@@ -150,238 +150,3 @@ To materialize the results of your DataFrame operations:
150
150
# Count rows
151
151
count = df.count()
152
152
153
- HTML Rendering in Jupyter
154
- -------------------------
155
-
156
- When working in Jupyter notebooks or other environments that support rich HTML display,
157
- DataFusion DataFrames automatically render as nicely formatted HTML tables. This functionality
158
- is provided by the ``_repr_html_ `` method, which is automatically called by Jupyter.
159
-
160
- Basic HTML Rendering
161
- ~~~~~~~~~~~~~~~~~~~~
162
-
163
- In a Jupyter environment, simply displaying a DataFrame object will trigger HTML rendering:
164
-
165
- .. code-block :: python
166
-
167
- # Will display as HTML table in Jupyter
168
- df
169
-
170
- # Explicit display also uses HTML rendering
171
- display(df)
172
-
173
- HTML Rendering Customization
174
- ----------------------------
175
-
176
- DataFusion provides extensive customization options for HTML table rendering through the
177
- ``datafusion.html_formatter `` module.
178
-
179
- Configuring the HTML Formatter
180
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
181
-
182
- You can customize how DataFrames are rendered by configuring the formatter:
183
-
184
- .. code-block :: python
185
-
186
- from datafusion.html_formatter import configure_formatter
187
-
188
- configure_formatter(
189
- max_cell_length = 30 , # Maximum length of cell content before truncation
190
- max_width = 800 , # Maximum width of table in pixels
191
- max_height = 400 , # Maximum height of table in pixels
192
- max_memory_bytes = 2 * 1024 * 1024 ,# Maximum memory used for rendering (2MB)
193
- min_rows_display = 10 , # Minimum rows to display
194
- repr_rows = 20 , # Number of rows to display in representation
195
- enable_cell_expansion = True , # Allow cells to be expandable on click
196
- custom_css = None , # Custom CSS to apply
197
- show_truncation_message = True , # Show message when data is truncated
198
- style_provider = None , # Custom style provider class
199
- use_shared_styles = True # Share styles across tables to reduce duplication
200
- )
201
-
202
- Custom Style Providers
203
- ~~~~~~~~~~~~~~~~~~~~~~
204
-
205
- For advanced styling needs, you can create a custom style provider class:
206
-
207
- .. code-block :: python
208
-
209
- from datafusion.html_formatter import configure_formatter
210
-
211
- class CustomStyleProvider :
212
- def get_cell_style (self ) -> str :
213
- return " background-color: #f5f5f5; color: #333; padding: 8px; border: 1px solid #ddd;"
214
-
215
- def get_header_style (self ) -> str :
216
- return " background-color: #4285f4; color: white; font-weight: bold; padding: 10px;"
217
-
218
- # Apply custom styling
219
- configure_formatter(style_provider = CustomStyleProvider())
220
-
221
- Custom Type Formatters
222
- ~~~~~~~~~~~~~~~~~~~~~~
223
-
224
- You can register custom formatters for specific data types:
225
-
226
- .. code-block :: python
227
-
228
- from datafusion.html_formatter import get_formatter
229
-
230
- formatter = get_formatter()
231
-
232
- # Format integers with color based on value
233
- def format_int (value ):
234
- return f ' <span style="color: { " red" if value > 100 else " blue" } "> { value} </span> '
235
-
236
- formatter.register_formatter(int , format_int)
237
-
238
- # Format date values
239
- def format_date (value ):
240
- return f ' <span class="date-value"> { value.isoformat()} </span> '
241
-
242
- formatter.register_formatter(datetime.date, format_date)
243
-
244
- Custom Cell Builders
245
- ~~~~~~~~~~~~~~~~~~~~
246
-
247
- For complete control over cell rendering:
248
-
249
- .. code-block :: python
250
-
251
- formatter = get_formatter()
252
-
253
- def custom_cell_builder (value , row , col , table_id ):
254
- try :
255
- num_value = float (value)
256
- if num_value > 0 : # Positive values get green
257
- return f ' <td style="background-color: #d9f0d3"> { value} </td> '
258
- if num_value < 0 : # Negative values get red
259
- return f ' <td style="background-color: #f0d3d3"> { value} </td> '
260
- except (ValueError , TypeError ):
261
- pass
262
-
263
- # Default styling for non-numeric or zero values
264
- return f ' <td style="border: 1px solid #ddd"> { value} </td> '
265
-
266
- formatter.set_custom_cell_builder(custom_cell_builder)
267
-
268
- Custom Header Builders
269
- ~~~~~~~~~~~~~~~~~~~~~~
270
-
271
- Similarly, you can customize the rendering of table headers:
272
-
273
- .. code-block :: python
274
-
275
- def custom_header_builder (field ):
276
- tooltip = f " Type: { field.type} "
277
- return f ' <th style="background-color: #333; color: white" title=" { tooltip} "> { field.name} </th> '
278
-
279
- formatter.set_custom_header_builder(custom_header_builder)
280
-
281
- Managing Formatter State
282
- -----------------------~
283
-
284
- The HTML formatter maintains global state that can be managed:
285
-
286
- .. code-block :: python
287
-
288
- from datafusion.html_formatter import reset_formatter, reset_styles_loaded_state, get_formatter
289
-
290
- # Reset the formatter to default settings
291
- reset_formatter()
292
-
293
- # Reset only the styles loaded state (useful when styles were loaded but need reloading)
294
- reset_styles_loaded_state()
295
-
296
- # Get the current formatter instance to make changes
297
- formatter = get_formatter()
298
-
299
- Advanced Example: Dashboard-Style Formatting
300
- ------------------------------------------~~
301
-
302
- This example shows how to create a dashboard-like styling for your DataFrames:
303
-
304
- .. code-block :: python
305
-
306
- from datafusion.html_formatter import configure_formatter, get_formatter
307
-
308
- # Define custom CSS
309
- custom_css = """
310
- .datafusion-table {
311
- font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
312
- border-collapse: collapse;
313
- width: 100%;
314
- box-shadow: 0 2px 3px rgba(0,0,0,0.1);
315
- }
316
- .datafusion-table th {
317
- position: sticky;
318
- top: 0;
319
- z-index: 10;
320
- }
321
- .datafusion-table tr:hover td {
322
- background-color: #f1f7fa !important;
323
- }
324
- .datafusion-table .numeric-positive {
325
- color: #0a7c00;
326
- }
327
- .datafusion-table .numeric-negative {
328
- color: #d13438;
329
- }
330
- """
331
-
332
- class DashboardStyleProvider :
333
- def get_cell_style (self ) -> str :
334
- return " padding: 8px 12px; border-bottom: 1px solid #e0e0e0;"
335
-
336
- def get_header_style (self ) -> str :
337
- return (" background-color: #0078d4; color: white; font-weight: 600; "
338
- " padding: 12px; text-align: left; border-bottom: 2px solid #005a9e;" )
339
-
340
- # Apply configuration
341
- configure_formatter(
342
- max_height = 500 ,
343
- enable_cell_expansion = True ,
344
- custom_css = custom_css,
345
- style_provider = DashboardStyleProvider(),
346
- max_cell_length = 50
347
- )
348
-
349
- # Add custom formatters for numbers
350
- formatter = get_formatter()
351
-
352
- def format_number (value ):
353
- try :
354
- num = float (value)
355
- cls = " numeric-positive" if num > 0 else " numeric-negative" if num < 0 else " "
356
- return f ' <span class=" { cls } "> { value:, } </span> ' if cls else f ' { value:, } '
357
- except (ValueError , TypeError ):
358
- return str (value)
359
-
360
- formatter.register_formatter(int , format_number)
361
- formatter.register_formatter(float , format_number)
362
-
363
- Best Practices
364
- --------------
365
-
366
- 1. **Memory Management **: For large datasets, use ``max_memory_bytes `` to limit memory usage.
367
-
368
- 2. **Responsive Design **: Set reasonable ``max_width `` and ``max_height `` values to ensure tables display well on different screens.
369
-
370
- 3. **Style Optimization **: Use ``use_shared_styles=True `` to avoid duplicate style definitions when displaying multiple tables.
371
-
372
- 4. **Reset When Needed **: Call ``reset_formatter() `` when you want to start fresh with default settings.
373
-
374
- 5. **Cell Expansion **: Use ``enable_cell_expansion=True `` when cells might contain longer content that users may want to see in full.
375
-
376
- Additional Resources
377
- --------------------
378
-
379
- * :doc: `../user-guide/dataframe ` - Complete guide to using DataFrames
380
- * :doc: `../user-guide/io/index ` - I/O Guide for reading data from various sources
381
- * :doc: `../user-guide/data-sources ` - Comprehensive data sources guide
382
- * :ref: `io_csv ` - CSV file reading
383
- * :ref: `io_parquet ` - Parquet file reading
384
- * :ref: `io_json ` - JSON file reading
385
- * :ref: `io_avro ` - Avro file reading
386
- * :ref: `io_custom_table_provider ` - Custom table providers
387
- * `API Reference <https://arrow.apache.org/datafusion-python/api/index.html >`_ - Full API reference
0 commit comments