Skip to content

Commit 12066f4

Browse files
authored
docs: unify dataframe documentation (#2)
1 parent ff5bfcc commit 12066f4

File tree

5 files changed

+242
-246
lines changed

5 files changed

+242
-246
lines changed

docs/source/api/dataframe.rst

Lines changed: 0 additions & 235 deletions
Original file line numberDiff line numberDiff line change
@@ -150,238 +150,3 @@ To materialize the results of your DataFrame operations:
150150
# Count rows
151151
count = df.count()
152152
153-
HTML Rendering in Jupyter
154-
-------------------------
155-
156-
When working in Jupyter notebooks or other environments that support rich HTML display,
157-
DataFusion DataFrames automatically render as nicely formatted HTML tables. This functionality
158-
is provided by the ``_repr_html_`` method, which is automatically called by Jupyter.
159-
160-
Basic HTML Rendering
161-
~~~~~~~~~~~~~~~~~~~~
162-
163-
In a Jupyter environment, simply displaying a DataFrame object will trigger HTML rendering:
164-
165-
.. code-block:: python
166-
167-
# Will display as HTML table in Jupyter
168-
df
169-
170-
# Explicit display also uses HTML rendering
171-
display(df)
172-
173-
HTML Rendering Customization
174-
----------------------------
175-
176-
DataFusion provides extensive customization options for HTML table rendering through the
177-
``datafusion.html_formatter`` module.
178-
179-
Configuring the HTML Formatter
180-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
181-
182-
You can customize how DataFrames are rendered by configuring the formatter:
183-
184-
.. code-block:: python
185-
186-
from datafusion.html_formatter import configure_formatter
187-
188-
configure_formatter(
189-
max_cell_length=30, # Maximum length of cell content before truncation
190-
max_width=800, # Maximum width of table in pixels
191-
max_height=400, # Maximum height of table in pixels
192-
max_memory_bytes=2 * 1024 * 1024,# Maximum memory used for rendering (2MB)
193-
min_rows_display=10, # Minimum rows to display
194-
repr_rows=20, # Number of rows to display in representation
195-
enable_cell_expansion=True, # Allow cells to be expandable on click
196-
custom_css=None, # Custom CSS to apply
197-
show_truncation_message=True, # Show message when data is truncated
198-
style_provider=None, # Custom style provider class
199-
use_shared_styles=True # Share styles across tables to reduce duplication
200-
)
201-
202-
Custom Style Providers
203-
~~~~~~~~~~~~~~~~~~~~~~
204-
205-
For advanced styling needs, you can create a custom style provider class:
206-
207-
.. code-block:: python
208-
209-
from datafusion.html_formatter import configure_formatter
210-
211-
class CustomStyleProvider:
212-
def get_cell_style(self) -> str:
213-
return "background-color: #f5f5f5; color: #333; padding: 8px; border: 1px solid #ddd;"
214-
215-
def get_header_style(self) -> str:
216-
return "background-color: #4285f4; color: white; font-weight: bold; padding: 10px;"
217-
218-
# Apply custom styling
219-
configure_formatter(style_provider=CustomStyleProvider())
220-
221-
Custom Type Formatters
222-
~~~~~~~~~~~~~~~~~~~~~~
223-
224-
You can register custom formatters for specific data types:
225-
226-
.. code-block:: python
227-
228-
from datafusion.html_formatter import get_formatter
229-
230-
formatter = get_formatter()
231-
232-
# Format integers with color based on value
233-
def format_int(value):
234-
return f'<span style="color: {"red" if value > 100 else "blue"}">{value}</span>'
235-
236-
formatter.register_formatter(int, format_int)
237-
238-
# Format date values
239-
def format_date(value):
240-
return f'<span class="date-value">{value.isoformat()}</span>'
241-
242-
formatter.register_formatter(datetime.date, format_date)
243-
244-
Custom Cell Builders
245-
~~~~~~~~~~~~~~~~~~~~
246-
247-
For complete control over cell rendering:
248-
249-
.. code-block:: python
250-
251-
formatter = get_formatter()
252-
253-
def custom_cell_builder(value, row, col, table_id):
254-
try:
255-
num_value = float(value)
256-
if num_value > 0: # Positive values get green
257-
return f'<td style="background-color: #d9f0d3">{value}</td>'
258-
if num_value < 0: # Negative values get red
259-
return f'<td style="background-color: #f0d3d3">{value}</td>'
260-
except (ValueError, TypeError):
261-
pass
262-
263-
# Default styling for non-numeric or zero values
264-
return f'<td style="border: 1px solid #ddd">{value}</td>'
265-
266-
formatter.set_custom_cell_builder(custom_cell_builder)
267-
268-
Custom Header Builders
269-
~~~~~~~~~~~~~~~~~~~~~~
270-
271-
Similarly, you can customize the rendering of table headers:
272-
273-
.. code-block:: python
274-
275-
def custom_header_builder(field):
276-
tooltip = f"Type: {field.type}"
277-
return f'<th style="background-color: #333; color: white" title="{tooltip}">{field.name}</th>'
278-
279-
formatter.set_custom_header_builder(custom_header_builder)
280-
281-
Managing Formatter State
282-
-----------------------~
283-
284-
The HTML formatter maintains global state that can be managed:
285-
286-
.. code-block:: python
287-
288-
from datafusion.html_formatter import reset_formatter, reset_styles_loaded_state, get_formatter
289-
290-
# Reset the formatter to default settings
291-
reset_formatter()
292-
293-
# Reset only the styles loaded state (useful when styles were loaded but need reloading)
294-
reset_styles_loaded_state()
295-
296-
# Get the current formatter instance to make changes
297-
formatter = get_formatter()
298-
299-
Advanced Example: Dashboard-Style Formatting
300-
------------------------------------------~~
301-
302-
This example shows how to create a dashboard-like styling for your DataFrames:
303-
304-
.. code-block:: python
305-
306-
from datafusion.html_formatter import configure_formatter, get_formatter
307-
308-
# Define custom CSS
309-
custom_css = """
310-
.datafusion-table {
311-
font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
312-
border-collapse: collapse;
313-
width: 100%;
314-
box-shadow: 0 2px 3px rgba(0,0,0,0.1);
315-
}
316-
.datafusion-table th {
317-
position: sticky;
318-
top: 0;
319-
z-index: 10;
320-
}
321-
.datafusion-table tr:hover td {
322-
background-color: #f1f7fa !important;
323-
}
324-
.datafusion-table .numeric-positive {
325-
color: #0a7c00;
326-
}
327-
.datafusion-table .numeric-negative {
328-
color: #d13438;
329-
}
330-
"""
331-
332-
class DashboardStyleProvider:
333-
def get_cell_style(self) -> str:
334-
return "padding: 8px 12px; border-bottom: 1px solid #e0e0e0;"
335-
336-
def get_header_style(self) -> str:
337-
return ("background-color: #0078d4; color: white; font-weight: 600; "
338-
"padding: 12px; text-align: left; border-bottom: 2px solid #005a9e;")
339-
340-
# Apply configuration
341-
configure_formatter(
342-
max_height=500,
343-
enable_cell_expansion=True,
344-
custom_css=custom_css,
345-
style_provider=DashboardStyleProvider(),
346-
max_cell_length=50
347-
)
348-
349-
# Add custom formatters for numbers
350-
formatter = get_formatter()
351-
352-
def format_number(value):
353-
try:
354-
num = float(value)
355-
cls = "numeric-positive" if num > 0 else "numeric-negative" if num < 0 else ""
356-
return f'<span class="{cls}">{value:,}</span>' if cls else f'{value:,}'
357-
except (ValueError, TypeError):
358-
return str(value)
359-
360-
formatter.register_formatter(int, format_number)
361-
formatter.register_formatter(float, format_number)
362-
363-
Best Practices
364-
--------------
365-
366-
1. **Memory Management**: For large datasets, use ``max_memory_bytes`` to limit memory usage.
367-
368-
2. **Responsive Design**: Set reasonable ``max_width`` and ``max_height`` values to ensure tables display well on different screens.
369-
370-
3. **Style Optimization**: Use ``use_shared_styles=True`` to avoid duplicate style definitions when displaying multiple tables.
371-
372-
4. **Reset When Needed**: Call ``reset_formatter()`` when you want to start fresh with default settings.
373-
374-
5. **Cell Expansion**: Use ``enable_cell_expansion=True`` when cells might contain longer content that users may want to see in full.
375-
376-
Additional Resources
377-
--------------------
378-
379-
* :doc:`../user-guide/dataframe` - Complete guide to using DataFrames
380-
* :doc:`../user-guide/io/index` - I/O Guide for reading data from various sources
381-
* :doc:`../user-guide/data-sources` - Comprehensive data sources guide
382-
* :ref:`io_csv` - CSV file reading
383-
* :ref:`io_parquet` - Parquet file reading
384-
* :ref:`io_json` - JSON file reading
385-
* :ref:`io_avro` - Avro file reading
386-
* :ref:`io_custom_table_provider` - Custom table providers
387-
* `API Reference <https://arrow.apache.org/datafusion-python/api/index.html>`_ - Full API reference

docs/source/api/index.rst

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,5 +23,3 @@ This section provides detailed API documentation for the DataFusion Python libra
2323

2424
.. toctree::
2525
:maxdepth: 2
26-
27-
dataframe

docs/source/index.rst

Lines changed: 1 addition & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ Example
7272
user-guide/introduction
7373
user-guide/basics
7474
user-guide/data-sources
75-
user-guide/dataframe
75+
user-guide/dataframe/index
7676
user-guide/common-operations/index
7777
user-guide/io/index
7878
user-guide/configuration
@@ -88,10 +88,3 @@ Example
8888
contributor-guide/introduction
8989
contributor-guide/ffi
9090

91-
.. _toc.api:
92-
.. toctree::
93-
:hidden:
94-
:maxdepth: 1
95-
:caption: API
96-
97-
api/index

docs/source/user-guide/dataframe.rst renamed to docs/source/user-guide/dataframe/index.rst

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -215,4 +215,9 @@ You can control how much data is displayed and how much memory is used for rende
215215
repr_rows=20 # Show 20 rows in __repr__ output
216216
)
217217
218-
These parameters help balance comprehensive data display against performance considerations.
218+
These parameters help balance comprehensive data display against performance considerations.
219+
220+
.. toctree::
221+
:maxdepth: 1
222+
223+
rendering

0 commit comments

Comments
 (0)