diff --git a/docs/source/api/dataframe.rst b/docs/source/api/dataframe.rst deleted file mode 100644 index a9e9e47c..00000000 --- a/docs/source/api/dataframe.rst +++ /dev/null @@ -1,387 +0,0 @@ -.. Licensed to the Apache Software Foundation (ASF) under one -.. or more contributor license agreements. See the NOTICE file -.. distributed with this work for additional information -.. regarding copyright ownership. The ASF licenses this file -.. to you under the Apache License, Version 2.0 (the -.. "License"); you may not use this file except in compliance -.. with the License. You may obtain a copy of the License at - -.. http://www.apache.org/licenses/LICENSE-2.0 - -.. Unless required by applicable law or agreed to in writing, -.. software distributed under the License is distributed on an -.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -.. KIND, either express or implied. See the License for the -.. specific language governing permissions and limitations -.. under the License. - -================= -DataFrame API -================= - -Overview --------- - -The ``DataFrame`` class is the core abstraction in DataFusion that represents tabular data and operations -on that data. DataFrames provide a flexible API for transforming data through various operations such as -filtering, projection, aggregation, joining, and more. - -A DataFrame represents a logical plan that is lazily evaluated. The actual execution occurs only when -terminal operations like ``collect()``, ``show()``, or ``to_pandas()`` are called. - -Creating DataFrames -------------------- - -DataFrames can be created in several ways: - -* From SQL queries via a ``SessionContext``: - - .. code-block:: python - - from datafusion import SessionContext - - ctx = SessionContext() - df = ctx.sql("SELECT * FROM your_table") - -* From registered tables: - - .. code-block:: python - - df = ctx.table("your_table") - -* From various data sources: - - .. code-block:: python - - # From CSV files (see :ref:`io_csv` for detailed options) - df = ctx.read_csv("path/to/data.csv") - - # From Parquet files (see :ref:`io_parquet` for detailed options) - df = ctx.read_parquet("path/to/data.parquet") - - # From JSON files (see :ref:`io_json` for detailed options) - df = ctx.read_json("path/to/data.json") - - # From Avro files (see :ref:`io_avro` for detailed options) - df = ctx.read_avro("path/to/data.avro") - - # From Pandas DataFrame - import pandas as pd - pandas_df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}) - df = ctx.from_pandas(pandas_df) - - # From Arrow data - import pyarrow as pa - batch = pa.RecordBatch.from_arrays( - [pa.array([1, 2, 3]), pa.array([4, 5, 6])], - names=["a", "b"] - ) - df = ctx.from_arrow(batch) - - For detailed information about reading from different data sources, see the :doc:`I/O Guide <../user-guide/io/index>`. - For custom data sources, see :ref:`io_custom_table_provider`. - -Common DataFrame Operations ---------------------------- - -DataFusion's DataFrame API offers a wide range of operations: - -.. code-block:: python - - from datafusion import column, literal - - # Select specific columns - df = df.select("col1", "col2") - - # Select with expressions - df = df.select(column("a") + column("b"), column("a") - column("b")) - - # Filter rows - df = df.filter(column("age") > literal(25)) - - # Add computed columns - df = df.with_column("full_name", column("first_name") + literal(" ") + column("last_name")) - - # Multiple column additions - df = df.with_columns( - (column("a") + column("b")).alias("sum"), - (column("a") * column("b")).alias("product") - ) - - # Sort data - df = df.sort(column("age").sort(ascending=False)) - - # Join DataFrames - df = df1.join(df2, on="user_id", how="inner") - - # Aggregate data - from datafusion import functions as f - df = df.aggregate( - [], # Group by columns (empty for global aggregation) - [f.sum(column("amount")).alias("total_amount")] - ) - - # Limit rows - df = df.limit(100) - - # Drop columns - df = df.drop("temporary_column") - -Terminal Operations -------------------- - -To materialize the results of your DataFrame operations: - -.. code-block:: python - - # Collect all data as PyArrow RecordBatches - result_batches = df.collect() - - # Convert to various formats - pandas_df = df.to_pandas() # Pandas DataFrame - polars_df = df.to_polars() # Polars DataFrame - arrow_table = df.to_arrow_table() # PyArrow Table - py_dict = df.to_pydict() # Python dictionary - py_list = df.to_pylist() # Python list of dictionaries - - # Display results - df.show() # Print tabular format to console - - # Count rows - count = df.count() - -HTML Rendering in Jupyter -------------------------- - -When working in Jupyter notebooks or other environments that support rich HTML display, -DataFusion DataFrames automatically render as nicely formatted HTML tables. This functionality -is provided by the ``_repr_html_`` method, which is automatically called by Jupyter. - -Basic HTML Rendering -~~~~~~~~~~~~~~~~~~~~ - -In a Jupyter environment, simply displaying a DataFrame object will trigger HTML rendering: - -.. code-block:: python - - # Will display as HTML table in Jupyter - df - - # Explicit display also uses HTML rendering - display(df) - -HTML Rendering Customization ----------------------------- - -DataFusion provides extensive customization options for HTML table rendering through the -``datafusion.html_formatter`` module. - -Configuring the HTML Formatter -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -You can customize how DataFrames are rendered by configuring the formatter: - -.. code-block:: python - - from datafusion.html_formatter import configure_formatter - - configure_formatter( - max_cell_length=30, # Maximum length of cell content before truncation - max_width=800, # Maximum width of table in pixels - max_height=400, # Maximum height of table in pixels - max_memory_bytes=2 * 1024 * 1024,# Maximum memory used for rendering (2MB) - min_rows_display=10, # Minimum rows to display - repr_rows=20, # Number of rows to display in representation - enable_cell_expansion=True, # Allow cells to be expandable on click - custom_css=None, # Custom CSS to apply - show_truncation_message=True, # Show message when data is truncated - style_provider=None, # Custom style provider class - use_shared_styles=True # Share styles across tables to reduce duplication - ) - -Custom Style Providers -~~~~~~~~~~~~~~~~~~~~~~ - -For advanced styling needs, you can create a custom style provider class: - -.. code-block:: python - - from datafusion.html_formatter import configure_formatter - - class CustomStyleProvider: - def get_cell_style(self) -> str: - return "background-color: #f5f5f5; color: #333; padding: 8px; border: 1px solid #ddd;" - - def get_header_style(self) -> str: - return "background-color: #4285f4; color: white; font-weight: bold; padding: 10px;" - - # Apply custom styling - configure_formatter(style_provider=CustomStyleProvider()) - -Custom Type Formatters -~~~~~~~~~~~~~~~~~~~~~~ - -You can register custom formatters for specific data types: - -.. code-block:: python - - from datafusion.html_formatter import get_formatter - - formatter = get_formatter() - - # Format integers with color based on value - def format_int(value): - return f' 100 else "blue"}">{value}' - - formatter.register_formatter(int, format_int) - - # Format date values - def format_date(value): - return f'{value.isoformat()}' - - formatter.register_formatter(datetime.date, format_date) - -Custom Cell Builders -~~~~~~~~~~~~~~~~~~~~ - -For complete control over cell rendering: - -.. code-block:: python - - formatter = get_formatter() - - def custom_cell_builder(value, row, col, table_id): - try: - num_value = float(value) - if num_value > 0: # Positive values get green - return f'{value}' - if num_value < 0: # Negative values get red - return f'{value}' - except (ValueError, TypeError): - pass - - # Default styling for non-numeric or zero values - return f'{value}' - - formatter.set_custom_cell_builder(custom_cell_builder) - -Custom Header Builders -~~~~~~~~~~~~~~~~~~~~~~ - -Similarly, you can customize the rendering of table headers: - -.. code-block:: python - - def custom_header_builder(field): - tooltip = f"Type: {field.type}" - return f'{field.name}' - - formatter.set_custom_header_builder(custom_header_builder) - -Managing Formatter State ------------------------~ - -The HTML formatter maintains global state that can be managed: - -.. code-block:: python - - from datafusion.html_formatter import reset_formatter, reset_styles_loaded_state, get_formatter - - # Reset the formatter to default settings - reset_formatter() - - # Reset only the styles loaded state (useful when styles were loaded but need reloading) - reset_styles_loaded_state() - - # Get the current formatter instance to make changes - formatter = get_formatter() - -Advanced Example: Dashboard-Style Formatting -------------------------------------------~~ - -This example shows how to create a dashboard-like styling for your DataFrames: - -.. code-block:: python - - from datafusion.html_formatter import configure_formatter, get_formatter - - # Define custom CSS - custom_css = """ - .datafusion-table { - font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif; - border-collapse: collapse; - width: 100%; - box-shadow: 0 2px 3px rgba(0,0,0,0.1); - } - .datafusion-table th { - position: sticky; - top: 0; - z-index: 10; - } - .datafusion-table tr:hover td { - background-color: #f1f7fa !important; - } - .datafusion-table .numeric-positive { - color: #0a7c00; - } - .datafusion-table .numeric-negative { - color: #d13438; - } - """ - - class DashboardStyleProvider: - def get_cell_style(self) -> str: - return "padding: 8px 12px; border-bottom: 1px solid #e0e0e0;" - - def get_header_style(self) -> str: - return ("background-color: #0078d4; color: white; font-weight: 600; " - "padding: 12px; text-align: left; border-bottom: 2px solid #005a9e;") - - # Apply configuration - configure_formatter( - max_height=500, - enable_cell_expansion=True, - custom_css=custom_css, - style_provider=DashboardStyleProvider(), - max_cell_length=50 - ) - - # Add custom formatters for numbers - formatter = get_formatter() - - def format_number(value): - try: - num = float(value) - cls = "numeric-positive" if num > 0 else "numeric-negative" if num < 0 else "" - return f'{value:,}' if cls else f'{value:,}' - except (ValueError, TypeError): - return str(value) - - formatter.register_formatter(int, format_number) - formatter.register_formatter(float, format_number) - -Best Practices --------------- - -1. **Memory Management**: For large datasets, use ``max_memory_bytes`` to limit memory usage. - -2. **Responsive Design**: Set reasonable ``max_width`` and ``max_height`` values to ensure tables display well on different screens. - -3. **Style Optimization**: Use ``use_shared_styles=True`` to avoid duplicate style definitions when displaying multiple tables. - -4. **Reset When Needed**: Call ``reset_formatter()`` when you want to start fresh with default settings. - -5. **Cell Expansion**: Use ``enable_cell_expansion=True`` when cells might contain longer content that users may want to see in full. - -Additional Resources --------------------- - -* :doc:`../user-guide/dataframe` - Complete guide to using DataFrames -* :doc:`../user-guide/io/index` - I/O Guide for reading data from various sources -* :doc:`../user-guide/data-sources` - Comprehensive data sources guide -* :ref:`io_csv` - CSV file reading -* :ref:`io_parquet` - Parquet file reading -* :ref:`io_json` - JSON file reading -* :ref:`io_avro` - Avro file reading -* :ref:`io_custom_table_provider` - Custom table providers -* `API Reference `_ - Full API reference diff --git a/docs/source/api/index.rst b/docs/source/api/index.rst deleted file mode 100644 index 7f58227c..00000000 --- a/docs/source/api/index.rst +++ /dev/null @@ -1,27 +0,0 @@ -.. Licensed to the Apache Software Foundation (ASF) under one -.. or more contributor license agreements. See the NOTICE file -.. distributed with this work for additional information -.. regarding copyright ownership. The ASF licenses this file -.. to you under the Apache License, Version 2.0 (the -.. "License"); you may not use this file except in compliance -.. with the License. You may obtain a copy of the License at - -.. http://www.apache.org/licenses/LICENSE-2.0 - -.. Unless required by applicable law or agreed to in writing, -.. software distributed under the License is distributed on an -.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -.. KIND, either express or implied. See the License for the -.. specific language governing permissions and limitations -.. under the License. - -============= -API Reference -============= - -This section provides detailed API documentation for the DataFusion Python library. - -.. toctree:: - :maxdepth: 2 - - dataframe diff --git a/docs/source/index.rst b/docs/source/index.rst index ff1e4728..adec60f4 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -72,7 +72,7 @@ Example user-guide/introduction user-guide/basics user-guide/data-sources - user-guide/dataframe + user-guide/dataframe/index user-guide/common-operations/index user-guide/io/index user-guide/configuration @@ -93,5 +93,3 @@ Example :hidden: :maxdepth: 1 :caption: API - - api/index diff --git a/docs/source/user-guide/basics.rst b/docs/source/user-guide/basics.rst index 2975d9a6..7c682046 100644 --- a/docs/source/user-guide/basics.rst +++ b/docs/source/user-guide/basics.rst @@ -73,7 +73,7 @@ DataFrames are typically created by calling a method on :py:class:`~datafusion.c calling the transformation methods, such as :py:func:`~datafusion.dataframe.DataFrame.filter`, :py:func:`~datafusion.dataframe.DataFrame.select`, :py:func:`~datafusion.dataframe.DataFrame.aggregate`, and :py:func:`~datafusion.dataframe.DataFrame.limit` to build up a query definition. -For more details on working with DataFrames, including visualization options and conversion to other formats, see :doc:`dataframe`. +For more details on working with DataFrames, including visualization options and conversion to other formats, see :doc:`dataframe/index`. Expressions ----------- diff --git a/docs/source/user-guide/dataframe/index.rst b/docs/source/user-guide/dataframe/index.rst new file mode 100644 index 00000000..f69485af --- /dev/null +++ b/docs/source/user-guide/dataframe/index.rst @@ -0,0 +1,209 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +DataFrames +========== + +Overview +-------- + +The ``DataFrame`` class is the core abstraction in DataFusion that represents tabular data and operations +on that data. DataFrames provide a flexible API for transforming data through various operations such as +filtering, projection, aggregation, joining, and more. + +A DataFrame represents a logical plan that is lazily evaluated. The actual execution occurs only when +terminal operations like ``collect()``, ``show()``, or ``to_pandas()`` are called. + +Creating DataFrames +------------------- + +DataFrames can be created in several ways: + +* From SQL queries via a ``SessionContext``: + + .. code-block:: python + + from datafusion import SessionContext + + ctx = SessionContext() + df = ctx.sql("SELECT * FROM your_table") + +* From registered tables: + + .. code-block:: python + + df = ctx.table("your_table") + +* From various data sources: + + .. code-block:: python + + # From CSV files (see :ref:`io_csv` for detailed options) + df = ctx.read_csv("path/to/data.csv") + + # From Parquet files (see :ref:`io_parquet` for detailed options) + df = ctx.read_parquet("path/to/data.parquet") + + # From JSON files (see :ref:`io_json` for detailed options) + df = ctx.read_json("path/to/data.json") + + # From Avro files (see :ref:`io_avro` for detailed options) + df = ctx.read_avro("path/to/data.avro") + + # From Pandas DataFrame + import pandas as pd + pandas_df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}) + df = ctx.from_pandas(pandas_df) + + # From Arrow data + import pyarrow as pa + batch = pa.RecordBatch.from_arrays( + [pa.array([1, 2, 3]), pa.array([4, 5, 6])], + names=["a", "b"] + ) + df = ctx.from_arrow(batch) + +For detailed information about reading from different data sources, see the :doc:`I/O Guide <../io/index>`. +For custom data sources, see :ref:`io_custom_table_provider`. + +Common DataFrame Operations +--------------------------- + +DataFusion's DataFrame API offers a wide range of operations: + +.. code-block:: python + + from datafusion import column, literal + + # Select specific columns + df = df.select("col1", "col2") + + # Select with expressions + df = df.select(column("a") + column("b"), column("a") - column("b")) + + # Filter rows + df = df.filter(column("age") > literal(25)) + + # Add computed columns + df = df.with_column("full_name", column("first_name") + literal(" ") + column("last_name")) + + # Multiple column additions + df = df.with_columns( + (column("a") + column("b")).alias("sum"), + (column("a") * column("b")).alias("product") + ) + + # Sort data + df = df.sort(column("age").sort(ascending=False)) + + # Join DataFrames + df = df1.join(df2, on="user_id", how="inner") + + # Aggregate data + from datafusion import functions as f + df = df.aggregate( + [], # Group by columns (empty for global aggregation) + [f.sum(column("amount")).alias("total_amount")] + ) + + # Limit rows + df = df.limit(100) + + # Drop columns + df = df.drop("temporary_column") + +Terminal Operations +------------------- + +To materialize the results of your DataFrame operations: + +.. code-block:: python + + # Collect all data as PyArrow RecordBatches + result_batches = df.collect() + + # Convert to various formats + pandas_df = df.to_pandas() # Pandas DataFrame + polars_df = df.to_polars() # Polars DataFrame + arrow_table = df.to_arrow_table() # PyArrow Table + py_dict = df.to_pydict() # Python dictionary + py_list = df.to_pylist() # Python list of dictionaries + + # Display results + df.show() # Print tabular format to console + + # Count rows + count = df.count() + +HTML Rendering +-------------- + +When working in Jupyter notebooks or other environments that support HTML rendering, DataFrames will +automatically display as formatted HTML tables. For detailed information about customizing HTML +rendering, formatting options, and advanced styling, see :doc:`rendering`. + +Core Classes +------------ + +**DataFrame** + The main DataFrame class for building and executing queries. + + See: :py:class:`datafusion.DataFrame` + +**SessionContext** + The primary entry point for creating DataFrames from various data sources. + + Key methods for DataFrame creation: + + * :py:meth:`~datafusion.SessionContext.read_csv` - Read CSV files + * :py:meth:`~datafusion.SessionContext.read_parquet` - Read Parquet files + * :py:meth:`~datafusion.SessionContext.read_json` - Read JSON files + * :py:meth:`~datafusion.SessionContext.read_avro` - Read Avro files + * :py:meth:`~datafusion.SessionContext.table` - Access registered tables + * :py:meth:`~datafusion.SessionContext.sql` - Execute SQL queries + * :py:meth:`~datafusion.SessionContext.from_pandas` - Create from Pandas DataFrame + * :py:meth:`~datafusion.SessionContext.from_arrow` - Create from Arrow data + + See: :py:class:`datafusion.SessionContext` + +Expression Classes +------------------ + +**Expr** + Represents expressions that can be used in DataFrame operations. + + See: :py:class:`datafusion.Expr` + +**Functions for creating expressions:** + +* :py:func:`datafusion.column` - Reference a column by name +* :py:func:`datafusion.literal` - Create a literal value expression + +Built-in Functions +------------------ + +DataFusion provides many built-in functions for data manipulation: + +* :py:mod:`datafusion.functions` - Mathematical, string, date/time, and aggregation functions + +For a complete list of available functions, see the :py:mod:`datafusion.functions` module documentation. + + +.. toctree:: + :maxdepth: 1 + + rendering diff --git a/docs/source/user-guide/dataframe.rst b/docs/source/user-guide/dataframe/rendering.rst similarity index 72% rename from docs/source/user-guide/dataframe.rst rename to docs/source/user-guide/dataframe/rendering.rst index 23c65b5f..4c37c747 100644 --- a/docs/source/user-guide/dataframe.rst +++ b/docs/source/user-guide/dataframe/rendering.rst @@ -15,59 +15,37 @@ .. specific language governing permissions and limitations .. under the License. -DataFrames -========== +HTML Rendering in Jupyter +========================= -Overview --------- +When working in Jupyter notebooks or other environments that support rich HTML display, +DataFusion DataFrames automatically render as nicely formatted HTML tables. This functionality +is provided by the ``_repr_html_`` method, which is automatically called by Jupyter to provide +a richer visualization than plain text output. -DataFusion's DataFrame API provides a powerful interface for building and executing queries against data sources. -It offers a familiar API similar to pandas and other DataFrame libraries, but with the performance benefits of Rust -and Arrow. +Basic HTML Rendering +-------------------- -A DataFrame represents a logical plan that can be composed through operations like filtering, projection, and aggregation. -The actual execution happens when terminal operations like ``collect()`` or ``show()`` are called. - -Basic Usage ------------ +In a Jupyter environment, simply displaying a DataFrame object will trigger HTML rendering: .. code-block:: python - import datafusion - from datafusion import col, lit + # Will display as HTML table in Jupyter + df - # Create a context and register a data source - ctx = datafusion.SessionContext() - ctx.register_csv("my_table", "path/to/data.csv") - - # Create and manipulate a DataFrame - df = ctx.sql("SELECT * FROM my_table") - - # Or use the DataFrame API directly - df = (ctx.table("my_table") - .filter(col("age") > lit(25)) - .select([col("name"), col("age")])) - - # Execute and collect results - result = df.collect() - - # Display the first few rows - df.show() + # Explicit display also uses HTML rendering + display(df) -HTML Rendering --------------- - -When working in Jupyter notebooks or other environments that support HTML rendering, DataFrames will -automatically display as formatted HTML tables, making it easier to visualize your data. +Customizing HTML Rendering +--------------------------- -The ``_repr_html_`` method is called automatically by Jupyter to render a DataFrame. This method -controls how DataFrames appear in notebook environments, providing a richer visualization than -plain text output. +DataFusion provides extensive customization options for HTML table rendering through the +``datafusion.html_formatter`` module. -Customizing HTML Rendering --------------------------- +Configuring the HTML Formatter +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -You can customize how DataFrames are rendered in HTML by configuring the formatter: +You can customize how DataFrames are rendered by configuring the formatter: .. code-block:: python @@ -91,7 +69,7 @@ You can customize how DataFrames are rendered in HTML by configuring the formatt The formatter settings affect all DataFrames displayed after configuration. Custom Style Providers ----------------------- +----------------------- For advanced styling needs, you can create a custom style provider: @@ -118,7 +96,8 @@ For advanced styling needs, you can create a custom style provider: configure_formatter(style_provider=MyStyleProvider()) Performance Optimization with Shared Styles -------------------------------------------- +-------------------------------------------- + The ``use_shared_styles`` parameter (enabled by default) optimizes performance when displaying multiple DataFrames in notebook environments: @@ -138,7 +117,7 @@ When ``use_shared_styles=True``: - Applies consistent styling across all DataFrames Creating a Custom Formatter ---------------------------- +---------------------------- For complete control over rendering, you can implement a custom formatter: @@ -184,7 +163,7 @@ Get the current formatter settings: print(formatter.theme) Contextual Formatting ---------------------- +---------------------- You can also use a context manager to temporarily change formatting settings: @@ -207,12 +186,38 @@ Memory and Display Controls You can control how much data is displayed and how much memory is used for rendering: - .. code-block:: python - +.. code-block:: python + configure_formatter( max_memory_bytes=4 * 1024 * 1024, # 4MB maximum memory for display min_rows_display=50, # Always show at least 50 rows repr_rows=20 # Show 20 rows in __repr__ output ) -These parameters help balance comprehensive data display against performance considerations. \ No newline at end of file +These parameters help balance comprehensive data display against performance considerations. + +Best Practices +-------------- + +1. **Global Configuration**: Use ``configure_formatter()`` at the beginning of your notebook to set up consistent formatting for all DataFrames. + +2. **Memory Management**: Set appropriate ``max_memory_bytes`` limits to prevent performance issues with large datasets. + +3. **Shared Styles**: Keep ``use_shared_styles=True`` (default) for better performance in notebooks with multiple DataFrames. + +4. **Reset When Needed**: Call ``reset_formatter()`` when you want to start fresh with default settings. + +5. **Cell Expansion**: Use ``enable_cell_expansion=True`` when cells might contain longer content that users may want to see in full. + +Additional Resources +-------------------- + +* :doc:`../dataframe/index` - Complete guide to using DataFrames +* :doc:`../io/index` - I/O Guide for reading data from various sources +* :doc:`../data-sources` - Comprehensive data sources guide +* :ref:`io_csv` - CSV file reading +* :ref:`io_parquet` - Parquet file reading +* :ref:`io_json` - JSON file reading +* :ref:`io_avro` - Avro file reading +* :ref:`io_custom_table_provider` - Custom table providers +* `API Reference `_ - Full API reference