-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Closed
Labels
Compatpandas objects compatability with Numpy or Python functionspandas objects compatability with Numpy or Python functionsDocsIO Parquetparquet, featherparquet, feather
Milestone
Description
In the dev docs the example that subsets the columns to read with read_parquet
is broken for the pyarrow engine: http://pandas-docs.github.io/pandas-docs-travis/io.html#io-parquet
In [514]: result = pd.read_parquet('example_pa.parquet', engine='pyarrow', columns=['a', 'b'])
...
IndexError: Table column index 6 is out of range
In [515]: result = pd.read_parquet('example_fp.parquet', engine='fastparquet', columns=['a', 'b'])
In [516]: result.dtypes
Out[516]:
a object
b int64
dtype: object
This is due to a bug in pyarrow
(which I am reporting over there, due to how pyarrow deals with the pandas metadata if not all columns are present), but in the meantime we should also fix our docs to not show this buggy example.
Metadata
Metadata
Assignees
Labels
Compatpandas objects compatability with Numpy or Python functionspandas objects compatability with Numpy or Python functionsDocsIO Parquetparquet, featherparquet, feather