-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Closed
Labels
Arrowpyarrow functionalitypyarrow functionality
Description
As far as I could see, there is no easy way given a PyArrow table, to get a DataFrame with pyarrow types.
I'd expect that those idioms work:
import numpy
import pyarrow
import pandas
arrow_u8 = pyarrow.array([1, 2, 3], type=pyarrow.uint8())
arrow_f64 = pyarrow.array([1., 2., 3.], type=pyarrow.float64())
table = pyarrow.table([arrow_u8, arrow_f64], names=['u8', 'f64'])
# Using the PyArrow `to_pandas` method will use NumPy backed data
df = table.to_pandas()
# Using the constructor with a PyArrow table raises: ValueError: DataFrame constructor not properly called!
df = pandas.DataFrame(table)
# This is not implemented (the method doesn't exist)
df = pandas.DataFrame.from_arrow(table)
# Creating a dataframe column by column naively from the arrow array will use NumPy dtypes
df = pandas.DataFrame({'u8': arrow_u8,
'f64': arrow_f64})
I think the easier way to make the transition is with something like this:
df = pandas.DataFrame({name: pandas.Series(array,
dtype=pandas.ArrowDtype(array.type))
for array, name
in zip(table.columns, table.column_names)})
@pandas-dev/pandas-core Given that Arrow dtypes is one of the highlights of pandas 2.0, shouldn't we provide at least one easy way to convert before the release?
Metadata
Metadata
Assignees
Labels
Arrowpyarrow functionalitypyarrow functionality