-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Closed
Labels
Arrowpyarrow functionalitypyarrow functionalityDeprecateFunctionality to remove in pandasFunctionality to remove in pandasEnhancementNeeds DiscussionRequires discussion from core team before further actionRequires discussion from core team before further actionStringsString extension data type and string dataString extension data type and string data
Description
Feature Type
- Adding new functionality to pandas
- Changing existing functionality in pandas
- Removing existing functionality in pandas
Problem Description
I wish I could use pandas to create pyarrow backend Series for strings.
I wish there was a single data type and single extension array for strings (rather than 2).
Currently, we have 2 pyarrow data types & arrays for strings
StringDtype("pyarrow")
backend by arrays.ArrowStringArrayArrowDtype(pa.string())
backend by arrays.ArrowExtensionArray
I propose we use ArrowDtype(pa.string())
and ArrowExtensionArray.
Feature Description
import pyarrow as pa
import pandas as pd
series_str_arry = pd.Series(['red', 'blue', None], dtype="string[pyarrow]")
string_ext_arry = pd.ArrowDtype(pa.string())
series_ext_arry = pd.Series(['red', 'blue', None], dtype=string_ext_arry)
assert series_str_arry.dtype == series_ext_arry.dtype
assert series_str_arry.dtype.construct_array_type() == series_ext_arry.dtype.construct_array_type()
Alternative Solutions
- Keep both data types and arrays
Additional Context
jbrockmendel
Metadata
Metadata
Assignees
Labels
Arrowpyarrow functionalitypyarrow functionalityDeprecateFunctionality to remove in pandasFunctionality to remove in pandasEnhancementNeeds DiscussionRequires discussion from core team before further actionRequires discussion from core team before further actionStringsString extension data type and string dataString extension data type and string data