-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
ENH: Add misc pyarrow types to ArrowDtype.type #51854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/arrays/arrow/dtype.py
Outdated
elif pa.types.is_dictionary(pa_type): | ||
from pandas.core.arrays import Categorical | ||
|
||
return Categorical |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel you had mentioned type
should be the type of the associated scalar.
For our the CategoricalDtype.type
this CategoricalDtypeType
, but I guess we don't have a categorical scalar per se?
For ArrowDtype.type
should we use CategoricalDtypeType
here or the type of the categorical values instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id be OK either way. CategoricalDtypeType
seems like the safe option internal-consistency wise, but seems pretty useless for a user
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, would it be okay to return a Categorical here? The "scalar" could be a length 1 Categorical
in practice but I'm not sure if that breaks the mental model of anything else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems really weird to me. Is there some test(s) that are easier to fix with that than with either of the other options?
We should probably deprecate/change CategoricalDtype.type at some point and ensure these match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this makes other tests easier as of yet since we don't test pa.dictionary
fully yet.
Will just use CategoricalDtypeType
here for now.
# TODO: Potentially change this & CategoricalDtype.type to | ||
# something more representative of the scalar | ||
return CategoricalDtypeType | ||
elif pa.types.is_list(pa_type) or pa.types.is_large_list(pa_type): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
side note: think with this supported we can get rid of JSONArray?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think we can get rid of the date and decimal arrays now since they have corresponding arrow types tested and json and list arrays once we add tests for those arrow types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…rowDtype.type) (#51887) Backport PR #51854: ENH: Add misc pyarrow types to ArrowDtype.type Co-authored-by: Matthew Roeschke <[email protected]>
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.