-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Open
Labels
BugDtype ConversionsUnexpected or buggy dtype conversionsUnexpected or buggy dtype conversionsExtensionArrayExtending pandas with custom dtypes or arrays.Extending pandas with custom dtypes or arrays.NA - MaskedArraysRelated to pd.NA and nullable extension arraysRelated to pd.NA and nullable extension arrays
Description
Follow-up on #22762 which added _reduce
to the ExtensionArray interface and added an implementation for IntegerArray.
Currently, for that IntegerArray we special case 'sum', 'min', 'max' to make sure they return ints (because the underlying implementation can be based on a float array, depending whether there are missing values).
pandas/pandas/core/arrays/integer.py
Lines 549 to 554 in 08e2752
# if we have a preservable numeric op, | |
# provide coercion back to an integer type if possible | |
elif name in ['sum', 'min', 'max', 'prod'] and notna(result): | |
int_result = int(result) | |
if int_result == result: | |
result = int_result |
That basic check has the following "problem":
- Some reductions with return python integers, others will return numpy scalars. Accessing single elements also gives numpy scalars.
In [3]: pd.Series(pd.core.arrays.integer_array([1, None, 3])).mean()
Out[3]: 2.0
In [4]: type(_)
Out[4]: numpy.float64
In [5]: pd.Series(pd.core.arrays.integer_array([1, None, 3])).sum()
Out[5]: 4
In [6]: type(_)
Out[6]: int
In [7]: type(pd.Series(pd.core.arrays.integer_array([1, None, 3]))[0])
Out[7]: numpy.int64
Metadata
Metadata
Assignees
Labels
BugDtype ConversionsUnexpected or buggy dtype conversionsUnexpected or buggy dtype conversionsExtensionArrayExtending pandas with custom dtypes or arrays.Extending pandas with custom dtypes or arrays.NA - MaskedArraysRelated to pd.NA and nullable extension arraysRelated to pd.NA and nullable extension arrays