-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Closed
Closed
Copy link
Labels
Needs TestsUnit test(s) needed to prevent regressionsUnit test(s) needed to prevent regressionsgood first issue
Milestone
Description
It seems that somehow the columns used in sum
when applied to a 1 row dataframe depend on the values in the row instead of just the dtypes. Observe:
import pandas as pd
import numpy as np
# Frame with some non-numeric dtypes
df = pd.DataFrame({'a': [1], 'b': [1.1], 'c': ['foo'], 'd': [pd.Timestamp('2000-01-01')]})
# Only change here is that `d` is `NaT`
df2 = pd.DataFrame({'a': [1], 'b': [1.1], 'c': ['foo'], 'd': [pd.NaT]})
# This is just the first one twice
df3 = pd.concat([df, df])
# I'd expect all 3 to use the same columns in the reduction
df_sum = df.sum()
df2_sum = df2.sum()
df3_sum = df3.sum()
Loading that in an ipython session:
In [1]: df_sum
Out[1]:
a 1
b 1.1
c foo
d 2000-01-01 00:00:00
dtype: object
In [2]: df2_sum
Out[2]:
a 1.0
b 1.1
dtype: float64
In [3]: df3_sum
Out[3]:
a 2.0
b 2.2
dtype: float64
In [4]: pd.__version__
Out[4]: u'0.18.1'
In [5]: np.__version__
Out[5]: '1.11.1'
I'd expect all 3 to only use the columns ['a', 'b']
, as these are the only numeric columns. Strangely, _get_numeric_data
does return just ['a', 'b']
in all cases, so it's not that.
Metadata
Metadata
Assignees
Labels
Needs TestsUnit test(s) needed to prevent regressionsUnit test(s) needed to prevent regressionsgood first issue