Skip to content

BUG: Groupby transform with missing groups #8955

@miketkelly

Description

@miketkelly

In a groupby/transform when some of the groups are missing, should the transformed values be set to missing (my preference), left unchanged, or should this be an error? Currently the behavior is inconsistent between Series and Frames, and between cythonized and non-cythonized transformations.

For a Series with a non-cythonized transformation, the values are left unchanged:

>>> import pandas as pd
>>> import numpy as np

>>> s = pd.Series([100, 200, 300, 400])
>>> s.groupby([1, 1, np.nan, np.nan]).transform(pd.Series.mean)
0    200
1    200
2    300
3    400

For a Series with cythonized functions, its an error (this changed between 0.14.1 and 0.15.0):

>>> s.groupby([1, 1, np.nan, np.nan]).transform(np.mean)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/pandas/core/groupby.py", line 2425, in transform
    return self._transform_fast(cyfunc)
  File "pandas/pandas/core/groupby.py", line 2466, in _transform_fast
    return self._set_result_index_ordered(Series(values))
  File "pandas/pandas/core/groupby.py", line 494, in _set_result_index_ordered
    result.index = self.obj.index
  File "pandas/pandas/core/generic.py", line 1948, in __setattr__
    object.__setattr__(self, name, value)
  File "pandas/src/properties.pyx", line 65, in pandas.lib.AxisProperty.__set__ (pandas/lib.c:41020)
  File "pandas/pandas/core/series.py", line 262, in _set_axis
    self._data.set_axis(axis, labels)
  File "pandas/pandas/core/internals.py", line 2217, in set_axis
    'new values have %d elements' % (old_len, new_len))
ValueError: Length mismatch: Expected axis has 2 elements, new values have 4 elements

For DataFrames, the results are opposite:

>>> f = pd.DataFrame({'a': s, 'b': s * 2})
>>> f
     a    b
0  100  200
1  200  400
2  300  600
3  400  800
>>> f.groupby([1, 1, np.nan, np.nan]).transform(np.sum)
     a    b
0  300  600
1  300  600
2  300  600
3  400  800
>>> f.groupby([1, 1, np.nan, np.nan]).transform(pd.DataFrame.sum)
Traceback (most recent call last):
  File "pandas/pandas/core/groupby.py", line 3002, in transform
    return self._transform_general(func, *args, **kwargs)
  File "pandas/pandas/core/groupby.py", line 2968, in _transform_general
    return self._set_result_index_ordered(concatenated)
  File "pandas/pandas/core/groupby.py", line 494, in _set_result_index_ordered
    result.index = self.obj.index
  File "pandas/pandas/core/generic.py", line 1948, in __setattr__
    object.__setattr__(self, name, value)
  File "pandas/src/properties.pyx", line 65, in pandas.lib.AxisProperty.__set__ (pandas/lib.c:41020)
  File "pandas/pandas/core/generic.py", line 406, in _set_axis
    self._data.set_axis(axis, labels)
  File "pandas/pandas/core/internals.py", line 2217, in set_axis
    'new values have %d elements' % (old_len, new_len))
ValueError: Length mismatch: Expected axis has 2 elements, new values have 4 elements
>>> print(pd.__version__)
0.15.1-125-ge463818

Metadata

Metadata

Assignees

No one assigned

    Labels

    ApplyApply, Aggregate, Transform, MapBugGroupby

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions