Skip to content

DataFrameGroupBy.quantile breaks on axis=1 using a multi-indexed dataframe #17961

Closed
@levlitichev

Description

@levlitichev

I am trying to group by a particular level in a dataframe with multi-indexed columns. In particular, I want to use the quantile function. It looks like quantile breaks for columns but not for rows, and other functions like mean work fine.

df = pd.DataFrame(np.random.rand(3, 4), columns=[["A", "A", "B", "B"], range(4)])
df.columns.names = ["first", "second"]
df
first A A B B
second 0 1 2 3
0 0.942337 0.090621 0.977834 0.332177
1 0.301687 0.907762 0.062494 0.091152
2 0.554201 0.282348 0.344425 0.941074

Calling quantile on the columns breaks:

df.groupby(level="first", axis=1).quantile(0.75)

TypeError Traceback (most recent call last)
in ()
----> 1 df.groupby(level="first", axis=1).quantile(0.75)

/Users/lev/miniconda2/envs/cmappy/lib/python2.7/site-packages/pandas/core/groupby.pyc in quantile(self, q, axis, numeric_only, interpolation)

/Users/lev/miniconda2/envs/cmappy/lib/python2.7/site-packages/pandas/core/groupby.pyc in wrapper(*args, **kwargs)
610 try:
611 return self._aggregate_item_by_item(name,
--> 612 *args, **kwargs)
613 except (AttributeError):
614 raise ValueError

/Users/lev/miniconda2/envs/cmappy/lib/python2.7/site-packages/pandas/core/groupby.pyc in _aggregate_item_by_item(self, func, *args, **kwargs)
3554 # GH6337
3555 if not len(result_columns) and errors is not None:
-> 3556 raise errors
3557
3558 return DataFrame(result, columns=result_columns)

TypeError: quantile() got an unexpected keyword argument 'numeric_only'

But mean works okay:

df.groupby(level="first", axis=1).mean()
first A B
0 0.516479 0.655005
1 0.604725 0.076823
2 0.418274 0.642749

Transposing fixes the problem:

df.T.groupby(level="first", axis=0).quantile(0.75).T
first A B
0.75
0 0.729408 0.816419
1 0.756243 0.083987
2 0.486237 0.791911

Expected Output

The expected output is the last result above, but it'd be nice not to have to transpose and then to re-transpose to get it to work.

Thank you very much for all your good work!

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.20.3
pytest: None
pip: 9.0.1
setuptools: 33.1.1.post20170320
Cython: None
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.4.1
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.5
s3fs: None
pandas_gbq: None
pandas_datareader: None
None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions