Description
I am trying to group by a particular level in a dataframe with multi-indexed columns. In particular, I want to use the quantile
function. It looks like quantile
breaks for columns but not for rows, and other functions like mean
work fine.
df = pd.DataFrame(np.random.rand(3, 4), columns=[["A", "A", "B", "B"], range(4)])
df.columns.names = ["first", "second"]
df
first | A | A | B | B |
---|---|---|---|---|
second | 0 | 1 | 2 | 3 |
0 | 0.942337 | 0.090621 | 0.977834 | 0.332177 |
1 | 0.301687 | 0.907762 | 0.062494 | 0.091152 |
2 | 0.554201 | 0.282348 | 0.344425 | 0.941074 |
Calling quantile
on the columns breaks:
df.groupby(level="first", axis=1).quantile(0.75)
TypeError Traceback (most recent call last)
in ()
----> 1 df.groupby(level="first", axis=1).quantile(0.75)/Users/lev/miniconda2/envs/cmappy/lib/python2.7/site-packages/pandas/core/groupby.pyc in quantile(self, q, axis, numeric_only, interpolation)
/Users/lev/miniconda2/envs/cmappy/lib/python2.7/site-packages/pandas/core/groupby.pyc in wrapper(*args, **kwargs)
610 try:
611 return self._aggregate_item_by_item(name,
--> 612 *args, **kwargs)
613 except (AttributeError):
614 raise ValueError/Users/lev/miniconda2/envs/cmappy/lib/python2.7/site-packages/pandas/core/groupby.pyc in _aggregate_item_by_item(self, func, *args, **kwargs)
3554 # GH6337
3555 if not len(result_columns) and errors is not None:
-> 3556 raise errors
3557
3558 return DataFrame(result, columns=result_columns)TypeError: quantile() got an unexpected keyword argument 'numeric_only'
But mean
works okay:
df.groupby(level="first", axis=1).mean()
first | A | B |
---|---|---|
0 | 0.516479 | 0.655005 |
1 | 0.604725 | 0.076823 |
2 | 0.418274 | 0.642749 |
Transposing fixes the problem:
df.T.groupby(level="first", axis=0).quantile(0.75).T
first | A | B |
---|---|---|
0.75 | ||
0 | 0.729408 | 0.816419 |
1 | 0.756243 | 0.083987 |
2 | 0.486237 | 0.791911 |
Expected Output
The expected output is the last result above, but it'd be nice not to have to transpose and then to re-transpose to get it to work.
Thank you very much for all your good work!
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.20.3
pytest: None
pip: 9.0.1
setuptools: 33.1.1.post20170320
Cython: None
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.4.1
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.5
s3fs: None
pandas_gbq: None
pandas_datareader: None
None