-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Description
edit from @TomAugspurger: this is fixed on master, but the example below needs to be added as a unit test. The test can probably go in groupby/test_groupby.py
.
Description
With the pandas 1.0.1 (full version with dependencies at the end), series with NAMED period index raise error on groupby index.month
There is no error if the index is not named.
There was no error wit pandas 0.25.3
Code Sample
import pandas as pd
index = pd.period_range(start='2018-01', periods=24, freq='M')
periodSerie = pd.Series(range(24),index=index)
periodSerie.index.name = 'Month'
periodSerie.groupby(periodSerie.index.month).sum()
Error
It seems to me that pandas tries to interpret the index name as if it were part of the index itself.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_value(self, series, key)
4410 try:
-> 4411 return libindex.get_value_at(s, key)
4412 except IndexError:
pandas/_libs/index.pyx in pandas._libs.index.get_value_at()
pandas/_libs/index.pyx in pandas._libs.index.get_value_at()
pandas/_libs/util.pxd in pandas._libs.util.get_value_at()
pandas/_libs/util.pxd in pandas._libs.util.validate_indexer()
TypeError: 'str' object cannot be interpreted as an integer
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/indexes/period.py in get_value(self, series, key)
516 try:
--> 517 value = super().get_value(s, key)
518 except (KeyError, IndexError):
~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_value(self, series, key)
4418 else:
-> 4419 raise e1
4420 except Exception:
~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_value(self, series, key)
4404 try:
-> 4405 return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
4406 except KeyError as e1:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()
KeyError: 'Month'
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_datetime_string_with_reso()
pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.dateutil_parse()
ValueError: Unknown datetime string format, unable to parse: Month
During handling of the above exception, another exception occurred:
DateParseError Traceback (most recent call last)
<ipython-input-6-a3e948d22d88> in <module>
5 periodSerie = pd.Series(range(24),index=index)
6 periodSerie.index.name = 'Month'
----> 7 periodSerie.groupby(periodSerie.index.month).sum()
~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/series.py in groupby(self, by, axis, level, as_index, sort, group_keys, squeeze, observed)
1676 axis = self._get_axis_number(axis)
1677
-> 1678 return groupby_generic.SeriesGroupBy(
1679 obj=self,
1680 keys=by,
~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/groupby/groupby.py in __init__(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, squeeze, observed, mutated)
400 from pandas.core.groupby.grouper import get_grouper
401
--> 402 grouper, exclusions, obj = get_grouper(
403 obj,
404 keys,
~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/groupby/grouper.py in get_grouper(obj, key, axis, level, sort, observed, mutated, validate)
583 for i, (gpr, level) in enumerate(zip(keys, levels)):
584
--> 585 if is_in_obj(gpr): # df.groupby(df['name'])
586 in_axis, name = True, gpr.name
587 exclusions.append(name)
~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/groupby/grouper.py in is_in_obj(gpr)
577 return False
578 try:
--> 579 return gpr is obj[gpr.name]
580 except (KeyError, IndexError):
581 return False
~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/series.py in __getitem__(self, key)
869 key = com.apply_if_callable(key, self)
870 try:
--> 871 result = self.index.get_value(self, key)
872
873 if not is_scalar(result):
~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/indexes/period.py in get_value(self, series, key)
518 except (KeyError, IndexError):
519 if isinstance(key, str):
--> 520 asdt, parsed, reso = parse_time_string(key, self.freq)
521 grp = resolution.Resolution.get_freq_group(reso)
522 freqn = resolution.get_freq_group(self.freq)
pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_time_string()
pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_datetime_string_with_reso()
DateParseError: Unknown datetime string format, unable to parse: Month
Expected Output
With pandas 0.25.3, the following expected output is produced :
Month
1 12
2 14
3 16
4 18
5 20
6 22
7 24
8 26
9 28
10 30
11 32
12 34
dtype: int64
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.8.1.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.18-1-MANJARO
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : fr_FR.UTF-8
LOCALE : fr_FR.UTF-8
pandas : 1.0.1
numpy : 1.18.0
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 44.0.0
Cython : 0.29.15
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.11.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.2
numexpr : None
odfpy : None
openpyxl : 3.0.2
pandas_gbq : None
pyarrow : None
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : None
numba : None