Skip to content

Series with NAMED period index raise error on groupby index.month (pandas 1.0 specific) #32108

@daxid

Description

@daxid

edit from @TomAugspurger: this is fixed on master, but the example below needs to be added as a unit test. The test can probably go in groupby/test_groupby.py.

Description

With the pandas 1.0.1 (full version with dependencies at the end), series with NAMED period index raise error on groupby index.month

There is no error if the index is not named.

There was no error wit pandas 0.25.3

Code Sample

import pandas as pd

index = pd.period_range(start='2018-01', periods=24, freq='M')
periodSerie = pd.Series(range(24),index=index)
periodSerie.index.name = 'Month'
periodSerie.groupby(periodSerie.index.month).sum()

Error

It seems to me that pandas tries to interpret the index name as if it were part of the index itself.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_value(self, series, key)
   4410             try:
-> 4411                 return libindex.get_value_at(s, key)
   4412             except IndexError:

pandas/_libs/index.pyx in pandas._libs.index.get_value_at()

pandas/_libs/index.pyx in pandas._libs.index.get_value_at()

pandas/_libs/util.pxd in pandas._libs.util.get_value_at()

pandas/_libs/util.pxd in pandas._libs.util.validate_indexer()

TypeError: 'str' object cannot be interpreted as an integer

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/indexes/period.py in get_value(self, series, key)
    516         try:
--> 517             value = super().get_value(s, key)
    518         except (KeyError, IndexError):

~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_value(self, series, key)
   4418                 else:
-> 4419                     raise e1
   4420             except Exception:

~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_value(self, series, key)
   4404         try:
-> 4405             return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
   4406         except KeyError as e1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()

KeyError: 'Month'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_datetime_string_with_reso()

pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.dateutil_parse()

ValueError: Unknown datetime string format, unable to parse: Month

During handling of the above exception, another exception occurred:

DateParseError                            Traceback (most recent call last)
<ipython-input-6-a3e948d22d88> in <module>
      5 periodSerie = pd.Series(range(24),index=index)
      6 periodSerie.index.name = 'Month'
----> 7 periodSerie.groupby(periodSerie.index.month).sum()

~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/series.py in groupby(self, by, axis, level, as_index, sort, group_keys, squeeze, observed)
   1676         axis = self._get_axis_number(axis)
   1677 
-> 1678         return groupby_generic.SeriesGroupBy(
   1679             obj=self,
   1680             keys=by,

~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/groupby/groupby.py in __init__(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, squeeze, observed, mutated)
    400             from pandas.core.groupby.grouper import get_grouper
    401 
--> 402             grouper, exclusions, obj = get_grouper(
    403                 obj,
    404                 keys,

~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/groupby/grouper.py in get_grouper(obj, key, axis, level, sort, observed, mutated, validate)
    583     for i, (gpr, level) in enumerate(zip(keys, levels)):
    584 
--> 585         if is_in_obj(gpr):  # df.groupby(df['name'])
    586             in_axis, name = True, gpr.name
    587             exclusions.append(name)

~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/groupby/grouper.py in is_in_obj(gpr)
    577             return False
    578         try:
--> 579             return gpr is obj[gpr.name]
    580         except (KeyError, IndexError):
    581             return False

~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/series.py in __getitem__(self, key)
    869         key = com.apply_if_callable(key, self)
    870         try:
--> 871             result = self.index.get_value(self, key)
    872 
    873             if not is_scalar(result):

~/.virtualenvs/dev/lib/python3.8/site-packages/pandas/core/indexes/period.py in get_value(self, series, key)
    518         except (KeyError, IndexError):
    519             if isinstance(key, str):
--> 520                 asdt, parsed, reso = parse_time_string(key, self.freq)
    521                 grp = resolution.Resolution.get_freq_group(reso)
    522                 freqn = resolution.get_freq_group(self.freq)

pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_time_string()

pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_datetime_string_with_reso()

DateParseError: Unknown datetime string format, unable to parse: Month

Expected Output

With pandas 0.25.3, the following expected output is produced :

Month
1     12
2     14
3     16
4     18
5     20
6     22
7     24
8     26
9     28
10    30
11    32
12    34
dtype: int64

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.8.1.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.18-1-MANJARO
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : fr_FR.UTF-8
LOCALE : fr_FR.UTF-8

pandas : 1.0.1
numpy : 1.18.0
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 44.0.0
Cython : 0.29.15
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.11.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.2
numexpr : None
odfpy : None
openpyxl : 3.0.2
pandas_gbq : None
pyarrow : None
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : None
numba : None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions