Skip to content

Indexing a MultiIndex with a (Multi)Index #15472

@toobaz

Description

@toobaz

Code Sample, a copy-pastable example if possible

In [2]: s = pd.Series(range(8), index=pd.MultiIndex.from_product([[1,2], [3,4], [3,4]],
                                                                 names=['a', 'b', 'c']))

In [3]: s.loc[s.index] # Works as expected
Out[3]: 
a  b  c
1  3  3    0
      4    1
   4  3    2
      4    3
2  3  3    4
      4    5
   4  3    6
      4    7
dtype: int64

In [4]: s.loc[s.iloc[2:-1].index] # Works as expected
Out[4]: 
a  b  c
1  4  3    2
      4    3
2  3  3    4
      4    5
   4  3    6
dtype: int64

In [5]: s.loc[s.index.droplevel('c')] # Just reindexes... weird
Out[5]: 
1  3   NaN
   3   NaN
   4   NaN
   4   NaN
2  3   NaN
   3   NaN
   4   NaN
   4   NaN
dtype: float64

In [6]: s.loc[s.index.droplevel(['b', 'c']), :] # Works (flat index)
Out[6]: 
a  b  c
1  3  3    0
      4    1
   4  3    2
      4    3
2  3  3    4
      4    5
   4  3    6
      4    7
dtype: int64

In [7]: s.loc[s.index.droplevel(['b', 'c'])] #... but fails if I use the shortened notation!
[...]
TypeError: unhashable type: 'Int64Index'

In [8]: s.loc[s.swaplevel('b', 'c')] # Works
Out[8]: 
a  b  c
1  3  3    0
      4    1
   4  3    2
      4    3
2  3  3    4
      4    5
   4  3    6
      4    7
dtype: int64

In [9]: s.loc[s.index.swaplevel('b', 'c')]  # Different result! (reindexes)
Out[9]: 
a  c  b
1  3  3    0
   4  3    2
   3  4    1
   4  4    3
2  3  3    4
   4  3    6
   3  4    5
   4  4    7
dtype: int64

In [10]: s.loc[pd.MultiIndex.from_product([[1,2], [3], [4]],
                                          names=['a', 'c', 'b'])] # Does not respect column names!
Out[10]: 
a  c  b
1  3  4    1
2  3  4    5
dtype: int64

Problem description

This clearly needs a unified approach (and I can try).

Expected Output

I guess most expected outputs above are obvious, except for In [10]: (and maybe In [5]:, which however is already discussed elsewhere). That is: it is not obvious whether level names in the indexer should be matched to level names in the indexed, when both are set (see this comment). It would probably be more pandas-ish if they were.

In other terms, while there is no doubt that

Out[10]: 
a  c  b
1  3  4    1
2  3  4    5
dtype: int64

is wrong, we must decide whether we want

Out[10]: 
a  b  c
1  3  4    1
2  3  4    5
dtype: int64

or

Out[10]: 
a  b  c
1  4  3    2
2  4  3    6
dtype: int64

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.7.0-1-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: it_IT.utf8 LOCALE: it_IT.UTF-8

pandas: 0.19.0+478.g12f2c6a
pytest: 3.0.6
pip: 8.1.2
setuptools: 28.0.0
Cython: 0.23.4
numpy: 1.12.0
scipy: 0.18.1
xarray: None
IPython: 5.1.0.dev
sphinx: 1.4.8
patsy: 0.3.0-dev
dateutil: 2.5.3
pytz: 2015.7
blosc: None
bottleneck: 1.2.0
tables: 3.2.2
numexpr: 2.6.0
feather: None
matplotlib: 2.0.0rc2
openpyxl: 2.3.0
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.6.4
bs4: 4.5.1
html5lib: 0.999
httplib2: 0.9.1
apiclient: 1.5.2
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.8
s3fs: None
pandas_datareader: 0.2.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIndexingRelated to indexing on series/frames, not to indexes themselvesMultiIndex

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions