-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Open
Labels
API DesignBugIndexRelated to the Index class or subclassesRelated to the Index class or subclassesNeeds DiscussionRequires discussion from core team before further actionRequires discussion from core team before further action
Description
Code Sample, a copy-pastable example
import pandas as pd
pd.Index(["a", "a", "b", "c"]).get_indexer(pd.Index(["b", "c"]))
# InvalidIndexError: Reindexing only valid with uniquely valued Index objects
Came across this here: #38745 (comment)
Problem description
It makes sense to me why this would error:
pd.Index(["a", "a", "b", "c"]).get_indexer(pd.Index(["a", "c"]))
There isn't a unique solution. For the case presented at the top, there is a unique solution.
Expected Output
pd.Index(["a", "a", "b", "c"]).get_indexer(pd.Index(["b", "c"]))
# array([2, 3])
Note, this is the behaviour of pd.Index._get_indexer
on master
Output of pd.show_versions()
master
INSTALLED VERSIONS
------------------
commit : 7f912a4009da963b5eacdcebb638c38eec06e7a7
python : 3.8.5.final.0
python-bits : 64
OS : Darwin
OS-release : 19.6.0
Version : Darwin Kernel Version 19.6.0: Tue Nov 10 00:10:30 PST 2020; root:xnu-6153.141.10~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.3.0.dev0+228.g7f912a4009
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 49.2.0.post20200712
Cython : 0.29.21
pytest : 5.4.3
hypothesis : 5.20.3
sphinx : 3.1.1
blosc : None
feather : None
xlsxwriter : 1.2.9
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : 0.7.4
fastparquet : 0.4.1
gcsfs : 0.6.2
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.4
pandas_gbq : None
pyarrow : 0.17.1
pyxlsb : None
s3fs : 0.4.2
scipy : 1.5.1
sqlalchemy : 1.3.18
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.16.0
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.50.1
1.2.0
pd.show_versions()
is failing with: ImportError: Can't determine version for numba
, so here's conda list
:
$ conda list
# packages in environment at /Users/isaac/miniconda3/envs/pandas-1.2:
#
# Name Version Build Channel
appnope 0.1.2 py38hecd8cb5_1001
backcall 0.2.0 py_0
blas 1.0 mkl
ca-certificates 2020.12.8 hecd8cb5_0
certifi 2020.12.5 py38hecd8cb5_0
decorator 4.4.2 py_0
intel-openmp 2019.4 233
ipython 7.19.0 py38h01d92e1_0
ipython_genutils 0.2.0 pyhd3eb1b0_1
jedi 0.17.0 py38_0
libcxx 11.0.0 h4c3b8ed_1 conda-forge
libedit 3.1.20191231 h1de35cc_1
libffi 3.3 hb1e8313_2
mkl 2019.4 233
mkl-service 2.3.0 py38h9ed2024_0
mkl_fft 1.2.0 py38hc64f4ea_0
mkl_random 1.1.1 py38h959d312_0
ncurses 6.2 h0a44026_1
numpy 1.19.2 py38h456fd55_0
numpy-base 1.19.2 py38hcfb5961_0
openssl 1.1.1i h9ed2024_0
pandas 1.2.0 py38he9f00de_0 conda-forge
parso 0.8.1 pyhd3eb1b0_0
pexpect 4.8.0 pyhd3eb1b0_3
pickleshare 0.7.5 pyhd3eb1b0_1003
pip 20.3.3 py38hecd8cb5_0
prompt-toolkit 3.0.8 py_0
ptyprocess 0.6.0 pyhd3eb1b0_2
pygments 2.7.3 pyhd3eb1b0_0
python 3.8.5 h26836e1_1
python-dateutil 2.8.1 py_0
python_abi 3.8 1_cp38 conda-forge
pytz 2020.4 pyhd3eb1b0_0
readline 8.0 h1de35cc_0
setuptools 51.0.0 py38hecd8cb5_2
six 1.15.0 py38hecd8cb5_0
sqlite 3.33.0 hffcf06c_0
tk 8.6.10 hb0a8c7a_0
traitlets 5.0.5 py_0
wcwidth 0.2.5 py_0
wheel 0.36.2 pyhd3eb1b0_0
xz 5.2.5 h1de35cc_0
zlib 1.2.11 h1de35cc_3
Metadata
Metadata
Assignees
Labels
API DesignBugIndexRelated to the Index class or subclassesRelated to the Index class or subclassesNeeds DiscussionRequires discussion from core team before further actionRequires discussion from core team before further action