-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Closed
Labels
AlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffBugTestingpandas testing functions or related to the test suitepandas testing functions or related to the test suitegood first issue
Milestone
Description
It's not clear from the documentation for factorize what datatype is expected for the values. But I assume that any list of hashables should work (specifically, a list of tuples).
Factorize indeed works for a list of tuples as long as the lens of all the tuples are not identical, but fails the moment all tuples have the same length. (Looks like there is some inference about the structure of the values that shouldn't be happening.)
import pandas as pd
pd.factorize([(1, 1), (1, 2), (0, 0), (1, 2), 'nonsense']) # This works
(array([0, 1, 2, 1, 3]), array([(1, 1), (1, 2), (0, 0), 'nonsense'], dtype=object))
pd.factorize([(1, 1), (1, 2), (0, 0), (1, 2), (1, 2, 3)]) # This also works.
pd.factorize([(1, 1), (1, 2), (0, 0), (1, 2)]) # <-- fails
ValueError Traceback (most recent call last)
<ipython-input-22-3ca8ec02e16c> in <module>()
1 print pd.factorize([(1, 1), (1, 2), (0, 0), (1, 2), 'nonsense'])
----> 2 print pd.factorize([(1, 1), (1, 2), (0, 0), (1, 2)])
/usr/local/lib/python2.7/dist-packages/pandas/core/algorithms.pyc in factorize(values, sort, order, na_sentinel)
132 table = hash_klass(len(vals))
133 uniques = vec_klass()
--> 134 labels = table.get_labels(vals, uniques, 0, na_sentinel)
135
136 labels = com._ensure_platform_int(labels)
/usr/local/lib/python2.7/dist-packages/pandas/hashtable.so in pandas.hashtable.Int64HashTable.get_labels (pandas/hashtable.c:8575)()
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
pandas 0.15.2
Metadata
Metadata
Assignees
Labels
AlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffBugTestingpandas testing functions or related to the test suitepandas testing functions or related to the test suitegood first issue