-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Closed
Labels
CategoricalCategorical Data TypeCategorical Data TypeReshapingConcat, Merge/Join, Stack/Unstack, ExplodeConcat, Merge/Join, Stack/Unstack, Explode
Milestone
Description
xref #8731, soln might be the same
Currently when we do a crosstab
, the distinct values in each column is reported in the lexical order. But crosstabs are usually useful when we have categorical data (that may have an inherent ordering).
import pandas as pd
d = {'MAKE' : pd.Series(['Honda', 'Acura', 'Tesla', 'Honda', 'Honda', 'Acura']),
'MODEL' : pd.Series(['Sedan', 'Sedan', 'Electric', 'Pickup', 'Sedan', 'Sedan'])}
data = pd.DataFrame(d)
pd.crosstab(data['MAKE'],data['MODEL'])
data['MODEL'] = data['MODEL'].astype('category')
data['MODEL'] = data['MODEL'].cat.set_categories(['Sedan','Electric','Pickup'])
pd.crosstab(data['MAKE'],data['MODEL'])
Both the cross-tab statements above result in the same output as below - essentially the code I believe is performing a lexical sort on the contents of the Series
being passed.
Output:
MODEL Electric Pickup Sedan
MAKE
Acura 0 0 2
Honda 0 1 2
Tesla 1 0 0
Would it be possible for crosstab
to maintain the ordering of the categorical variable if column.cat.ordered on the passed column is True? Thanks!
Metadata
Metadata
Assignees
Labels
CategoricalCategorical Data TypeCategorical Data TypeReshapingConcat, Merge/Join, Stack/Unstack, ExplodeConcat, Merge/Join, Stack/Unstack, Explode