Skip to content

Commit 6047485

Browse files
committed
BUG: fix issue with get_dummies not using sparse correct
The underlying problem is that inside of get_dummies when you say sparse=True it only encodes the columns that are part of the dummies transformation to SparseDataFrames. This would be fine except if you reindex on one of the dense frames it causes block errors at a lower level. This should fix that by casting all columns as sparse if sparse=True
1 parent dbec3c9 commit 6047485

File tree

3 files changed

+13
-0
lines changed

3 files changed

+13
-0
lines changed

doc/source/whatsnew/v0.23.0.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -353,6 +353,7 @@ Reshaping
353353
- Bug in :func:`Series.rank` where ``Series`` containing ``NaT`` modifies the ``Series`` inplace (:issue:`18521`)
354354
- Bug in :func:`cut` which fails when using readonly arrays (:issue:`18773`)
355355
- Bug in :func:`Dataframe.pivot_table` which fails when the ``aggfunc`` arg is of type string. The behavior is now consistent with other methods like ``agg`` and ``apply`` (:issue:`18713`)
356+
- Bug in :func:`get_dummies` which doesn't cast all columns as sparse which causes reindex to fail. (:issue:`18914`)
356357

357358

358359
Numeric

pandas/core/reshape/reshape.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -841,6 +841,10 @@ def check_len(item, name):
841841
with_dummies = []
842842
else:
843843
with_dummies = [data.drop(columns_to_encode, axis=1)]
844+
if sparse:
845+
with_dummies = [SparseDataFrame(with_dummies[0],
846+
index=with_dummies.index,
847+
default_fill_value=0)]
844848

845849
for (col, pre, sep) in zip(columns_to_encode, prefix, prefix_sep):
846850

pandas/tests/reshape/test_reshape.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -454,6 +454,14 @@ def test_dataframe_dummies_preserve_categorical_dtype(self, dtype):
454454

455455
tm.assert_frame_equal(result, expected)
456456

457+
def test_get_dummies_sparsify_all_columns(self):
458+
# GH18914
459+
df = DataFrame.from_items([('GDP', [1, 2]), ('Nation', ['AB', 'CD'])])
460+
df = get_dummies(df, columns=['Nation'], sparse=True)
461+
df2 = df.reindex(columns=['GDP'])
462+
463+
tm.assert_index_equal(df.index, df2.index)
464+
457465

458466
class TestCategoricalReshape(object):
459467

0 commit comments

Comments
 (0)