Skip to content

Bug in Series.str.repeat with string dtype and sequence of repeats #31632

@TomAugspurger

Description

@TomAugspurger

Code Sample, a copy-pastable example if possible

In [1]: import pandas as pd

In [2]: s = pd.Series(['a', None], dtype="string")

In [3]: s.str.repeat([1, 2])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/sandbox/pandas/pandas/core/strings.py in rep(x, r)
    781             try:
--> 782                 return bytes.__mul__(x, r)
    783             except TypeError:

TypeError: descriptor '__mul__' requires a 'bytes' object but received a 'NAType'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-3-a01827562f7a> in <module>
----> 1 s.str.repeat([1, 2])

~/sandbox/pandas/pandas/core/strings.py in wrapper(self, *args, **kwargs)
   1950                 )
   1951                 raise TypeError(msg)
-> 1952             return func(self, *args, **kwargs)
   1953
   1954         wrapper.__name__ = func_name

~/sandbox/pandas/pandas/core/strings.py in repeat(self, repeats)
   2780     @forbid_nonstring_types(["bytes"])
   2781     def repeat(self, repeats):
-> 2782         result = str_repeat(self._parent, repeats)
   2783         return self._wrap_result(result)
   2784

~/sandbox/pandas/pandas/core/strings.py in str_repeat(arr, repeats)
    785
    786         repeats = np.asarray(repeats, dtype=object)
--> 787         result = libops.vec_binop(com.values_from_object(arr), repeats, rep)
    788         return result
    789

~/sandbox/pandas/pandas/_libs/ops.pyx in pandas._libs.ops.vec_binop()
    239                 result[i] = y
    240             else:
--> 241                 raise
    242
    243     return maybe_convert_bool(result.base)  # `.base` to access np.ndarray

~/sandbox/pandas/pandas/_libs/ops.pyx in pandas._libs.ops.vec_binop()
    232         y = right[i]
    233         try:
--> 234             result[i] = op(x, y)
    235         except TypeError:
    236             if x is None or is_nan(x):

~/sandbox/pandas/pandas/core/strings.py in rep(x, r)
    782                 return bytes.__mul__(x, r)
    783             except TypeError:
--> 784                 return str.__mul__(x, r)
    785
    786         repeats = np.asarray(repeats, dtype=object)

TypeError: descriptor '__mul__' requires a 'str' object but received a 'NAType'

Problem description

The str_repeat method correctly handles NA values when repeats is a scalar, but fails when its a sequence.

Expected Output

0       a
1    <NA>
dtype: string

Metadata

Metadata

Assignees

No one assigned

    Labels

    Missing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateNA - MaskedArraysRelated to pd.NA and nullable extension arraysStringsString extension data type and string data

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions