-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Description
In #35169 / #42597 we discussed the desired behavior of PyArrow-backed StringArray when a certain method is not implemented in pyarrow.compute
.
For string methods like str_normalize
, which aren't currently implemented in pyarrow.compute
, I believe we (silently) cast from Pyarrow[string]
to an object-dtype ndarray of Python str
objects at
pandas/pandas/core/arrays/string_.py
Lines 527 to 528 in edd5af7
mask = isna(self) | |
arr = np.asarray(self) |
These kinds of performance cliffs are difficult for users to debug. I don't think we should do that conversion on behalf of the user. If something isn't implemented yet, then I think we should raise with a message saying they should convert to string[python]
dtype first.
If we don't want to raise, we could emit a PerformanceWarning
, similar to what we do for SparseArray when converting to dense.