Skip to content

ExtensionArray series.align(frame) not working #20576

@jorisvandenbossche

Description

@jorisvandenbossche

Aligning a Series containing extension array data with a dataframe does not work if there is actually something to be aligned:

In [27]: from pandas.tests.extension.decimal.array import DecimalArray, make_data

In [28]: dec_arr = DecimalArray(make_data()[:3])

In [29]: s = pd.Series(dec_arr)

In [30]: s
Out[30]: 
0    0.29242561210243966929311909552779980003833770...
1    0.34798224977276304148432473084540106356143951...
2    0.04963128775050906771326708621927537024021148...
dtype: decimal

In [31]: frame = pd.DataFrame({'col': np.arange(3)})

In [32]: s.align(frame)
Out[32]: 
(0    0.29242561210243966929311909552779980003833770...
 1    0.34798224977276304148432473084540106356143951...
 2    0.04963128775050906771326708621927537024021148...
 dtype: decimal,    col
 0    0
 1    1
 2    2)

In [33]: frame = pd.DataFrame({'col': np.arange(4)})

In [34]: s.align(frame)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-34-94a43a65d337> in <module>()
----> 1 s.align(frame)

/home/joris/scipy/pandas/pandas/core/series.py in align(self, other, join, axis, level, copy, fill_value, method, limit, fill_axis, broadcast_axis)
   3231                                          fill_value=fill_value, method=method,
   3232                                          limit=limit, fill_axis=fill_axis,
-> 3233                                          broadcast_axis=broadcast_axis)
   3234 
   3235     def rename(self, index=None, **kwargs):

/home/joris/scipy/pandas/pandas/core/generic.py in align(self, other, join, axis, level, copy, fill_value, method, limit, fill_axis, broadcast_axis)
   7174                                      copy=copy, fill_value=fill_value,
   7175                                      method=method, limit=limit,
-> 7176                                      fill_axis=fill_axis)
   7177         elif isinstance(other, Series):
   7178             return self._align_series(other, join=join, axis=axis, level=level,

/home/joris/scipy/pandas/pandas/core/generic.py in _align_frame(self, other, join, axis, level, copy, fill_value, method, limit, fill_axis)
   7210         left = self._reindex_with_indexers(reindexers, copy=copy,
   7211                                            fill_value=fill_value,
-> 7212                                            allow_dups=True)
   7213         # other must be always DataFrame
   7214         right = other._reindex_with_indexers({0: [join_index, iridx],

/home/joris/scipy/pandas/pandas/core/generic.py in _reindex_with_indexers(self, reindexers, fill_value, copy, allow_dups)
   3812                                                 fill_value=fill_value,
   3813                                                 allow_dups=allow_dups,
-> 3814                                                 copy=copy)
   3815 
   3816         if copy and new_data is self._data:

/home/joris/scipy/pandas/pandas/core/internals.py in reindex_indexer(self, new_axis, indexer, axis, fill_value, allow_dups, copy)
   4377         if axis == 0:
   4378             new_blocks = self._slice_take_blocks_ax0(indexer,
-> 4379                                                      fill_tuple=(fill_value,))
   4380         else:
   4381             new_blocks = [blk.take_nd(indexer, axis=axis, fill_tuple=(

/home/joris/scipy/pandas/pandas/core/internals.py in _slice_take_blocks_ax0(self, slice_or_indexer, fill_tuple)
   4411             elif not allow_fill or self.ndim == 1:
   4412                 if allow_fill and fill_tuple[0] is None:
-> 4413                     _, fill_value = maybe_promote(blk.dtype)
   4414                     fill_tuple = (fill_value, )
   4415 

/home/joris/scipy/pandas/pandas/core/dtypes/cast.py in maybe_promote(dtype, fill_value)
    334     elif is_datetimetz(dtype):
    335         pass
--> 336     elif issubclass(np.dtype(dtype).type, string_types):
    337         dtype = np.object_
    338 

TypeError: data type not understood

Other combinations of aligning (frame with series, series with series) do work correctly:

In [36]: frame.align(s, axis=0)
Out[36]: 
(   col
 0    0
 1    1
 2    2
 3    3, 0    0.29242561210243966929311909552779980003833770...
 1    0.34798224977276304148432473084540106356143951...
 2    0.04963128775050906771326708621927537024021148...
 3                                                  NaN
 dtype: decimal)

In [37]: s.align(frame['col'])
Out[37]: 
(0    0.29242561210243966929311909552779980003833770...
 1    0.34798224977276304148432473084540106356143951...
 2    0.04963128775050906771326708621927537024021148...
 3                                                  NaN
 dtype: decimal, 0    0
 1    1
 2    2
 3    3
 Name: col, dtype: int64)

Metadata

Metadata

Assignees

No one assigned

    Labels

    ExtensionArrayExtending pandas with custom dtypes or arrays.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions