Skip to content

DataFrame does not play well with classes extending it #2859

Closed
@lukauskas

Description

@lukauskas

Hi,

I am running on the git cloned version of pandas, and there seems to be quite a few issues with user defined classes extending DataFrame.

It seems that DataFrame class constructor is hardcoded in a lot of places, where self.__class__ or cls constructors should be used instead. This causes some weird behaviour.

Allow me to illustrate, let's import pandas and define some class that would extend DataFrame

In[2]: import pandas as pd
In [3]: pd.__version__
Out[3]: '0.10.1'
In [4]: class ClassExtendingDataFrame(pd.DataFrame):
   ...:     pass
   ...: 

Note that ClassExtendingDataFrame does not override anything and is essentially the same DataFrame, just renamed.

Now one would expect a new instance of ClassExtendingDataFrame to be created by the following code:

In [5]: dict = {'a' : [1,2,3], 'b': [2,3,4]}
In [6]: x = ClassExtendingDataFrame.from_dict(dict)

Unfortunately:

In [10]: assert(isinstance(x, ClassExtendingDataFrame))
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-10-3f1ceeb1b90f> in <module>()
----> 1 assert(isinstance(x, ClassExtendingDataFrame))

AssertionError: 

In [11]: type(x)
Out[11]: pandas.core.frame.DataFrame

This is due to DataFrame being hardcoded in from_dict: https://github.com/pydata/pandas/blob/master/pandas/core/frame.py#L905 .
cls variable should be used here.

Note that ClassExtendingDataFrame is initialised using constructor, rather than from_dict method, correct object is created:

In [12]: a = ClassExtendingDataFrame(dict)
In [13]: isinstance(a, ClassExtendingDataFrame)
Out[13]: True

However, operations as simple as slicing break this:

In [14]: isinstance(a[:5], ClassExtendingDataFrame)
Out[14]: False

These are just the two examples I have noticed myself, but I am sure there could be more.
A thorough review of Hardcoded DataFrame constructors is needed to check if they could be replaced by self.__class__ or cls instead.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions