Description
Hi,
I am running on the git cloned version of pandas
, and there seems to be quite a few issues with user defined classes extending DataFrame
.
It seems that DataFrame
class constructor is hardcoded in a lot of places, where self.__class__
or cls
constructors should be used instead. This causes some weird behaviour.
Allow me to illustrate, let's import pandas and define some class that would extend DataFrame
In[2]: import pandas as pd
In [3]: pd.__version__
Out[3]: '0.10.1'
In [4]: class ClassExtendingDataFrame(pd.DataFrame):
...: pass
...:
Note that ClassExtendingDataFrame
does not override anything and is essentially the same DataFrame
, just renamed.
Now one would expect a new instance of ClassExtendingDataFrame
to be created by the following code:
In [5]: dict = {'a' : [1,2,3], 'b': [2,3,4]}
In [6]: x = ClassExtendingDataFrame.from_dict(dict)
Unfortunately:
In [10]: assert(isinstance(x, ClassExtendingDataFrame))
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-10-3f1ceeb1b90f> in <module>()
----> 1 assert(isinstance(x, ClassExtendingDataFrame))
AssertionError:
In [11]: type(x)
Out[11]: pandas.core.frame.DataFrame
This is due to DataFrame
being hardcoded in from_dict
: https://github.com/pydata/pandas/blob/master/pandas/core/frame.py#L905 .
cls
variable should be used here.
Note that ClassExtendingDataFrame
is initialised using constructor, rather than from_dict
method, correct object is created:
In [12]: a = ClassExtendingDataFrame(dict)
In [13]: isinstance(a, ClassExtendingDataFrame)
Out[13]: True
However, operations as simple as slicing break this:
In [14]: isinstance(a[:5], ClassExtendingDataFrame)
Out[14]: False
These are just the two examples I have noticed myself, but I am sure there could be more.
A thorough review of Hardcoded DataFrame
constructors is needed to check if they could be replaced by self.__class__
or cls
instead.