Skip to content

BUG: Behavior change on DataFrame instantiation from 2.0.0 2.0.1 #53100

Closed
@ivirshup

Description

@ivirshup

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

pd.DataFrame({}).columns.dtype

Issue Description

On 2.0.0, this returned:

dtype('O')

On 2.0.1, this returns:

dtype('int64')

This was changed in #52404, and I'd previously reported a similar issue around the 2.0.0rcs in #51725.

Happy to continue this conversation on #51725 if that's more appropriate.

I would note that:

pd.DataFrame(columns=[]).columns.dtype

Still returns dtype('O') in 2.0.1, as it did in 2.0.0 and in the 1.x series.

I have lots of tests that rely on creating an empty dataframe where the dtype of the columns should be "string like" or pd.api.types.infer_dtype -> "empty". (e.g. my library requires only works with dataframes with where column names must be strings, similar to pyarrow).

From

It sounds like there is a desire to move to "empty" as the dtype. I am a fan of this, and think it resembles the 1.x behavior.

Expected Behavior

Ideally, for the behavior of pd.DataFrame({}) to not change between bug fix releases.

This could also be addressed with information around:

  • Whether pd.DataFrame(columns=[]) will also change in a future version.
  • What way won't change

Installed Versions

INSTALLED VERSIONS
------------------
commit           : 37ea63d540fd27274cad6585082c91b1283f963d
python           : 3.10.10.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 20.6.0
Version          : Darwin Kernel Version 20.6.0: Thu Mar  9 20:39:26 PST 2023; root:xnu-7195.141.49.700.6~1/RELEASE_X86_64
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : None
LANG             : None
LOCALE           : None.UTF-8

pandas           : 2.0.1
numpy            : 1.24.3
pytz             : 2023.3
dateutil         : 2.8.2
setuptools       : 67.7.2
pip              : 23.1.2
Cython           : None
pytest           : 7.3.1
hypothesis       : None
sphinx           : 6.2.1
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 3.1.2
IPython          : 8.13.2
pandas_datareader: None
bs4              : 4.12.2
bottleneck       : None
brotli           : None
fastparquet      : None
fsspec           : 2023.4.0
gcsfs            : None
matplotlib       : 3.7.1
numba            : 0.57.0
numexpr          : None
odfpy            : None
openpyxl         : 3.1.2
pandas_gbq       : None
pyarrow          : None
pyreadstat       : None
pyxlsb           : None
s3fs             : None
scipy            : 1.10.1
snappy           : None
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : None
zstandard        : None
tzdata           : 2023.3
qtpy             : None
pyqt5            : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugConstructorsSeries/DataFrame/Index/pd.array ConstructorsDataFrameDataFrame data structureRegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions