Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
pd.DataFrame({}).columns.dtype
Issue Description
On 2.0.0, this returned:
dtype('O')
On 2.0.1, this returns:
dtype('int64')
This was changed in #52404, and I'd previously reported a similar issue around the 2.0.0rcs in #51725.
Happy to continue this conversation on #51725 if that's more appropriate.
I would note that:
pd.DataFrame(columns=[]).columns.dtype
Still returns dtype('O')
in 2.0.1, as it did in 2.0.0 and in the 1.x series.
I have lots of tests that rely on creating an empty dataframe where the dtype of the columns should be "string like" or pd.api.types.infer_dtype
-> "empty". (e.g. my library requires only works with dataframes with where column names must be strings, similar to pyarrow
).
From
It sounds like there is a desire to move to "empty" as the dtype. I am a fan of this, and think it resembles the 1.x behavior.
Expected Behavior
Ideally, for the behavior of pd.DataFrame({})
to not change between bug fix releases.
This could also be addressed with information around:
- Whether
pd.DataFrame(columns=[])
will also change in a future version. - What way won't change
Installed Versions
INSTALLED VERSIONS
------------------
commit : 37ea63d540fd27274cad6585082c91b1283f963d
python : 3.10.10.final.0
python-bits : 64
OS : Darwin
OS-release : 20.6.0
Version : Darwin Kernel Version 20.6.0: Thu Mar 9 20:39:26 PST 2023; root:xnu-7195.141.49.700.6~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8
pandas : 2.0.1
numpy : 1.24.3
pytz : 2023.3
dateutil : 2.8.2
setuptools : 67.7.2
pip : 23.1.2
Cython : None
pytest : 7.3.1
hypothesis : None
sphinx : 6.2.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.13.2
pandas_datareader: None
bs4 : 4.12.2
bottleneck : None
brotli : None
fastparquet : None
fsspec : 2023.4.0
gcsfs : None
matplotlib : 3.7.1
numba : 0.57.0
numexpr : None
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.10.1
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None