Skip to content

BUG: read_csv: C and python engines have inconsistent behavior when parse_dates=[] #38489

@xuhdev

Description

@xuhdev
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import io
import pandas as pd

s = """some_header
2020-01-01 01:00:00"""


print('python,parse_dates=False')
print(pd.read_csv(io.StringIO(s), parse_dates=False, engine='python', delimiter=' '))
print('python,parse_dates=[]')
print(pd.read_csv(io.StringIO(s), parse_dates=[], engine='python', delimiter=' '))

print('c, parse_dates=False')
print(pd.read_csv(io.StringIO(s), parse_dates=False, engine='c', delimiter=' '))
print('c, parse_dates=[]')
print(pd.read_csv(io.StringIO(s), parse_dates=[], engine='c', delimiter=' '))

Problem description

In the code above, the first three read_csv calls succeed, but the last one crashes:

Traceback (most recent call last):
  File "bug.py", line 16, in <module>
    print(pd.read_csv(io.StringIO(s), parse_dates=[], engine='c', delimiter=' '))
  File "/home/hong/wsrc/dax-api/.tox/dev/lib64/python3.8/site-packages/pandas/io/parsers.py", line 688, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/hong/wsrc/dax-api/.tox/dev/lib64/python3.8/site-packages/pandas/io/parsers.py", line 460, in _read
    data = parser.read(nrows)
  File "/home/hong/wsrc/dax-api/.tox/dev/lib64/python3.8/site-packages/pandas/io/parsers.py", line 1198, in read
    ret = self._engine.read(nrows)
  File "/home/hong/wsrc/dax-api/.tox/dev/lib64/python3.8/site-packages/pandas/io/parsers.py", line 2200, in read
    values = self._maybe_parse_dates(values, i, try_parse_dates=True)
  File "/home/hong/wsrc/dax-api/.tox/dev/lib64/python3.8/site-packages/pandas/io/parsers.py", line 2261, in _maybe_parse_dates
    if try_parse_dates and self._should_parse_dates(index):
  File "/home/hong/wsrc/dax-api/.tox/dev/lib64/python3.8/site-packages/pandas/io/parsers.py", line 1569, in _should_parse_dates
    j = self.index_col[i]
TypeError: 'NoneType' object is not subscriptable

Expected Output

The content of the CSV being printed.

Output of pd.show_versions()

``` INSTALLED VERSIONS ------------------ commit : b5958ee python : 3.8.0.final.0 python-bits : 64 OS : Linux OS-release : 3.10.0-1160.6.1.el7.x86_64 Version : #1 SMP Wed Oct 21 13:44:38 EDT 2020 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.1.5
numpy : 1.19.4
pytz : 2020.4
dateutil : 2.8.1
pip : 20.3.1
setuptools : 50.3.2
Cython : None
pytest : 6.2.0
hypothesis : None
sphinx : 3.3.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None


</details>

Metadata

Metadata

Assignees

No one assigned

    Labels

    Closing CandidateMay be closeable, needs more eyeballsIO CSVread_csv, to_csv

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions