Skip to content

BUG: IndexError on read_csv/read_table when using usecols/names parameters and omitting last column #5766

@wrenoud

Description

@wrenoud

Example code:

from StringIO import StringIO
import pandas as pd

names = ["a","b","c"]

data = """\
0,1,2
3,4,5
6,7,8"""

# usecols works as expected if all columns are named
print pd.read_csv(StringIO(data), header=None, usecols=[1,2], names=names)
print pd.read_csv(StringIO(data), header=None, usecols=[0,1], names=names)

# naming only columns selected with usecols works when last column is included
print pd.read_csv(StringIO(data), header=None, usecols=[1,2], names=names[1:])
# causes IndexError
print pd.read_csv(StringIO(data), header=None, usecols=[0,1], names=names[:-1])

Output:

   b  c
0  1  2
1  4  5
2  7  8

[3 rows x 2 columns]
   a  b
0  0  1
1  3  4
2  6  7

[3 rows x 2 columns]
   b  c
0  1  2
1  4  5
2  7  8

[3 rows x 2 columns]
Traceback (most recent call last):
  File "pandas_test2.py", line 18, in <module>
    print pd.read_csv(StringIO(data), header=None, usecols=[0,1], names=names[:-1])
  File "/home/weston/pandas/pandas/io/parsers.py", line 404, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/weston/pandas/pandas/io/parsers.py", line 212, in _read
    return parser.read()
  File "/home/weston/pandas/pandas/io/parsers.py", line 610, in read
    ret = self._engine.read(nrows)
  File "/home/weston/pandas/pandas/io/parsers.py", line 1050, in read
    data = self._reader.read(nrows)
  File "parser.pyx", line 727, in pandas.parser.TextReader.read (pandas/parser.c:6475)
  File "parser.pyx", line 749, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:6695)
  File "parser.pyx", line 824, in pandas.parser.TextReader._read_rows (pandas/parser.c:7517)
  File "parser.pyx", line 902, in pandas.parser.TextReader._convert_column_data (pandas/parser.c:8296)
  File "parser.pyx", line 1139, in pandas.parser.TextReader._get_column_name (pandas/parser.c:11353)
IndexError: list index out of range

print_versions.py output:

INSTALLED VERSIONS
------------------
Python: 2.7.3.final.0
OS: Linux 3.2.0-51-generic #77-Ubuntu SMP Wed Jul 24 20:21:10 UTC 2013 i686
byteorder: little
LC_ALL: None
LANG: en_CA.UTF-8

pandas: 0.13.0rc1-119-g2485e09
Cython: 0.15.1
Numpy: 1.6.1
Scipy: 0.9.0
statsmodels: Not installed
    patsy: Not installed
scikits.timeseries: Not installed
dateutil: 1.5
pytz: 2011k
bottleneck: Not installed
PyTables: Not Installed
    numexpr: Not Installed
matplotlib: 1.1.1rc
openpyxl: Not installed
xlrd: Not installed
xlwt: Not installed
xlsxwriter: Not installed
sqlalchemy: Not installed
lxml: Not installed
bs4: Not installed
html5lib: Not installed
bigquery: Not installed
apiclient: Not installed

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugRegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions