-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
import tempfile
import textwrap
import pandas
import os
workspace_dir = tempfile.mkdtemp()
csv_file = os.path.join(workspace_dir, 'non-utf8.csv')
# encode with ISO Cyrillic, include a non-ASCII character to achieve UnicodeDecodeError
with open(csv_file, 'w', encoding='iso8859_5') as file_obj:
file_obj.write(textwrap.dedent(
"""
header,
fЮЮ,
bar
"""
).strip())
try:
dataframe = pandas.read_csv(csv_file, sep=None)
except UnicodeDecodeError as error:
os.remove(csv_file)
raise
Problem description
os.remove
raises a PermissionError on Windows because apparently the file handle is still open. This only happens when the sep=None
kwarg is used. Leaving out that kwarg gets the expected output.
Expected Output
Traceback (most recent call last):
File "..\scratch\pandas_file_handle.py", line 19, in <module>
dataframe = pandas.read_csv(csv_file, sep=None)
File "C:\Users\dmf\projects\invest\env\lib\site-packages\pandas\io\parsers.py", line 605, in read_csv
return _read(filepath_or_buffer, kwds)
File "C:\Users\dmf\projects\invest\env\lib\site-packages\pandas\io\parsers.py", line 457, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\Users\dmf\projects\invest\env\lib\site-packages\pandas\io\parsers.py", line 814, in __init__
self._engine = self._make_engine(self.engine)
File "C:\Users\dmf\projects\invest\env\lib\site-packages\pandas\io\parsers.py", line 1045, in _make_engine
return mapping[engine](self.f, **self.options) # type: ignore[call-arg]
File "C:\Users\dmf\projects\invest\env\lib\site-packages\pandas\io\parsers.py", line 2291, in __init__
self._make_reader(self.handles.handle)
File "C:\Users\dmf\projects\invest\env\lib\site-packages\pandas\io\parsers.py", line 2412, in _make_reader
line = f.readline()
File "C:\Users\dmf\projects\invest\env\lib\codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xce in position 10: invalid continuation byte
Output of pd.show_versions()
INSTALLED VERSIONS
commit : 3e89b4c
python : 3.7.9.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : AMD64 Family 23 Model 1 Stepping 1, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : None.None
pandas : 1.2.0
numpy : 1.19.2
pytz : 2020.5
dateutil : 2.8.1
pip : 20.2.4
setuptools : 49.6.0.post20201009
Cython : 0.29.21
pytest : 6.1.1
hypothesis : None
sphinx : 3.2.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
numba : None