-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Closed
Labels
IO CSVread_csv, to_csvread_csv, to_csvOutput-Formatting__repr__ of pandas objects, to_string__repr__ of pandas objects, to_string
Description
If I have a dataframe with cells containing lists of strings (or unicode strings), then these lists are broken when I use to_csv()
with the encoding
parameter set. The error does not occur if the encoding
is not set.
Here is an example (using pandas version 0.16.2):
df = pd.DataFrame.from_records(
[('Mary S.',['Detroit, MI','New York, NY']),
('John U.',[u'Atlanta, GA',u'Paris, France'])],
columns=['name','residences'])
df.to_csv('ascii.csv')
df.to_csv('utf8.csv',encoding='utf-8')
The ascii-encoded CSV file is fine. (contents of 'ascii.csv' below)
,name,residences
0,Mary S.,"['Detroit, MI', 'New York, NY']"
1,John U.,"[u'Atlanta, GA', u'Paris, France']"
But the unicode CSV file fails to quote the strings within the lists. (contents of 'utf8.csv' below)
,name,residences
0,Mary S.,"[Detroit, MI, New York, NY]"
1,John U.,"[Atlanta, GA, Paris, France]"
This results in the data being impossible to recover. For example, if I load this file using read_csv()
, the relevant cells are treated as strings, and cannot be accurately recast as lists.
The behavior is the same using encoding='utf-16'
but I didn't check any other encodings.
agnesmm, rebeen, aausch and almartin82
Metadata
Metadata
Assignees
Labels
IO CSVread_csv, to_csvread_csv, to_csvOutput-Formatting__repr__ of pandas objects, to_string__repr__ of pandas objects, to_string