-
-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Description
Code Sample, a copy-pastable example if possible
# requires registration-only downloads from https://oai.epi-ucsf.org/datarelease/DataClinical.asp
df1 = pandas.read_sas('allclinical00.sas7bdat')
df2 = pandas.read_sas('AllClinical00.xpt')
Problem description
I downloaded SAS datasets from the Osteoarthritis Initiative (OAI) and tried loading them with Pandas. I do not have SAS myself, and I don't have prior experience with it. The OAI data offers different formats for download, as you can see from the above filenames.
Expected Output
I would expect large datasets (roughly 4800 rows with several hundreds columns), and (as far as I understood) the same data from the .sas7bdat and .xpt files. In fact, out of 12 .sas7bdat files, I can open 3, all others fail with:
('Warning: column count mismatch (%d + %d != %d)\n', 143, 191, 1226)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-3-9a49a9b73cd0> in <module>()
----> 1 df = pandas.read_sas('allclinical00.sas7bdat')
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/sas/sasreader.pyc in read_sas(filepath_or_buffer, format, index, encoding, chunksize, iterator)
59 return reader
60
---> 61 data = reader.read()
62 reader.close()
63 return data
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/sas/sas7bdat.pyc in read(self, nrows)
602 nrows = m
603
--> 604 nd = (self.column_types == b'd').sum()
605 ns = (self.column_types == b's').sum()
606
AttributeError: 'bool' object has no attribute 'sum'
Note the (improperly formatted) warning at the top. self.column_types
turns out to be an empty list ([]
).
For df2
, I also tried read_sas()
, but got
ValueError: Header record is not an XPORT file.
Probably, this is related to the following part of the OAI documentation:
Using SAS Transport Files
The SAS dataset(s) in this zip file were created using SAS CPORT and the SAS V9 engine in the Windows environment. We strongly recommend that you use SAS V9 or higher to access the OAI data. …
PROC CPORT creates files in transport format, which uses an environment-independent standard for character encoding and numeric representation. Transport files that are created by PROC CPORT can be transferred across operating environments and read using PROC CIMPORT.
Note: SAS transport files that are created using PROC CPORT are not interchangeable with transport files that are created using the XPORT engine.
I think if I can load the .sas7bdat files, it would be OK if Pandas cannot read CPORT files. It could be
helpful, though, if it would recognize them and be more specific ("This seems to be a CPORT file. CPORT files are currently not supported, only XPORT and SAS7BDAT." or so).
Output of pd.show_versions()
pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 34.3.1
Cython: 0.25.2
numpy: 1.12.0
scipy: 0.19.0
statsmodels: 0.8.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.3
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: 3.2.3.1
numexpr: 2.6.2
matplotlib: 2.0.0
openpyxl: 2.4.5
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: 3.6.0
bs4: 4.5.3
html5lib: 1.0b10
httplib2: None
apiclient: None
sqlalchemy: 1.1.6
pymysql: None
psycopg2: None
jinja2: 2.9.5
boto: None
pandas_datareader: None