Skip to content

Commit 9774163

Browse files
committed
ENH: feather support in the pandas IO api
closes #13092
1 parent d98e982 commit 9774163

16 files changed

+332
-2
lines changed

appveyor.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,7 @@ install:
8181

8282
# add the pandas channel *before* defaults to have defaults take priority
8383
- cmd: conda config --add channels pandas
84+
- cmd: conda config --add channels conda-forge
8485
- cmd: conda config --remove channels defaults
8586
- cmd: conda config --add channels defaults
8687
- cmd: conda install anaconda-client

ci/install_travis.sh

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,8 +74,11 @@ else
7474
conda config --set always_yes true --set changeps1 false || exit 1
7575
conda update -q conda
7676

77-
# add the pandas channel *before* defaults to have defaults take priority
77+
# add the pandas channel to take priority
78+
# add the conda-forge channel *before* defaults
79+
# to add extra packages
7880
echo "add channels"
81+
conda config --add channels conda-forge || exit 1
7982
conda config --add channels pandas || exit 1
8083
conda config --remove channels defaults || exit 1
8184
conda config --add channels defaults || exit 1

ci/requirements-2.7-64.run

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ openpyxl
99
xlrd
1010
sqlalchemy
1111
lxml=3.2.1
12+
feather-format
1213
scipy
1314
xlsxwriter
1415
boto

ci/requirements-2.7.run

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ openpyxl=1.6.2
99
xlrd=0.9.2
1010
sqlalchemy=0.9.6
1111
lxml=3.2.1
12+
feather-format
1213
scipy
1314
xlsxwriter=0.4.6
1415
boto=2.36.0

ci/requirements-3.5.run

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ scipy
99
numexpr
1010
pytables
1111
html5lib
12+
feather-format
1213
lxml
1314
matplotlib
1415
jinja2

ci/requirements-3.5_OSX.run

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ xlsxwriter
55
xlrd
66
xlwt
77
numexpr
8+
feather-format
89
pytables
910
html5lib
1011
lxml

doc/source/api.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,14 @@ HDFStore: PyTables (HDF5)
8282
HDFStore.get
8383
HDFStore.select
8484

85+
Feather
86+
~~~~~~~
87+
88+
.. autosummary::
89+
:toctree: generated/
90+
91+
read_feather
92+
8593
SAS
8694
~~~
8795

doc/source/install.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -247,6 +247,7 @@ Optional Dependencies
247247
* `SciPy <http://www.scipy.org>`__: miscellaneous statistical functions
248248
* `xarray <http://xarray.pydata.org>`__: pandas like handling for > 2 dims, needed for converting Panels to xarray objects. Version 0.7.0 or higher is recommended.
249249
* `PyTables <http://www.pytables.org>`__: necessary for HDF5-based storage. Version 3.0.0 or higher required, Version 3.2.1 or higher highly recommended.
250+
* `Feather Format <https://github.com/wesm/feather>`__: necessary for feather-based storage.
250251
* `SQLAlchemy <http://www.sqlalchemy.org>`__: for SQL database support. Version 0.8.1 or higher recommended. Besides SQLAlchemy, you also need a database specific driver. You can find an overview of supported drivers for each SQL dialect in the `SQLAlchemy docs <http://docs.sqlalchemy.org/en/latest/dialects/index.html>`__. Some common drivers are:
251252

252253
- `psycopg2 <http://initd.org/psycopg/>`__: for PostgreSQL

doc/source/io.rst

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ object.
3434
* :ref:`read_csv<io.read_csv_table>`
3535
* :ref:`read_excel<io.excel_reader>`
3636
* :ref:`read_hdf<io.hdf5>`
37+
* :ref:`read_feather<io.feather>`
3738
* :ref:`read_sql<io.sql>`
3839
* :ref:`read_json<io.json_reader>`
3940
* :ref:`read_msgpack<io.msgpack>` (experimental)
@@ -49,6 +50,7 @@ The corresponding ``writer`` functions are object methods that are accessed like
4950
* :ref:`to_csv<io.store_in_csv>`
5051
* :ref:`to_excel<io.excel_writer>`
5152
* :ref:`to_hdf<io.hdf5>`
53+
* :ref:`to_feather<io.feather>`
5254
* :ref:`to_sql<io.sql>`
5355
* :ref:`to_json<io.json_writer>`
5456
* :ref:`to_msgpack<io.msgpack>` (experimental)
@@ -4089,6 +4091,63 @@ object). This cannot be changed after table creation.
40894091
os.remove('store.h5')
40904092
40914093
4094+
.. _io.feather:
4095+
4096+
Feather
4097+
-------
4098+
4099+
.. versionadded:: 0.19.1
4100+
4101+
Feather provides binary columnar serialization for data frames. It is designed to make reading and writing data
4102+
frames efficient, and to make sharing data across data analysis languages easy.
4103+
4104+
Feather is designed to faithfully serialize and de-serialize DataFrames, supporting all of the pandas
4105+
dtypes, including extension dtypes such as categorical and datetime with tz.
4106+
4107+
Several caveats.
4108+
4109+
- This is a newer library, and the format, though stable, is not guaranteed to be backward compatible
4110+
to the earlier versions.
4111+
- The format will NOT write an ``Index``, or ``MultiIndex`` for the ``DataFrame`` and will raise an
4112+
error if a non-default one is provided. You can simply ``.reset_index()`` in order to store the index.
4113+
- Non supported types include ``Period`` and actual python object types. These will raise a helpful error message
4114+
on an attempt at serialization.
4115+
4116+
See the `Full Documentation <https://github.com/wesm/feather>`__
4117+
4118+
.. ipython:: python
4119+
4120+
df = pd.DataFrame({'a': list('abc'),
4121+
'b': list(range(1, 4)),
4122+
'c': np.arange(3, 6).astype('u1'),
4123+
'd': np.arange(4.0, 7.0, dtype='float64'),
4124+
'e': [True, False, True],
4125+
'f': pd.Categorical(list('abc')),
4126+
'g': pd.date_range('20130101', periods=3),
4127+
'h': pd.date_range('20130101', periods=3, tz='US/Eastern'),
4128+
'g': pd.date_range('20130101', periods=3, freq='ns')})
4129+
4130+
df
4131+
df.dtypes
4132+
4133+
Write to a feather file.
4134+
4135+
.. ipython:: python
4136+
4137+
df.to_feather('example.fth)
4138+
4139+
Read from a feather file.
4140+
4141+
.. ipython:: python
4142+
4143+
pd.read_feather('example.fth')
4144+
4145+
.. ipython:: python
4146+
:suppress:
4147+
4148+
import os
4149+
os.remove('example.fth')
4150+
40924151
.. _io.sql:
40934152
40944153
SQL Queries

doc/source/whatsnew/v0.19.1.txt

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,15 @@ Highlights include:
1515
:backlinks: none
1616

1717

18+
.. _whatsnew_0190.new_features:
19+
20+
New features
21+
~~~~~~~~~~~~
22+
23+
- Integration with the ``feather-format``, including a new top-level ``pd.read_feather()`` and ``DataFrame.to_feather()`` method, see :ref:`here <io.feather>`.
24+
25+
26+
1827
.. _whatsnew_0191.performance:
1928

2029
Performance Improvements

0 commit comments

Comments
 (0)