diff --git a/doc/source/io.rst b/doc/source/io.rst index 48fe6e24dda9f..363a82ccbcf69 100644 --- a/doc/source/io.rst +++ b/doc/source/io.rst @@ -456,12 +456,22 @@ data columns: index_col=0) #index is the nominal column df -**Note**: When passing a dict as the `parse_dates` argument, the order of -the columns prepended is not guaranteed, because `dict` objects do not impose -an ordering on their keys. On Python 2.7+ you may use `collections.OrderedDict` -instead of a regular `dict` if this matters to you. Because of this, when using a -dict for 'parse_dates' in conjunction with the `index_col` argument, it's best to -specify `index_col` as a column label rather then as an index on the resulting frame. +.. note:: + read_csv has a fast_path for parsing datetime strings in iso8601 format, + e.g "2000-01-01T00:01:02+00:00" and similar variations. If you can arrange + for your data to store datetimes in this format, load times will be + significantly faster, ~20x has been observed. + + +.. note:: + + When passing a dict as the `parse_dates` argument, the order of + the columns prepended is not guaranteed, because `dict` objects do not impose + an ordering on their keys. On Python 2.7+ you may use `collections.OrderedDict` + instead of a regular `dict` if this matters to you. Because of this, when using a + dict for 'parse_dates' in conjunction with the `index_col` argument, it's best to + specify `index_col` as a column label rather then as an index on the resulting frame. + Date Parsing Functions ~~~~~~~~~~~~~~~~~~~~~~ diff --git a/doc/source/release.rst b/doc/source/release.rst index 99b8bfc460068..1809ec5cccdba 100644 --- a/doc/source/release.rst +++ b/doc/source/release.rst @@ -1595,6 +1595,9 @@ Improvements to existing features - Add methods ``neg`` and ``inv`` to Series - Implement ``kind`` option in ``ExcelFile`` to indicate whether it's an XLS or XLSX file (:issue:`2613`) + - Documented a fast-path in pd.read_Csv when parsing iso8601 datetime strings + yielding as much as a 20x speedup. (:issue:`5993`) + Bug Fixes ~~~~~~~~~ diff --git a/pandas/io/parsers.py b/pandas/io/parsers.py index 813b7e59e107a..52d4e6bbac50a 100644 --- a/pandas/io/parsers.py +++ b/pandas/io/parsers.py @@ -87,6 +87,7 @@ If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column. {'foo' : [1, 3]} -> parse columns 1, 3 as date and call result 'foo' + A fast-path exists for iso8601-formatted dates. keep_date_col : boolean, default False If True and parse_dates specifies combining multiple columns then keep the original columns.