Skip to content

Commit 199cc43

Browse files
committed
Merge branch '33141-pandas-cut' of github.com:mabelvj/pandas into 33141-pandas-cut
2 parents 777e13e + 4599d46 commit 199cc43

File tree

145 files changed

+2692
-1128
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

145 files changed

+2692
-1128
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
# pandas: powerful Python data analysis toolkit
88
[![PyPI Latest Release](https://img.shields.io/pypi/v/pandas.svg)](https://pypi.org/project/pandas/)
99
[![Conda Latest Release](https://anaconda.org/conda-forge/pandas/badges/version.svg)](https://anaconda.org/anaconda/pandas/)
10+
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3509134.svg)](https://doi.org/10.5281/zenodo.3509134)
1011
[![Package Status](https://img.shields.io/pypi/status/pandas.svg)](https://pypi.org/project/pandas/)
1112
[![License](https://img.shields.io/pypi/l/pandas.svg)](https://github.com/pandas-dev/pandas/blob/master/LICENSE)
1213
[![Travis Build Status](https://travis-ci.org/pandas-dev/pandas.svg?branch=master)](https://travis-ci.org/pandas-dev/pandas)

asv_bench/asv.conf.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@
3939
// followed by the pip installed packages).
4040
"matrix": {
4141
"numpy": [],
42-
"Cython": [],
42+
"Cython": ["0.29.16"],
4343
"matplotlib": [],
4444
"sqlalchemy": [],
4545
"scipy": [],

asv_bench/benchmarks/array.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,11 @@ def setup(self):
99
self.values_float = np.array([1.0, 0.0, 1.0, 0.0])
1010
self.values_integer = np.array([1, 0, 1, 0])
1111
self.values_integer_like = [1, 0, 1, 0]
12+
self.data = np.array([True, False, True, False])
13+
self.mask = np.array([False, False, True, False])
14+
15+
def time_constructor(self):
16+
pd.arrays.BooleanArray(self.data, self.mask)
1217

1318
def time_from_bool_array(self):
1419
pd.array(self.values_bool, dtype="boolean")
@@ -21,3 +26,16 @@ def time_from_integer_like(self):
2126

2227
def time_from_float_array(self):
2328
pd.array(self.values_float, dtype="boolean")
29+
30+
31+
class IntegerArray:
32+
def setup(self):
33+
self.values_integer = np.array([1, 0, 1, 0])
34+
self.data = np.array([1, 2, 3, 4], dtype="int64")
35+
self.mask = np.array([False, False, True, False])
36+
37+
def time_constructor(self):
38+
pd.arrays.IntegerArray(self.data, self.mask)
39+
40+
def time_from_integer_array(self):
41+
pd.array(self.values_integer, dtype="Int64")

ci/code_checks.sh

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -292,10 +292,6 @@ if [[ -z "$CHECK" || "$CHECK" == "doctests" ]]; then
292292
pytest -q --doctest-modules pandas/core/generic.py
293293
RET=$(($RET + $?)) ; echo $MSG "DONE"
294294

295-
MSG='Doctests groupby.py' ; echo $MSG
296-
pytest -q --doctest-modules pandas/core/groupby/groupby.py -k"-cumcount -describe -pipe"
297-
RET=$(($RET + $?)) ; echo $MSG "DONE"
298-
299295
MSG='Doctests series.py' ; echo $MSG
300296
pytest -q --doctest-modules pandas/core/series.py
301297
RET=$(($RET + $?)) ; echo $MSG "DONE"
@@ -318,6 +314,10 @@ if [[ -z "$CHECK" || "$CHECK" == "doctests" ]]; then
318314
pytest -q --doctest-modules pandas/core/dtypes/
319315
RET=$(($RET + $?)) ; echo $MSG "DONE"
320316

317+
MSG='Doctests groupby' ; echo $MSG
318+
pytest -q --doctest-modules pandas/core/groupby/
319+
RET=$(($RET + $?)) ; echo $MSG "DONE"
320+
321321
MSG='Doctests indexes' ; echo $MSG
322322
pytest -q --doctest-modules pandas/core/indexes/
323323
RET=$(($RET + $?)) ; echo $MSG "DONE"

ci/deps/azure-36-minimum_versions.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ dependencies:
2222
- numpy=1.13.3
2323
- openpyxl=2.5.7
2424
- pytables=3.4.2
25-
- python-dateutil=2.6.1
25+
- python-dateutil=2.7.3
2626
- pytz=2017.2
2727
- scipy=0.19.0
2828
- xlrd=1.1.0

ci/deps/azure-37-numpydev.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,8 @@ dependencies:
1414
- pytz
1515
- pip
1616
- pip:
17-
- cython>=0.29.16
17+
- cython==0.29.16
18+
# GH#33507 cython 3.0a1 is causing TypeErrors 2020-04-13
1819
- "git+git://github.com/dateutil/dateutil.git"
1920
- "-f https://7933911d6844c6c53a7d-47bd50c35cd79bd838daf386af554a83.ssl.cf2.rackcdn.com"
2021
- "--pre"

ci/deps/azure-macos-36.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ dependencies:
2323
- openpyxl
2424
- pyarrow>=0.13.0
2525
- pytables
26-
- python-dateutil==2.6.1
26+
- python-dateutil==2.7.3
2727
- pytz
2828
- xarray
2929
- xlrd

doc/source/getting_started/install.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -221,7 +221,7 @@ Package Minimum support
221221
================================================================ ==========================
222222
`setuptools <https://setuptools.readthedocs.io/en/latest/>`__ 24.2.0
223223
`NumPy <https://www.numpy.org>`__ 1.13.3
224-
`python-dateutil <https://dateutil.readthedocs.io/en/stable/>`__ 2.6.1
224+
`python-dateutil <https://dateutil.readthedocs.io/en/stable/>`__ 2.7.3
225225
`pytz <https://pypi.org/project/pytz/>`__ 2017.2
226226
================================================================ ==========================
227227

doc/source/getting_started/intro_tutorials/03_subset_data.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
<div class="card-body">
2424
<p class="card-text">
2525

26-
This tutorial uses the titanic data set, stored as CSV. The data
26+
This tutorial uses the Titanic data set, stored as CSV. The data
2727
consists of the following data columns:
2828

2929
- PassengerId: Id of every passenger.
@@ -72,7 +72,7 @@ How do I select specific columns from a ``DataFrame``?
7272
<ul class="task-bullet">
7373
<li>
7474

75-
I’m interested in the age of the titanic passengers.
75+
I’m interested in the age of the Titanic passengers.
7676

7777
.. ipython:: python
7878
@@ -111,7 +111,7 @@ the number of rows is returned.
111111
<ul class="task-bullet">
112112
<li>
113113

114-
I’m interested in the age and sex of the titanic passengers.
114+
I’m interested in the age and sex of the Titanic passengers.
115115

116116
.. ipython:: python
117117
@@ -198,7 +198,7 @@ can be used to filter the ``DataFrame`` by putting it in between the
198198
selection brackets ``[]``. Only rows for which the value is ``True``
199199
will be selected.
200200

201-
We now from before that the original titanic ``DataFrame`` consists of
201+
We know from before that the original Titanic ``DataFrame`` consists of
202202
891 rows. Let’s have a look at the amount of rows which satisfy the
203203
condition by checking the ``shape`` attribute of the resulting
204204
``DataFrame`` ``above_35``:
@@ -212,7 +212,7 @@ condition by checking the ``shape`` attribute of the resulting
212212
<ul class="task-bullet">
213213
<li>
214214

215-
I’m interested in the titanic passengers from cabin class 2 and 3.
215+
I’m interested in the Titanic passengers from cabin class 2 and 3.
216216

217217
.. ipython:: python
218218

doc/source/user_guide/io.rst

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -285,14 +285,18 @@ chunksize : int, default ``None``
285285
Quoting, compression, and file format
286286
+++++++++++++++++++++++++++++++++++++
287287

288-
compression : {``'infer'``, ``'gzip'``, ``'bz2'``, ``'zip'``, ``'xz'``, ``None``}, default ``'infer'``
288+
compression : {``'infer'``, ``'gzip'``, ``'bz2'``, ``'zip'``, ``'xz'``, ``None``, ``dict``}, default ``'infer'``
289289
For on-the-fly decompression of on-disk data. If 'infer', then use gzip,
290290
bz2, zip, or xz if filepath_or_buffer is a string ending in '.gz', '.bz2',
291291
'.zip', or '.xz', respectively, and no decompression otherwise. If using 'zip',
292292
the ZIP file must contain only one data file to be read in.
293-
Set to ``None`` for no decompression.
293+
Set to ``None`` for no decompression. Can also be a dict with key ``'method'``
294+
set to one of {``'zip'``, ``'gzip'``, ``'bz2'``}, and other keys set to
295+
compression settings. As an example, the following could be passed for
296+
faster compression: ``compression={'method': 'gzip', 'compresslevel': 1}``.
294297

295298
.. versionchanged:: 0.24.0 'infer' option added and set to default.
299+
.. versionchanged:: 1.1.0 dict option extended to support ``gzip`` and ``bz2``.
296300
thousands : str, default ``None``
297301
Thousands separator.
298302
decimal : str, default ``'.'``
@@ -3347,6 +3351,12 @@ The compression type can be an explicit parameter or be inferred from the file e
33473351
If 'infer', then use ``gzip``, ``bz2``, ``zip``, or ``xz`` if filename ends in ``'.gz'``, ``'.bz2'``, ``'.zip'``, or
33483352
``'.xz'``, respectively.
33493353

3354+
The compression parameter can also be a ``dict`` in order to pass options to the
3355+
compression protocol. It must have a ``'method'`` key set to the name
3356+
of the compression protocol, which must be one of
3357+
{``'zip'``, ``'gzip'``, ``'bz2'``}. All other key-value pairs are passed to
3358+
the underlying compression library.
3359+
33503360
.. ipython:: python
33513361
33523362
df = pd.DataFrame({
@@ -3383,6 +3393,15 @@ The default is to 'infer':
33833393
rt = pd.read_pickle("s1.pkl.bz2")
33843394
rt
33853395
3396+
Passing options to the compression protocol in order to speed up compression:
3397+
3398+
.. ipython:: python
3399+
3400+
df.to_pickle(
3401+
"data.pkl.gz",
3402+
compression={"method": "gzip", 'compresslevel': 1}
3403+
)
3404+
33863405
.. ipython:: python
33873406
:suppress:
33883407

0 commit comments

Comments
 (0)