Skip to content

Commit df87e14

Browse files
BUG: Slice Arrow buffer before passing it to numpy (#40896)
Merge branch 'master' into issue-40896
2 parents ff85a80 + 3513f59 commit df87e14

File tree

95 files changed

+1797
-540
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

95 files changed

+1797
-540
lines changed

.pre-commit-config.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ repos:
3535
exclude: ^pandas/_libs/src/(klib|headers)/
3636
args: [--quiet, '--extensions=c,h', '--headers=h', --recursive, '--filter=-readability/casting,-runtime/int,-build/include_subdir']
3737
- repo: https://gitlab.com/pycqa/flake8
38-
rev: 3.9.0
38+
rev: 3.9.1
3939
hooks:
4040
- id: flake8
4141
additional_dependencies:
@@ -75,7 +75,7 @@ repos:
7575
hooks:
7676
- id: yesqa
7777
additional_dependencies:
78-
- flake8==3.9.0
78+
- flake8==3.9.1
7979
- flake8-comprehensions==3.1.0
8080
- flake8-bugbear==21.3.2
8181
- pandas-dev-flaker==0.2.0

LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ BSD 3-Clause License
33
Copyright (c) 2008-2011, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team
44
All rights reserved.
55

6-
Copyright (c) 2011-2020, Open source contributors.
6+
Copyright (c) 2011-2021, Open source contributors.
77

88
Redistribution and use in source and binary forms, with or without
99
modification, are permitted provided that the following conditions are met:

asv_bench/benchmarks/groupby.py

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -505,6 +505,34 @@ def time_frame_agg(self, dtype, method):
505505
self.df.groupby("key").agg(method)
506506

507507

508+
class CumminMax:
509+
param_names = ["dtype", "method"]
510+
params = [
511+
["float64", "int64", "Float64", "Int64"],
512+
["cummin", "cummax"],
513+
]
514+
515+
def setup(self, dtype, method):
516+
N = 500_000
517+
vals = np.random.randint(-10, 10, (N, 5))
518+
null_vals = vals.astype(float, copy=True)
519+
null_vals[::2, :] = np.nan
520+
null_vals[::3, :] = np.nan
521+
df = DataFrame(vals, columns=list("abcde"), dtype=dtype)
522+
null_df = DataFrame(null_vals, columns=list("abcde"), dtype=dtype)
523+
keys = np.random.randint(0, 100, size=N)
524+
df["key"] = keys
525+
null_df["key"] = keys
526+
self.df = df
527+
self.null_df = null_df
528+
529+
def time_frame_transform(self, dtype, method):
530+
self.df.groupby("key").transform(method)
531+
532+
def time_frame_transform_many_nulls(self, dtype, method):
533+
self.null_df.groupby("key").transform(method)
534+
535+
508536
class RankWithTies:
509537
# GH 21237
510538
param_names = ["dtype", "tie_method"]

doc/source/_static/style/hq_ax1.png

5.95 KB
Loading
5.96 KB
Loading

doc/source/_static/style/hq_props.png

6.09 KB
Loading

doc/source/development/roadmap.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -71,8 +71,8 @@ instead of comparing as False).
7171

7272
Long term, we want to introduce consistent missing data handling for all data
7373
types. This includes consistent behavior in all operations (indexing, arithmetic
74-
operations, comparisons, etc.). We want to eventually make the new semantics the
75-
default.
74+
operations, comparisons, etc.). There has been discussion of eventually making
75+
the new semantics the default.
7676

7777
This has been discussed at
7878
`github #28095 <https://github.com/pandas-dev/pandas/issues/28095>`__ (and

doc/source/getting_started/install.rst

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -362,6 +362,21 @@ pyarrow 0.15.0 Parquet, ORC, and feather reading /
362362
pyreadstat SPSS files (.sav) reading
363363
========================= ================== =============================================================
364364

365+
.. _install.warn_orc:
366+
367+
.. warning::
368+
369+
* If you want to use :func:`~pandas.read_orc`, it is highly recommended to install pyarrow using conda.
370+
The following is a summary of the environment in which :func:`~pandas.read_orc` can work.
371+
372+
========================= ================== =============================================================
373+
System Conda PyPI
374+
========================= ================== =============================================================
375+
Linux Successful Failed(pyarrow==3.0 Successful)
376+
macOS Successful Failed
377+
Windows Failed Failed
378+
========================= ================== =============================================================
379+
365380
Access data in the cloud
366381
^^^^^^^^^^^^^^^^^^^^^^^^
367382

doc/source/getting_started/intro_tutorials/01_table_oriented.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -176,7 +176,7 @@ these are by default not taken into account by the :func:`~DataFrame.describe` m
176176

177177
Many pandas operations return a ``DataFrame`` or a ``Series``. The
178178
:func:`~DataFrame.describe` method is an example of a pandas operation returning a
179-
pandas ``Series``.
179+
pandas ``Series`` or a pandas ``DataFrame``.
180180

181181
.. raw:: html
182182

doc/source/user_guide/io.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5443,6 +5443,11 @@ Similar to the :ref:`parquet <io.parquet>` format, the `ORC Format <https://orc.
54435443
for data frames. It is designed to make reading data frames efficient. pandas provides *only* a reader for the
54445444
ORC format, :func:`~pandas.read_orc`. This requires the `pyarrow <https://arrow.apache.org/docs/python/>`__ library.
54455445

5446+
.. warning::
5447+
5448+
* It is *highly recommended* to install pyarrow using conda due to some issues occurred by pyarrow.
5449+
* :func:`~pandas.read_orc` is not supported on Windows yet, you can find valid environments on :ref:`install optional dependencies <install.warn_orc>`.
5450+
54465451
.. _io.sql:
54475452

54485453
SQL queries

0 commit comments

Comments
 (0)