diff --git a/.coveragerc b/.coveragerc deleted file mode 100644 index 3f630aa6cf8f5..0000000000000 --- a/.coveragerc +++ /dev/null @@ -1,27 +0,0 @@ -# .coveragerc to control coverage.py -[run] -branch = False -omit = */tests/* - -[report] -# Regexes for lines to exclude from consideration -exclude_lines = - # Have to re-enable the standard pragma - pragma: no cover - - # Don't complain about missing debug-only code: - def __repr__ - if self\.debug - - # Don't complain if tests don't hit defensive assertion code: - raise AssertionError - raise NotImplementedError - - # Don't complain if non-runnable code isn't run: - if 0: - if __name__ == .__main__.: - -ignore_errors = False - -[html] -directory = coverage_html_report diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md index 95729f845ff5c..2e6e980242197 100644 --- a/.github/CONTRIBUTING.md +++ b/.github/CONTRIBUTING.md @@ -1,24 +1,23 @@ -Contributing to pandas -====================== +# Contributing to pandas Whether you are a novice or experienced software developer, all contributions and suggestions are welcome! -Our main contribution docs can be found [here](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst), but if you do not want to read it in its entirety, we will summarize the main ways in which you can contribute and point to relevant places in the docs for further information. +Our main contributing guide can be found [in this repo](https://github.com/pandas-dev/pandas/blob/master/doc/source/development/contributing.rst) or [on the website](https://pandas-docs.github.io/pandas-docs-travis/development/contributing.html). If you do not want to read it in its entirety, we will summarize the main ways in which you can contribute and point to relevant sections of that document for further information. + +## Getting Started -Getting Started ---------------- If you are looking to contribute to the *pandas* codebase, the best place to start is the [GitHub "issues" tab](https://github.com/pandas-dev/pandas/issues). This is also a great place for filing bug reports and making suggestions for ways in which we can improve the code and documentation. -If you have additional questions, feel free to ask them on the [mailing list](https://groups.google.com/forum/?fromgroups#!forum/pydata) or on [Gitter](https://gitter.im/pydata/pandas). Further information can also be found in our [Getting Started](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#where-to-start) section of our main contribution doc. +If you have additional questions, feel free to ask them on the [mailing list](https://groups.google.com/forum/?fromgroups#!forum/pydata) or on [Gitter](https://gitter.im/pydata/pandas). Further information can also be found in the "[Where to start?](https://github.com/pandas-dev/pandas/blob/master/doc/source/development/contributing.rst#where-to-start)" section. + +## Filing Issues + +If you notice a bug in the code or documentation, or have suggestions for how we can improve either, feel free to create an issue on the [GitHub "issues" tab](https://github.com/pandas-dev/pandas/issues) using [GitHub's "issue" form](https://github.com/pandas-dev/pandas/issues/new). The form contains some questions that will help us best address your issue. For more information regarding how to file issues against *pandas*, please refer to the "[Bug reports and enhancement requests](https://github.com/pandas-dev/pandas/blob/master/doc/source/development/contributing.rst#bug-reports-and-enhancement-requests)" section. -Filing Issues -------------- -If you notice a bug in the code or in docs or have suggestions for how we can improve either, feel free to create an issue on the [GitHub "issues" tab](https://github.com/pandas-dev/pandas/issues) using [GitHub's "issue" form](https://github.com/pandas-dev/pandas/issues/new). The form contains some questions that will help us best address your issue. For more information regarding how to file issues against *pandas*, please refer to the [Bug reports and enhancement requests](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#bug-reports-and-enhancement-requests) section of our main contribution doc. +## Contributing to the Codebase -Contributing to the Codebase ----------------------------- -The code is hosted on [GitHub](https://www.github.com/pandas-dev/pandas), so you will need to use [Git](http://git-scm.com/) to clone the project and make changes to the codebase. Once you have obtained a copy of the code, you should create a development environment that is separate from your existing Python environment so that you can make and test changes without compromising your own work environment. For more information, please refer to our [Working with the code](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#working-with-the-code) section of our main contribution docs. +The code is hosted on [GitHub](https://www.github.com/pandas-dev/pandas), so you will need to use [Git](http://git-scm.com/) to clone the project and make changes to the codebase. Once you have obtained a copy of the code, you should create a development environment that is separate from your existing Python environment so that you can make and test changes without compromising your own work environment. For more information, please refer to the "[Working with the code](https://github.com/pandas-dev/pandas/blob/master/doc/source/development/contributing.rst#working-with-the-code)" section. -Before submitting your changes for review, make sure to check that your changes do not break any tests. You can find more information about our test suites can be found [here](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#test-driven-development-code-writing). We also have guidelines regarding coding style that will be enforced during testing. Details about coding style can be found [here](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#code-standards). +Before submitting your changes for review, make sure to check that your changes do not break any tests. You can find more information about our test suites in the "[Test-driven development/code writing](https://github.com/pandas-dev/pandas/blob/master/doc/source/development/contributing.rst#test-driven-development-code-writing)" section. We also have guidelines regarding coding style that will be enforced during testing, which can be found in the "[Code standards](https://github.com/pandas-dev/pandas/blob/master/doc/source/development/contributing.rst#code-standards)" section. -Once your changes are ready to be submitted, make sure to push your changes to GitHub before creating a pull request. Details about how to do that can be found in the [Contributing your changes to pandas](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#contributing-your-changes-to-pandas) section of our main contribution docs. We will review your changes, and you will most likely be asked to make additional changes before it is finally ready to merge. However, once it's ready, we will merge it, and you will have successfully contributed to the codebase! +Once your changes are ready to be submitted, make sure to push your changes to GitHub before creating a pull request. Details about how to do that can be found in the "[Contributing your changes to pandas](https://github.com/pandas-dev/pandas/blob/master/doc/source/development/contributing.rst#contributing-your-changes-to-pandas)" section. We will review your changes, and you will most likely be asked to make additional changes before it is finally ready to merge. However, once it's ready, we will merge it, and you will have successfully contributed to the codebase! diff --git a/.github/FUNDING.yml b/.github/FUNDING.yml new file mode 100644 index 0000000000000..944ce9b4fb1f6 --- /dev/null +++ b/.github/FUNDING.yml @@ -0,0 +1,2 @@ +custom: https://pandas.pydata.org/donate.html +tidelift: pypi/pandas diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 4e1e9ce017408..7c3870470f074 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -1,4 +1,5 @@ - [ ] closes #xxxx - [ ] tests added / passed +- [ ] passes `black pandas` - [ ] passes `git diff upstream/master -u -- "*.py" | flake8 --diff` - [ ] whatsnew entry diff --git a/.github/SECURITY.md b/.github/SECURITY.md new file mode 100644 index 0000000000000..f3b059a5d4f13 --- /dev/null +++ b/.github/SECURITY.md @@ -0,0 +1 @@ +To report a security vulnerability to pandas, please go to https://tidelift.com/security and see the instructions there. diff --git a/.gitignore b/.gitignore index 0d4e8c6fb75a6..6c3c275c48fb7 100644 --- a/.gitignore +++ b/.gitignore @@ -57,10 +57,19 @@ dist # wheel files *.whl **/wheelhouse/* +pip-wheel-metadata # coverage .coverage coverage.xml coverage_html_report +.mypy_cache +*.pytest_cache +# hypothesis test database +.hypothesis/ +__pycache__ +# pytest-monkeytype +monkeytype.sqlite3 + # OS generated files # ###################### @@ -88,8 +97,8 @@ scikits *.c *.cpp -# Performance Testing # -####################### +# Unit / Performance Testing # +############################## asv_bench/env/ asv_bench/html/ asv_bench/results/ @@ -98,6 +107,8 @@ asv_bench/pandas/ # Documentation generated files # ################################# doc/source/generated +doc/source/user_guide/styled.xlsx +doc/source/reference/api doc/source/_static doc/source/vbench doc/source/vbench.rst @@ -105,6 +116,5 @@ doc/source/index.rst doc/build/html/index.html # Windows specific leftover: doc/tmp.sv -doc/source/styled.xlsx -doc/source/templates/ env/ +doc/source/savefig/ diff --git a/.pep8speaks.yml b/.pep8speaks.yml index 299b76c8922cc..5a83727ddf5f8 100644 --- a/.pep8speaks.yml +++ b/.pep8speaks.yml @@ -2,9 +2,3 @@ scanner: diff_only: True # If True, errors caused by only the patch are shown - -pycodestyle: - max-line-length: 79 - ignore: # Errors and warnings to ignore - - E731 - - E402 diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml new file mode 100644 index 0000000000000..b79f0f71dac23 --- /dev/null +++ b/.pre-commit-config.yaml @@ -0,0 +1,17 @@ +repos: +- repo: https://github.com/python/black + rev: stable + hooks: + - id: black + language_version: python3.7 +- repo: https://gitlab.com/pycqa/flake8 + rev: 3.7.7 + hooks: + - id: flake8 + language: python_venv + additional_dependencies: [flake8-comprehensions] +- repo: https://github.com/pre-commit/mirrors-isort + rev: v4.3.20 + hooks: + - id: isort + language: python_venv diff --git a/.travis.yml b/.travis.yml index 4cbe7f86bd2fa..79fecc41bec0d 100644 --- a/.travis.yml +++ b/.travis.yml @@ -1,6 +1,4 @@ -sudo: false language: python -# Default Python version is usually 2.7 python: 3.5 # To turn off cached cython files and compiler cache @@ -23,86 +21,36 @@ env: git: # for cloning - depth: 1000 + depth: false matrix: fast_finish: true exclude: # Exclude the default Python 3.5 build - python: 3.5 + include: - - os: osx - language: generic - env: - - JOB="3.5_OSX" TEST_ARGS="--skip-slow --skip-network" - - dist: trusty - env: - - JOB="2.7_LOCALE" LOCALE_OVERRIDE="zh_CN.UTF-8" SLOW=true - addons: - apt: - packages: - - language-pack-zh-hans - - dist: trusty - env: - - JOB="2.7" TEST_ARGS="--skip-slow" LINT=true - addons: - apt: - packages: - - python-gtk2 - # In allow_failures - - dist: trusty - env: - - JOB="3.5_CONDA_BUILD_TEST" TEST_ARGS="--skip-slow --skip-network" CONDA_BUILD_TEST=true - - dist: trusty - env: - - JOB="3.6" TEST_ARGS="--skip-slow --skip-network" PANDAS_TESTING_MODE="deprecate" CONDA_FORGE=true COVERAGE=true - # In allow_failures - dist: trusty env: - - JOB="2.7_SLOW" SLOW=true - # In allow_failures - - dist: trusty - env: - - JOB="3.6_PIP_BUILD_TEST" TEST_ARGS="--skip-slow" PIP_BUILD_TEST=true - addons: - apt: - packages: - - xsel - # In allow_failures + - JOB="3.7" ENV_FILE="ci/deps/travis-37.yaml" PATTERN="(not slow and not network)" + - dist: trusty env: - - JOB="3.6_NUMPY_DEV" TEST_ARGS="--skip-slow --skip-network" PANDAS_TESTING_MODE="deprecate" - # In allow_failures + - JOB="3.6, locale" ENV_FILE="ci/deps/travis-36-locale.yaml" PATTERN="((not slow and not network) or (single and db))" LOCALE_OVERRIDE="zh_CN.UTF-8" + - dist: trusty env: - - JOB="3.6_ASV" ASV=true + - JOB="3.6, coverage" ENV_FILE="ci/deps/travis-36-cov.yaml" PATTERN="((not slow and not network) or (single and db))" PANDAS_TESTING_MODE="deprecate" COVERAGE=true + # In allow_failures - dist: trusty env: - - JOB="3.6_DOC" DOC=true + - JOB="3.6, slow" ENV_FILE="ci/deps/travis-36-slow.yaml" PATTERN="slow" + allow_failures: - dist: trusty env: - - JOB="3.5_CONDA_BUILD_TEST" TEST_ARGS="--skip-slow --skip-network" CONDA_BUILD_TEST=true - - dist: trusty - env: - - JOB="2.7_SLOW" SLOW=true - - dist: trusty - env: - - JOB="3.6_PIP_BUILD_TEST" TEST_ARGS="--skip-slow" PIP_BUILD_TEST=true - addons: - apt: - packages: - - xsel - - dist: trusty - env: - - JOB="3.6_NUMPY_DEV" TEST_ARGS="--skip-slow --skip-network" PANDAS_TESTING_MODE="deprecate" - - dist: trusty - env: - - JOB="3.6_ASV" ASV=true - - dist: trusty - env: - - JOB="3.6_DOC" DOC=true + - JOB="3.6, slow" ENV_FILE="ci/deps/travis-36-slow.yaml" PATTERN="slow" before_install: - echo "before_install" @@ -115,41 +63,36 @@ before_install: - pwd - uname -a - git --version - - git tag + - ./ci/check_git_tags.sh + # Because travis runs on Google Cloud and has a /etc/boto.cfg, + # it breaks moto import, see: + # https://github.com/spulec/moto/issues/1771 + # https://github.com/boto/boto/issues/3741 + # This overrides travis and tells it to look nowhere. + - export BOTO_CONFIG=/dev/null install: - echo "install start" - ci/prep_cython_cache.sh - - ci/install_travis.sh + - ci/setup_env.sh - ci/submit_cython_cache.sh - echo "install done" before_script: - - ci/install_db_travis.sh + # display server (for clipboard functionality) needs to be started here, + # does not work if done in install:setup_env.sh (GH-26103) - export DISPLAY=":99.0" - - ci/before_script_travis.sh + - echo "sh -e /etc/init.d/xvfb start" + - sh -e /etc/init.d/xvfb start + - sleep 3 script: - echo "script start" - - ci/run_build_docs.sh - - ci/script_single.sh - - ci/script_multi.sh - - ci/lint.sh - - ci/asv.sh - - echo "checking imports" - - source activate pandas && python ci/check_imports.py - - echo "script done" - -after_success: - - ci/upload_coverage.sh + - source activate pandas-dev + - ci/run_tests.sh after_script: - echo "after_script start" - - source activate pandas && pushd /tmp && python -c "import pandas; pandas.show_versions();" && popd - - if [ -e /tmp/single.xml ]; then - ci/print_skipped.py /tmp/single.xml; - fi - - if [ -e /tmp/multiple.xml ]; then - ci/print_skipped.py /tmp/multiple.xml; - fi + - source activate pandas-dev && pushd /tmp && python -c "import pandas; pandas.show_versions();" && popd + - ci/print_skipped.py - echo "after_script done" diff --git a/LICENSES/DATEUTIL_LICENSE b/LICENSES/DATEUTIL_LICENSE new file mode 100644 index 0000000000000..6053d35cfc60b --- /dev/null +++ b/LICENSES/DATEUTIL_LICENSE @@ -0,0 +1,54 @@ +Copyright 2017- Paul Ganssle +Copyright 2017- dateutil contributors (see AUTHORS file) + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + +The above license applies to all contributions after 2017-12-01, as well as +all contributions that have been re-licensed (see AUTHORS file for the list of +contributors who have re-licensed their code). +-------------------------------------------------------------------------------- +dateutil - Extensions to the standard Python datetime module. + +Copyright (c) 2003-2011 - Gustavo Niemeyer +Copyright (c) 2012-2014 - Tomi Pieviläinen +Copyright (c) 2014-2016 - Yaron de Leeuw +Copyright (c) 2015- - Paul Ganssle +Copyright (c) 2015- - dateutil contributors (see AUTHORS file) + +All rights reserved. + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright notice, + this list of conditions and the following disclaimer in the documentation + and/or other materials provided with the distribution. + * Neither the name of the copyright holder nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR +CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, +EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, +PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR +PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF +LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING +NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS +SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +The above BSD License Applies to all code, even that also covered by Apache 2.0. diff --git a/LICENSES/HAVEN_LICENSE b/LICENSES/HAVEN_LICENSE new file mode 100644 index 0000000000000..2f444cb44d505 --- /dev/null +++ b/LICENSES/HAVEN_LICENSE @@ -0,0 +1,2 @@ +YEAR: 2013-2016 +COPYRIGHT HOLDER: Hadley Wickham; RStudio; and Evan Miller diff --git a/LICENSES/HAVEN_MIT b/LICENSES/HAVEN_MIT new file mode 100644 index 0000000000000..b03d0e640627a --- /dev/null +++ b/LICENSES/HAVEN_MIT @@ -0,0 +1,32 @@ +Based on http://opensource.org/licenses/MIT + +This is a template. Complete and ship as file LICENSE the following 2 +lines (only) + +YEAR: +COPYRIGHT HOLDER: + +and specify as + +License: MIT + file LICENSE + +Copyright (c) , + +Permission is hereby granted, free of charge, to any person obtaining +a copy of this software and associated documentation files (the +"Software"), to deal in the Software without restriction, including +without limitation the rights to use, copy, modify, merge, publish, +distribute, sublicense, and/or sell copies of the Software, and to +permit persons to whom the Software is furnished to do so, subject to +the following conditions: + +The above copyright notice and this permission notice shall be +included in all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, +EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF +MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND +NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE +LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION +OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION +WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. diff --git a/LICENSES/MUSL_LICENSE b/LICENSES/MUSL_LICENSE new file mode 100644 index 0000000000000..a8833d4bc4744 --- /dev/null +++ b/LICENSES/MUSL_LICENSE @@ -0,0 +1,132 @@ +musl as a whole is licensed under the following standard MIT license: + +---------------------------------------------------------------------- +Copyright © 2005-2014 Rich Felker, et al. + +Permission is hereby granted, free of charge, to any person obtaining +a copy of this software and associated documentation files (the +"Software"), to deal in the Software without restriction, including +without limitation the rights to use, copy, modify, merge, publish, +distribute, sublicense, and/or sell copies of the Software, and to +permit persons to whom the Software is furnished to do so, subject to +the following conditions: + +The above copyright notice and this permission notice shall be +included in all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, +EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF +MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. +IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY +CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, +TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE +SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. +---------------------------------------------------------------------- + +Authors/contributors include: + +Anthony G. Basile +Arvid Picciani +Bobby Bingham +Boris Brezillon +Brent Cook +Chris Spiegel +Clément Vasseur +Emil Renner Berthing +Hiltjo Posthuma +Isaac Dunham +Jens Gustedt +Jeremy Huntwork +John Spencer +Justin Cormack +Luca Barbato +Luka Perkov +M Farkas-Dyck (Strake) +Michael Forney +Nicholas J. Kain +orc +Pascal Cuoq +Pierre Carrier +Rich Felker +Richard Pennington +sin +Solar Designer +Stefan Kristiansson +Szabolcs Nagy +Timo Teräs +Valentin Ochs +William Haddon + +Portions of this software are derived from third-party works licensed +under terms compatible with the above MIT license: + +The TRE regular expression implementation (src/regex/reg* and +src/regex/tre*) is Copyright © 2001-2008 Ville Laurikari and licensed +under a 2-clause BSD license (license text in the source files). The +included version has been heavily modified by Rich Felker in 2012, in +the interests of size, simplicity, and namespace cleanliness. + +Much of the math library code (src/math/* and src/complex/*) is +Copyright © 1993,2004 Sun Microsystems or +Copyright © 2003-2011 David Schultz or +Copyright © 2003-2009 Steven G. Kargl or +Copyright © 2003-2009 Bruce D. Evans or +Copyright © 2008 Stephen L. Moshier +and labelled as such in comments in the individual source files. All +have been licensed under extremely permissive terms. + +The ARM memcpy code (src/string/armel/memcpy.s) is Copyright © 2008 +The Android Open Source Project and is licensed under a two-clause BSD +license. It was taken from Bionic libc, used on Android. + +The implementation of DES for crypt (src/misc/crypt_des.c) is +Copyright © 1994 David Burren. It is licensed under a BSD license. + +The implementation of blowfish crypt (src/misc/crypt_blowfish.c) was +originally written by Solar Designer and placed into the public +domain. The code also comes with a fallback permissive license for use +in jurisdictions that may not recognize the public domain. + +The smoothsort implementation (src/stdlib/qsort.c) is Copyright © 2011 +Valentin Ochs and is licensed under an MIT-style license. + +The BSD PRNG implementation (src/prng/random.c) and XSI search API +(src/search/*.c) functions are Copyright © 2011 Szabolcs Nagy and +licensed under following terms: "Permission to use, copy, modify, +and/or distribute this code for any purpose with or without fee is +hereby granted. There is no warranty." + +The x86_64 port was written by Nicholas J. Kain. Several files (crt) +were released into the public domain; others are licensed under the +standard MIT license terms at the top of this file. See individual +files for their copyright status. + +The mips and microblaze ports were originally written by Richard +Pennington for use in the ellcc project. The original code was adapted +by Rich Felker for build system and code conventions during upstream +integration. It is licensed under the standard MIT terms. + +The powerpc port was also originally written by Richard Pennington, +and later supplemented and integrated by John Spencer. It is licensed +under the standard MIT terms. + +All other files which have no copyright comments are original works +produced specifically for use as part of this library, written either +by Rich Felker, the main author of the library, or by one or more +contibutors listed above. Details on authorship of individual files +can be found in the git version control history of the project. The +omission of copyright and license comments in each file is in the +interest of source tree size. + +All public header files (include/* and arch/*/bits/*) should be +treated as Public Domain as they intentionally contain no content +which can be covered by copyright. Some source modules may fall in +this category as well. If you believe that a file is so trivial that +it should be in the Public Domain, please contact the authors and +request an explicit statement releasing it from copyright. + +The following files are trivial, believed not to be copyrightable in +the first place, and hereby explicitly released to the Public Domain: + +All public headers: include/*, arch/*/bits/* +Startup files: crt/* diff --git a/LICENSES/SIX b/LICENSES/SIX deleted file mode 100644 index 6fd669af222d3..0000000000000 --- a/LICENSES/SIX +++ /dev/null @@ -1,21 +0,0 @@ -six license (substantial portions used in the python 3 compatibility module) -=========================================================================== -Copyright (c) 2010-2013 Benjamin Peterson - -Permission is hereby granted, free of charge, to any person obtaining a copy -of this software and associated documentation files (the "Software"), to deal -in the Software without restriction, including without limitation the rights -to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -copies of the Software, and to permit persons to whom the Software is -furnished to do so, subject to the following conditions: -# -The above copyright notice and this permission notice shall be included in all -copies or substantial portions of the Software. -# -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE -SOFTWARE. diff --git a/MANIFEST.in b/MANIFEST.in index 9773019c6e6e0..d82e64d0a68b8 100644 --- a/MANIFEST.in +++ b/MANIFEST.in @@ -8,22 +8,35 @@ include pyproject.toml graft doc prune doc/build +graft LICENSES + graft pandas -global-exclude *.so -global-exclude *.pyd +global-exclude *.bz2 +global-exclude *.csv +global-exclude *.dta +global-exclude *.gz +global-exclude *.h5 +global-exclude *.html +global-exclude *.json +global-exclude *.msgpack +global-exclude *.pickle +global-exclude *.png global-exclude *.pyc +global-exclude *.pyd +global-exclude *.sas7bdat +global-exclude *.so +global-exclude *.xls +global-exclude *.xlsm +global-exclude *.xlsx +global-exclude *.xpt +global-exclude *.xz +global-exclude *.zip global-exclude *~ -global-exclude \#* -global-exclude .git* global-exclude .DS_Store -global-exclude *.png +global-exclude .git* +global-exclude \#* -# include examples/data/* -# recursive-include examples *.py -# recursive-include doc/source * -# recursive-include doc/sphinxext * -# recursive-include LICENSES * include versioneer.py include pandas/_version.py include pandas/io/formats/templates/*.tpl diff --git a/Makefile b/Makefile index c79175cd3c401..27a2c3682de9c 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,6 @@ -tseries: pandas/_libs/lib.pyx pandas/_libs/tslib.pyx pandas/_libs/hashtable.pyx - python setup.py build_ext --inplace +.PHONY : develop build clean clean_pyc doc lint-diff black -.PHONY : develop build clean clean_pyc tseries doc +all: develop clean: -python setup.py clean @@ -13,10 +12,13 @@ build: clean_pyc python setup.py build_ext --inplace lint-diff: - git diff master --name-only -- "*.py" | grep "pandas" | xargs flake8 + git diff upstream/master --name-only -- "*.py" | xargs flake8 + +black: + black . --exclude '(asv_bench/env|\.egg|\.git|\.hg|\.mypy_cache|\.nox|\.tox|\.venv|_build|buck-out|build|dist|setup.py)' develop: build - -python setup.py develop + python -m pip install --no-build-isolation -e . doc: -rm -rf doc/build doc/source/generated diff --git a/README.md b/README.md index 4b9c9505e320a..d5e71fc4740cf 100644 --- a/README.md +++ b/README.md @@ -9,18 +9,34 @@ - + - + - + - + @@ -33,33 +49,21 @@ - - - - - - - - - - + - - + +
Latest Releaselatest release + + latest release + +
latest release + + latest release + +
Package Statusstatus + + status + +
Licenselicense + + license + +
Build Status
- - circleci build status - -
- - appveyor build status + + Azure Pipelines build status
Coveragecoverage
Conda - - conda default downloads +   + + coverage
Conda-forgeDownloads conda-forge downloads @@ -67,18 +71,18 @@
PyPI - - pypi downloads - - Gitter + + + +
-[![https://gitter.im/pydata/pandas](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/pydata/pandas?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) -## What is it + +## What is it? **pandas** is a Python package providing fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both @@ -86,7 +90,7 @@ easy and intuitive. It aims to be the fundamental high-level building block for doing practical, **real world** data analysis in Python. Additionally, it has the broader goal of becoming **the most powerful and flexible open source data analysis / manipulation tool available in any language**. It is already well on -its way toward this goal. +its way towards this goal. ## Main Features Here are just a few of the things that pandas does well: @@ -147,7 +151,7 @@ The source code is currently hosted on GitHub at: https://github.com/pandas-dev/pandas Binary installers for the latest released version are available at the [Python -package index](https://pypi.python.org/pypi/pandas) and on conda. +package index](https://pypi.org/project/pandas) and on conda. ```sh # conda @@ -160,9 +164,9 @@ pip install pandas ``` ## Dependencies -- [NumPy](http://www.numpy.org): 1.9.0 or higher +- [NumPy](https://www.numpy.org): 1.13.3 or higher - [python-dateutil](https://labix.org/python-dateutil): 2.5.0 or higher -- [pytz](https://pythonhosted.org/pytz): 2011k or higher +- [pytz](https://pythonhosted.org/pytz): 2015.4 or higher See the [full installation instructions](https://pandas.pydata.org/pandas-docs/stable/install.html#dependencies) for recommended and optional dependencies. @@ -184,16 +188,17 @@ python setup.py install or for installing in [development mode](https://pip.pypa.io/en/latest/reference/pip_install.html#editable-installs): + ```sh -python setup.py develop +python -m pip install --no-build-isolation -e . ``` -Alternatively, you can use `pip` if you want all the dependencies pulled -in automatically (the `-e` option is for installing it in [development -mode](https://pip.pypa.io/en/latest/reference/pip_install.html#editable-installs)): +If you have `make`, you can also use `make develop` to run the same command. + +or alternatively ```sh -pip install -e . +python setup.py develop ``` See the full instructions for [installing from source](https://pandas.pydata.org/pandas-docs/stable/install.html#installing-from-source). @@ -216,13 +221,18 @@ Further, general questions and discussions can also take place on the [pydata ma ## Discussion and Development Most development discussion is taking place on github in this repo. Further, the [pandas-dev mailing list](https://mail.python.org/mailman/listinfo/pandas-dev) can also be used for specialized discussions or design issues, and a [Gitter channel](https://gitter.im/pydata/pandas) is available for quick development related questions. -## Contributing to pandas +## Contributing to pandas [![Open Source Helpers](https://www.codetriage.com/pandas-dev/pandas/badges/users.svg)](https://www.codetriage.com/pandas-dev/pandas) + All contributions, bug reports, bug fixes, documentation improvements, enhancements and ideas are welcome. -A detailed overview on how to contribute can be found in the **[contributing guide.](https://pandas.pydata.org/pandas-docs/stable/contributing.html)** +A detailed overview on how to contribute can be found in the **[contributing guide](https://dev.pandas.io/contributing.html)**. There is also an [overview](.github/CONTRIBUTING.md) on GitHub. + +If you are simply looking to start working with the pandas codebase, navigate to the [GitHub "issues" tab](https://github.com/pandas-dev/pandas/issues) and start looking through interesting issues. There are a number of issues listed under [Docs](https://github.com/pandas-dev/pandas/issues?labels=Docs&sort=updated&state=open) and [good first issue](https://github.com/pandas-dev/pandas/issues?labels=good+first+issue&sort=updated&state=open) where you could start out. -If you are simply looking to start working with the pandas codebase, navigate to the [GitHub “issues” tab](https://github.com/pandas-dev/pandas/issues) and start looking through interesting issues. There are a number of issues listed under [Docs](https://github.com/pandas-dev/pandas/issues?labels=Docs&sort=updated&state=open) and [Difficulty Novice](https://github.com/pandas-dev/pandas/issues?q=is%3Aopen+is%3Aissue+label%3A%22Difficulty+Novice%22) where you could start out. +You can also triage issues which may include reproducing bug reports, or asking for vital information such as version numbers or reproduction instructions. If you would like to start triaging issues, one easy way to get started is to [subscribe to pandas on CodeTriage](https://www.codetriage.com/pandas-dev/pandas). Or maybe through using pandas you have an idea of your own or are looking for something in the documentation and thinking ‘this can be improved’...you can do something about it! Feel free to ask questions on the [mailing list](https://groups.google.com/forum/?fromgroups#!forum/pydata) or on [Gitter](https://gitter.im/pydata/pandas). + +As contributors and maintainers to this project, you are expected to abide by pandas' code of conduct. More information can be found at: [Contributor Code of Conduct](https://github.com/pandas-dev/pandas/blob/master/.github/CODE_OF_CONDUCT.md) diff --git a/appveyor.yml b/appveyor.yml deleted file mode 100644 index ba001208864a8..0000000000000 --- a/appveyor.yml +++ /dev/null @@ -1,96 +0,0 @@ -# With infos from -# http://tjelvarolsson.com/blog/how-to-continuously-test-your-python-code-on-windows-using-appveyor/ -# https://packaging.python.org/en/latest/appveyor/ -# https://github.com/rmcgibbo/python-appveyor-conda-example - -# Backslashes in quotes need to be escaped: \ -> "\\" - -matrix: - fast_finish: true # immediately finish build once one of the jobs fails. - -environment: - global: - # SDK v7.0 MSVC Express 2008's SetEnv.cmd script will fail if the - # /E:ON and /V:ON options are not enabled in the batch script interpreter - # See: http://stackoverflow.com/a/13751649/163740 - CMD_IN_ENV: "cmd /E:ON /V:ON /C .\\ci\\run_with_env.cmd" - clone_folder: C:\projects\pandas - PANDAS_TESTING_MODE: "deprecate" - - matrix: - - - CONDA_ROOT: "C:\\Miniconda3_64" - PYTHON_VERSION: "3.6" - PYTHON_ARCH: "64" - CONDA_PY: "36" - CONDA_NPY: "113" - - - CONDA_ROOT: "C:\\Miniconda3_64" - PYTHON_VERSION: "2.7" - PYTHON_ARCH: "64" - CONDA_PY: "27" - CONDA_NPY: "110" - -# We always use a 64-bit machine, but can build x86 distributions -# with the PYTHON_ARCH variable (which is used by CMD_IN_ENV). -platform: - - x64 - -# all our python builds have to happen in tests_script... -build: false - -install: - # cancel older builds for the same PR - - ps: if ($env:APPVEYOR_PULL_REQUEST_NUMBER -and $env:APPVEYOR_BUILD_NUMBER -ne ((Invoke-RestMethod ` - https://ci.appveyor.com/api/projects/$env:APPVEYOR_ACCOUNT_NAME/$env:APPVEYOR_PROJECT_SLUG/history?recordsNumber=50).builds | ` - Where-Object pullRequestId -eq $env:APPVEYOR_PULL_REQUEST_NUMBER)[0].buildNumber) { ` - throw "There are newer queued builds for this pull request, failing early." } - - # this installs the appropriate Miniconda (Py2/Py3, 32/64 bit) - # updates conda & installs: conda-build jinja2 anaconda-client - - powershell .\ci\install.ps1 - - SET PATH=%CONDA_ROOT%;%CONDA_ROOT%\Scripts;%PATH% - - echo "install" - - cd - - ls -ltr - - git tag --sort v:refname - - # this can conflict with git - - cmd: rmdir C:\cygwin /s /q - - # install our build environment - - cmd: conda config --set show_channel_urls true --set always_yes true --set changeps1 false - - cmd: conda update -q conda - - cmd: conda config --set ssl_verify false - - # add the pandas channel *before* defaults to have defaults take priority - - cmd: conda config --add channels conda-forge - - cmd: conda config --add channels pandas - - cmd: conda config --remove channels defaults - - cmd: conda config --add channels defaults - - # this is now the downloaded conda... - - cmd: conda info -a - - # create our env - - cmd: conda create -n pandas python=%PYTHON_VERSION% cython pytest>=3.1.0 pytest-xdist - - cmd: activate pandas - - cmd: pip install moto - - SET REQ=ci\requirements-%PYTHON_VERSION%_WIN.run - - cmd: echo "installing requirements from %REQ%" - - cmd: conda install -n pandas --file=%REQ% - - cmd: conda list -n pandas - - cmd: echo "installing requirements from %REQ% - done" - - # add some pip only reqs to the env - - SET REQ=ci\requirements-%PYTHON_VERSION%_WIN.pip - - cmd: echo "installing requirements from %REQ%" - - cmd: pip install -Ur %REQ% - - # build em using the local source checkout in the correct windows env - - cmd: '%CMD_IN_ENV% python setup.py build_ext --inplace' - -test_script: - # tests - - cmd: activate pandas - - cmd: test.bat diff --git a/asv_bench/asv.conf.json b/asv_bench/asv.conf.json index 9c333f62810f4..c04bbf53a86a6 100644 --- a/asv_bench/asv.conf.json +++ b/asv_bench/asv.conf.json @@ -50,12 +50,13 @@ "xlsxwriter": [], "xlrd": [], "xlwt": [], + "odfpy": [], "pytest": [], // If using Windows with python 2.7 and want to build using the // mingw toolchain (rather than MSVC), uncomment the following line. // "libpython": [], }, - + "conda_channels": ["defaults", "conda-forge"], // Combinations of libraries/python versions can be excluded/included // from the set to test. Each entry is a dictionary containing additional // key-value pairs to include/exclude. @@ -107,7 +108,7 @@ // `asv` will cache wheels of the recent builds in each // environment, making them faster to install next time. This is // number of builds to keep, per environment. - "wheel_cache_size": 8, + "build_cache_size": 8, // The commits after which the regression search in `asv publish` // should start looking for regressions. Dictionary whose keys are @@ -118,9 +119,8 @@ // skipped for the matching benchmark. // "regressions_first_commits": { - ".*": "v0.20.0" + ".*": "0409521665" }, "regression_thresholds": { - ".*": 0.05 } } diff --git a/asv_bench/benchmarks/__init__.py b/asv_bench/benchmarks/__init__.py index e69de29bb2d1d..eada147852fe1 100644 --- a/asv_bench/benchmarks/__init__.py +++ b/asv_bench/benchmarks/__init__.py @@ -0,0 +1 @@ +"""Pandas benchmarks.""" diff --git a/asv_bench/benchmarks/algorithms.py b/asv_bench/benchmarks/algorithms.py index cccd38ef11251..7d97f2c740acb 100644 --- a/asv_bench/benchmarks/algorithms.py +++ b/asv_bench/benchmarks/algorithms.py @@ -1,108 +1,130 @@ -import warnings from importlib import import_module import numpy as np + +from pandas._libs import lib + import pandas as pd from pandas.util import testing as tm -for imp in ['pandas.util', 'pandas.tools.hashing']: +for imp in ["pandas.util", "pandas.tools.hashing"]: try: hashing = import_module(imp) break - except: + except (ImportError, TypeError, ValueError): pass -from .pandas_vb_common import setup # noqa - -class Factorize(object): - - goal_time = 0.2 - - params = [True, False] - param_names = ['sort'] +class MaybeConvertObjects: + def setup(self): + N = 10 ** 5 - def setup(self, sort): - N = 10**5 - self.int_idx = pd.Int64Index(np.arange(N).repeat(5)) - self.float_idx = pd.Float64Index(np.random.randn(N).repeat(5)) - self.string_idx = tm.makeStringIndex(N) + data = list(range(N)) + data[0] = pd.NaT + data = np.array(data) + self.data = data - def time_factorize_int(self, sort): - self.int_idx.factorize(sort=sort) + def time_maybe_convert_objects(self): + lib.maybe_convert_objects(self.data) - def time_factorize_float(self, sort): - self.float_idx.factorize(sort=sort) - def time_factorize_string(self, sort): - self.string_idx.factorize(sort=sort) +class Factorize: + params = [[True, False], ["int", "uint", "float", "string"]] + param_names = ["sort", "dtype"] -class Duplicated(object): + def setup(self, sort, dtype): + N = 10 ** 5 + data = { + "int": pd.Int64Index(np.arange(N).repeat(5)), + "uint": pd.UInt64Index(np.arange(N).repeat(5)), + "float": pd.Float64Index(np.random.randn(N).repeat(5)), + "string": tm.makeStringIndex(N).repeat(5), + } + self.idx = data[dtype] - goal_time = 0.2 + def time_factorize(self, sort, dtype): + self.idx.factorize(sort=sort) - params = ['first', 'last', False] - param_names = ['keep'] - def setup(self, keep): - N = 10**5 - self.int_idx = pd.Int64Index(np.arange(N).repeat(5)) - self.float_idx = pd.Float64Index(np.random.randn(N).repeat(5)) - self.string_idx = tm.makeStringIndex(N) +class FactorizeUnique: - def time_duplicated_int(self, keep): - self.int_idx.duplicated(keep=keep) + params = [[True, False], ["int", "uint", "float", "string"]] + param_names = ["sort", "dtype"] - def time_duplicated_float(self, keep): - self.float_idx.duplicated(keep=keep) + def setup(self, sort, dtype): + N = 10 ** 5 + data = { + "int": pd.Int64Index(np.arange(N)), + "uint": pd.UInt64Index(np.arange(N)), + "float": pd.Float64Index(np.arange(N)), + "string": tm.makeStringIndex(N), + } + self.idx = data[dtype] + assert self.idx.is_unique - def time_duplicated_string(self, keep): - self.string_idx.duplicated(keep=keep) + def time_factorize(self, sort, dtype): + self.idx.factorize(sort=sort) -class DuplicatedUniqueIndex(object): +class Duplicated: - goal_time = 0.2 + params = [["first", "last", False], ["int", "uint", "float", "string"]] + param_names = ["keep", "dtype"] - def setup(self): - N = 10**5 - self.idx_int_dup = pd.Int64Index(np.arange(N * 5)) + def setup(self, keep, dtype): + N = 10 ** 5 + data = { + "int": pd.Int64Index(np.arange(N).repeat(5)), + "uint": pd.UInt64Index(np.arange(N).repeat(5)), + "float": pd.Float64Index(np.random.randn(N).repeat(5)), + "string": tm.makeStringIndex(N).repeat(5), + } + self.idx = data[dtype] # cache is_unique - self.idx_int_dup.is_unique - - def time_duplicated_unique_int(self): - self.idx_int_dup.duplicated() + self.idx.is_unique + def time_duplicated(self, keep, dtype): + self.idx.duplicated(keep=keep) -class Match(object): - goal_time = 0.2 +class DuplicatedUniqueIndex: - def setup(self): - self.uniques = tm.makeStringIndex(1000).values - self.all = self.uniques.repeat(10) - - def time_match_string(self): - with warnings.catch_warnings(record=True): - pd.match(self.all, self.uniques) + params = ["int", "uint", "float", "string"] + param_names = ["dtype"] + def setup(self, dtype): + N = 10 ** 5 + data = { + "int": pd.Int64Index(np.arange(N)), + "uint": pd.UInt64Index(np.arange(N)), + "float": pd.Float64Index(np.random.randn(N)), + "string": tm.makeStringIndex(N), + } + self.idx = data[dtype] + # cache is_unique + self.idx.is_unique -class Hashing(object): + def time_duplicated_unique(self, dtype): + self.idx.duplicated() - goal_time = 0.2 +class Hashing: def setup_cache(self): - N = 10**5 + N = 10 ** 5 df = pd.DataFrame( - {'strings': pd.Series(tm.makeStringIndex(10000).take( - np.random.randint(0, 10000, size=N))), - 'floats': np.random.randn(N), - 'ints': np.arange(N), - 'dates': pd.date_range('20110101', freq='s', periods=N), - 'timedeltas': pd.timedelta_range('1 day', freq='s', periods=N)}) - df['categories'] = df['strings'].astype('category') + { + "strings": pd.Series( + tm.makeStringIndex(10000).take(np.random.randint(0, 10000, size=N)) + ), + "floats": np.random.randn(N), + "ints": np.arange(N), + "dates": pd.date_range("20110101", freq="s", periods=N), + "timedeltas": pd.timedelta_range("1 day", freq="s", periods=N), + } + ) + df["categories"] = df["strings"].astype("category") df.iloc[10:20] = np.nan return df @@ -110,19 +132,55 @@ def time_frame(self, df): hashing.hash_pandas_object(df) def time_series_int(self, df): - hashing.hash_pandas_object(df['ints']) + hashing.hash_pandas_object(df["ints"]) def time_series_string(self, df): - hashing.hash_pandas_object(df['strings']) + hashing.hash_pandas_object(df["strings"]) def time_series_float(self, df): - hashing.hash_pandas_object(df['floats']) + hashing.hash_pandas_object(df["floats"]) def time_series_categorical(self, df): - hashing.hash_pandas_object(df['categories']) + hashing.hash_pandas_object(df["categories"]) def time_series_timedeltas(self, df): - hashing.hash_pandas_object(df['timedeltas']) + hashing.hash_pandas_object(df["timedeltas"]) def time_series_dates(self, df): - hashing.hash_pandas_object(df['dates']) + hashing.hash_pandas_object(df["dates"]) + + +class Quantile: + params = [ + [0, 0.5, 1], + ["linear", "nearest", "lower", "higher", "midpoint"], + ["float", "int", "uint"], + ] + param_names = ["quantile", "interpolation", "dtype"] + + def setup(self, quantile, interpolation, dtype): + N = 10 ** 5 + data = { + "int": np.arange(N), + "uint": np.arange(N).astype(np.uint64), + "float": np.random.randn(N), + } + self.idx = pd.Series(data[dtype].repeat(5)) + + def time_quantile(self, quantile, interpolation, dtype): + self.idx.quantile(quantile, interpolation=interpolation) + + +class SortIntegerArray: + params = [10 ** 3, 10 ** 5] + + def setup(self, N): + data = np.arange(N, dtype=float) + data[40] = np.nan + self.array = pd.array(data, dtype="Int64") + + def time_argsort(self, N): + self.array.argsort() + + +from .pandas_vb_common import setup # noqa: F401 isort:skip diff --git a/asv_bench/benchmarks/attrs_caching.py b/asv_bench/benchmarks/attrs_caching.py index 48f0b7d71144c..501e27b9078ec 100644 --- a/asv_bench/benchmarks/attrs_caching.py +++ b/asv_bench/benchmarks/attrs_caching.py @@ -1,17 +1,14 @@ import numpy as np + from pandas import DataFrame + try: from pandas.util import cache_readonly except ImportError: from pandas.util.decorators import cache_readonly -from .pandas_vb_common import setup # noqa - - -class DataFrameAttributes(object): - - goal_time = 0.2 +class DataFrameAttributes: def setup(self): self.df = DataFrame(np.random.randn(10, 6)) self.cur_index = self.df.index @@ -23,18 +20,17 @@ def time_set_index(self): self.df.index = self.cur_index -class CacheReadonly(object): - - goal_time = 0.2 - +class CacheReadonly: def setup(self): - class Foo: - @cache_readonly def prop(self): return 5 + self.obj = Foo() def time_cache_readonly(self): self.obj.prop + + +from .pandas_vb_common import setup # noqa: F401 isort:skip diff --git a/asv_bench/benchmarks/binary_ops.py b/asv_bench/benchmarks/binary_ops.py index cc8766e1fa39c..58e0db67d6025 100644 --- a/asv_bench/benchmarks/binary_ops.py +++ b/asv_bench/benchmarks/binary_ops.py @@ -1,26 +1,24 @@ import numpy as np + from pandas import DataFrame, Series, date_range from pandas.core.algorithms import checked_add_with_arr + try: import pandas.core.computation.expressions as expr except ImportError: import pandas.computation.expressions as expr -from .pandas_vb_common import setup # noqa - - -class Ops(object): - goal_time = 0.2 +class Ops: - params = [[True, False], ['default', 1]] - param_names = ['use_numexpr', 'threads'] + params = [[True, False], ["default", 1]] + param_names = ["use_numexpr", "threads"] def setup(self, use_numexpr, threads): self.df = DataFrame(np.random.randn(20000, 100)) self.df2 = DataFrame(np.random.randn(20000, 100)) - if threads != 'default': + if threads != "default": expr.set_numexpr_threads(threads) if not use_numexpr: expr.set_use_numexpr(False) @@ -42,21 +40,24 @@ def teardown(self, use_numexpr, threads): expr.set_numexpr_threads() -class Ops2(object): - - goal_time = 0.2 - +class Ops2: def setup(self): - N = 10**3 + N = 10 ** 3 self.df = DataFrame(np.random.randn(N, N)) self.df2 = DataFrame(np.random.randn(N, N)) - self.df_int = DataFrame(np.random.randint(np.iinfo(np.int16).min, - np.iinfo(np.int16).max, - size=(N, N))) - self.df2_int = DataFrame(np.random.randint(np.iinfo(np.int16).min, - np.iinfo(np.int16).max, - size=(N, N))) + self.df_int = DataFrame( + np.random.randint( + np.iinfo(np.int16).min, np.iinfo(np.int16).max, size=(N, N) + ) + ) + self.df2_int = DataFrame( + np.random.randint( + np.iinfo(np.int16).min, np.iinfo(np.int16).max, size=(N, N) + ) + ) + + self.s = Series(np.random.randn(N)) # Division @@ -80,21 +81,30 @@ def time_frame_int_mod(self): def time_frame_float_mod(self): self.df % self.df2 + # Dot product + + def time_frame_dot(self): + self.df.dot(self.df2) -class Timeseries(object): + def time_series_dot(self): + self.s.dot(self.s) - goal_time = 0.2 + def time_frame_series_dot(self): + self.df.dot(self.s) - params = [None, 'US/Eastern'] - param_names = ['tz'] + +class Timeseries: + + params = [None, "US/Eastern"] + param_names = ["tz"] def setup(self, tz): - N = 10**6 + N = 10 ** 6 halfway = (N // 2) - 1 - self.s = Series(date_range('20010101', periods=N, freq='T', tz=tz)) + self.s = Series(date_range("20010101", periods=N, freq="T", tz=tz)) self.ts = self.s[halfway] - self.s2 = Series(date_range('20010101', periods=N, freq='s', tz=tz)) + self.s2 = Series(date_range("20010101", periods=N, freq="s", tz=tz)) def time_series_timestamp_compare(self, tz): self.s <= self.ts @@ -109,27 +119,22 @@ def time_timestamp_ops_diff_with_shift(self, tz): self.s - self.s.shift() -class AddOverflowScalar(object): - - goal_time = 0.2 +class AddOverflowScalar: params = [1, -1, 0] - param_names = ['scalar'] + param_names = ["scalar"] def setup(self, scalar): - N = 10**6 + N = 10 ** 6 self.arr = np.arange(N) def time_add_overflow_scalar(self, scalar): checked_add_with_arr(self.arr, scalar) -class AddOverflowArray(object): - - goal_time = 0.2 - +class AddOverflowArray: def setup(self): - N = 10**6 + N = 10 ** 6 self.arr = np.arange(N) self.arr_rev = np.arange(-N, 0) self.arr_mixed = np.array([1, -1]).repeat(N / 2) @@ -143,9 +148,12 @@ def time_add_overflow_arr_mask_nan(self): checked_add_with_arr(self.arr, self.arr_mixed, arr_mask=self.arr_nan_1) def time_add_overflow_b_mask_nan(self): - checked_add_with_arr(self.arr, self.arr_mixed, - b_mask=self.arr_nan_1) + checked_add_with_arr(self.arr, self.arr_mixed, b_mask=self.arr_nan_1) def time_add_overflow_both_arg_nan(self): - checked_add_with_arr(self.arr, self.arr_mixed, arr_mask=self.arr_nan_1, - b_mask=self.arr_nan_2) + checked_add_with_arr( + self.arr, self.arr_mixed, arr_mask=self.arr_nan_1, b_mask=self.arr_nan_2 + ) + + +from .pandas_vb_common import setup # noqa: F401 isort:skip diff --git a/asv_bench/benchmarks/categoricals.py b/asv_bench/benchmarks/categoricals.py index 7743921003353..559aa7050a640 100644 --- a/asv_bench/benchmarks/categoricals.py +++ b/asv_bench/benchmarks/categoricals.py @@ -1,8 +1,10 @@ import warnings import numpy as np + import pandas as pd import pandas.util.testing as tm + try: from pandas.api.types import union_categoricals except ImportError: @@ -11,19 +13,14 @@ except ImportError: pass -from .pandas_vb_common import setup # noqa - - -class Concat(object): - - goal_time = 0.2 +class Concat: def setup(self): - N = 10**5 - self.s = pd.Series(list('aabbcd') * N).astype('category') + N = 10 ** 5 + self.s = pd.Series(list("aabbcd") * N).astype("category") - self.a = pd.Categorical(list('aabbcd') * N) - self.b = pd.Categorical(list('bbcdjk') * N) + self.a = pd.Categorical(list("aabbcd") * N) + self.b = pd.Categorical(list("bbcdjk") * N) def time_concat(self): pd.concat([self.s, self.s]) @@ -32,25 +29,25 @@ def time_union(self): union_categoricals([self.a, self.b]) -class Constructor(object): - - goal_time = 0.2 - +class Constructor: def setup(self): - N = 10**5 - self.categories = list('abcde') + N = 10 ** 5 + self.categories = list("abcde") self.cat_idx = pd.Index(self.categories) self.values = np.tile(self.categories, N) self.codes = np.tile(range(len(self.categories)), N) - self.datetimes = pd.Series(pd.date_range('1995-01-01 00:00:00', - periods=N / 10, - freq='s')) + self.datetimes = pd.Series( + pd.date_range("1995-01-01 00:00:00", periods=N / 10, freq="s") + ) self.datetimes_with_nat = self.datetimes.copy() self.datetimes_with_nat.iloc[-1] = pd.NaT self.values_some_nan = list(np.tile(self.categories + [np.nan], N)) self.values_all_nan = [np.nan] * len(self.values) + self.values_all_int8 = np.ones(N, "int8") + self.categorical = pd.Categorical(self.values, self.categories) + self.series = pd.Series(self.categorical) def time_regular(self): pd.Categorical(self.values, self.categories) @@ -70,66 +67,74 @@ def time_with_nan(self): def time_all_nan(self): pd.Categorical(self.values_all_nan) + def time_from_codes_all_int8(self): + pd.Categorical.from_codes(self.values_all_int8, self.categories) -class ValueCounts(object): + def time_existing_categorical(self): + pd.Categorical(self.categorical) - goal_time = 0.2 + def time_existing_series(self): + pd.Categorical(self.series) + + +class ValueCounts: params = [True, False] - param_names = ['dropna'] + param_names = ["dropna"] def setup(self, dropna): - n = 5 * 10**5 - arr = ['s%04d' % i for i in np.random.randint(0, n // 10, size=n)] - self.ts = pd.Series(arr).astype('category') + n = 5 * 10 ** 5 + arr = ["s{:04d}".format(i) for i in np.random.randint(0, n // 10, size=n)] + self.ts = pd.Series(arr).astype("category") def time_value_counts(self, dropna): self.ts.value_counts(dropna=dropna) -class Repr(object): - - goal_time = 0.2 - +class Repr: def setup(self): - self.sel = pd.Series(['s1234']).astype('category') + self.sel = pd.Series(["s1234"]).astype("category") def time_rendering(self): str(self.sel) -class SetCategories(object): - - goal_time = 0.2 - +class SetCategories: def setup(self): - n = 5 * 10**5 - arr = ['s%04d' % i for i in np.random.randint(0, n // 10, size=n)] - self.ts = pd.Series(arr).astype('category') + n = 5 * 10 ** 5 + arr = ["s{:04d}".format(i) for i in np.random.randint(0, n // 10, size=n)] + self.ts = pd.Series(arr).astype("category") def time_set_categories(self): self.ts.cat.set_categories(self.ts.cat.categories[::2]) -class Rank(object): +class RemoveCategories: + def setup(self): + n = 5 * 10 ** 5 + arr = ["s{:04d}".format(i) for i in np.random.randint(0, n // 10, size=n)] + self.ts = pd.Series(arr).astype("category") + + def time_remove_categories(self): + self.ts.cat.remove_categories(self.ts.cat.categories[::2]) - goal_time = 0.2 +class Rank: def setup(self): - N = 10**5 + N = 10 ** 5 ncats = 100 self.s_str = pd.Series(tm.makeCategoricalIndex(N, ncats)).astype(str) - self.s_str_cat = self.s_str.astype('category') + self.s_str_cat = pd.Series(self.s_str, dtype="category") with warnings.catch_warnings(record=True): - self.s_str_cat_ordered = self.s_str.astype('category', - ordered=True) + str_cat_type = pd.CategoricalDtype(set(self.s_str), ordered=True) + self.s_str_cat_ordered = self.s_str.astype(str_cat_type) self.s_int = pd.Series(np.random.randint(0, ncats, size=N)) - self.s_int_cat = self.s_int.astype('category') + self.s_int_cat = pd.Series(self.s_int, dtype="category") with warnings.catch_warnings(record=True): - self.s_int_cat_ordered = self.s_int.astype('category', - ordered=True) + int_cat_type = pd.CategoricalDtype(set(self.s_int), ordered=True) + self.s_int_cat_ordered = self.s_int.astype(int_cat_type) def time_rank_string(self): self.s_str.rank() @@ -148,3 +153,133 @@ def time_rank_int_cat(self): def time_rank_int_cat_ordered(self): self.s_int_cat_ordered.rank() + + +class Isin: + + params = ["object", "int64"] + param_names = ["dtype"] + + def setup(self, dtype): + np.random.seed(1234) + n = 5 * 10 ** 5 + sample_size = 100 + arr = [i for i in np.random.randint(0, n // 10, size=n)] + if dtype == "object": + arr = ["s{:04d}".format(i) for i in arr] + self.sample = np.random.choice(arr, sample_size) + self.series = pd.Series(arr).astype("category") + + def time_isin_categorical(self, dtype): + self.series.isin(self.sample) + + +class IsMonotonic: + def setup(self): + N = 1000 + self.c = pd.CategoricalIndex(list("a" * N + "b" * N + "c" * N)) + self.s = pd.Series(self.c) + + def time_categorical_index_is_monotonic_increasing(self): + self.c.is_monotonic_increasing + + def time_categorical_index_is_monotonic_decreasing(self): + self.c.is_monotonic_decreasing + + def time_categorical_series_is_monotonic_increasing(self): + self.s.is_monotonic_increasing + + def time_categorical_series_is_monotonic_decreasing(self): + self.s.is_monotonic_decreasing + + +class Contains: + def setup(self): + N = 10 ** 5 + self.ci = tm.makeCategoricalIndex(N) + self.c = self.ci.values + self.key = self.ci.categories[0] + + def time_categorical_index_contains(self): + self.key in self.ci + + def time_categorical_contains(self): + self.key in self.c + + +class CategoricalSlicing: + + params = ["monotonic_incr", "monotonic_decr", "non_monotonic"] + param_names = ["index"] + + def setup(self, index): + N = 10 ** 6 + categories = ["a", "b", "c"] + values = [0] * N + [1] * N + [2] * N + if index == "monotonic_incr": + self.data = pd.Categorical.from_codes(values, categories=categories) + elif index == "monotonic_decr": + self.data = pd.Categorical.from_codes( + list(reversed(values)), categories=categories + ) + elif index == "non_monotonic": + self.data = pd.Categorical.from_codes([0, 1, 2] * N, categories=categories) + else: + raise ValueError("Invalid index param: {}".format(index)) + + self.scalar = 10000 + self.list = list(range(10000)) + self.cat_scalar = "b" + + def time_getitem_scalar(self, index): + self.data[self.scalar] + + def time_getitem_slice(self, index): + self.data[: self.scalar] + + def time_getitem_list_like(self, index): + self.data[[self.scalar]] + + def time_getitem_list(self, index): + self.data[self.list] + + def time_getitem_bool_array(self, index): + self.data[self.data == self.cat_scalar] + + +class Indexing: + def setup(self): + N = 10 ** 5 + self.index = pd.CategoricalIndex(range(N), range(N)) + self.series = pd.Series(range(N), index=self.index).sort_index() + self.category = self.index[500] + + def time_get_loc(self): + self.index.get_loc(self.category) + + def time_shape(self): + self.index.shape + + def time_shallow_copy(self): + self.index._shallow_copy() + + def time_align(self): + pd.DataFrame({"a": self.series, "b": self.series[:500]}) + + def time_intersection(self): + self.index[:750].intersection(self.index[250:]) + + def time_unique(self): + self.index.unique() + + def time_reindex(self): + self.index.reindex(self.index[:500]) + + def time_reindex_missing(self): + self.index.reindex(["a", "b", "c", "d"]) + + def time_sort_values(self): + self.index.sort_values(ascending=False) + + +from .pandas_vb_common import setup # noqa: F401 isort:skip diff --git a/asv_bench/benchmarks/ctors.py b/asv_bench/benchmarks/ctors.py index 3f9016787aab4..ec3dd7a48a89f 100644 --- a/asv_bench/benchmarks/ctors.py +++ b/asv_bench/benchmarks/ctors.py @@ -1,45 +1,96 @@ import numpy as np + +from pandas import DatetimeIndex, Index, MultiIndex, Series, Timestamp import pandas.util.testing as tm -from pandas import Series, Index, DatetimeIndex, Timestamp, MultiIndex -from .pandas_vb_common import setup # noqa + +def no_change(arr): + return arr + + +def list_of_str(arr): + return list(arr.astype(str)) + + +def gen_of_str(arr): + return (x for x in arr.astype(str)) + + +def arr_dict(arr): + return dict(zip(range(len(arr)), arr)) + + +def list_of_tuples(arr): + return [(i, -i) for i in arr] + + +def gen_of_tuples(arr): + return ((i, -i) for i in arr) -class SeriesConstructors(object): +def list_of_lists(arr): + return [[i, -i] for i in arr] - goal_time = 0.2 - param_names = ["data_fmt", "with_index"] - params = [[lambda x: x, - list, - lambda arr: list(arr.astype(str)), - lambda arr: dict(zip(range(len(arr)), arr)), - lambda arr: [(i, -i) for i in arr], - lambda arr: [[i, -i] for i in arr], - lambda arr: ([(i, -i) for i in arr][:-1] + [None]), - lambda arr: ([[i, -i] for i in arr][:-1] + [None])], - [False, True]] +def list_of_tuples_with_none(arr): + return [(i, -i) for i in arr][:-1] + [None] - def setup(self, data_fmt, with_index): - N = 10**4 - arr = np.random.randn(N) + +def list_of_lists_with_none(arr): + return [[i, -i] for i in arr][:-1] + [None] + + +class SeriesConstructors: + + param_names = ["data_fmt", "with_index", "dtype"] + params = [ + [ + no_change, + list, + list_of_str, + gen_of_str, + arr_dict, + list_of_tuples, + gen_of_tuples, + list_of_lists, + list_of_tuples_with_none, + list_of_lists_with_none, + ], + [False, True], + ["float", "int"], + ] + + # Generators get exhausted on use, so run setup before every call + number = 1 + repeat = (3, 250, 10) + + def setup(self, data_fmt, with_index, dtype): + if data_fmt in (gen_of_str, gen_of_tuples) and with_index: + raise NotImplementedError( + "Series constructors do not support " "using generators with indexes" + ) + N = 10 ** 4 + if dtype == "float": + arr = np.random.randn(N) + else: + arr = np.arange(N) self.data = data_fmt(arr) self.index = np.arange(N) if with_index else None - def time_series_constructor(self, data_fmt, with_index): + def time_series_constructor(self, data_fmt, with_index, dtype): Series(self.data, index=self.index) -class SeriesDtypesConstructors(object): - - goal_time = 0.2 - +class SeriesDtypesConstructors: def setup(self): - N = 10**4 - self.arr = np.random.randn(N, N) - self.arr_str = np.array(['foo', 'bar', 'baz'], dtype=object) - self.s = Series([Timestamp('20110101'), Timestamp('20120101'), - Timestamp('20130101')] * N * 10) + N = 10 ** 4 + self.arr = np.random.randn(N) + self.arr_str = np.array(["foo", "bar", "baz"], dtype=object) + self.s = Series( + [Timestamp("20110101"), Timestamp("20120101"), Timestamp("20130101")] + * N + * 10 + ) def time_index_from_array_string(self): Index(self.arr_str) @@ -54,13 +105,13 @@ def time_dtindex_from_index_with_series(self): Index(self.s) -class MultiIndexConstructor(object): - - goal_time = 0.2 - +class MultiIndexConstructor: def setup(self): - N = 10**4 + N = 10 ** 4 self.iterables = [tm.makeStringIndex(N), range(20)] def time_multiindex_from_iterables(self): MultiIndex.from_product(self.iterables) + + +from .pandas_vb_common import setup # noqa: F401 isort:skip diff --git a/asv_bench/benchmarks/dtypes.py b/asv_bench/benchmarks/dtypes.py new file mode 100644 index 0000000000000..24cc1c6f9fa70 --- /dev/null +++ b/asv_bench/benchmarks/dtypes.py @@ -0,0 +1,43 @@ +import numpy as np + +from pandas.api.types import pandas_dtype + +from .pandas_vb_common import ( + datetime_dtypes, + extension_dtypes, + numeric_dtypes, + string_dtypes, +) + +_numpy_dtypes = [ + np.dtype(dtype) for dtype in (numeric_dtypes + datetime_dtypes + string_dtypes) +] +_dtypes = _numpy_dtypes + extension_dtypes + + +class Dtypes: + params = _dtypes + list(map(lambda dt: dt.name, _dtypes)) + param_names = ["dtype"] + + def time_pandas_dtype(self, dtype): + pandas_dtype(dtype) + + +class DtypesInvalid: + param_names = ["dtype"] + params = ["scalar-string", "scalar-int", "list-string", "array-string"] + data_dict = { + "scalar-string": "foo", + "scalar-int": 1, + "list-string": ["foo"] * 1000, + "array-string": np.array(["foo"] * 1000), + } + + def time_pandas_dtype_invalid(self, dtype): + try: + pandas_dtype(self.data_dict[dtype]) + except TypeError: + pass + + +from .pandas_vb_common import setup # noqa: F401 isort:skip diff --git a/asv_bench/benchmarks/eval.py b/asv_bench/benchmarks/eval.py index 8e581dcf22b4c..06a181875aaa8 100644 --- a/asv_bench/benchmarks/eval.py +++ b/asv_bench/benchmarks/eval.py @@ -1,19 +1,17 @@ import numpy as np + import pandas as pd + try: import pandas.core.computation.expressions as expr except ImportError: import pandas.computation.expressions as expr -from .pandas_vb_common import setup # noqa - - -class Eval(object): - goal_time = 0.2 +class Eval: - params = [['numexpr', 'python'], [1, 'all']] - param_names = ['engine', 'threads'] + params = [["numexpr", "python"], [1, "all"]] + param_names = ["engine", "threads"] def setup(self, engine, threads): self.df = pd.DataFrame(np.random.randn(20000, 100)) @@ -25,43 +23,44 @@ def setup(self, engine, threads): expr.set_numexpr_threads(1) def time_add(self, engine, threads): - pd.eval('self.df + self.df2 + self.df3 + self.df4', engine=engine) + pd.eval("self.df + self.df2 + self.df3 + self.df4", engine=engine) def time_and(self, engine, threads): - pd.eval('(self.df > 0) & (self.df2 > 0) & ' - '(self.df3 > 0) & (self.df4 > 0)', engine=engine) + pd.eval( + "(self.df > 0) & (self.df2 > 0) & " "(self.df3 > 0) & (self.df4 > 0)", + engine=engine, + ) def time_chained_cmp(self, engine, threads): - pd.eval('self.df < self.df2 < self.df3 < self.df4', engine=engine) + pd.eval("self.df < self.df2 < self.df3 < self.df4", engine=engine) def time_mult(self, engine, threads): - pd.eval('self.df * self.df2 * self.df3 * self.df4', engine=engine) + pd.eval("self.df * self.df2 * self.df3 * self.df4", engine=engine) def teardown(self, engine, threads): expr.set_numexpr_threads() -class Query(object): - - goal_time = 0.2 - +class Query: def setup(self): - N = 10**6 + N = 10 ** 6 halfway = (N // 2) - 1 - index = pd.date_range('20010101', periods=N, freq='T') + index = pd.date_range("20010101", periods=N, freq="T") s = pd.Series(index) self.ts = s.iloc[halfway] - self.df = pd.DataFrame({'a': np.random.randn(N), 'dates': s}, - index=index) + self.df = pd.DataFrame({"a": np.random.randn(N), "dates": index}, index=index) data = np.random.randn(N) self.min_val = data.min() self.max_val = data.max() def time_query_datetime_index(self): - self.df.query('index < @self.ts') + self.df.query("index < @self.ts") def time_query_datetime_column(self): - self.df.query('dates < @self.ts') + self.df.query("dates < @self.ts") def time_query_with_boolean_selection(self): - self.df.query('(a >= @self.min_val) & (a <= @self.max_val)') + self.df.query("(a >= @self.min_val) & (a <= @self.max_val)") + + +from .pandas_vb_common import setup # noqa: F401 isort:skip diff --git a/asv_bench/benchmarks/frame_ctor.py b/asv_bench/benchmarks/frame_ctor.py index 21b20cb123ed6..3944e0bc523d8 100644 --- a/asv_bench/benchmarks/frame_ctor.py +++ b/asv_bench/benchmarks/frame_ctor.py @@ -1,29 +1,24 @@ import numpy as np + +from pandas import DataFrame, MultiIndex, Series, Timestamp, date_range import pandas.util.testing as tm -from pandas import DataFrame, Series, MultiIndex, Timestamp, date_range + try: from pandas.tseries.offsets import Nano, Hour except ImportError: # For compatibility with older versions - from pandas.core.datetools import * # noqa - -from .pandas_vb_common import setup # noqa - + from pandas.core.datetools import * # noqa -class FromDicts(object): - - goal_time = 0.2 +class FromDicts: def setup(self): N, K = 5000, 50 - index = tm.makeStringIndex(N) - columns = tm.makeStringIndex(K) - frame = DataFrame(np.random.randn(N, K), index=index, columns=columns) + self.index = tm.makeStringIndex(N) + self.columns = tm.makeStringIndex(K) + frame = DataFrame(np.random.randn(N, K), index=self.index, columns=self.columns) self.data = frame.to_dict() - self.some_dict = list(self.data.values())[0] - self.dict_list = frame.to_dict(orient='records') - self.data2 = {i: {j: float(j) for j in range(100)} - for i in range(2000)} + self.dict_list = frame.to_dict(orient="records") + self.data2 = {i: {j: float(j) for j in range(100)} for i in range(2000)} def time_list_of_dict(self): DataFrame(self.dict_list) @@ -31,18 +26,21 @@ def time_list_of_dict(self): def time_nested_dict(self): DataFrame(self.data) - def time_dict(self): - Series(self.some_dict) + def time_nested_dict_index(self): + DataFrame(self.data, index=self.index) + + def time_nested_dict_columns(self): + DataFrame(self.data, columns=self.columns) + + def time_nested_dict_index_columns(self): + DataFrame(self.data, index=self.index, columns=self.columns) def time_nested_dict_int64(self): # nested dict, integer indexes, regression described in #621 DataFrame(self.data2) -class FromSeries(object): - - goal_time = 0.2 - +class FromSeries: def setup(self): mi = MultiIndex.from_product([range(100), range(100)]) self.s = Series(np.random.randn(10000), index=mi) @@ -51,16 +49,15 @@ def time_mi_series(self): DataFrame(self.s) -class FromDictwithTimestamp(object): +class FromDictwithTimestamp: - goal_time = 0.2 params = [Nano(1), Hour(1)] - param_names = ['offset'] + param_names = ["offset"] def setup(self, offset): - N = 10**3 + N = 10 ** 3 np.random.seed(1234) - idx = date_range(Timestamp('1/1/1900'), freq=offset, periods=N) + idx = date_range(Timestamp("1/1/1900"), freq=offset, periods=N) df = DataFrame(np.random.randn(N, 10), index=idx) self.d = df.to_dict() @@ -68,11 +65,14 @@ def time_dict_with_timestamp_offsets(self, offset): DataFrame(self.d) -class FromRecords(object): +class FromRecords: - goal_time = 0.2 params = [None, 1000] - param_names = ['nrows'] + param_names = ["nrows"] + + # Generators get exhausted on use, so run setup before every call + number = 1 + repeat = (3, 250, 10) def setup(self, nrows): N = 100000 @@ -83,13 +83,26 @@ def time_frame_from_records_generator(self, nrows): self.df = DataFrame.from_records(self.gen, nrows=nrows) -class FromNDArray(object): - - goal_time = 0.2 - +class FromNDArray: def setup(self): N = 100000 self.data = np.random.randn(N) def time_frame_from_ndarray(self): self.df = DataFrame(self.data) + + +class FromLists: + + goal_time = 0.2 + + def setup(self): + N = 1000 + M = 100 + self.data = [[j for j in range(M)] for i in range(N)] + + def time_frame_from_lists(self): + self.df = DataFrame(self.data) + + +from .pandas_vb_common import setup # noqa: F401 isort:skip diff --git a/asv_bench/benchmarks/frame_methods.py b/asv_bench/benchmarks/frame_methods.py index 4ff71c706cd34..05f98c66faa2b 100644 --- a/asv_bench/benchmarks/frame_methods.py +++ b/asv_bench/benchmarks/frame_methods.py @@ -2,42 +2,34 @@ import warnings import numpy as np -import pandas.util.testing as tm -from pandas import (DataFrame, Series, MultiIndex, date_range, period_range, - isnull, NaT) - -from .pandas_vb_common import setup # noqa - -class GetNumericData(object): +from pandas import DataFrame, MultiIndex, NaT, Series, date_range, isnull, period_range +import pandas.util.testing as tm - goal_time = 0.2 +class GetNumericData: def setup(self): self.df = DataFrame(np.random.randn(10000, 25)) - self.df['foo'] = 'bar' - self.df['bar'] = 'baz' - with warnings.catch_warnings(record=True): - self.df = self.df.consolidate() + self.df["foo"] = "bar" + self.df["bar"] = "baz" + self.df = self.df._consolidate() def time_frame_get_numeric_data(self): self.df._get_numeric_data() -class Lookup(object): - - goal_time = 0.2 - +class Lookup: def setup(self): - self.df = DataFrame(np.random.randn(10000, 8), - columns=list('abcdefgh')) - self.df['foo'] = 'bar' + self.df = DataFrame(np.random.randn(10000, 8), columns=list("abcdefgh")) + self.df["foo"] = "bar" self.row_labels = list(self.df.index[::10])[:900] self.col_labels = list(self.df.columns) * 100 self.row_labels_all = np.array( - list(self.df.index) * len(self.df.columns), dtype='object') + list(self.df.index) * len(self.df.columns), dtype="object" + ) self.col_labels_all = np.array( - list(self.df.columns) * len(self.df.index), dtype='object') + list(self.df.columns) * len(self.df.index), dtype="object" + ) def time_frame_fancy_lookup(self): self.df.lookup(self.row_labels, self.col_labels) @@ -46,20 +38,22 @@ def time_frame_fancy_lookup_all(self): self.df.lookup(self.row_labels_all, self.col_labels_all) -class Reindex(object): - - goal_time = 0.2 - +class Reindex: def setup(self): - N = 10**3 + N = 10 ** 3 self.df = DataFrame(np.random.randn(N * 10, N)) self.idx = np.arange(4 * N, 7 * N) self.df2 = DataFrame( - {c: {0: np.random.randint(0, 2, N).astype(np.bool_), - 1: np.random.randint(0, N, N).astype(np.int16), - 2: np.random.randint(0, N, N).astype(np.int32), - 3: np.random.randint(0, N, N).astype(np.int64)} - [np.random.randint(0, 4)] for c in range(N)}) + { + c: { + 0: np.random.randint(0, 2, N).astype(np.bool_), + 1: np.random.randint(0, N, N).astype(np.int16), + 2: np.random.randint(0, N, N).astype(np.int32), + 3: np.random.randint(0, N, N).astype(np.int64), + }[np.random.randint(0, 4)] + for c in range(N) + } + ) def time_reindex_axis0(self): self.df.reindex(self.idx) @@ -70,52 +64,142 @@ def time_reindex_axis1(self): def time_reindex_both_axes(self): self.df.reindex(index=self.idx, columns=self.idx) - def time_reindex_both_axes_ix(self): - self.df.ix[self.idx, self.idx] - def time_reindex_upcast(self): self.df2.reindex(np.random.permutation(range(1200))) -class Iteration(object): +class Rename: + def setup(self): + N = 10 ** 3 + self.df = DataFrame(np.random.randn(N * 10, N)) + self.idx = np.arange(4 * N, 7 * N) + self.dict_idx = {k: k for k in self.idx} + self.df2 = DataFrame( + { + c: { + 0: np.random.randint(0, 2, N).astype(np.bool_), + 1: np.random.randint(0, N, N).astype(np.int16), + 2: np.random.randint(0, N, N).astype(np.int32), + 3: np.random.randint(0, N, N).astype(np.int64), + }[np.random.randint(0, 4)] + for c in range(N) + } + ) + + def time_rename_single(self): + self.df.rename({0: 0}) + + def time_rename_axis0(self): + self.df.rename(self.dict_idx) + + def time_rename_axis1(self): + self.df.rename(columns=self.dict_idx) + + def time_rename_both_axes(self): + self.df.rename(index=self.dict_idx, columns=self.dict_idx) + + def time_dict_rename_both_axes(self): + self.df.rename(index=self.dict_idx, columns=self.dict_idx) + - goal_time = 0.2 +class Iteration: + # mem_itertuples_* benchmarks are slow + timeout = 120 def setup(self): N = 1000 self.df = DataFrame(np.random.randn(N * 10, N)) self.df2 = DataFrame(np.random.randn(N * 50, 10)) - self.df3 = DataFrame(np.random.randn(N, 5 * N), - columns=['C' + str(c) for c in range(N * 5)]) + self.df3 = DataFrame( + np.random.randn(N, 5 * N), columns=["C" + str(c) for c in range(N * 5)] + ) + self.df4 = DataFrame(np.random.randn(N * 1000, 10)) - def time_iteritems(self): + def time_items(self): # (monitor no-copying behaviour) - if hasattr(self.df, '_item_cache'): + if hasattr(self.df, "_item_cache"): self.df._item_cache.clear() - for name, col in self.df.iteritems(): + for name, col in self.df.items(): pass - def time_iteritems_cached(self): - for name, col in self.df.iteritems(): + def time_items_cached(self): + for name, col in self.df.items(): pass def time_iteritems_indexing(self): for col in self.df3: self.df3[col] + def time_itertuples_start(self): + self.df4.itertuples() + + def time_itertuples_read_first(self): + next(self.df4.itertuples()) + def time_itertuples(self): - for row in self.df2.itertuples(): + for row in self.df4.itertuples(): pass - def time_iterrows(self): - for row in self.df.iterrows(): + def time_itertuples_to_list(self): + list(self.df4.itertuples()) + + def mem_itertuples_start(self): + return self.df4.itertuples() + + def peakmem_itertuples_start(self): + self.df4.itertuples() + + def mem_itertuples_read_first(self): + return next(self.df4.itertuples()) + + def peakmem_itertuples(self): + for row in self.df4.itertuples(): + pass + + def mem_itertuples_to_list(self): + return list(self.df4.itertuples()) + + def peakmem_itertuples_to_list(self): + list(self.df4.itertuples()) + + def time_itertuples_raw_start(self): + self.df4.itertuples(index=False, name=None) + + def time_itertuples_raw_read_first(self): + next(self.df4.itertuples(index=False, name=None)) + + def time_itertuples_raw_tuples(self): + for row in self.df4.itertuples(index=False, name=None): + pass + + def time_itertuples_raw_tuples_to_list(self): + list(self.df4.itertuples(index=False, name=None)) + + def mem_itertuples_raw_start(self): + return self.df4.itertuples(index=False, name=None) + + def peakmem_itertuples_raw_start(self): + self.df4.itertuples(index=False, name=None) + + def peakmem_itertuples_raw_read_first(self): + next(self.df4.itertuples(index=False, name=None)) + + def peakmem_itertuples_raw(self): + for row in self.df4.itertuples(index=False, name=None): pass + def mem_itertuples_raw_to_list(self): + return list(self.df4.itertuples(index=False, name=None)) + + def peakmem_itertuples_raw_to_list(self): + list(self.df4.itertuples(index=False, name=None)) -class ToString(object): + def time_iterrows(self): + for row in self.df.iterrows(): + pass - goal_time = 0.2 +class ToString: def setup(self): self.df = DataFrame(np.random.randn(100, 10)) @@ -123,24 +207,18 @@ def time_to_string_floats(self): self.df.to_string() -class ToHTML(object): - - goal_time = 0.2 - +class ToHTML: def setup(self): nrows = 500 self.df2 = DataFrame(np.random.randn(nrows, 10)) - self.df2[0] = period_range('2000', periods=nrows) + self.df2[0] = period_range("2000", periods=nrows) self.df2[1] = range(nrows) def time_to_html_mixed(self): self.df2.to_html() -class Repr(object): - - goal_time = 0.2 - +class Repr: def setup(self): nrows = 10000 data = np.random.randn(nrows, 10) @@ -164,10 +242,7 @@ def time_frame_repr_wide(self): repr(self.df_wide) -class MaskBool(object): - - goal_time = 0.2 - +class MaskBool: def setup(self): data = np.random.randn(1000, 500) df = DataFrame(data) @@ -182,12 +257,9 @@ def time_frame_mask_floats(self): self.bools.astype(float).mask(self.mask) -class Isnull(object): - - goal_time = 0.2 - +class Isnull: def setup(self): - N = 10**3 + N = 10 ** 3 self.df_no_null = DataFrame(np.random.randn(N, N)) sample = np.array([np.nan, 1.0]) @@ -198,8 +270,20 @@ def setup(self): data = np.random.choice(sample, (N, N)) self.df_strings = DataFrame(data) - sample = np.array([NaT, np.nan, None, np.datetime64('NaT'), - np.timedelta64('NaT'), 0, 1, 2.0, '', 'abcd']) + sample = np.array( + [ + NaT, + np.nan, + None, + np.datetime64("NaT"), + np.timedelta64("NaT"), + 0, + 1, + 2.0, + "", + "abcd", + ] + ) data = np.random.choice(sample, (N, N)) self.df_obj = DataFrame(data) @@ -216,11 +300,10 @@ def time_isnull_obj(self): isnull(self.df_obj) -class Fillna(object): +class Fillna: - goal_time = 0.2 - params = ([True, False], ['pad', 'bfill']) - param_names = ['inplace', 'method'] + params = ([True, False], ["pad", "bfill"]) + param_names = ["inplace", "method"] def setup(self, inplace, method): values = np.random.randn(10000, 100) @@ -231,19 +314,19 @@ def time_frame_fillna(self, inplace, method): self.df.fillna(inplace=inplace, method=method) -class Dropna(object): +class Dropna: - goal_time = 0.2 - params = (['all', 'any'], [0, 1]) - param_names = ['how', 'axis'] + params = (["all", "any"], [0, 1]) + param_names = ["how", "axis"] def setup(self, how, axis): self.df = DataFrame(np.random.randn(10000, 1000)) - self.df.ix[50:1000, 20:50] = np.nan - self.df.ix[2000:3000] = np.nan - self.df.ix[:, 60:70] = np.nan + with warnings.catch_warnings(record=True): + self.df.ix[50:1000, 20:50] = np.nan + self.df.ix[2000:3000] = np.nan + self.df.ix[:, 60:70] = np.nan self.df_mixed = self.df.copy() - self.df_mixed['foo'] = 'bar' + self.df_mixed["foo"] = "bar" def time_dropna(self, how, axis): self.df.dropna(how=how, axis=axis) @@ -252,28 +335,28 @@ def time_dropna_axis_mixed_dtypes(self, how, axis): self.df_mixed.dropna(how=how, axis=axis) -class Count(object): - - goal_time = 0.2 +class Count: params = [0, 1] - param_names = ['axis'] + param_names = ["axis"] def setup(self, axis): self.df = DataFrame(np.random.randn(10000, 1000)) - self.df.ix[50:1000, 20:50] = np.nan - self.df.ix[2000:3000] = np.nan - self.df.ix[:, 60:70] = np.nan + with warnings.catch_warnings(record=True): + self.df.ix[50:1000, 20:50] = np.nan + self.df.ix[2000:3000] = np.nan + self.df.ix[:, 60:70] = np.nan self.df_mixed = self.df.copy() - self.df_mixed['foo'] = 'bar' + self.df_mixed["foo"] = "bar" self.df.index = MultiIndex.from_arrays([self.df.index, self.df.index]) - self.df.columns = MultiIndex.from_arrays([self.df.columns, - self.df.columns]) - self.df_mixed.index = MultiIndex.from_arrays([self.df_mixed.index, - self.df_mixed.index]) - self.df_mixed.columns = MultiIndex.from_arrays([self.df_mixed.columns, - self.df_mixed.columns]) + self.df.columns = MultiIndex.from_arrays([self.df.columns, self.df.columns]) + self.df_mixed.index = MultiIndex.from_arrays( + [self.df_mixed.index, self.df_mixed.index] + ) + self.df_mixed.columns = MultiIndex.from_arrays( + [self.df_mixed.columns, self.df_mixed.columns] + ) def time_count_level_multi(self, axis): self.df.count(axis=axis, level=1) @@ -282,16 +365,13 @@ def time_count_level_mixed_dtypes_multi(self, axis): self.df_mixed.count(axis=axis, level=1) -class Apply(object): - - goal_time = 0.2 - +class Apply: def setup(self): self.df = DataFrame(np.random.randn(1000, 100)) self.s = Series(np.arange(1028.0)) self.df2 = DataFrame({i: self.s for i in range(1028)}) - self.df3 = DataFrame(np.random.randn(1000, 3), columns=list('ABC')) + self.df3 = DataFrame(np.random.randn(1000, 3), columns=list("ABC")) def time_apply_user_func(self): self.df2.apply(lambda x: np.corrcoef(x, self.s)[(0, 1)]) @@ -309,13 +389,10 @@ def time_apply_pass_thru(self): self.df.apply(lambda x: x) def time_apply_ref_by_name(self): - self.df3.apply(lambda x: x['A'] + x['B'], axis=1) - - -class Dtypes(object): + self.df3.apply(lambda x: x["A"] + x["B"], axis=1) - goal_time = 0.2 +class Dtypes: def setup(self): self.df = DataFrame(np.random.randn(1000, 1000)) @@ -323,22 +400,19 @@ def time_frame_dtypes(self): self.df.dtypes -class Equals(object): - - goal_time = 0.2 - +class Equals: def setup(self): - N = 10**3 + N = 10 ** 3 self.float_df = DataFrame(np.random.randn(N, N)) self.float_df_nan = self.float_df.copy() self.float_df_nan.iloc[-1, -1] = np.nan - self.object_df = DataFrame('foo', index=range(N), columns=range(N)) + self.object_df = DataFrame("foo", index=range(N), columns=range(N)) self.object_df_nan = self.object_df.copy() self.object_df_nan.iloc[-1, -1] = np.nan self.nonunique_cols = self.object_df.copy() - self.nonunique_cols.columns = ['A'] * len(self.nonunique_cols.columns) + self.nonunique_cols.columns = ["A"] * len(self.nonunique_cols.columns) self.nonunique_cols_nan = self.nonunique_cols.copy() self.nonunique_cols_nan.iloc[-1, -1] = np.nan @@ -361,11 +435,10 @@ def time_frame_object_unequal(self): self.object_df.equals(self.object_df_nan) -class Interpolate(object): +class Interpolate: - goal_time = 0.2 - params = [None, 'infer'] - param_names = ['downcast'] + params = [None, "infer"] + param_names = ["downcast"] def setup(self, downcast): N = 10000 @@ -373,12 +446,16 @@ def setup(self, downcast): self.df = DataFrame(np.random.randn(N, 100)) self.df.values[::2] = np.nan - self.df2 = DataFrame({'A': np.arange(0, N), - 'B': np.random.randint(0, 100, N), - 'C': np.random.randn(N), - 'D': np.random.randn(N)}) - self.df2.loc[1::5, 'A'] = np.nan - self.df2.loc[1::5, 'C'] = np.nan + self.df2 = DataFrame( + { + "A": np.arange(0, N), + "B": np.random.randint(0, 100, N), + "C": np.random.randn(N), + "D": np.random.randn(N), + } + ) + self.df2.loc[1::5, "A"] = np.nan + self.df2.loc[1::5, "C"] = np.nan def time_interpolate(self, downcast): self.df.interpolate(downcast=downcast) @@ -387,11 +464,10 @@ def time_interpolate_some_good(self, downcast): self.df2.interpolate(downcast=downcast) -class Shift(object): +class Shift: # frame shift speedup issue-5609 - goal_time = 0.2 params = [0, 1] - param_names = ['axis'] + param_names = ["axis"] def setup(self, axis): self.df = DataFrame(np.random.rand(10000, 500)) @@ -400,8 +476,7 @@ def time_shift(self, axis): self.df.shift(1, axis=axis) -class Nunique(object): - +class Nunique: def setup(self): self.df = DataFrame(np.random.randn(10000, 1000)) @@ -409,17 +484,18 @@ def time_frame_nunique(self): self.df.nunique() -class Duplicated(object): - - goal_time = 0.2 - +class Duplicated: def setup(self): - n = (1 << 20) - t = date_range('2015-01-01', freq='S', periods=(n // 64)) + n = 1 << 20 + t = date_range("2015-01-01", freq="S", periods=(n // 64)) xs = np.random.randn(n // 64).round(2) - self.df = DataFrame({'a': np.random.randint(-1 << 8, 1 << 8, n), - 'b': np.random.choice(t, n), - 'c': np.random.choice(xs, n)}) + self.df = DataFrame( + { + "a": np.random.randint(-1 << 8, 1 << 8, n), + "b": np.random.choice(t, n), + "c": np.random.choice(xs, n), + } + ) self.df2 = DataFrame(np.random.randn(1000, 100).astype(str)).T def time_frame_duplicated(self): @@ -429,86 +505,108 @@ def time_frame_duplicated_wide(self): self.df2.duplicated() -class XS(object): +class XS: - goal_time = 0.2 params = [0, 1] - param_names = ['axis'] + param_names = ["axis"] def setup(self, axis): - self.N = 10**4 + self.N = 10 ** 4 self.df = DataFrame(np.random.randn(self.N, self.N)) def time_frame_xs(self, axis): self.df.xs(self.N / 2, axis=axis) -class SortValues(object): +class SortValues: - goal_time = 0.2 params = [True, False] - param_names = ['ascending'] + param_names = ["ascending"] def setup(self, ascending): - self.df = DataFrame(np.random.randn(1000000, 2), columns=list('AB')) + self.df = DataFrame(np.random.randn(1000000, 2), columns=list("AB")) def time_frame_sort_values(self, ascending): - self.df.sort_values(by='A', ascending=ascending) - - -class SortIndexByColumns(object): + self.df.sort_values(by="A", ascending=ascending) - goal_time = 0.2 +class SortIndexByColumns: def setup(self): N = 10000 K = 10 - self.df = DataFrame({'key1': tm.makeStringIndex(N).values.repeat(K), - 'key2': tm.makeStringIndex(N).values.repeat(K), - 'value': np.random.randn(N * K)}) + self.df = DataFrame( + { + "key1": tm.makeStringIndex(N).values.repeat(K), + "key2": tm.makeStringIndex(N).values.repeat(K), + "value": np.random.randn(N * K), + } + ) def time_frame_sort_values_by_columns(self): - self.df.sort_values(by=['key1', 'key2']) + self.df.sort_values(by=["key1", "key2"]) -class Quantile(object): +class Quantile: - goal_time = 0.2 params = [0, 1] - param_names = ['axis'] + param_names = ["axis"] def setup(self, axis): - self.df = DataFrame(np.random.randn(1000, 3), columns=list('ABC')) + self.df = DataFrame(np.random.randn(1000, 3), columns=list("ABC")) def time_frame_quantile(self, axis): self.df.quantile([0.1, 0.5], axis=axis) -class GetDtypeCounts(object): +class GetDtypeCounts: # 2807 - goal_time = 0.2 - def setup(self): self.df = DataFrame(np.random.randn(10, 10000)) def time_frame_get_dtype_counts(self): - self.df.get_dtype_counts() + with warnings.catch_warnings(record=True): + self.df.get_dtype_counts() def time_info(self): self.df.info() -class NSort(object): +class NSort: - goal_time = 0.2 - params = ['first', 'last'] - param_names = ['keep'] + params = ["first", "last", "all"] + param_names = ["keep"] def setup(self, keep): - self.df = DataFrame(np.random.randn(1000, 3), columns=list('ABC')) + self.df = DataFrame(np.random.randn(100000, 3), columns=list("ABC")) + + def time_nlargest_one_column(self, keep): + self.df.nlargest(100, "A", keep=keep) + + def time_nlargest_two_columns(self, keep): + self.df.nlargest(100, ["A", "B"], keep=keep) + + def time_nsmallest_one_column(self, keep): + self.df.nsmallest(100, "A", keep=keep) + + def time_nsmallest_two_columns(self, keep): + self.df.nsmallest(100, ["A", "B"], keep=keep) + + +class Describe: + def setup(self): + self.df = DataFrame( + { + "a": np.random.randint(0, 100, int(1e6)), + "b": np.random.randint(0, 100, int(1e6)), + "c": np.random.randint(0, 100, int(1e6)), + } + ) + + def time_series_describe(self): + self.df["a"].describe() + + def time_dataframe_describe(self): + self.df.describe() - def time_nlargest(self, keep): - self.df.nlargest(100, 'A', keep=keep) - def time_nsmallest(self, keep): - self.df.nsmallest(100, 'A', keep=keep) +from .pandas_vb_common import setup # noqa: F401 isort:skip diff --git a/asv_bench/benchmarks/gil.py b/asv_bench/benchmarks/gil.py index 21c1ccf46e1c4..d57492dd37268 100644 --- a/asv_bench/benchmarks/gil.py +++ b/asv_bench/benchmarks/gil.py @@ -1,10 +1,21 @@ import numpy as np -import pandas.util.testing as tm -from pandas import DataFrame, Series, read_csv, factorize, date_range + +from pandas import DataFrame, Series, date_range, factorize, read_csv from pandas.core.algorithms import take_1d +import pandas.util.testing as tm + try: - from pandas import (rolling_median, rolling_mean, rolling_min, rolling_max, - rolling_var, rolling_skew, rolling_kurt, rolling_std) + from pandas import ( + rolling_median, + rolling_mean, + rolling_min, + rolling_max, + rolling_var, + rolling_skew, + rolling_kurt, + rolling_std, + ) + have_rolling_methods = True except ImportError: have_rolling_methods = False @@ -14,6 +25,7 @@ from pandas import algos try: from pandas.util.testing import test_parallel + have_real_test_parallel = True except ImportError: have_real_test_parallel = False @@ -21,33 +33,36 @@ def test_parallel(num_threads=1): def wrapper(fname): return fname + return wrapper -from .pandas_vb_common import BaseIO, setup # noqa +from .pandas_vb_common import BaseIO # noqa: E402 isort:skip -class ParallelGroupbyMethods(object): - goal_time = 0.2 - params = ([2, 4, 8], ['count', 'last', 'max', 'mean', 'min', 'prod', - 'sum', 'var']) - param_names = ['threads', 'method'] +class ParallelGroupbyMethods: + + params = ([2, 4, 8], ["count", "last", "max", "mean", "min", "prod", "sum", "var"]) + param_names = ["threads", "method"] def setup(self, threads, method): if not have_real_test_parallel: raise NotImplementedError - N = 10**6 - ngroups = 10**3 - df = DataFrame({'key': np.random.randint(0, ngroups, size=N), - 'data': np.random.randn(N)}) + N = 10 ** 6 + ngroups = 10 ** 3 + df = DataFrame( + {"key": np.random.randint(0, ngroups, size=N), "data": np.random.randn(N)} + ) @test_parallel(num_threads=threads) def parallel(): - getattr(df.groupby('key')['data'], method)() + getattr(df.groupby("key")["data"], method)() + self.parallel = parallel def loop(): - getattr(df.groupby('key')['data'], method)() + getattr(df.groupby("key")["data"], method)() + self.loop = loop def time_parallel(self, threads, method): @@ -58,51 +73,51 @@ def time_loop(self, threads, method): self.loop() -class ParallelGroups(object): +class ParallelGroups: - goal_time = 0.2 params = [2, 4, 8] - param_names = ['threads'] + param_names = ["threads"] def setup(self, threads): if not have_real_test_parallel: raise NotImplementedError - size = 2**22 - ngroups = 10**3 + size = 2 ** 22 + ngroups = 10 ** 3 data = Series(np.random.randint(0, ngroups, size=size)) @test_parallel(num_threads=threads) def get_groups(): data.groupby(data).groups + self.get_groups = get_groups def time_get_groups(self, threads): self.get_groups() -class ParallelTake1D(object): +class ParallelTake1D: - goal_time = 0.2 - params = ['int64', 'float64'] - param_names = ['dtype'] + params = ["int64", "float64"] + param_names = ["dtype"] def setup(self, dtype): if not have_real_test_parallel: raise NotImplementedError - N = 10**6 - df = DataFrame({'col': np.arange(N, dtype=dtype)}) + N = 10 ** 6 + df = DataFrame({"col": np.arange(N, dtype=dtype)}) indexer = np.arange(100, len(df) - 100) @test_parallel(num_threads=2) def parallel_take1d(): - take_1d(df['col'].values, indexer) + take_1d(df["col"].values, indexer) + self.parallel_take1d = parallel_take1d def time_take1d(self, dtype): self.parallel_take1d() -class ParallelKth(object): +class ParallelKth: number = 1 repeat = 5 @@ -110,99 +125,105 @@ class ParallelKth(object): def setup(self): if not have_real_test_parallel: raise NotImplementedError - N = 10**7 - k = 5 * 10**5 - kwargs_list = [{'arr': np.random.randn(N)}, - {'arr': np.random.randn(N)}] + N = 10 ** 7 + k = 5 * 10 ** 5 + kwargs_list = [{"arr": np.random.randn(N)}, {"arr": np.random.randn(N)}] @test_parallel(num_threads=2, kwargs_list=kwargs_list) def parallel_kth_smallest(arr): algos.kth_smallest(arr, k) + self.parallel_kth_smallest = parallel_kth_smallest def time_kth_smallest(self): self.parallel_kth_smallest() -class ParallelDatetimeFields(object): - - goal_time = 0.2 - +class ParallelDatetimeFields: def setup(self): if not have_real_test_parallel: raise NotImplementedError - N = 10**6 - self.dti = date_range('1900-01-01', periods=N, freq='T') - self.period = self.dti.to_period('D') + N = 10 ** 6 + self.dti = date_range("1900-01-01", periods=N, freq="T") + self.period = self.dti.to_period("D") def time_datetime_field_year(self): @test_parallel(num_threads=2) def run(dti): dti.year + run(self.dti) def time_datetime_field_day(self): @test_parallel(num_threads=2) def run(dti): dti.day + run(self.dti) def time_datetime_field_daysinmonth(self): @test_parallel(num_threads=2) def run(dti): dti.days_in_month + run(self.dti) def time_datetime_field_normalize(self): @test_parallel(num_threads=2) def run(dti): dti.normalize() + run(self.dti) def time_datetime_to_period(self): @test_parallel(num_threads=2) def run(dti): - dti.to_period('S') + dti.to_period("S") + run(self.dti) def time_period_to_datetime(self): @test_parallel(num_threads=2) def run(period): period.to_timestamp() + run(self.period) -class ParallelRolling(object): +class ParallelRolling: - goal_time = 0.2 - params = ['median', 'mean', 'min', 'max', 'var', 'skew', 'kurt', 'std'] - param_names = ['method'] + params = ["median", "mean", "min", "max", "var", "skew", "kurt", "std"] + param_names = ["method"] def setup(self, method): if not have_real_test_parallel: raise NotImplementedError win = 100 arr = np.random.rand(100000) - if hasattr(DataFrame, 'rolling'): + if hasattr(DataFrame, "rolling"): df = DataFrame(arr).rolling(win) @test_parallel(num_threads=2) def parallel_rolling(): getattr(df, method)() + self.parallel_rolling = parallel_rolling elif have_rolling_methods: - rolling = {'median': rolling_median, - 'mean': rolling_mean, - 'min': rolling_min, - 'max': rolling_max, - 'var': rolling_var, - 'skew': rolling_skew, - 'kurt': rolling_kurt, - 'std': rolling_std} + rolling = { + "median": rolling_median, + "mean": rolling_mean, + "min": rolling_min, + "max": rolling_max, + "var": rolling_var, + "skew": rolling_skew, + "kurt": rolling_kurt, + "std": rolling_std, + } @test_parallel(num_threads=2) def parallel_rolling(): rolling[method](arr, win) + self.parallel_rolling = parallel_rolling else: raise NotImplementedError @@ -215,42 +236,46 @@ class ParallelReadCSV(BaseIO): number = 1 repeat = 5 - params = ['float', 'object', 'datetime'] - param_names = ['dtype'] + params = ["float", "object", "datetime"] + param_names = ["dtype"] def setup(self, dtype): if not have_real_test_parallel: raise NotImplementedError rows = 10000 cols = 50 - data = {'float': DataFrame(np.random.randn(rows, cols)), - 'datetime': DataFrame(np.random.randn(rows, cols), - index=date_range('1/1/2000', - periods=rows)), - 'object': DataFrame('foo', - index=range(rows), - columns=['object%03d'.format(i) - for i in range(5)])} - - self.fname = '__test_{}__.csv'.format(dtype) + data = { + "float": DataFrame(np.random.randn(rows, cols)), + "datetime": DataFrame( + np.random.randn(rows, cols), index=date_range("1/1/2000", periods=rows) + ), + "object": DataFrame( + "foo", + index=range(rows), + columns=["object%03d".format(i) for i in range(5)], + ), + } + + self.fname = "__test_{}__.csv".format(dtype) df = data[dtype] df.to_csv(self.fname) @test_parallel(num_threads=2) def parallel_read_csv(): read_csv(self.fname) + self.parallel_read_csv = parallel_read_csv def time_read_csv(self, dtype): self.parallel_read_csv() -class ParallelFactorize(object): +class ParallelFactorize: number = 1 repeat = 5 params = [2, 4, 8] - param_names = ['threads'] + param_names = ["threads"] def setup(self, threads): if not have_real_test_parallel: @@ -261,10 +286,12 @@ def setup(self, threads): @test_parallel(num_threads=threads) def parallel(): factorize(strings) + self.parallel = parallel def loop(): factorize(strings) + self.loop = loop def time_parallel(self, threads): @@ -273,3 +300,6 @@ def time_parallel(self, threads): def time_loop(self, threads): for i in range(threads): self.loop() + + +from .pandas_vb_common import setup # noqa: F401 isort:skip diff --git a/asv_bench/benchmarks/groupby.py b/asv_bench/benchmarks/groupby.py index 61db39528a5fb..d51c53e2264f1 100644 --- a/asv_bench/benchmarks/groupby.py +++ b/asv_bench/benchmarks/groupby.py @@ -1,48 +1,92 @@ -import warnings -from string import ascii_letters -from itertools import product from functools import partial +from itertools import product +from string import ascii_letters import numpy as np -from pandas import (DataFrame, Series, MultiIndex, date_range, period_range, - TimeGrouper, Categorical) -import pandas.util.testing as tm -from .pandas_vb_common import setup # noqa - - -class ApplyDictReturn(object): - goal_time = 0.2 +from pandas import ( + Categorical, + DataFrame, + MultiIndex, + Series, + Timestamp, + date_range, + period_range, +) +import pandas.util.testing as tm +method_blacklist = { + "object": { + "median", + "prod", + "sem", + "cumsum", + "sum", + "cummin", + "mean", + "max", + "skew", + "cumprod", + "cummax", + "rank", + "pct_change", + "min", + "var", + "mad", + "describe", + "std", + "quantile", + }, + "datetime": { + "median", + "prod", + "sem", + "cumsum", + "sum", + "mean", + "skew", + "cumprod", + "cummax", + "pct_change", + "var", + "mad", + "describe", + "std", + }, +} + + +class ApplyDictReturn: def setup(self): self.labels = np.arange(1000).repeat(10) self.data = Series(np.random.randn(len(self.labels))) def time_groupby_apply_dict_return(self): - self.data.groupby(self.labels).apply(lambda x: {'first': x.values[0], - 'last': x.values[-1]}) - + self.data.groupby(self.labels).apply( + lambda x: {"first": x.values[0], "last": x.values[-1]} + ) -class Apply(object): - - goal_time = 0.2 +class Apply: def setup_cache(self): - N = 10**4 + N = 10 ** 4 labels = np.random.randint(0, 2000, size=N) labels2 = np.random.randint(0, 3, size=N) - df = DataFrame({'key': labels, - 'key2': labels2, - 'value1': np.random.randn(N), - 'value2': ['foo', 'bar', 'baz', 'qux'] * (N // 4) - }) + df = DataFrame( + { + "key": labels, + "key2": labels2, + "value1": np.random.randn(N), + "value2": ["foo", "bar", "baz", "qux"] * (N // 4), + } + ) return df def time_scalar_function_multi_col(self, df): - df.groupby(['key', 'key2']).apply(lambda x: 1) + df.groupby(["key", "key2"]).apply(lambda x: 1) def time_scalar_function_single_col(self, df): - df.groupby('key').apply(lambda x: 1) + df.groupby("key").apply(lambda x: 1) @staticmethod def df_copy_function(g): @@ -51,29 +95,29 @@ def df_copy_function(g): return g.copy() def time_copy_function_multi_col(self, df): - df.groupby(['key', 'key2']).apply(self.df_copy_function) + df.groupby(["key", "key2"]).apply(self.df_copy_function) def time_copy_overhead_single_col(self, df): - df.groupby('key').apply(self.df_copy_function) - + df.groupby("key").apply(self.df_copy_function) -class Groups(object): - goal_time = 0.2 +class Groups: - param_names = ['key'] - params = ['int64_small', 'int64_large', 'object_small', 'object_large'] + param_names = ["key"] + params = ["int64_small", "int64_large", "object_small", "object_large"] def setup_cache(self): - size = 10**6 - data = {'int64_small': Series(np.random.randint(0, 100, size=size)), - 'int64_large': Series(np.random.randint(0, 10000, size=size)), - 'object_small': Series( - tm.makeStringIndex(100).take( - np.random.randint(0, 100, size=size))), - 'object_large': Series( - tm.makeStringIndex(10000).take( - np.random.randint(0, 10000, size=size)))} + size = 10 ** 6 + data = { + "int64_small": Series(np.random.randint(0, 100, size=size)), + "int64_large": Series(np.random.randint(0, 10000, size=size)), + "object_small": Series( + tm.makeStringIndex(100).take(np.random.randint(0, 100, size=size)) + ), + "object_large": Series( + tm.makeStringIndex(10000).take(np.random.randint(0, 10000, size=size)) + ), + } return data def setup(self, data, key): @@ -83,50 +127,10 @@ def time_series_groups(self, data, key): self.ser.groupby(self.ser).groups -class FirstLast(object): - - goal_time = 0.2 - - param_names = ['dtype'] - params = ['float32', 'float64', 'datetime', 'object'] - - def setup(self, dtype): - N = 10**5 - # with datetimes (GH7555) - if dtype == 'datetime': - self.df = DataFrame({'values': date_range('1/1/2011', - periods=N, - freq='s'), - 'key': range(N)}) - elif dtype == 'object': - self.df = DataFrame({'values': ['foo'] * N, - 'key': range(N)}) - else: - labels = np.arange(N / 10).repeat(10) - data = Series(np.random.randn(len(labels)), dtype=dtype) - data[::3] = np.nan - data[1::3] = np.nan - labels = labels.take(np.random.permutation(len(labels))) - self.df = DataFrame({'values': data, 'key': labels}) - - def time_groupby_first(self, dtype): - self.df.groupby('key').first() - - def time_groupby_last(self, dtype): - self.df.groupby('key').last() - - def time_groupby_nth_all(self, dtype): - self.df.groupby('key').nth(0, dropna='all') - - def time_groupby_nth_none(self, dtype): - self.df.groupby('key').nth(0) - +class GroupManyLabels: -class GroupManyLabels(object): - - goal_time = 0.2 params = [1, 1000] - param_names = ['ncols'] + param_names = ["ncols"] def setup(self, ncols): N = 1000 @@ -138,50 +142,47 @@ def time_sum(self, ncols): self.df.groupby(self.labels).sum() -class Nth(object): - - goal_time = 0.2 - - def setup_cache(self): - df = DataFrame(np.random.randint(1, 100, (10000, 2))) - df.iloc[1, 1] = np.nan - return df - - def time_frame_nth_any(self, df): - df.groupby(0).nth(0, dropna='any') - - def time_frame_nth(self, df): - df.groupby(0).nth(0) +class Nth: - def time_series_nth_any(self, df): - df[1].groupby(df[0]).nth(0, dropna='any') - - def time_series_nth(self, df): - df[1].groupby(df[0]).nth(0) + param_names = ["dtype"] + params = ["float32", "float64", "datetime", "object"] + def setup(self, dtype): + N = 10 ** 5 + # with datetimes (GH7555) + if dtype == "datetime": + values = date_range("1/1/2011", periods=N, freq="s") + elif dtype == "object": + values = ["foo"] * N + else: + values = np.arange(N).astype(dtype) -class NthObject(object): + key = np.arange(N) + self.df = DataFrame({"key": key, "values": values}) + self.df.iloc[1, 1] = np.nan # insert missing data - goal_time = 0.2 + def time_frame_nth_any(self, dtype): + self.df.groupby("key").nth(0, dropna="any") - def setup_cache(self): - df = DataFrame(np.random.randint(1, 100, (10000,)), columns=['g']) - df['obj'] = ['a'] * 5000 + ['b'] * 5000 - return df + def time_groupby_nth_all(self, dtype): + self.df.groupby("key").nth(0, dropna="all") - def time_nth(self, df): - df.groupby('g').nth(5) + def time_frame_nth(self, dtype): + self.df.groupby("key").nth(0) - def time_nth_last(self, df): - df.groupby('g').last() + def time_series_nth_any(self, dtype): + self.df["values"].groupby(self.df["key"]).nth(0, dropna="any") + def time_series_nth_all(self, dtype): + self.df["values"].groupby(self.df["key"]).nth(0, dropna="all") -class DateAttributes(object): + def time_series_nth(self, dtype): + self.df["values"].groupby(self.df["key"]).nth(0) - goal_time = 0.2 +class DateAttributes: def setup(self): - rng = date_range('1/1/2000', '12/31/2005', freq='H') + rng = date_range("1/1/2000", "12/31/2005", freq="H") self.year, self.month, self.day = rng.year, rng.month, rng.day self.ts = Series(np.random.randn(len(rng)), index=rng) @@ -189,215 +190,276 @@ def time_len_groupby_object(self): len(self.ts.groupby([self.year, self.month, self.day])) -class Int64(object): - - goal_time = 0.2 - +class Int64: def setup(self): arr = np.random.randint(-1 << 12, 1 << 12, (1 << 17, 5)) i = np.random.choice(len(arr), len(arr) * 5) arr = np.vstack((arr, arr[i])) i = np.random.permutation(len(arr)) arr = arr[i] - self.cols = list('abcde') + self.cols = list("abcde") self.df = DataFrame(arr, columns=self.cols) - self.df['jim'], self.df['joe'] = np.random.randn(2, len(self.df)) * 10 + self.df["jim"], self.df["joe"] = np.random.randn(2, len(self.df)) * 10 def time_overflow(self): self.df.groupby(self.cols).max() -class CountMultiDtype(object): - - goal_time = 0.2 - +class CountMultiDtype: def setup_cache(self): n = 10000 - offsets = np.random.randint(n, size=n).astype('timedelta64[ns]') - dates = np.datetime64('now') + offsets - dates[np.random.rand(n) > 0.5] = np.datetime64('nat') - offsets[np.random.rand(n) > 0.5] = np.timedelta64('nat') + offsets = np.random.randint(n, size=n).astype("timedelta64[ns]") + dates = np.datetime64("now") + offsets + dates[np.random.rand(n) > 0.5] = np.datetime64("nat") + offsets[np.random.rand(n) > 0.5] = np.timedelta64("nat") value2 = np.random.randn(n) value2[np.random.rand(n) > 0.5] = np.nan - obj = np.random.choice(list('ab'), size=n).astype(object) + obj = np.random.choice(list("ab"), size=n).astype(object) obj[np.random.randn(n) > 0.5] = np.nan - df = DataFrame({'key1': np.random.randint(0, 500, size=n), - 'key2': np.random.randint(0, 100, size=n), - 'dates': dates, - 'value2': value2, - 'value3': np.random.randn(n), - 'ints': np.random.randint(0, 1000, size=n), - 'obj': obj, - 'offsets': offsets}) + df = DataFrame( + { + "key1": np.random.randint(0, 500, size=n), + "key2": np.random.randint(0, 100, size=n), + "dates": dates, + "value2": value2, + "value3": np.random.randn(n), + "ints": np.random.randint(0, 1000, size=n), + "obj": obj, + "offsets": offsets, + } + ) return df def time_multi_count(self, df): - df.groupby(['key1', 'key2']).count() + df.groupby(["key1", "key2"]).count() -class CountInt(object): - - goal_time = 0.2 - +class CountMultiInt: def setup_cache(self): n = 10000 - df = DataFrame({'key1': np.random.randint(0, 500, size=n), - 'key2': np.random.randint(0, 100, size=n), - 'ints': np.random.randint(0, 1000, size=n), - 'ints2': np.random.randint(0, 1000, size=n)}) + df = DataFrame( + { + "key1": np.random.randint(0, 500, size=n), + "key2": np.random.randint(0, 100, size=n), + "ints": np.random.randint(0, 1000, size=n), + "ints2": np.random.randint(0, 1000, size=n), + } + ) return df - def time_int_count(self, df): - df.groupby(['key1', 'key2']).count() - - def time_int_nunique(self, df): - df.groupby(['key1', 'key2']).nunique() - + def time_multi_int_count(self, df): + df.groupby(["key1", "key2"]).count() -class AggFunctions(object): + def time_multi_int_nunique(self, df): + df.groupby(["key1", "key2"]).nunique() - goal_time = 0.2 +class AggFunctions: def setup_cache(self): - N = 10**5 - fac1 = np.array(['A', 'B', 'C'], dtype='O') - fac2 = np.array(['one', 'two'], dtype='O') - df = DataFrame({'key1': fac1.take(np.random.randint(0, 3, size=N)), - 'key2': fac2.take(np.random.randint(0, 2, size=N)), - 'value1': np.random.randn(N), - 'value2': np.random.randn(N), - 'value3': np.random.randn(N)}) + N = 10 ** 5 + fac1 = np.array(["A", "B", "C"], dtype="O") + fac2 = np.array(["one", "two"], dtype="O") + df = DataFrame( + { + "key1": fac1.take(np.random.randint(0, 3, size=N)), + "key2": fac2.take(np.random.randint(0, 2, size=N)), + "value1": np.random.randn(N), + "value2": np.random.randn(N), + "value3": np.random.randn(N), + } + ) return df def time_different_str_functions(self, df): - df.groupby(['key1', 'key2']).agg({'value1': 'mean', - 'value2': 'var', - 'value3': 'sum'}) + df.groupby(["key1", "key2"]).agg( + {"value1": "mean", "value2": "var", "value3": "sum"} + ) def time_different_numpy_functions(self, df): - df.groupby(['key1', 'key2']).agg({'value1': np.mean, - 'value2': np.var, - 'value3': np.sum}) + df.groupby(["key1", "key2"]).agg( + {"value1": np.mean, "value2": np.var, "value3": np.sum} + ) def time_different_python_functions_multicol(self, df): - df.groupby(['key1', 'key2']).agg([sum, min, max]) + df.groupby(["key1", "key2"]).agg([sum, min, max]) def time_different_python_functions_singlecol(self, df): - df.groupby('key1').agg([sum, min, max]) + df.groupby("key1").agg([sum, min, max]) -class GroupStrings(object): - - goal_time = 0.2 - +class GroupStrings: def setup(self): - n = 2 * 10**5 - alpha = list(map(''.join, product(ascii_letters, repeat=4))) + n = 2 * 10 ** 5 + alpha = list(map("".join, product(ascii_letters, repeat=4))) data = np.random.choice(alpha, (n // 5, 4), replace=False) data = np.repeat(data, 5, axis=0) - self.df = DataFrame(data, columns=list('abcd')) - self.df['joe'] = (np.random.randn(len(self.df)) * 10).round(3) + self.df = DataFrame(data, columns=list("abcd")) + self.df["joe"] = (np.random.randn(len(self.df)) * 10).round(3) self.df = self.df.sample(frac=1).reset_index(drop=True) def time_multi_columns(self): - self.df.groupby(list('abcd')).max() - + self.df.groupby(list("abcd")).max() -class MultiColumn(object): - - goal_time = 0.2 +class MultiColumn: def setup_cache(self): - N = 10**5 + N = 10 ** 5 key1 = np.tile(np.arange(100, dtype=object), 1000) key2 = key1.copy() np.random.shuffle(key1) np.random.shuffle(key2) - df = DataFrame({'key1': key1, - 'key2': key2, - 'data1': np.random.randn(N), - 'data2': np.random.randn(N)}) + df = DataFrame( + { + "key1": key1, + "key2": key2, + "data1": np.random.randn(N), + "data2": np.random.randn(N), + } + ) return df def time_lambda_sum(self, df): - df.groupby(['key1', 'key2']).agg(lambda x: x.values.sum()) + df.groupby(["key1", "key2"]).agg(lambda x: x.values.sum()) def time_cython_sum(self, df): - df.groupby(['key1', 'key2']).sum() + df.groupby(["key1", "key2"]).sum() def time_col_select_lambda_sum(self, df): - df.groupby(['key1', 'key2'])['data1'].agg(lambda x: x.values.sum()) + df.groupby(["key1", "key2"])["data1"].agg(lambda x: x.values.sum()) def time_col_select_numpy_sum(self, df): - df.groupby(['key1', 'key2'])['data1'].agg(np.sum) - + df.groupby(["key1", "key2"])["data1"].agg(np.sum) -class Size(object): - - goal_time = 0.2 +class Size: def setup(self): - n = 10**5 - offsets = np.random.randint(n, size=n).astype('timedelta64[ns]') - dates = np.datetime64('now') + offsets - self.df = DataFrame({'key1': np.random.randint(0, 500, size=n), - 'key2': np.random.randint(0, 100, size=n), - 'value1': np.random.randn(n), - 'value2': np.random.randn(n), - 'value3': np.random.randn(n), - 'dates': dates}) + n = 10 ** 5 + offsets = np.random.randint(n, size=n).astype("timedelta64[ns]") + dates = np.datetime64("now") + offsets + self.df = DataFrame( + { + "key1": np.random.randint(0, 500, size=n), + "key2": np.random.randint(0, 100, size=n), + "value1": np.random.randn(n), + "value2": np.random.randn(n), + "value3": np.random.randn(n), + "dates": dates, + } + ) self.draws = Series(np.random.randn(n)) - labels = Series(['foo', 'bar', 'baz', 'qux'] * (n // 4)) - self.cats = labels.astype('category') + labels = Series(["foo", "bar", "baz", "qux"] * (n // 4)) + self.cats = labels.astype("category") def time_multi_size(self): - self.df.groupby(['key1', 'key2']).size() - - def time_dt_size(self): - self.df.groupby(['dates']).size() - - def time_dt_timegrouper_size(self): - with warnings.catch_warnings(record=True): - self.df.groupby(TimeGrouper(key='dates', freq='M')).size() + self.df.groupby(["key1", "key2"]).size() def time_category_size(self): self.draws.groupby(self.cats).size() -class GroupByMethods(object): - - goal_time = 0.2 - - param_names = ['dtype', 'method'] - params = [['int', 'float'], - ['all', 'any', 'count', 'cumcount', 'cummax', 'cummin', - 'cumprod', 'cumsum', 'describe', 'first', 'head', 'last', 'mad', - 'max', 'min', 'median', 'mean', 'nunique', 'pct_change', 'prod', - 'rank', 'sem', 'shift', 'size', 'skew', 'std', 'sum', 'tail', - 'unique', 'value_counts', 'var']] - - def setup(self, dtype, method): +class GroupByMethods: + + param_names = ["dtype", "method", "application"] + params = [ + ["int", "float", "object", "datetime"], + [ + "all", + "any", + "bfill", + "count", + "cumcount", + "cummax", + "cummin", + "cumprod", + "cumsum", + "describe", + "ffill", + "first", + "head", + "last", + "mad", + "max", + "min", + "median", + "mean", + "nunique", + "pct_change", + "prod", + "quantile", + "rank", + "sem", + "shift", + "size", + "skew", + "std", + "sum", + "tail", + "unique", + "value_counts", + "var", + ], + ["direct", "transformation"], + ] + + def setup(self, dtype, method, application): + if method in method_blacklist.get(dtype, {}): + raise NotImplementedError # skip benchmark ngroups = 1000 size = ngroups * 2 rng = np.arange(ngroups) values = rng.take(np.random.randint(0, ngroups, size=size)) - if dtype == 'int': + if dtype == "int": key = np.random.randint(0, size, size=size) + elif dtype == "float": + key = np.concatenate( + [np.random.random(ngroups) * 0.1, np.random.random(ngroups) * 10.0] + ) + elif dtype == "object": + key = ["foo"] * size + elif dtype == "datetime": + key = date_range("1/1/2011", periods=size, freq="s") + + df = DataFrame({"values": values, "key": key}) + + if application == "transform": + if method == "describe": + raise NotImplementedError + + self.as_group_method = lambda: df.groupby("key")["values"].transform(method) + self.as_field_method = lambda: df.groupby("values")["key"].transform(method) else: - key = np.concatenate([np.random.random(ngroups) * 0.1, - np.random.random(ngroups) * 10.0]) + self.as_group_method = getattr(df.groupby("key")["values"], method) + self.as_field_method = getattr(df.groupby("values")["key"], method) - df = DataFrame({'values': values, 'key': key}) - self.df_groupby_method = getattr(df.groupby('key')['values'], method) + def time_dtype_as_group(self, dtype, method, application): + self.as_group_method() - def time_method(self, dtype, method): - self.df_groupby_method() + def time_dtype_as_field(self, dtype, method, application): + self.as_field_method() -class Float32(object): - # GH 13335 - goal_time = 0.2 +class RankWithTies: + # GH 21237 + param_names = ["dtype", "tie_method"] + params = [ + ["float64", "float32", "int64", "datetime64"], + ["first", "average", "dense", "min", "max"], + ] + + def setup(self, dtype, tie_method): + N = 10 ** 4 + if dtype == "datetime64": + data = np.array([Timestamp("2011/01/01")] * N, dtype=dtype) + else: + data = np.array([1] * N, dtype=dtype) + self.df = DataFrame({"values": data, "key": ["foo"] * N}) + def time_rank_ties(self, dtype, tie_method): + self.df.groupby("key").rank(method=tie_method) + + +class Float32: + # GH 13335 def setup(self): tmp1 = (np.random.random(10000) * 0.1).astype(np.float32) tmp2 = (np.random.random(10000) * 10.0).astype(np.float32) @@ -406,166 +468,161 @@ def setup(self): self.df = DataFrame(dict(a=arr, b=arr)) def time_sum(self): - self.df.groupby(['a'])['b'].sum() - + self.df.groupby(["a"])["b"].sum() -class Categories(object): - - goal_time = 0.2 +class Categories: def setup(self): - N = 10**5 + N = 10 ** 5 arr = np.random.random(N) - data = {'a': Categorical(np.random.randint(10000, size=N)), - 'b': arr} + data = {"a": Categorical(np.random.randint(10000, size=N)), "b": arr} self.df = DataFrame(data) - data = {'a': Categorical(np.random.randint(10000, size=N), - ordered=True), - 'b': arr} + data = { + "a": Categorical(np.random.randint(10000, size=N), ordered=True), + "b": arr, + } self.df_ordered = DataFrame(data) - data = {'a': Categorical(np.random.randint(100, size=N), - categories=np.arange(10000)), - 'b': arr} + data = { + "a": Categorical( + np.random.randint(100, size=N), categories=np.arange(10000) + ), + "b": arr, + } self.df_extra_cat = DataFrame(data) def time_groupby_sort(self): - self.df.groupby('a')['b'].count() + self.df.groupby("a")["b"].count() def time_groupby_nosort(self): - self.df.groupby('a', sort=False)['b'].count() + self.df.groupby("a", sort=False)["b"].count() def time_groupby_ordered_sort(self): - self.df_ordered.groupby('a')['b'].count() + self.df_ordered.groupby("a")["b"].count() def time_groupby_ordered_nosort(self): - self.df_ordered.groupby('a', sort=False)['b'].count() + self.df_ordered.groupby("a", sort=False)["b"].count() def time_groupby_extra_cat_sort(self): - self.df_extra_cat.groupby('a')['b'].count() + self.df_extra_cat.groupby("a")["b"].count() def time_groupby_extra_cat_nosort(self): - self.df_extra_cat.groupby('a', sort=False)['b'].count() + self.df_extra_cat.groupby("a", sort=False)["b"].count() -class Datelike(object): +class Datelike: # GH 14338 - goal_time = 0.2 - params = ['period_range', 'date_range', 'date_range_tz'] - param_names = ['grouper'] + params = ["period_range", "date_range", "date_range_tz"] + param_names = ["grouper"] def setup(self, grouper): - N = 10**4 - rng_map = {'period_range': period_range, - 'date_range': date_range, - 'date_range_tz': partial(date_range, tz='US/Central')} - self.grouper = rng_map[grouper]('1900-01-01', freq='D', periods=N) - self.df = DataFrame(np.random.randn(10**4, 2)) + N = 10 ** 4 + rng_map = { + "period_range": period_range, + "date_range": date_range, + "date_range_tz": partial(date_range, tz="US/Central"), + } + self.grouper = rng_map[grouper]("1900-01-01", freq="D", periods=N) + self.df = DataFrame(np.random.randn(10 ** 4, 2)) def time_sum(self, grouper): self.df.groupby(self.grouper).sum() -class SumBools(object): +class SumBools: # GH 2692 - goal_time = 0.2 - def setup(self): N = 500 - self.df = DataFrame({'ii': range(N), - 'bb': [True] * N}) + self.df = DataFrame({"ii": range(N), "bb": [True] * N}) def time_groupby_sum_booleans(self): - self.df.groupby('ii').sum() + self.df.groupby("ii").sum() -class SumMultiLevel(object): +class SumMultiLevel: # GH 9049 - goal_time = 0.2 timeout = 120.0 def setup(self): N = 50 - self.df = DataFrame({'A': list(range(N)) * 2, - 'B': range(N * 2), - 'C': 1}).set_index(['A', 'B']) + self.df = DataFrame( + {"A": list(range(N)) * 2, "B": range(N * 2), "C": 1} + ).set_index(["A", "B"]) def time_groupby_sum_multiindex(self): self.df.groupby(level=[0, 1]).sum() -class Transform(object): - - goal_time = 0.2 - +class Transform: def setup(self): n1 = 400 n2 = 250 - index = MultiIndex(levels=[np.arange(n1), tm.makeStringIndex(n2)], - labels=[np.repeat(range(n1), n2).tolist(), - list(range(n2)) * n1], - names=['lev1', 'lev2']) + index = MultiIndex( + levels=[np.arange(n1), tm.makeStringIndex(n2)], + codes=[np.repeat(range(n1), n2).tolist(), list(range(n2)) * n1], + names=["lev1", "lev2"], + ) arr = np.random.randn(n1 * n2, 3) arr[::10000, 0] = np.nan arr[1::10000, 1] = np.nan arr[2::10000, 2] = np.nan - data = DataFrame(arr, index=index, columns=['col1', 'col20', 'col3']) + data = DataFrame(arr, index=index, columns=["col1", "col20", "col3"]) self.df = data n = 20000 - self.df1 = DataFrame(np.random.randint(1, n, (n, 3)), - columns=['jim', 'joe', 'jolie']) + self.df1 = DataFrame( + np.random.randint(1, n, (n, 3)), columns=["jim", "joe", "jolie"] + ) self.df2 = self.df1.copy() - self.df2['jim'] = self.df2['joe'] + self.df2["jim"] = self.df2["joe"] - self.df3 = DataFrame(np.random.randint(1, (n / 10), (n, 3)), - columns=['jim', 'joe', 'jolie']) + self.df3 = DataFrame( + np.random.randint(1, (n / 10), (n, 3)), columns=["jim", "joe", "jolie"] + ) self.df4 = self.df3.copy() - self.df4['jim'] = self.df4['joe'] + self.df4["jim"] = self.df4["joe"] def time_transform_lambda_max(self): - self.df.groupby(level='lev1').transform(lambda x: max(x)) + self.df.groupby(level="lev1").transform(lambda x: max(x)) def time_transform_ufunc_max(self): - self.df.groupby(level='lev1').transform(np.max) + self.df.groupby(level="lev1").transform(np.max) def time_transform_multi_key1(self): - self.df1.groupby(['jim', 'joe'])['jolie'].transform('max') + self.df1.groupby(["jim", "joe"])["jolie"].transform("max") def time_transform_multi_key2(self): - self.df2.groupby(['jim', 'joe'])['jolie'].transform('max') + self.df2.groupby(["jim", "joe"])["jolie"].transform("max") def time_transform_multi_key3(self): - self.df3.groupby(['jim', 'joe'])['jolie'].transform('max') + self.df3.groupby(["jim", "joe"])["jolie"].transform("max") def time_transform_multi_key4(self): - self.df4.groupby(['jim', 'joe'])['jolie'].transform('max') - + self.df4.groupby(["jim", "joe"])["jolie"].transform("max") -class TransformBools(object): - - goal_time = 0.2 +class TransformBools: def setup(self): N = 120000 transition_points = np.sort(np.random.choice(np.arange(N), 1400)) transitions = np.zeros(N, dtype=np.bool) transitions[transition_points] = True self.g = transitions.cumsum() - self.df = DataFrame({'signal': np.random.rand(N)}) + self.df = DataFrame({"signal": np.random.rand(N)}) def time_transform_mean(self): - self.df['signal'].groupby(self.g).transform(np.mean) + self.df["signal"].groupby(self.g).transform(np.mean) -class TransformNaN(object): +class TransformNaN: # GH 12737 - goal_time = 0.2 - def setup(self): - self.df_nans = DataFrame({'key': np.repeat(np.arange(1000), 10), - 'B': np.nan, - 'C': np.nan}) - self.df_nans.loc[4::10, 'B':'C'] = 5 + self.df_nans = DataFrame( + {"key": np.repeat(np.arange(1000), 10), "B": np.nan, "C": np.nan} + ) + self.df_nans.loc[4::10, "B":"C"] = 5 def time_first(self): - self.df_nans.groupby('key').transform('first') + self.df_nans.groupby("key").transform("first") + + +from .pandas_vb_common import setup # noqa: F401 isort:skip diff --git a/asv_bench/benchmarks/index_cached_properties.py b/asv_bench/benchmarks/index_cached_properties.py new file mode 100644 index 0000000000000..13b33855569c9 --- /dev/null +++ b/asv_bench/benchmarks/index_cached_properties.py @@ -0,0 +1,75 @@ +import pandas as pd + + +class IndexCache: + number = 1 + repeat = (3, 100, 20) + + params = [ + [ + "DatetimeIndex", + "Float64Index", + "IntervalIndex", + "Int64Index", + "MultiIndex", + "PeriodIndex", + "RangeIndex", + "TimedeltaIndex", + "UInt64Index", + ] + ] + param_names = ["index_type"] + + def setup(self, index_type): + N = 10 ** 5 + if index_type == "MultiIndex": + self.idx = pd.MultiIndex.from_product( + [pd.date_range("1/1/2000", freq="T", periods=N // 2), ["a", "b"]] + ) + elif index_type == "DatetimeIndex": + self.idx = pd.date_range("1/1/2000", freq="T", periods=N) + elif index_type == "Int64Index": + self.idx = pd.Index(range(N)) + elif index_type == "PeriodIndex": + self.idx = pd.period_range("1/1/2000", freq="T", periods=N) + elif index_type == "RangeIndex": + self.idx = pd.RangeIndex(start=0, stop=N) + elif index_type == "IntervalIndex": + self.idx = pd.IntervalIndex.from_arrays(range(N), range(1, N + 1)) + elif index_type == "TimedeltaIndex": + self.idx = pd.TimedeltaIndex(range(N)) + elif index_type == "Float64Index": + self.idx = pd.Float64Index(range(N)) + elif index_type == "UInt64Index": + self.idx = pd.UInt64Index(range(N)) + else: + raise ValueError + assert len(self.idx) == N + self.idx._cache = {} + + def time_values(self, index_type): + self.idx._values + + def time_shape(self, index_type): + self.idx.shape + + def time_is_monotonic(self, index_type): + self.idx.is_monotonic + + def time_is_monotonic_decreasing(self, index_type): + self.idx.is_monotonic_decreasing + + def time_is_monotonic_increasing(self, index_type): + self.idx.is_monotonic_increasing + + def time_is_unique(self, index_type): + self.idx.is_unique + + def time_engine(self, index_type): + self.idx._engine + + def time_inferred_type(self, index_type): + self.idx.inferred_type + + def time_is_all_dates(self, index_type): + self.idx.is_all_dates diff --git a/asv_bench/benchmarks/index_object.py b/asv_bench/benchmarks/index_object.py index f1703e163917a..a94960d494707 100644 --- a/asv_bench/benchmarks/index_object.py +++ b/asv_bench/benchmarks/index_object.py @@ -1,43 +1,50 @@ +import gc + import numpy as np -import pandas.util.testing as tm -from pandas import (Series, date_range, DatetimeIndex, Index, RangeIndex, - Float64Index) -from .pandas_vb_common import setup # noqa +from pandas import ( + DatetimeIndex, + Float64Index, + Index, + IntervalIndex, + RangeIndex, + Series, + date_range, +) +import pandas.util.testing as tm -class SetOperations(object): +class SetOperations: - goal_time = 0.2 - params = (['datetime', 'date_string', 'int', 'strings'], - ['intersection', 'union', 'symmetric_difference']) - param_names = ['dtype', 'method'] + params = ( + ["datetime", "date_string", "int", "strings"], + ["intersection", "union", "symmetric_difference"], + ) + param_names = ["dtype", "method"] def setup(self, dtype, method): - N = 10**5 - dates_left = date_range('1/1/2000', periods=N, freq='T') - fmt = '%Y-%m-%d %H:%M:%S' + N = 10 ** 5 + dates_left = date_range("1/1/2000", periods=N, freq="T") + fmt = "%Y-%m-%d %H:%M:%S" date_str_left = Index(dates_left.strftime(fmt)) int_left = Index(np.arange(N)) str_left = tm.makeStringIndex(N) - data = {'datetime': {'left': dates_left, 'right': dates_left[:-1]}, - 'date_string': {'left': date_str_left, - 'right': date_str_left[:-1]}, - 'int': {'left': int_left, 'right': int_left[:-1]}, - 'strings': {'left': str_left, 'right': str_left[:-1]}} - self.left = data[dtype]['left'] - self.right = data[dtype]['right'] + data = { + "datetime": {"left": dates_left, "right": dates_left[:-1]}, + "date_string": {"left": date_str_left, "right": date_str_left[:-1]}, + "int": {"left": int_left, "right": int_left[:-1]}, + "strings": {"left": str_left, "right": str_left[:-1]}, + } + self.left = data[dtype]["left"] + self.right = data[dtype]["right"] def time_operation(self, dtype, method): getattr(self.left, method)(self.right) -class SetDisjoint(object): - - goal_time = 0.2 - +class SetDisjoint: def setup(self): - N = 10**5 + N = 10 ** 5 B = N + 20000 self.datetime_left = DatetimeIndex(range(N)) self.datetime_right = DatetimeIndex(range(N, B)) @@ -46,26 +53,22 @@ def time_datetime_difference_disjoint(self): self.datetime_left.difference(self.datetime_right) -class Datetime(object): - - goal_time = 0.2 - +class Datetime: def setup(self): - self.dr = date_range('20000101', freq='D', periods=10000) + self.dr = date_range("20000101", freq="D", periods=10000) def time_is_dates_only(self): self.dr._is_dates_only -class Ops(object): +class Ops: - sample_time = 0.2 - params = ['float', 'int'] - param_names = ['dtype'] + params = ["float", "int"] + param_names = ["dtype"] def setup(self, dtype): - N = 10**6 - indexes = {'int': 'makeIntIndex', 'float': 'makeFloatIndex'} + N = 10 ** 6 + indexes = {"int": "makeIntIndex", "float": "makeFloatIndex"} self.index = getattr(tm, indexes[dtype])(N) def time_add(self, dtype): @@ -84,13 +87,10 @@ def time_modulo(self, dtype): self.index % 2 -class Range(object): - - goal_time = 0.2 - +class Range: def setup(self): - self.idx_inc = RangeIndex(start=0, stop=10**7, step=3) - self.idx_dec = RangeIndex(start=10**7, stop=-1, step=-3) + self.idx_inc = RangeIndex(start=0, stop=10 ** 7, step=3) + self.idx_dec = RangeIndex(start=10 ** 7, stop=-1, step=-3) def time_max(self): self.idx_inc.max() @@ -104,11 +104,14 @@ def time_min(self): def time_min_trivial(self): self.idx_inc.min() + def time_get_loc_inc(self): + self.idx_inc.get_loc(900000) -class IndexAppend(object): + def time_get_loc_dec(self): + self.idx_dec.get_loc(100000) - goal_time = 0.2 +class IndexAppend: def setup(self): N = 10000 @@ -136,21 +139,22 @@ def time_append_obj_list(self): self.obj_idx.append(self.object_idxs) -class Indexing(object): +class Indexing: - goal_time = 0.2 - params = ['String', 'Float', 'Int'] - param_names = ['dtype'] + params = ["String", "Float", "Int"] + param_names = ["dtype"] def setup(self, dtype): - N = 10**6 - self.idx = getattr(tm, 'make{}Index'.format(dtype))(N) + N = 10 ** 6 + self.idx = getattr(tm, "make{}Index".format(dtype))(N) self.array_mask = (np.arange(N) % 3) == 0 self.series_mask = Series(self.array_mask) self.sorted = self.idx.sort_values() half = N // 2 self.non_unique = self.idx[:half].append(self.idx[:half]) - self.non_unique_sorted = self.sorted[:half].append(self.sorted[:half]) + self.non_unique_sorted = ( + self.sorted[:half].append(self.sorted[:half]).sort_values() + ) self.key = self.sorted[N // 4] def time_boolean_array(self, dtype): @@ -181,10 +185,8 @@ def time_get_loc_non_unique_sorted(self, dtype): self.non_unique_sorted.get_loc(self.key) -class Float64IndexMethod(object): +class Float64IndexMethod: # GH 13166 - goal_time = 0.2 - def setup(self): N = 100000 a = np.arange(N) @@ -192,3 +194,55 @@ def setup(self): def time_get_loc(self): self.ind.get_loc(0) + + +class IntervalIndexMethod: + # GH 24813 + params = [10 ** 3, 10 ** 5] + + def setup(self, N): + left = np.append(np.arange(N), np.array(0)) + right = np.append(np.arange(1, N + 1), np.array(1)) + self.intv = IntervalIndex.from_arrays(left, right) + self.intv._engine + + self.intv2 = IntervalIndex.from_arrays(left + 1, right + 1) + self.intv2._engine + + self.left = IntervalIndex.from_breaks(np.arange(N)) + self.right = IntervalIndex.from_breaks(np.arange(N - 3, 2 * N - 3)) + + def time_monotonic_inc(self, N): + self.intv.is_monotonic_increasing + + def time_is_unique(self, N): + self.intv.is_unique + + def time_intersection(self, N): + self.left.intersection(self.right) + + def time_intersection_one_duplicate(self, N): + self.intv.intersection(self.right) + + def time_intersection_both_duplicate(self, N): + self.intv.intersection(self.intv2) + + +class GC: + params = [1, 2, 5] + + def create_use_drop(self): + idx = Index(list(range(1000 * 1000))) + idx._engine + + def peakmem_gc_instances(self, N): + try: + gc.disable() + + for _ in range(N): + self.create_use_drop() + finally: + gc.enable() + + +from .pandas_vb_common import setup # noqa: F401 isort:skip diff --git a/asv_bench/benchmarks/indexing.py b/asv_bench/benchmarks/indexing.py index 77e013e1e4fb0..ac35139c1954a 100644 --- a/asv_bench/benchmarks/indexing.py +++ b/asv_bench/benchmarks/indexing.py @@ -1,125 +1,156 @@ import warnings import numpy as np + +from pandas import ( + CategoricalIndex, + DataFrame, + Float64Index, + IndexSlice, + Int64Index, + IntervalIndex, + MultiIndex, + Series, + UInt64Index, + concat, + date_range, + option_context, + period_range, +) import pandas.util.testing as tm -from pandas import (Series, DataFrame, MultiIndex, Int64Index, Float64Index, - IntervalIndex, IndexSlice, concat, date_range) -from .pandas_vb_common import setup, Panel # noqa -class NumericSeriesIndexing(object): +class NumericSeriesIndexing: - goal_time = 0.2 - params = [Int64Index, Float64Index] - param = ['index'] + params = [ + (Int64Index, UInt64Index, Float64Index), + ("unique_monotonic_inc", "nonunique_monotonic_inc"), + ] + param_names = ["index_dtype", "index_structure"] - def setup(self, index): - N = 10**6 - idx = index(range(N)) - self.data = Series(np.random.rand(N), index=idx) + def setup(self, index, index_structure): + N = 10 ** 6 + indices = { + "unique_monotonic_inc": index(range(N)), + "nonunique_monotonic_inc": index( + list(range(55)) + [54] + list(range(55, N - 1)) + ), + } + self.data = Series(np.random.rand(N), index=indices[index_structure]) self.array = np.arange(10000) self.array_list = self.array.tolist() - def time_getitem_scalar(self, index): + def time_getitem_scalar(self, index, index_structure): self.data[800000] - def time_getitem_slice(self, index): + def time_getitem_slice(self, index, index_structure): self.data[:800000] - def time_getitem_list_like(self, index): + def time_getitem_list_like(self, index, index_structure): self.data[[800000]] - def time_getitem_array(self, index): + def time_getitem_array(self, index, index_structure): self.data[self.array] - def time_getitem_lists(self, index): + def time_getitem_lists(self, index, index_structure): self.data[self.array_list] - def time_iloc_array(self, index): + def time_iloc_array(self, index, index_structure): self.data.iloc[self.array] - def time_iloc_list_like(self, index): + def time_iloc_list_like(self, index, index_structure): self.data.iloc[[800000]] - def time_iloc_scalar(self, index): + def time_iloc_scalar(self, index, index_structure): self.data.iloc[800000] - def time_iloc_slice(self, index): + def time_iloc_slice(self, index, index_structure): self.data.iloc[:800000] - def time_ix_array(self, index): - self.data.ix[self.array] + def time_ix_array(self, index, index_structure): + with warnings.catch_warnings(record=True): + self.data.ix[self.array] - def time_ix_list_like(self, index): - self.data.ix[[800000]] + def time_ix_list_like(self, index, index_structure): + with warnings.catch_warnings(record=True): + self.data.ix[[800000]] - def time_ix_scalar(self, index): - self.data.ix[800000] + def time_ix_scalar(self, index, index_structure): + with warnings.catch_warnings(record=True): + self.data.ix[800000] - def time_ix_slice(self, index): - self.data.ix[:800000] + def time_ix_slice(self, index, index_structure): + with warnings.catch_warnings(record=True): + self.data.ix[:800000] - def time_loc_array(self, index): + def time_loc_array(self, index, index_structure): self.data.loc[self.array] - def time_loc_list_like(self, index): + def time_loc_list_like(self, index, index_structure): self.data.loc[[800000]] - def time_loc_scalar(self, index): + def time_loc_scalar(self, index, index_structure): self.data.loc[800000] - def time_loc_slice(self, index): + def time_loc_slice(self, index, index_structure): self.data.loc[:800000] -class NonNumericSeriesIndexing(object): - - goal_time = 0.2 - params = ['string', 'datetime'] - param_names = ['index'] - - def setup(self, index): - N = 10**5 - indexes = {'string': tm.makeStringIndex(N), - 'datetime': date_range('1900', periods=N, freq='s')} - index = indexes[index] +class NonNumericSeriesIndexing: + + params = [ + ("string", "datetime", "period"), + ("unique_monotonic_inc", "nonunique_monotonic_inc", "non_monotonic"), + ] + param_names = ["index_dtype", "index_structure"] + + def setup(self, index, index_structure): + N = 10 ** 6 + if index == "string": + index = tm.makeStringIndex(N) + elif index == "datetime": + index = date_range("1900", periods=N, freq="s") + elif index == "period": + index = period_range("1900", periods=N, freq="s") + index = index.sort_values() + assert index.is_unique and index.is_monotonic_increasing + if index_structure == "nonunique_monotonic_inc": + index = index.insert(item=index[2], loc=2)[:-1] + elif index_structure == "non_monotonic": + index = index[::2].append(index[1::2]) + assert len(index) == N self.s = Series(np.random.rand(N), index=index) self.lbl = index[80000] + # warm up index mapping + self.s[self.lbl] - def time_getitem_label_slice(self, index): - self.s[:self.lbl] + def time_getitem_label_slice(self, index, index_structure): + self.s[: self.lbl] - def time_getitem_pos_slice(self, index): + def time_getitem_pos_slice(self, index, index_structure): self.s[:80000] - def time_get_value(self, index): - with warnings.catch_warnings(record=True): - self.s.get_value(self.lbl) - - def time_getitem_scalar(self, index): + def time_getitem_scalar(self, index, index_structure): self.s[self.lbl] + def time_getitem_list_like(self, index, index_structure): + self.s[[self.lbl]] -class DataFrameStringIndexing(object): - - goal_time = 0.2 +class DataFrameStringIndexing: def setup(self): index = tm.makeStringIndex(1000) columns = tm.makeStringIndex(30) - self.df = DataFrame(np.random.randn(1000, 30), index=index, - columns=columns) + with warnings.catch_warnings(record=True): + self.df = DataFrame(np.random.randn(1000, 30), index=index, columns=columns) self.idx_scalar = index[100] self.col_scalar = columns[10] self.bool_indexer = self.df[self.col_scalar] > 0 self.bool_obj_indexer = self.bool_indexer.astype(object) - def time_get_value(self): - with warnings.catch_warnings(record=True): - self.df.get_value(self.idx_scalar, self.col_scalar) - def time_ix(self): - self.df.ix[self.idx_scalar, self.col_scalar] + with warnings.catch_warnings(record=True): + self.df.ix[self.idx_scalar, self.col_scalar] def time_loc(self): self.df.loc[self.idx_scalar, self.col_scalar] @@ -134,10 +165,7 @@ def time_boolean_rows_object(self): self.df[self.bool_obj_indexer] -class DataFrameNumericIndexing(object): - - goal_time = 0.2 - +class DataFrameNumericIndexing: def setup(self): self.idx_dupe = np.array(range(30)) * 99 self.df = DataFrame(np.random.randn(10000, 5)) @@ -160,16 +188,17 @@ def time_bool_indexer(self): self.df[self.bool_indexer] -class Take(object): +class Take: - goal_time = 0.2 - params = ['int', 'datetime'] - param_names = ['index'] + params = ["int", "datetime"] + param_names = ["index"] def setup(self, index): N = 100000 - indexes = {'int': Int64Index(np.arange(N)), - 'datetime': date_range('2011-01-01', freq='S', periods=N)} + indexes = { + "int": Int64Index(np.arange(N)), + "datetime": date_range("2011-01-01", freq="S", periods=N), + } index = indexes[index] self.s = Series(np.random.rand(N), index=index) self.indexer = [True, False, True, True, False] * 20000 @@ -178,40 +207,40 @@ def time_take(self, index): self.s.take(self.indexer) -class MultiIndexing(object): - - goal_time = 0.2 - +class MultiIndexing: def setup(self): mi = MultiIndex.from_product([range(1000), range(1000)]) self.s = Series(np.random.randn(1000000), index=mi) self.df = DataFrame(self.s) n = 100000 - self.mdt = DataFrame({'A': np.random.choice(range(10000, 45000, 1000), - n), - 'B': np.random.choice(range(10, 400), n), - 'C': np.random.choice(range(1, 150), n), - 'D': np.random.choice(range(10000, 45000), n), - 'x': np.random.choice(range(400), n), - 'y': np.random.choice(range(25), n)}) + with warnings.catch_warnings(record=True): + self.mdt = DataFrame( + { + "A": np.random.choice(range(10000, 45000, 1000), n), + "B": np.random.choice(range(10, 400), n), + "C": np.random.choice(range(1, 150), n), + "D": np.random.choice(range(10000, 45000), n), + "x": np.random.choice(range(400), n), + "y": np.random.choice(range(25), n), + } + ) self.idx = IndexSlice[20000:30000, 20:30, 35:45, 30000:40000] - self.mdt = self.mdt.set_index(['A', 'B', 'C', 'D']).sort_index() + self.mdt = self.mdt.set_index(["A", "B", "C", "D"]).sort_index() def time_series_ix(self): - self.s.ix[999] + with warnings.catch_warnings(record=True): + self.s.ix[999] def time_frame_ix(self): - self.df.ix[999] + with warnings.catch_warnings(record=True): + self.df.ix[999] def time_index_slice(self): self.mdt.loc[self.idx, :] -class IntervalIndexing(object): - - goal_time = 0.2 - +class IntervalIndexing: def setup_cache(self): idx = IntervalIndex.from_breaks(np.arange(1000001)) monotonic = Series(np.arange(1000000), index=idx) @@ -230,24 +259,50 @@ def time_loc_list(self, monotonic): monotonic.loc[80000:] -class PanelIndexing(object): +class CategoricalIndexIndexing: - goal_time = 0.2 + params = ["monotonic_incr", "monotonic_decr", "non_monotonic"] + param_names = ["index"] - def setup(self): - with warnings.catch_warnings(record=True): - self.p = Panel(np.random.randn(100, 100, 100)) - self.inds = range(0, 100, 10) + def setup(self, index): + N = 10 ** 5 + values = list("a" * N + "b" * N + "c" * N) + indices = { + "monotonic_incr": CategoricalIndex(values), + "monotonic_decr": CategoricalIndex(reversed(values)), + "non_monotonic": CategoricalIndex(list("abc" * N)), + } + self.data = indices[index] - def time_subset(self): - with warnings.catch_warnings(record=True): - self.p.ix[(self.inds, self.inds, self.inds)] + self.int_scalar = 10000 + self.int_list = list(range(10000)) + self.cat_scalar = "b" + self.cat_list = ["a", "c"] -class MethodLookup(object): + def time_getitem_scalar(self, index): + self.data[self.int_scalar] - goal_time = 0.2 + def time_getitem_slice(self, index): + self.data[: self.int_scalar] + def time_getitem_list_like(self, index): + self.data[[self.int_scalar]] + + def time_getitem_list(self, index): + self.data[self.int_list] + + def time_getitem_bool_array(self, index): + self.data[self.data == self.cat_scalar] + + def time_get_loc_scalar(self, index): + self.data.get_loc(self.cat_scalar) + + def time_get_indexer_list(self, index): + self.data.get_indexer(self.cat_list) + + +class MethodLookup: def setup_cache(self): s = Series() return s @@ -256,55 +311,65 @@ def time_lookup_iloc(self, s): s.iloc def time_lookup_ix(self, s): - s.ix + with warnings.catch_warnings(record=True): + s.ix def time_lookup_loc(self, s): s.loc -class GetItemSingleColumn(object): - - goal_time = 0.2 - +class GetItemSingleColumn: def setup(self): - self.df_string_col = DataFrame(np.random.randn(3000, 1), columns=['A']) + self.df_string_col = DataFrame(np.random.randn(3000, 1), columns=["A"]) self.df_int_col = DataFrame(np.random.randn(3000, 1)) def time_frame_getitem_single_column_label(self): - self.df_string_col['A'] + self.df_string_col["A"] def time_frame_getitem_single_column_int(self): self.df_int_col[0] -class AssignTimeseriesIndex(object): - - goal_time = 0.2 - +class AssignTimeseriesIndex: def setup(self): N = 100000 - idx = date_range('1/1/2000', periods=N, freq='H') - self.df = DataFrame(np.random.randn(N, 1), columns=['A'], index=idx) + idx = date_range("1/1/2000", periods=N, freq="H") + self.df = DataFrame(np.random.randn(N, 1), columns=["A"], index=idx) def time_frame_assign_timeseries_index(self): - self.df['date'] = self.df.index + self.df["date"] = self.df.index -class InsertColumns(object): - - goal_time = 0.2 - +class InsertColumns: def setup(self): - self.N = 10**3 + self.N = 10 ** 3 self.df = DataFrame(index=range(self.N)) def time_insert(self): np.random.seed(1234) for i in range(100): - self.df.insert(0, i, np.random.randn(self.N), - allow_duplicates=True) + self.df.insert(0, i, np.random.randn(self.N), allow_duplicates=True) def time_assign_with_setitem(self): np.random.seed(1234) for i in range(100): self.df[i] = np.random.randn(self.N) + + +class ChainIndexing: + + params = [None, "warn"] + param_names = ["mode"] + + def setup(self, mode): + self.N = 1000000 + + def time_chained_indexing(self, mode): + with warnings.catch_warnings(record=True): + with option_context("mode.chained_assignment", mode): + df = DataFrame({"A": np.arange(self.N), "B": "foo"}) + df2 = df[df.A > self.N // 2] + df2["C"] = 1.0 + + +from .pandas_vb_common import setup # noqa: F401 isort:skip diff --git a/asv_bench/benchmarks/indexing_engines.py b/asv_bench/benchmarks/indexing_engines.py new file mode 100644 index 0000000000000..44a22dfa77791 --- /dev/null +++ b/asv_bench/benchmarks/indexing_engines.py @@ -0,0 +1,71 @@ +import numpy as np + +from pandas._libs import index as libindex + + +def _get_numeric_engines(): + engine_names = [ + ("Int64Engine", np.int64), + ("Int32Engine", np.int32), + ("Int16Engine", np.int16), + ("Int8Engine", np.int8), + ("UInt64Engine", np.uint64), + ("UInt32Engine", np.uint32), + ("UInt16engine", np.uint16), + ("UInt8Engine", np.uint8), + ("Float64Engine", np.float64), + ("Float32Engine", np.float32), + ] + return [ + (getattr(libindex, engine_name), dtype) + for engine_name, dtype in engine_names + if hasattr(libindex, engine_name) + ] + + +class NumericEngineIndexing: + + params = [ + _get_numeric_engines(), + ["monotonic_incr", "monotonic_decr", "non_monotonic"], + ] + param_names = ["engine_and_dtype", "index_type"] + + def setup(self, engine_and_dtype, index_type): + engine, dtype = engine_and_dtype + N = 10 ** 5 + values = list([1] * N + [2] * N + [3] * N) + arr = { + "monotonic_incr": np.array(values, dtype=dtype), + "monotonic_decr": np.array(list(reversed(values)), dtype=dtype), + "non_monotonic": np.array([1, 2, 3] * N, dtype=dtype), + }[index_type] + + self.data = engine(lambda: arr, len(arr)) + # code belows avoids populating the mapping etc. while timing. + self.data.get_loc(2) + + def time_get_loc(self, engine_and_dtype, index_type): + self.data.get_loc(2) + + +class ObjectEngineIndexing: + + params = [("monotonic_incr", "monotonic_decr", "non_monotonic")] + param_names = ["index_type"] + + def setup(self, index_type): + N = 10 ** 5 + values = list("a" * N + "b" * N + "c" * N) + arr = { + "monotonic_incr": np.array(values, dtype=object), + "monotonic_decr": np.array(list(reversed(values)), dtype=object), + "non_monotonic": np.array(list("abc") * N, dtype=object), + }[index_type] + + self.data = libindex.ObjectEngine(lambda: arr, len(arr)) + # code belows avoids populating the mapping etc. while timing. + self.data.get_loc("b") + + def time_get_loc(self, index_type): + self.data.get_loc("b") diff --git a/asv_bench/benchmarks/inference.py b/asv_bench/benchmarks/inference.py index 16d9e7cd73cbb..e85b3bd2c7687 100644 --- a/asv_bench/benchmarks/inference.py +++ b/asv_bench/benchmarks/inference.py @@ -1,67 +1,65 @@ import numpy as np -import pandas.util.testing as tm + from pandas import DataFrame, Series, to_numeric +import pandas.util.testing as tm -from .pandas_vb_common import numeric_dtypes, lib, setup # noqa +from .pandas_vb_common import lib, numeric_dtypes -class NumericInferOps(object): +class NumericInferOps: # from GH 7332 - goal_time = 0.2 params = numeric_dtypes - param_names = ['dtype'] + param_names = ["dtype"] def setup(self, dtype): - N = 5 * 10**5 - self.df = DataFrame({'A': np.arange(N).astype(dtype), - 'B': np.arange(N).astype(dtype)}) + N = 5 * 10 ** 5 + self.df = DataFrame( + {"A": np.arange(N).astype(dtype), "B": np.arange(N).astype(dtype)} + ) def time_add(self, dtype): - self.df['A'] + self.df['B'] + self.df["A"] + self.df["B"] def time_subtract(self, dtype): - self.df['A'] - self.df['B'] + self.df["A"] - self.df["B"] def time_multiply(self, dtype): - self.df['A'] * self.df['B'] + self.df["A"] * self.df["B"] def time_divide(self, dtype): - self.df['A'] / self.df['B'] + self.df["A"] / self.df["B"] def time_modulo(self, dtype): - self.df['A'] % self.df['B'] + self.df["A"] % self.df["B"] -class DateInferOps(object): +class DateInferOps: # from GH 7332 - goal_time = 0.2 - def setup_cache(self): - N = 5 * 10**5 - df = DataFrame({'datetime64': np.arange(N).astype('datetime64[ms]')}) - df['timedelta'] = df['datetime64'] - df['datetime64'] + N = 5 * 10 ** 5 + df = DataFrame({"datetime64": np.arange(N).astype("datetime64[ms]")}) + df["timedelta"] = df["datetime64"] - df["datetime64"] return df def time_subtract_datetimes(self, df): - df['datetime64'] - df['datetime64'] + df["datetime64"] - df["datetime64"] def time_timedelta_plus_datetime(self, df): - df['timedelta'] + df['datetime64'] + df["timedelta"] + df["datetime64"] def time_add_timedeltas(self, df): - df['timedelta'] + df['timedelta'] + df["timedelta"] + df["timedelta"] -class ToNumeric(object): +class ToNumeric: - goal_time = 0.2 - params = ['ignore', 'coerce'] - param_names = ['errors'] + params = ["ignore", "coerce"] + param_names = ["errors"] def setup(self, errors): N = 10000 self.float = Series(np.random.randn(N)) - self.numstr = self.float.astype('str') + self.numstr = self.float.astype("str") self.str = Series(tm.makeStringIndex(N)) def time_from_float(self, errors): @@ -74,23 +72,34 @@ def time_from_str(self, errors): to_numeric(self.str, errors=errors) -class ToNumericDowncast(object): +class ToNumericDowncast: - param_names = ['dtype', 'downcast'] - params = [['string-float', 'string-int', 'string-nint', 'datetime64', - 'int-list', 'int32'], - [None, 'integer', 'signed', 'unsigned', 'float']] + param_names = ["dtype", "downcast"] + params = [ + [ + "string-float", + "string-int", + "string-nint", + "datetime64", + "int-list", + "int32", + ], + [None, "integer", "signed", "unsigned", "float"], + ] N = 500000 N2 = int(N / 2) - data_dict = {'string-int': ['1'] * N2 + [2] * N2, - 'string-nint': ['-1'] * N2 + [2] * N2, - 'datetime64': np.repeat(np.array(['1970-01-01', '1970-01-02'], - dtype='datetime64[D]'), N), - 'string-float': ['1.1'] * N2 + [2] * N2, - 'int-list': [1] * N2 + [2] * N2, - 'int32': np.repeat(np.int32(1), N)} + data_dict = { + "string-int": ["1"] * N2 + [2] * N2, + "string-nint": ["-1"] * N2 + [2] * N2, + "datetime64": np.repeat( + np.array(["1970-01-01", "1970-01-02"], dtype="datetime64[D]"), N + ), + "string-float": ["1.1"] * N2 + [2] * N2, + "int-list": [1] * N2 + [2] * N2, + "int32": np.repeat(np.int32(1), N), + } def setup(self, dtype, downcast): self.data = self.data_dict[dtype] @@ -99,11 +108,10 @@ def time_downcast(self, dtype, downcast): to_numeric(self.data, downcast=downcast) -class MaybeConvertNumeric(object): - +class MaybeConvertNumeric: def setup_cache(self): - N = 10**6 - arr = np.repeat([2**63], N) + np.arange(N).astype('uint64') + N = 10 ** 6 + arr = np.repeat([2 ** 63], N) + np.arange(N).astype("uint64") data = arr.astype(object) data[1::2] = arr[1::2].astype(str) data[-1] = -1 @@ -111,3 +119,6 @@ def setup_cache(self): def time_convert(self, data): lib.maybe_convert_numeric(data, set(), coerce_numeric=False) + + +from .pandas_vb_common import setup # noqa: F401 isort:skip diff --git a/asv_bench/benchmarks/io/csv.py b/asv_bench/benchmarks/io/csv.py index 3b7fdc6e2d78c..9b8599b0a1b64 100644 --- a/asv_bench/benchmarks/io/csv.py +++ b/asv_bench/benchmarks/io/csv.py @@ -1,40 +1,42 @@ +from io import StringIO import random -import timeit import string import numpy as np + +from pandas import Categorical, DataFrame, date_range, read_csv, to_datetime import pandas.util.testing as tm -from pandas import DataFrame, Categorical, date_range, read_csv -from pandas.compat import PY2 -from pandas.compat import cStringIO as StringIO -from ..pandas_vb_common import setup, BaseIO # noqa +from ..pandas_vb_common import BaseIO class ToCSV(BaseIO): - goal_time = 0.2 - fname = '__test__.csv' - params = ['wide', 'long', 'mixed'] - param_names = ['kind'] + fname = "__test__.csv" + params = ["wide", "long", "mixed"] + param_names = ["kind"] def setup(self, kind): wide_frame = DataFrame(np.random.randn(3000, 30)) - long_frame = DataFrame({'A': np.arange(50000), - 'B': np.arange(50000) + 1., - 'C': np.arange(50000) + 2., - 'D': np.arange(50000) + 3.}) - mixed_frame = DataFrame({'float': np.random.randn(5000), - 'int': np.random.randn(5000).astype(int), - 'bool': (np.arange(5000) % 2) == 0, - 'datetime': date_range('2001', - freq='s', - periods=5000), - 'object': ['foo'] * 5000}) - mixed_frame.loc[30:500, 'float'] = np.nan - data = {'wide': wide_frame, - 'long': long_frame, - 'mixed': mixed_frame} + long_frame = DataFrame( + { + "A": np.arange(50000), + "B": np.arange(50000) + 1.0, + "C": np.arange(50000) + 2.0, + "D": np.arange(50000) + 3.0, + } + ) + mixed_frame = DataFrame( + { + "float": np.random.randn(5000), + "int": np.random.randn(5000).astype(int), + "bool": (np.arange(5000) % 2) == 0, + "datetime": date_range("2001", freq="s", periods=5000), + "object": ["foo"] * 5000, + } + ) + mixed_frame.loc[30:500, "float"] = np.nan + data = {"wide": wide_frame, "long": long_frame, "mixed": mixed_frame} self.df = data[kind] def time_frame(self, kind): @@ -43,119 +45,156 @@ def time_frame(self, kind): class ToCSVDatetime(BaseIO): - goal_time = 0.2 - fname = '__test__.csv' + fname = "__test__.csv" def setup(self): - rng = date_range('1/1/2000', periods=1000) + rng = date_range("1/1/2000", periods=1000) self.data = DataFrame(rng, index=rng) def time_frame_date_formatting(self): - self.data.to_csv(self.fname, date_format='%Y%m%d') + self.data.to_csv(self.fname, date_format="%Y%m%d") + + +class ToCSVDatetimeBig(BaseIO): + + fname = "__test__.csv" + timeout = 1500 + params = [1000, 10000, 100000] + param_names = ["obs"] + + def setup(self, obs): + d = "2018-11-29" + dt = "2018-11-26 11:18:27.0" + self.data = DataFrame( + { + "dt": [np.datetime64(dt)] * obs, + "d": [np.datetime64(d)] * obs, + "r": [np.random.uniform()] * obs, + } + ) + def time_frame(self, obs): + self.data.to_csv(self.fname) -class ReadCSVDInferDatetimeFormat(object): - goal_time = 0.2 - params = ([True, False], ['custom', 'iso8601', 'ymd']) - param_names = ['infer_datetime_format', 'format'] +class StringIORewind: + def data(self, stringio_object): + stringio_object.seek(0) + return stringio_object + + +class ReadCSVDInferDatetimeFormat(StringIORewind): + + params = ([True, False], ["custom", "iso8601", "ymd"]) + param_names = ["infer_datetime_format", "format"] def setup(self, infer_datetime_format, format): - rng = date_range('1/1/2000', periods=1000) - formats = {'custom': '%m/%d/%Y %H:%M:%S.%f', - 'iso8601': '%Y-%m-%d %H:%M:%S', - 'ymd': '%Y%m%d'} + rng = date_range("1/1/2000", periods=1000) + formats = { + "custom": "%m/%d/%Y %H:%M:%S.%f", + "iso8601": "%Y-%m-%d %H:%M:%S", + "ymd": "%Y%m%d", + } dt_format = formats[format] - self.data = StringIO('\n'.join(rng.strftime(dt_format).tolist())) + self.StringIO_input = StringIO("\n".join(rng.strftime(dt_format).tolist())) def time_read_csv(self, infer_datetime_format, format): - read_csv(self.data, header=None, names=['foo'], parse_dates=['foo'], - infer_datetime_format=infer_datetime_format) + read_csv( + self.data(self.StringIO_input), + header=None, + names=["foo"], + parse_dates=["foo"], + infer_datetime_format=infer_datetime_format, + ) + + +class ReadCSVConcatDatetime(StringIORewind): + + iso8601 = "%Y-%m-%d %H:%M:%S" + + def setup(self): + rng = date_range("1/1/2000", periods=50000, freq="S") + self.StringIO_input = StringIO("\n".join(rng.strftime(self.iso8601).tolist())) + + def time_read_csv(self): + read_csv( + self.data(self.StringIO_input), + header=None, + names=["foo"], + parse_dates=["foo"], + infer_datetime_format=False, + ) + + +class ReadCSVConcatDatetimeBadDateValue(StringIORewind): + + params = (["nan", "0", ""],) + param_names = ["bad_date_value"] + + def setup(self, bad_date_value): + self.StringIO_input = StringIO(("%s,\n" % bad_date_value) * 50000) + + def time_read_csv(self, bad_date_value): + read_csv( + self.data(self.StringIO_input), + header=None, + names=["foo", "bar"], + parse_dates=["foo"], + infer_datetime_format=False, + ) class ReadCSVSkipRows(BaseIO): - goal_time = 0.2 - fname = '__test__.csv' + fname = "__test__.csv" params = [None, 10000] - param_names = ['skiprows'] + param_names = ["skiprows"] def setup(self, skiprows): N = 20000 index = tm.makeStringIndex(N) - df = DataFrame({'float1': np.random.randn(N), - 'float2': np.random.randn(N), - 'string1': ['foo'] * N, - 'bool1': [True] * N, - 'int1': np.random.randint(0, N, size=N)}, - index=index) + df = DataFrame( + { + "float1": np.random.randn(N), + "float2": np.random.randn(N), + "string1": ["foo"] * N, + "bool1": [True] * N, + "int1": np.random.randint(0, N, size=N), + }, + index=index, + ) df.to_csv(self.fname) def time_skipprows(self, skiprows): read_csv(self.fname, skiprows=skiprows) -class ReadUint64Integers(object): - - goal_time = 0.2 - +class ReadUint64Integers(StringIORewind): def setup(self): - self.na_values = [2**63 + 500] - arr = np.arange(10000).astype('uint64') + 2**63 - self.data1 = StringIO('\n'.join(arr.astype(str).tolist())) + self.na_values = [2 ** 63 + 500] + arr = np.arange(10000).astype("uint64") + 2 ** 63 + self.data1 = StringIO("\n".join(arr.astype(str).tolist())) arr = arr.astype(object) arr[500] = -1 - self.data2 = StringIO('\n'.join(arr.astype(str).tolist())) + self.data2 = StringIO("\n".join(arr.astype(str).tolist())) def time_read_uint64(self): - read_csv(self.data1, header=None, names=['foo']) + read_csv(self.data(self.data1), header=None, names=["foo"]) def time_read_uint64_neg_values(self): - read_csv(self.data2, header=None, names=['foo']) + read_csv(self.data(self.data2), header=None, names=["foo"]) def time_read_uint64_na_values(self): - read_csv(self.data1, header=None, names=['foo'], - na_values=self.na_values) - - -class S3(object): - # Make sure that we can read part of a file from S3 without - # needing to download the entire thing. Use the timeit.default_timer - # to measure wall time instead of CPU time -- we want to see - # how long it takes to download the data. - timer = timeit.default_timer - params = ([None, "gzip", "bz2"], ["python", "c"]) - param_names = ["compression", "engine"] - - def setup(self, compression, engine): - if compression == "bz2" and engine == "c" and PY2: - # The Python 2 C parser can't read bz2 from open files. - raise NotImplementedError - try: - import s3fs # noqa - except ImportError: - # Skip these benchmarks if `boto` is not installed. - raise NotImplementedError - - ext = "" - if compression == "gzip": - ext = ".gz" - elif compression == "bz2": - ext = ".bz2" - self.big_fname = "s3://pandas-test/large_random.csv" + ext - - def time_read_csv_10_rows(self, compression, engine): - # Read a small number of rows from a huge (100,000 x 50) table. - read_csv(self.big_fname, nrows=10, compression=compression, - engine=engine) + read_csv( + self.data(self.data1), header=None, names=["foo"], na_values=self.na_values + ) class ReadCSVThousands(BaseIO): - goal_time = 0.2 - fname = '__test__.csv' - params = ([',', '|'], [None, ',']) - param_names = ['sep', 'thousands'] + fname = "__test__.csv" + params = ([",", "|"], [None, ","]) + param_names = ["sep", "thousands"] def setup(self, sep, thousands): N = 10000 @@ -163,8 +202,8 @@ def setup(self, sep, thousands): data = np.random.randn(N, K) * np.random.randint(100, 10000, (N, K)) df = DataFrame(data) if thousands is not None: - fmt = ':{}'.format(thousands) - fmt = '{' + fmt + '}' + fmt = ":{}".format(thousands) + fmt = "{" + fmt + "}" df = df.applymap(lambda x: fmt.format(x)) df.to_csv(self.fname, sep=sep) @@ -172,63 +211,69 @@ def time_thousands(self, sep, thousands): read_csv(self.fname, sep=sep, thousands=thousands) -class ReadCSVComment(object): - - goal_time = 0.2 - +class ReadCSVComment(StringIORewind): def setup(self): - data = ['A,B,C'] + (['1,2,3 # comment'] * 100000) - self.s_data = StringIO('\n'.join(data)) + data = ["A,B,C"] + (["1,2,3 # comment"] * 100000) + self.StringIO_input = StringIO("\n".join(data)) def time_comment(self): - read_csv(self.s_data, comment='#', header=None, names=list('abc')) + read_csv( + self.data(self.StringIO_input), comment="#", header=None, names=list("abc") + ) -class ReadCSVFloatPrecision(object): +class ReadCSVFloatPrecision(StringIORewind): - goal_time = 0.2 - params = ([',', ';'], ['.', '_'], [None, 'high', 'round_trip']) - param_names = ['sep', 'decimal', 'float_precision'] + params = ([",", ";"], [".", "_"], [None, "high", "round_trip"]) + param_names = ["sep", "decimal", "float_precision"] def setup(self, sep, decimal, float_precision): - floats = [''.join(random.choice(string.digits) for _ in range(28)) - for _ in range(15)] - rows = sep.join(['0{}'.format(decimal) + '{}'] * 3) + '\n' + floats = [ + "".join(random.choice(string.digits) for _ in range(28)) for _ in range(15) + ] + rows = sep.join(["0{}".format(decimal) + "{}"] * 3) + "\n" data = rows * 5 data = data.format(*floats) * 200 # 1000 x 3 strings csv - self.s_data = StringIO(data) + self.StringIO_input = StringIO(data) def time_read_csv(self, sep, decimal, float_precision): - read_csv(self.s_data, sep=sep, header=None, names=list('abc'), - float_precision=float_precision) + read_csv( + self.data(self.StringIO_input), + sep=sep, + header=None, + names=list("abc"), + float_precision=float_precision, + ) def time_read_csv_python_engine(self, sep, decimal, float_precision): - read_csv(self.s_data, sep=sep, header=None, engine='python', - float_precision=None, names=list('abc')) + read_csv( + self.data(self.StringIO_input), + sep=sep, + header=None, + engine="python", + float_precision=None, + names=list("abc"), + ) class ReadCSVCategorical(BaseIO): - goal_time = 0.2 - fname = '__test__.csv' + fname = "__test__.csv" def setup(self): N = 100000 - group1 = ['aaaaaaaa', 'bbbbbbb', 'cccccccc', 'dddddddd', 'eeeeeeee'] - df = DataFrame(np.random.choice(group1, (N, 3)), columns=list('abc')) + group1 = ["aaaaaaaa", "bbbbbbb", "cccccccc", "dddddddd", "eeeeeeee"] + df = DataFrame(np.random.choice(group1, (N, 3)), columns=list("abc")) df.to_csv(self.fname, index=False) def time_convert_post(self): read_csv(self.fname).apply(Categorical) def time_convert_direct(self): - read_csv(self.fname, dtype='category') - + read_csv(self.fname, dtype="category") -class ReadCSVParseDates(object): - - goal_time = 0.2 +class ReadCSVParseDates(StringIORewind): def setup(self): data = """{},19:00:00,18:56:00,0.8100,2.8100,7.2000,0.0000,280.0000\n {},20:00:00,19:56:00,0.0100,2.2100,7.2000,0.0000,260.0000\n @@ -236,14 +281,130 @@ def setup(self): {},21:00:00,21:18:00,-0.9900,2.0100,3.6000,0.0000,270.0000\n {},22:00:00,21:56:00,-0.5900,1.7100,5.1000,0.0000,290.0000\n """ - two_cols = ['KORD,19990127'] * 5 + two_cols = ["KORD,19990127"] * 5 data = data.format(*two_cols) - self.s_data = StringIO(data) + self.StringIO_input = StringIO(data) def time_multiple_date(self): - read_csv(self.s_data, sep=',', header=None, - names=list(string.digits[:9]), parse_dates=[[1, 2], [1, 3]]) + read_csv( + self.data(self.StringIO_input), + sep=",", + header=None, + names=list(string.digits[:9]), + parse_dates=[[1, 2], [1, 3]], + ) def time_baseline(self): - read_csv(self.s_data, sep=',', header=None, parse_dates=[1], - names=list(string.digits[:9])) + read_csv( + self.data(self.StringIO_input), + sep=",", + header=None, + parse_dates=[1], + names=list(string.digits[:9]), + ) + + +class ReadCSVCachedParseDates(StringIORewind): + params = ([True, False],) + param_names = ["do_cache"] + + def setup(self, do_cache): + data = ( + "\n".join("10/{}".format(year) for year in range(2000, 2100)) + "\n" + ) * 10 + self.StringIO_input = StringIO(data) + + def time_read_csv_cached(self, do_cache): + try: + read_csv( + self.data(self.StringIO_input), + header=None, + parse_dates=[0], + cache_dates=do_cache, + ) + except TypeError: + # cache_dates is a new keyword in 0.25 + pass + + +class ReadCSVMemoryGrowth(BaseIO): + + chunksize = 20 + num_rows = 1000 + fname = "__test__.csv" + + def setup(self): + with open(self.fname, "w") as f: + for i in range(self.num_rows): + f.write("{i}\n".format(i=i)) + + def mem_parser_chunks(self): + # see gh-24805. + result = read_csv(self.fname, chunksize=self.chunksize) + + for _ in result: + pass + + +class ReadCSVParseSpecialDate(StringIORewind): + params = (["mY", "mdY", "hm"],) + param_names = ["value"] + objects = { + "mY": "01-2019\n10-2019\n02/2000\n", + "mdY": "12/02/2010\n", + "hm": "21:34\n", + } + + def setup(self, value): + count_elem = 10000 + data = self.objects[value] * count_elem + self.StringIO_input = StringIO(data) + + def time_read_special_date(self, value): + read_csv( + self.data(self.StringIO_input), + sep=",", + header=None, + names=["Date"], + parse_dates=["Date"], + ) + + +class ParseDateComparison(StringIORewind): + params = ([False, True],) + param_names = ["cache_dates"] + + def setup(self, cache_dates): + count_elem = 10000 + data = "12-02-2010\n" * count_elem + self.StringIO_input = StringIO(data) + + def time_read_csv_dayfirst(self, cache_dates): + try: + read_csv( + self.data(self.StringIO_input), + sep=",", + header=None, + names=["Date"], + parse_dates=["Date"], + cache_dates=cache_dates, + dayfirst=True, + ) + except TypeError: + # cache_dates is a new keyword in 0.25 + pass + + def time_to_datetime_dayfirst(self, cache_dates): + df = read_csv( + self.data(self.StringIO_input), dtype={"date": str}, names=["date"] + ) + to_datetime(df["date"], cache=cache_dates, dayfirst=True) + + def time_to_datetime_format_DD_MM_YYYY(self, cache_dates): + df = read_csv( + self.data(self.StringIO_input), dtype={"date": str}, names=["date"] + ) + to_datetime(df["date"], cache=cache_dates, format="%d-%m-%Y") + + +from ..pandas_vb_common import setup # noqa: F401 isort:skip diff --git a/asv_bench/benchmarks/io/excel.py b/asv_bench/benchmarks/io/excel.py index a7c6c43d15026..c97cf768e27d9 100644 --- a/asv_bench/benchmarks/io/excel.py +++ b/asv_bench/benchmarks/io/excel.py @@ -1,37 +1,72 @@ +from io import BytesIO + import numpy as np -from pandas import DataFrame, date_range, ExcelWriter, read_excel -from pandas.compat import BytesIO +from odf.opendocument import OpenDocumentSpreadsheet +from odf.table import Table, TableCell, TableRow +from odf.text import P + +from pandas import DataFrame, ExcelWriter, date_range, read_excel import pandas.util.testing as tm -from ..pandas_vb_common import BaseIO, setup # noqa +def _generate_dataframe(): + N = 2000 + C = 5 + df = DataFrame( + np.random.randn(N, C), + columns=["float{}".format(i) for i in range(C)], + index=date_range("20000101", periods=N, freq="H"), + ) + df["object"] = tm.makeStringIndex(N) + return df -class Excel(object): - goal_time = 0.2 - params = ['openpyxl', 'xlsxwriter', 'xlwt'] - param_names = ['engine'] +class WriteExcel: + + params = ["openpyxl", "xlsxwriter", "xlwt"] + param_names = ["engine"] def setup(self, engine): - N = 2000 - C = 5 - self.df = DataFrame(np.random.randn(N, C), - columns=['float{}'.format(i) for i in range(C)], - index=date_range('20000101', periods=N, freq='H')) - self.df['object'] = tm.makeStringIndex(N) - self.bio_read = BytesIO() - self.writer_read = ExcelWriter(self.bio_read, engine=engine) - self.df.to_excel(self.writer_read, sheet_name='Sheet1') - self.writer_read.save() - self.bio_read.seek(0) - - self.bio_write = BytesIO() - self.bio_write.seek(0) - self.writer_write = ExcelWriter(self.bio_write, engine=engine) + self.df = _generate_dataframe() + + def time_write_excel(self, engine): + bio = BytesIO() + bio.seek(0) + writer = ExcelWriter(bio, engine=engine) + self.df.to_excel(writer, sheet_name="Sheet1") + writer.save() + + +class ReadExcel: + + params = ["xlrd", "openpyxl", "odf"] + param_names = ["engine"] + fname_excel = "spreadsheet.xlsx" + fname_odf = "spreadsheet.ods" + + def _create_odf(self): + doc = OpenDocumentSpreadsheet() + table = Table(name="Table1") + for row in self.df.values: + tr = TableRow() + for val in row: + tc = TableCell(valuetype="string") + tc.addElement(P(text=val)) + tr.addElement(tc) + table.addElement(tr) + + doc.spreadsheet.addElement(table) + doc.save(self.fname_odf) + + def setup_cache(self): + self.df = _generate_dataframe() + + self.df.to_excel(self.fname_excel, sheet_name="Sheet1") + self._create_odf() def time_read_excel(self, engine): - read_excel(self.bio_read) + fname = self.fname_odf if engine == "odf" else self.fname_excel + read_excel(fname, engine=engine) - def time_write_excel(self, engine): - self.df.to_excel(self.writer_write, sheet_name='Sheet1') - self.writer_write.save() + +from ..pandas_vb_common import setup # noqa: F401 isort:skip diff --git a/asv_bench/benchmarks/io/hdf.py b/asv_bench/benchmarks/io/hdf.py index 4b6e1d69af92d..8ec04a2087f1b 100644 --- a/asv_bench/benchmarks/io/hdf.py +++ b/asv_bench/benchmarks/io/hdf.py @@ -1,95 +1,98 @@ -import warnings - import numpy as np -from pandas import DataFrame, Panel, date_range, HDFStore, read_hdf + +from pandas import DataFrame, HDFStore, date_range, read_hdf import pandas.util.testing as tm -from ..pandas_vb_common import BaseIO, setup # noqa +from ..pandas_vb_common import BaseIO class HDFStoreDataFrame(BaseIO): - - goal_time = 0.2 - def setup(self): N = 25000 index = tm.makeStringIndex(N) - self.df = DataFrame({'float1': np.random.randn(N), - 'float2': np.random.randn(N)}, - index=index) - self.df_mixed = DataFrame({'float1': np.random.randn(N), - 'float2': np.random.randn(N), - 'string1': ['foo'] * N, - 'bool1': [True] * N, - 'int1': np.random.randint(0, N, size=N)}, - index=index) + self.df = DataFrame( + {"float1": np.random.randn(N), "float2": np.random.randn(N)}, index=index + ) + self.df_mixed = DataFrame( + { + "float1": np.random.randn(N), + "float2": np.random.randn(N), + "string1": ["foo"] * N, + "bool1": [True] * N, + "int1": np.random.randint(0, N, size=N), + }, + index=index, + ) self.df_wide = DataFrame(np.random.randn(N, 100)) self.start_wide = self.df_wide.index[10000] self.stop_wide = self.df_wide.index[15000] - self.df2 = DataFrame({'float1': np.random.randn(N), - 'float2': np.random.randn(N)}, - index=date_range('1/1/2000', periods=N)) + self.df2 = DataFrame( + {"float1": np.random.randn(N), "float2": np.random.randn(N)}, + index=date_range("1/1/2000", periods=N), + ) self.start = self.df2.index[10000] self.stop = self.df2.index[15000] - self.df_wide2 = DataFrame(np.random.randn(N, 100), - index=date_range('1/1/2000', periods=N)) - self.df_dc = DataFrame(np.random.randn(N, 10), - columns=['C%03d' % i for i in range(10)]) + self.df_wide2 = DataFrame( + np.random.randn(N, 100), index=date_range("1/1/2000", periods=N) + ) + self.df_dc = DataFrame( + np.random.randn(N, 10), columns=["C%03d" % i for i in range(10)] + ) - self.fname = '__test__.h5' + self.fname = "__test__.h5" self.store = HDFStore(self.fname) - self.store.put('fixed', self.df) - self.store.put('fixed_mixed', self.df_mixed) - self.store.append('table', self.df2) - self.store.append('table_mixed', self.df_mixed) - self.store.append('table_wide', self.df_wide) - self.store.append('table_wide2', self.df_wide2) + self.store.put("fixed", self.df) + self.store.put("fixed_mixed", self.df_mixed) + self.store.append("table", self.df2) + self.store.append("table_mixed", self.df_mixed) + self.store.append("table_wide", self.df_wide) + self.store.append("table_wide2", self.df_wide2) def teardown(self): self.store.close() self.remove(self.fname) def time_read_store(self): - self.store.get('fixed') + self.store.get("fixed") def time_read_store_mixed(self): - self.store.get('fixed_mixed') + self.store.get("fixed_mixed") def time_write_store(self): - self.store.put('fixed_write', self.df) + self.store.put("fixed_write", self.df) def time_write_store_mixed(self): - self.store.put('fixed_mixed_write', self.df_mixed) + self.store.put("fixed_mixed_write", self.df_mixed) def time_read_store_table_mixed(self): - self.store.select('table_mixed') + self.store.select("table_mixed") def time_write_store_table_mixed(self): - self.store.append('table_mixed_write', self.df_mixed) + self.store.append("table_mixed_write", self.df_mixed) def time_read_store_table(self): - self.store.select('table') + self.store.select("table") def time_write_store_table(self): - self.store.append('table_write', self.df) + self.store.append("table_write", self.df) def time_read_store_table_wide(self): - self.store.select('table_wide') + self.store.select("table_wide") def time_write_store_table_wide(self): - self.store.append('table_wide_write', self.df_wide) + self.store.append("table_wide_write", self.df_wide) def time_write_store_table_dc(self): - self.store.append('table_dc_write', self.df_dc, data_columns=True) + self.store.append("table_dc_write", self.df_dc, data_columns=True) def time_query_store_table_wide(self): - self.store.select('table_wide', where="index > self.start_wide and " - "index < self.stop_wide") + self.store.select( + "table_wide", where="index > self.start_wide and " "index < self.stop_wide" + ) def time_query_store_table(self): - self.store.select('table', where="index > self.start and " - "index < self.stop") + self.store.select("table", where="index > self.start and " "index < self.stop") def time_store_repr(self): repr(self.store) @@ -101,51 +104,28 @@ def time_store_info(self): self.store.info() -class HDFStorePanel(BaseIO): - - goal_time = 0.2 - - def setup(self): - self.fname = '__test__.h5' - with warnings.catch_warnings(record=True): - self.p = Panel(np.random.randn(20, 1000, 25), - items=['Item%03d' % i for i in range(20)], - major_axis=date_range('1/1/2000', periods=1000), - minor_axis=['E%03d' % i for i in range(25)]) - self.store = HDFStore(self.fname) - self.store.append('p1', self.p) - - def teardown(self): - self.store.close() - self.remove(self.fname) - - def time_read_store_table_panel(self): - with warnings.catch_warnings(record=True): - self.store.select('p1') - - def time_write_store_table_panel(self): - with warnings.catch_warnings(record=True): - self.store.append('p2', self.p) - - class HDF(BaseIO): - goal_time = 0.2 - params = ['table', 'fixed'] - param_names = ['format'] + params = ["table", "fixed"] + param_names = ["format"] def setup(self, format): - self.fname = '__test__.h5' + self.fname = "__test__.h5" N = 100000 C = 5 - self.df = DataFrame(np.random.randn(N, C), - columns=['float{}'.format(i) for i in range(C)], - index=date_range('20000101', periods=N, freq='H')) - self.df['object'] = tm.makeStringIndex(N) - self.df.to_hdf(self.fname, 'df', format=format) + self.df = DataFrame( + np.random.randn(N, C), + columns=["float{}".format(i) for i in range(C)], + index=date_range("20000101", periods=N, freq="H"), + ) + self.df["object"] = tm.makeStringIndex(N) + self.df.to_hdf(self.fname, "df", format=format) def time_read_hdf(self, format): - read_hdf(self.fname, 'df') + read_hdf(self.fname, "df") def time_write_hdf(self, format): - self.df.to_hdf(self.fname, 'df', format=format) + self.df.to_hdf(self.fname, "df", format=format) + + +from ..pandas_vb_common import setup # noqa: F401 isort:skip diff --git a/asv_bench/benchmarks/io/json.py b/asv_bench/benchmarks/io/json.py index acfdd327c3b51..5c1d39776b91c 100644 --- a/asv_bench/benchmarks/io/json.py +++ b/asv_bench/benchmarks/io/json.py @@ -1,24 +1,28 @@ import numpy as np + +from pandas import DataFrame, concat, date_range, read_json, timedelta_range import pandas.util.testing as tm -from pandas import DataFrame, date_range, timedelta_range, concat, read_json -from ..pandas_vb_common import setup, BaseIO # noqa +from ..pandas_vb_common import BaseIO class ReadJSON(BaseIO): - goal_time = 0.2 fname = "__test__.json" - params = (['split', 'index', 'records'], ['int', 'datetime']) - param_names = ['orient', 'index'] + params = (["split", "index", "records"], ["int", "datetime"]) + param_names = ["orient", "index"] def setup(self, orient, index): N = 100000 - indexes = {'int': np.arange(N), - 'datetime': date_range('20000101', periods=N, freq='H')} - df = DataFrame(np.random.randn(N, 5), - columns=['float_{}'.format(i) for i in range(5)], - index=indexes[index]) + indexes = { + "int": np.arange(N), + "datetime": date_range("20000101", periods=N, freq="H"), + } + df = DataFrame( + np.random.randn(N, 5), + columns=["float_{}".format(i) for i in range(5)], + index=indexes[index], + ) df.to_json(self.fname, orient=orient) def time_read_json(self, orient, index): @@ -27,101 +31,188 @@ def time_read_json(self, orient, index): class ReadJSONLines(BaseIO): - goal_time = 0.2 fname = "__test_lines__.json" - params = ['int', 'datetime'] - param_names = ['index'] + params = ["int", "datetime"] + param_names = ["index"] def setup(self, index): N = 100000 - indexes = {'int': np.arange(N), - 'datetime': date_range('20000101', periods=N, freq='H')} - df = DataFrame(np.random.randn(N, 5), - columns=['float_{}'.format(i) for i in range(5)], - index=indexes[index]) - df.to_json(self.fname, orient='records', lines=True) + indexes = { + "int": np.arange(N), + "datetime": date_range("20000101", periods=N, freq="H"), + } + df = DataFrame( + np.random.randn(N, 5), + columns=["float_{}".format(i) for i in range(5)], + index=indexes[index], + ) + df.to_json(self.fname, orient="records", lines=True) def time_read_json_lines(self, index): - read_json(self.fname, orient='records', lines=True) + read_json(self.fname, orient="records", lines=True) def time_read_json_lines_concat(self, index): - concat(read_json(self.fname, orient='records', lines=True, - chunksize=25000)) + concat(read_json(self.fname, orient="records", lines=True, chunksize=25000)) def peakmem_read_json_lines(self, index): - read_json(self.fname, orient='records', lines=True) + read_json(self.fname, orient="records", lines=True) def peakmem_read_json_lines_concat(self, index): - concat(read_json(self.fname, orient='records', lines=True, - chunksize=25000)) + concat(read_json(self.fname, orient="records", lines=True, chunksize=25000)) class ToJSON(BaseIO): - goal_time = 0.2 fname = "__test__.json" - params = ['split', 'columns', 'index'] - param_names = ['orient'] + params = [ + ["split", "columns", "index", "values", "records"], + ["df", "df_date_idx", "df_td_int_ts", "df_int_floats", "df_int_float_str"], + ] + param_names = ["orient", "frame"] + + def setup(self, orient, frame): + N = 10 ** 5 + ncols = 5 + index = date_range("20000101", periods=N, freq="H") + timedeltas = timedelta_range(start=1, periods=N, freq="s") + datetimes = date_range(start=1, periods=N, freq="s") + ints = np.random.randint(100000000, size=N) + floats = np.random.randn(N) + strings = tm.makeStringIndex(N) + self.df = DataFrame(np.random.randn(N, ncols), index=np.arange(N)) + self.df_date_idx = DataFrame(np.random.randn(N, ncols), index=index) + self.df_td_int_ts = DataFrame( + { + "td_1": timedeltas, + "td_2": timedeltas, + "int_1": ints, + "int_2": ints, + "ts_1": datetimes, + "ts_2": datetimes, + }, + index=index, + ) + self.df_int_floats = DataFrame( + { + "int_1": ints, + "int_2": ints, + "int_3": ints, + "float_1": floats, + "float_2": floats, + "float_3": floats, + }, + index=index, + ) + self.df_int_float_str = DataFrame( + { + "int_1": ints, + "int_2": ints, + "float_1": floats, + "float_2": floats, + "str_1": strings, + "str_2": strings, + }, + index=index, + ) + + def time_to_json(self, orient, frame): + getattr(self, frame).to_json(self.fname, orient=orient) + + def peakmem_to_json(self, orient, frame): + getattr(self, frame).to_json(self.fname, orient=orient) + + def time_to_json_wide(self, orient, frame): + base_df = getattr(self, frame).copy() + df = concat([base_df.iloc[:100]] * 1000, ignore_index=True, axis=1) + df.to_json(self.fname, orient=orient) + + def peakmem_to_json_wide(self, orient, frame): + base_df = getattr(self, frame).copy() + df = concat([base_df.iloc[:100]] * 1000, ignore_index=True, axis=1) + df.to_json(self.fname, orient=orient) + + +class ToJSONLines(BaseIO): + + fname = "__test__.json" - def setup(self, lines_orient): - N = 10**5 + def setup(self): + N = 10 ** 5 ncols = 5 - index = date_range('20000101', periods=N, freq='H') - timedeltas = timedelta_range(start=1, periods=N, freq='s') - datetimes = date_range(start=1, periods=N, freq='s') + index = date_range("20000101", periods=N, freq="H") + timedeltas = timedelta_range(start=1, periods=N, freq="s") + datetimes = date_range(start=1, periods=N, freq="s") ints = np.random.randint(100000000, size=N) floats = np.random.randn(N) strings = tm.makeStringIndex(N) self.df = DataFrame(np.random.randn(N, ncols), index=np.arange(N)) self.df_date_idx = DataFrame(np.random.randn(N, ncols), index=index) - self.df_td_int_ts = DataFrame({'td_1': timedeltas, - 'td_2': timedeltas, - 'int_1': ints, - 'int_2': ints, - 'ts_1': datetimes, - 'ts_2': datetimes}, - index=index) - self.df_int_floats = DataFrame({'int_1': ints, - 'int_2': ints, - 'int_3': ints, - 'float_1': floats, - 'float_2': floats, - 'float_3': floats}, - index=index) - self.df_int_float_str = DataFrame({'int_1': ints, - 'int_2': ints, - 'float_1': floats, - 'float_2': floats, - 'str_1': strings, - 'str_2': strings}, - index=index) - - def time_floats_with_int_index(self, orient): - self.df.to_json(self.fname, orient=orient) - - def time_floats_with_dt_index(self, orient): - self.df_date_idx.to_json(self.fname, orient=orient) - - def time_delta_int_tstamp(self, orient): - self.df_td_int_ts.to_json(self.fname, orient=orient) - - def time_float_int(self, orient): - self.df_int_floats.to_json(self.fname, orient=orient) - - def time_float_int_str(self, orient): - self.df_int_float_str.to_json(self.fname, orient=orient) - - def time_floats_with_int_idex_lines(self, orient): - self.df.to_json(self.fname, orient='records', lines=True) - - def time_floats_with_dt_index_lines(self, orient): - self.df_date_idx.to_json(self.fname, orient='records', lines=True) - - def time_delta_int_tstamp_lines(self, orient): - self.df_td_int_ts.to_json(self.fname, orient='records', lines=True) - - def time_float_int_lines(self, orient): - self.df_int_floats.to_json(self.fname, orient='records', lines=True) - - def time_float_int_str_lines(self, orient): - self.df_int_float_str.to_json(self.fname, orient='records', lines=True) + self.df_td_int_ts = DataFrame( + { + "td_1": timedeltas, + "td_2": timedeltas, + "int_1": ints, + "int_2": ints, + "ts_1": datetimes, + "ts_2": datetimes, + }, + index=index, + ) + self.df_int_floats = DataFrame( + { + "int_1": ints, + "int_2": ints, + "int_3": ints, + "float_1": floats, + "float_2": floats, + "float_3": floats, + }, + index=index, + ) + self.df_int_float_str = DataFrame( + { + "int_1": ints, + "int_2": ints, + "float_1": floats, + "float_2": floats, + "str_1": strings, + "str_2": strings, + }, + index=index, + ) + + def time_floats_with_int_idex_lines(self): + self.df.to_json(self.fname, orient="records", lines=True) + + def time_floats_with_dt_index_lines(self): + self.df_date_idx.to_json(self.fname, orient="records", lines=True) + + def time_delta_int_tstamp_lines(self): + self.df_td_int_ts.to_json(self.fname, orient="records", lines=True) + + def time_float_int_lines(self): + self.df_int_floats.to_json(self.fname, orient="records", lines=True) + + def time_float_int_str_lines(self): + self.df_int_float_str.to_json(self.fname, orient="records", lines=True) + + +class ToJSONMem: + def setup_cache(self): + df = DataFrame([[1]]) + frames = {"int": df, "float": df.astype(float)} + + return frames + + def peakmem_int(self, frames): + df = frames["int"] + for _ in range(100_000): + df.to_json() + + def peakmem_float(self, frames): + df = frames["float"] + for _ in range(100_000): + df.to_json() + + +from ..pandas_vb_common import setup # noqa: F401 isort:skip diff --git a/asv_bench/benchmarks/io/msgpack.py b/asv_bench/benchmarks/io/msgpack.py index 8ccce01117ca4..f5038602539ab 100644 --- a/asv_bench/benchmarks/io/msgpack.py +++ b/asv_bench/benchmarks/io/msgpack.py @@ -1,26 +1,32 @@ +import warnings + import numpy as np + from pandas import DataFrame, date_range, read_msgpack import pandas.util.testing as tm -from ..pandas_vb_common import BaseIO, setup # noqa +from ..pandas_vb_common import BaseIO class MSGPack(BaseIO): - - goal_time = 0.2 - def setup(self): - self.fname = '__test__.msg' + self.fname = "__test__.msg" N = 100000 C = 5 - self.df = DataFrame(np.random.randn(N, C), - columns=['float{}'.format(i) for i in range(C)], - index=date_range('20000101', periods=N, freq='H')) - self.df['object'] = tm.makeStringIndex(N) - self.df.to_msgpack(self.fname) + self.df = DataFrame( + np.random.randn(N, C), + columns=["float{}".format(i) for i in range(C)], + index=date_range("20000101", periods=N, freq="H"), + ) + self.df["object"] = tm.makeStringIndex(N) + with warnings.catch_warnings(record=True): + self.df.to_msgpack(self.fname) def time_read_msgpack(self): read_msgpack(self.fname) def time_write_msgpack(self): self.df.to_msgpack(self.fname) + + +from ..pandas_vb_common import setup # noqa: F401 isort:skip diff --git a/asv_bench/benchmarks/io/parsers.py b/asv_bench/benchmarks/io/parsers.py new file mode 100644 index 0000000000000..c5e099bd44eac --- /dev/null +++ b/asv_bench/benchmarks/io/parsers.py @@ -0,0 +1,42 @@ +import numpy as np + +try: + from pandas._libs.tslibs.parsing import ( + _concat_date_cols, + _does_string_look_like_datetime, + ) +except ImportError: + # Avoid whole benchmark suite import failure on asv (currently 0.4) + pass + + +class DoesStringLookLikeDatetime: + + params = (["2Q2005", "0.0", "10000"],) + param_names = ["value"] + + def setup(self, value): + self.objects = [value] * 1000000 + + def time_check_datetimes(self, value): + for obj in self.objects: + _does_string_look_like_datetime(obj) + + +class ConcatDateCols: + + params = ([1234567890, "AAAA"], [1, 2]) + param_names = ["value", "dim"] + + def setup(self, value, dim): + count_elem = 10000 + if dim == 1: + self.object = (np.array([value] * count_elem),) + if dim == 2: + self.object = ( + np.array([value] * count_elem), + np.array([value] * count_elem), + ) + + def time_check_concat(self, value, dim): + _concat_date_cols(self.object) diff --git a/asv_bench/benchmarks/io/pickle.py b/asv_bench/benchmarks/io/pickle.py index 2ad0fcca6eb26..647e9d27dec9d 100644 --- a/asv_bench/benchmarks/io/pickle.py +++ b/asv_bench/benchmarks/io/pickle.py @@ -1,22 +1,22 @@ import numpy as np + from pandas import DataFrame, date_range, read_pickle import pandas.util.testing as tm -from ..pandas_vb_common import BaseIO, setup # noqa +from ..pandas_vb_common import BaseIO class Pickle(BaseIO): - - goal_time = 0.2 - def setup(self): - self.fname = '__test__.pkl' + self.fname = "__test__.pkl" N = 100000 C = 5 - self.df = DataFrame(np.random.randn(N, C), - columns=['float{}'.format(i) for i in range(C)], - index=date_range('20000101', periods=N, freq='H')) - self.df['object'] = tm.makeStringIndex(N) + self.df = DataFrame( + np.random.randn(N, C), + columns=["float{}".format(i) for i in range(C)], + index=date_range("20000101", periods=N, freq="H"), + ) + self.df["object"] = tm.makeStringIndex(N) self.df.to_pickle(self.fname) def time_read_pickle(self): @@ -24,3 +24,6 @@ def time_read_pickle(self): def time_write_pickle(self): self.df.to_pickle(self.fname) + + +from ..pandas_vb_common import setup # noqa: F401 isort:skip diff --git a/asv_bench/benchmarks/io/sas.py b/asv_bench/benchmarks/io/sas.py index 526c524de7fff..7ce8ef8c12639 100644 --- a/asv_bench/benchmarks/io/sas.py +++ b/asv_bench/benchmarks/io/sas.py @@ -3,18 +3,27 @@ from pandas import read_sas -class SAS(object): +class SAS: - goal_time = 0.2 - params = ['sas7bdat', 'xport'] - param_names = ['format'] + params = ["sas7bdat", "xport"] + param_names = ["format"] def setup(self, format): # Read files that are located in 'pandas/io/tests/sas/data' - files = {'sas7bdat': 'test1.sas7bdat', 'xport': 'paxraw_d_short.xpt'} + files = {"sas7bdat": "test1.sas7bdat", "xport": "paxraw_d_short.xpt"} file = files[format] - paths = [os.path.dirname(__file__), '..', '..', '..', 'pandas', - 'tests', 'io', 'sas', 'data', file] + paths = [ + os.path.dirname(__file__), + "..", + "..", + "..", + "pandas", + "tests", + "io", + "sas", + "data", + file, + ] self.f = os.path.join(*paths) def time_read_msgpack(self, format): diff --git a/asv_bench/benchmarks/io/sql.py b/asv_bench/benchmarks/io/sql.py index ef4e501e5f3b9..fe84c869717e3 100644 --- a/asv_bench/benchmarks/io/sql.py +++ b/asv_bench/benchmarks/io/sql.py @@ -1,132 +1,145 @@ import sqlite3 import numpy as np -import pandas.util.testing as tm -from pandas import DataFrame, date_range, read_sql_query, read_sql_table from sqlalchemy import create_engine -from ..pandas_vb_common import setup # noqa +from pandas import DataFrame, date_range, read_sql_query, read_sql_table +import pandas.util.testing as tm -class SQL(object): +class SQL: - goal_time = 0.2 - params = ['sqlalchemy', 'sqlite'] - param_names = ['connection'] + params = ["sqlalchemy", "sqlite"] + param_names = ["connection"] def setup(self, connection): N = 10000 - con = {'sqlalchemy': create_engine('sqlite:///:memory:'), - 'sqlite': sqlite3.connect(':memory:')} - self.table_name = 'test_type' - self.query_all = 'SELECT * FROM {}'.format(self.table_name) + con = { + "sqlalchemy": create_engine("sqlite:///:memory:"), + "sqlite": sqlite3.connect(":memory:"), + } + self.table_name = "test_type" + self.query_all = "SELECT * FROM {}".format(self.table_name) self.con = con[connection] - self.df = DataFrame({'float': np.random.randn(N), - 'float_with_nan': np.random.randn(N), - 'string': ['foo'] * N, - 'bool': [True] * N, - 'int': np.random.randint(0, N, size=N), - 'datetime': date_range('2000-01-01', - periods=N, - freq='s')}, - index=tm.makeStringIndex(N)) - self.df.loc[1000:3000, 'float_with_nan'] = np.nan - self.df['datetime_string'] = self.df['datetime'].astype(str) - self.df.to_sql(self.table_name, self.con, if_exists='replace') + self.df = DataFrame( + { + "float": np.random.randn(N), + "float_with_nan": np.random.randn(N), + "string": ["foo"] * N, + "bool": [True] * N, + "int": np.random.randint(0, N, size=N), + "datetime": date_range("2000-01-01", periods=N, freq="s"), + }, + index=tm.makeStringIndex(N), + ) + self.df.loc[1000:3000, "float_with_nan"] = np.nan + self.df["datetime_string"] = self.df["datetime"].astype(str) + self.df.to_sql(self.table_name, self.con, if_exists="replace") def time_to_sql_dataframe(self, connection): - self.df.to_sql('test1', self.con, if_exists='replace') + self.df.to_sql("test1", self.con, if_exists="replace") def time_read_sql_query(self, connection): read_sql_query(self.query_all, self.con) -class WriteSQLDtypes(object): +class WriteSQLDtypes: - goal_time = 0.2 - params = (['sqlalchemy', 'sqlite'], - ['float', 'float_with_nan', 'string', 'bool', 'int', 'datetime']) - param_names = ['connection', 'dtype'] + params = ( + ["sqlalchemy", "sqlite"], + ["float", "float_with_nan", "string", "bool", "int", "datetime"], + ) + param_names = ["connection", "dtype"] def setup(self, connection, dtype): N = 10000 - con = {'sqlalchemy': create_engine('sqlite:///:memory:'), - 'sqlite': sqlite3.connect(':memory:')} - self.table_name = 'test_type' - self.query_col = 'SELECT {} FROM {}'.format(dtype, self.table_name) + con = { + "sqlalchemy": create_engine("sqlite:///:memory:"), + "sqlite": sqlite3.connect(":memory:"), + } + self.table_name = "test_type" + self.query_col = "SELECT {} FROM {}".format(dtype, self.table_name) self.con = con[connection] - self.df = DataFrame({'float': np.random.randn(N), - 'float_with_nan': np.random.randn(N), - 'string': ['foo'] * N, - 'bool': [True] * N, - 'int': np.random.randint(0, N, size=N), - 'datetime': date_range('2000-01-01', - periods=N, - freq='s')}, - index=tm.makeStringIndex(N)) - self.df.loc[1000:3000, 'float_with_nan'] = np.nan - self.df['datetime_string'] = self.df['datetime'].astype(str) - self.df.to_sql(self.table_name, self.con, if_exists='replace') + self.df = DataFrame( + { + "float": np.random.randn(N), + "float_with_nan": np.random.randn(N), + "string": ["foo"] * N, + "bool": [True] * N, + "int": np.random.randint(0, N, size=N), + "datetime": date_range("2000-01-01", periods=N, freq="s"), + }, + index=tm.makeStringIndex(N), + ) + self.df.loc[1000:3000, "float_with_nan"] = np.nan + self.df["datetime_string"] = self.df["datetime"].astype(str) + self.df.to_sql(self.table_name, self.con, if_exists="replace") def time_to_sql_dataframe_column(self, connection, dtype): - self.df[[dtype]].to_sql('test1', self.con, if_exists='replace') + self.df[[dtype]].to_sql("test1", self.con, if_exists="replace") def time_read_sql_query_select_column(self, connection, dtype): read_sql_query(self.query_col, self.con) -class ReadSQLTable(object): - - goal_time = 0.2 - +class ReadSQLTable: def setup(self): N = 10000 - self.table_name = 'test' - self.con = create_engine('sqlite:///:memory:') - self.df = DataFrame({'float': np.random.randn(N), - 'float_with_nan': np.random.randn(N), - 'string': ['foo'] * N, - 'bool': [True] * N, - 'int': np.random.randint(0, N, size=N), - 'datetime': date_range('2000-01-01', - periods=N, - freq='s')}, - index=tm.makeStringIndex(N)) - self.df.loc[1000:3000, 'float_with_nan'] = np.nan - self.df['datetime_string'] = self.df['datetime'].astype(str) - self.df.to_sql(self.table_name, self.con, if_exists='replace') + self.table_name = "test" + self.con = create_engine("sqlite:///:memory:") + self.df = DataFrame( + { + "float": np.random.randn(N), + "float_with_nan": np.random.randn(N), + "string": ["foo"] * N, + "bool": [True] * N, + "int": np.random.randint(0, N, size=N), + "datetime": date_range("2000-01-01", periods=N, freq="s"), + }, + index=tm.makeStringIndex(N), + ) + self.df.loc[1000:3000, "float_with_nan"] = np.nan + self.df["datetime_string"] = self.df["datetime"].astype(str) + self.df.to_sql(self.table_name, self.con, if_exists="replace") def time_read_sql_table_all(self): read_sql_table(self.table_name, self.con) def time_read_sql_table_parse_dates(self): - read_sql_table(self.table_name, self.con, columns=['datetime_string'], - parse_dates=['datetime_string']) - + read_sql_table( + self.table_name, + self.con, + columns=["datetime_string"], + parse_dates=["datetime_string"], + ) -class ReadSQLTableDtypes(object): - goal_time = 0.2 +class ReadSQLTableDtypes: - params = ['float', 'float_with_nan', 'string', 'bool', 'int', 'datetime'] - param_names = ['dtype'] + params = ["float", "float_with_nan", "string", "bool", "int", "datetime"] + param_names = ["dtype"] def setup(self, dtype): N = 10000 - self.table_name = 'test' - self.con = create_engine('sqlite:///:memory:') - self.df = DataFrame({'float': np.random.randn(N), - 'float_with_nan': np.random.randn(N), - 'string': ['foo'] * N, - 'bool': [True] * N, - 'int': np.random.randint(0, N, size=N), - 'datetime': date_range('2000-01-01', - periods=N, - freq='s')}, - index=tm.makeStringIndex(N)) - self.df.loc[1000:3000, 'float_with_nan'] = np.nan - self.df['datetime_string'] = self.df['datetime'].astype(str) - self.df.to_sql(self.table_name, self.con, if_exists='replace') + self.table_name = "test" + self.con = create_engine("sqlite:///:memory:") + self.df = DataFrame( + { + "float": np.random.randn(N), + "float_with_nan": np.random.randn(N), + "string": ["foo"] * N, + "bool": [True] * N, + "int": np.random.randint(0, N, size=N), + "datetime": date_range("2000-01-01", periods=N, freq="s"), + }, + index=tm.makeStringIndex(N), + ) + self.df.loc[1000:3000, "float_with_nan"] = np.nan + self.df["datetime_string"] = self.df["datetime"].astype(str) + self.df.to_sql(self.table_name, self.con, if_exists="replace") def time_read_sql_table_column(self, dtype): read_sql_table(self.table_name, self.con, columns=[dtype]) + + +from ..pandas_vb_common import setup # noqa: F401 isort:skip diff --git a/asv_bench/benchmarks/io/stata.py b/asv_bench/benchmarks/io/stata.py index e0f5752ca930f..28829785d72e9 100644 --- a/asv_bench/benchmarks/io/stata.py +++ b/asv_bench/benchmarks/io/stata.py @@ -1,33 +1,37 @@ import numpy as np + from pandas import DataFrame, date_range, read_stata import pandas.util.testing as tm -from ..pandas_vb_common import BaseIO, setup # noqa +from ..pandas_vb_common import BaseIO class Stata(BaseIO): - goal_time = 0.2 - params = ['tc', 'td', 'tm', 'tw', 'th', 'tq', 'ty'] - param_names = ['convert_dates'] + params = ["tc", "td", "tm", "tw", "th", "tq", "ty"] + param_names = ["convert_dates"] def setup(self, convert_dates): - self.fname = '__test__.dta' - N = 100000 - C = 5 - self.df = DataFrame(np.random.randn(N, C), - columns=['float{}'.format(i) for i in range(C)], - index=date_range('20000101', periods=N, freq='H')) - self.df['object'] = tm.makeStringIndex(N) - self.df['int8_'] = np.random.randint(np.iinfo(np.int8).min, - np.iinfo(np.int8).max - 27, N) - self.df['int16_'] = np.random.randint(np.iinfo(np.int16).min, - np.iinfo(np.int16).max - 27, N) - self.df['int32_'] = np.random.randint(np.iinfo(np.int32).min, - np.iinfo(np.int32).max - 27, N) - self.df['float32_'] = np.array(np.random.randn(N), - dtype=np.float32) - self.convert_dates = {'index': convert_dates} + self.fname = "__test__.dta" + N = self.N = 100000 + C = self.C = 5 + self.df = DataFrame( + np.random.randn(N, C), + columns=["float{}".format(i) for i in range(C)], + index=date_range("20000101", periods=N, freq="H"), + ) + self.df["object"] = tm.makeStringIndex(self.N) + self.df["int8_"] = np.random.randint( + np.iinfo(np.int8).min, np.iinfo(np.int8).max - 27, N + ) + self.df["int16_"] = np.random.randint( + np.iinfo(np.int16).min, np.iinfo(np.int16).max - 27, N + ) + self.df["int32_"] = np.random.randint( + np.iinfo(np.int32).min, np.iinfo(np.int32).max - 27, N + ) + self.df["float32_"] = np.array(np.random.randn(N), dtype=np.float32) + self.convert_dates = {"index": convert_dates} self.df.to_stata(self.fname, self.convert_dates) def time_read_stata(self, convert_dates): @@ -35,3 +39,16 @@ def time_read_stata(self, convert_dates): def time_write_stata(self, convert_dates): self.df.to_stata(self.fname, self.convert_dates) + + +class StataMissing(Stata): + def setup(self, convert_dates): + super().setup(convert_dates) + for i in range(10): + missing_data = np.random.randn(self.N) + missing_data[missing_data < 0] = np.nan + self.df["missing_{0}".format(i)] = missing_data + self.df.to_stata(self.fname, self.convert_dates) + + +from ..pandas_vb_common import setup # noqa: F401 isort:skip diff --git a/asv_bench/benchmarks/join_merge.py b/asv_bench/benchmarks/join_merge.py index de0a3b33da147..6aa82a43a4d6a 100644 --- a/asv_bench/benchmarks/join_merge.py +++ b/asv_bench/benchmarks/join_merge.py @@ -1,36 +1,26 @@ -import warnings import string import numpy as np + +from pandas import DataFrame, MultiIndex, Series, concat, date_range, merge, merge_asof import pandas.util.testing as tm -from pandas import (DataFrame, Series, MultiIndex, date_range, concat, merge, - merge_asof) + try: from pandas import merge_ordered except ImportError: from pandas import ordered_merge as merge_ordered -from .pandas_vb_common import Panel, setup # noqa - - -class Append(object): - - goal_time = 0.2 +class Append: def setup(self): - self.df1 = DataFrame(np.random.randn(10000, 4), - columns=['A', 'B', 'C', 'D']) + self.df1 = DataFrame(np.random.randn(10000, 4), columns=["A", "B", "C", "D"]) self.df2 = self.df1.copy() self.df2.index = np.arange(10000, 20000) self.mdf1 = self.df1.copy() - self.mdf1['obj1'] = 'bar' - self.mdf1['obj2'] = 'bar' - self.mdf1['int1'] = 5 - try: - with warnings.catch_warnings(record=True): - self.mdf1.consolidate(inplace=True) - except: - pass + self.mdf1["obj1"] = "bar" + self.mdf1["obj2"] = "bar" + self.mdf1["int1"] = 5 + self.mdf1 = self.mdf1._consolidate() self.mdf2 = self.mdf1.copy() self.mdf2.index = self.df2.index @@ -41,24 +31,25 @@ def time_append_mixed(self): self.mdf1.append(self.mdf2) -class Concat(object): +class Concat: - goal_time = 0.2 params = [0, 1] - param_names = ['axis'] + param_names = ["axis"] def setup(self, axis): N = 1000 s = Series(N, index=tm.makeStringIndex(N)) - self.series = [s[i:- i] for i in range(1, 10)] * 50 + self.series = [s[i:-i] for i in range(1, 10)] * 50 self.small_frames = [DataFrame(np.random.randn(5, 4))] * 1000 - df = DataFrame({'A': range(N)}, - index=date_range('20130101', periods=N, freq='s')) + df = DataFrame( + {"A": range(N)}, index=date_range("20130101", periods=N, freq="s") + ) self.empty_left = [DataFrame(), df] self.empty_right = [df, DataFrame()] + self.mixed_ndims = [df, df.head(N // 2)] def time_concat_series(self, axis): - concat(self.series, axis=axis) + concat(self.series, axis=axis, sort=False) def time_concat_small_frames(self, axis): concat(self.small_frames, axis=axis) @@ -69,45 +60,19 @@ def time_concat_empty_right(self, axis): def time_concat_empty_left(self, axis): concat(self.empty_left, axis=axis) + def time_concat_mixed_ndims(self, axis): + concat(self.mixed_ndims, axis=axis) -class ConcatPanels(object): - goal_time = 0.2 - params = ([0, 1, 2], [True, False]) - param_names = ['axis', 'ignore_index'] - - def setup(self, axis, ignore_index): - with warnings.catch_warnings(record=True): - panel_c = Panel(np.zeros((10000, 200, 2), - dtype=np.float32, - order='C')) - self.panels_c = [panel_c] * 20 - panel_f = Panel(np.zeros((10000, 200, 2), - dtype=np.float32, - order='F')) - self.panels_f = [panel_f] * 20 - - def time_c_ordered(self, axis, ignore_index): - with warnings.catch_warnings(record=True): - concat(self.panels_c, axis=axis, ignore_index=ignore_index) +class ConcatDataFrames: - def time_f_ordered(self, axis, ignore_index): - with warnings.catch_warnings(record=True): - concat(self.panels_f, axis=axis, ignore_index=ignore_index) - - -class ConcatDataFrames(object): - - goal_time = 0.2 params = ([0, 1], [True, False]) - param_names = ['axis', 'ignore_index'] + param_names = ["axis", "ignore_index"] def setup(self, axis, ignore_index): - frame_c = DataFrame(np.zeros((10000, 200), - dtype=np.float32, order='C')) + frame_c = DataFrame(np.zeros((10000, 200), dtype=np.float32, order="C")) self.frame_c = [frame_c] * 20 - frame_f = DataFrame(np.zeros((10000, 200), - dtype=np.float32, order='F')) + frame_f = DataFrame(np.zeros((10000, 200), dtype=np.float32, order="F")) self.frame_f = [frame_f] * 20 def time_c_ordered(self, axis, ignore_index): @@ -117,93 +82,91 @@ def time_f_ordered(self, axis, ignore_index): concat(self.frame_f, axis=axis, ignore_index=ignore_index) -class Join(object): +class Join: - goal_time = 0.2 params = [True, False] - param_names = ['sort'] + param_names = ["sort"] def setup(self, sort): level1 = tm.makeStringIndex(10).values level2 = tm.makeStringIndex(1000).values - label1 = np.arange(10).repeat(1000) - label2 = np.tile(np.arange(1000), 10) - index2 = MultiIndex(levels=[level1, level2], - labels=[label1, label2]) - self.df_multi = DataFrame(np.random.randn(len(index2), 4), - index=index2, - columns=['A', 'B', 'C', 'D']) - - self.key1 = np.tile(level1.take(label1), 10) - self.key2 = np.tile(level2.take(label2), 10) - self.df = DataFrame({'data1': np.random.randn(100000), - 'data2': np.random.randn(100000), - 'key1': self.key1, - 'key2': self.key2}) - - self.df_key1 = DataFrame(np.random.randn(len(level1), 4), - index=level1, - columns=['A', 'B', 'C', 'D']) - self.df_key2 = DataFrame(np.random.randn(len(level2), 4), - index=level2, - columns=['A', 'B', 'C', 'D']) + codes1 = np.arange(10).repeat(1000) + codes2 = np.tile(np.arange(1000), 10) + index2 = MultiIndex(levels=[level1, level2], codes=[codes1, codes2]) + self.df_multi = DataFrame( + np.random.randn(len(index2), 4), index=index2, columns=["A", "B", "C", "D"] + ) + + self.key1 = np.tile(level1.take(codes1), 10) + self.key2 = np.tile(level2.take(codes2), 10) + self.df = DataFrame( + { + "data1": np.random.randn(100000), + "data2": np.random.randn(100000), + "key1": self.key1, + "key2": self.key2, + } + ) + + self.df_key1 = DataFrame( + np.random.randn(len(level1), 4), index=level1, columns=["A", "B", "C", "D"] + ) + self.df_key2 = DataFrame( + np.random.randn(len(level2), 4), index=level2, columns=["A", "B", "C", "D"] + ) shuf = np.arange(100000) np.random.shuffle(shuf) self.df_shuf = self.df.reindex(self.df.index[shuf]) def time_join_dataframe_index_multi(self, sort): - self.df.join(self.df_multi, on=['key1', 'key2'], sort=sort) + self.df.join(self.df_multi, on=["key1", "key2"], sort=sort) def time_join_dataframe_index_single_key_bigger(self, sort): - self.df.join(self.df_key2, on='key2', sort=sort) + self.df.join(self.df_key2, on="key2", sort=sort) def time_join_dataframe_index_single_key_small(self, sort): - self.df.join(self.df_key1, on='key1', sort=sort) + self.df.join(self.df_key1, on="key1", sort=sort) def time_join_dataframe_index_shuffle_key_bigger_sort(self, sort): - self.df_shuf.join(self.df_key2, on='key2', sort=sort) + self.df_shuf.join(self.df_key2, on="key2", sort=sort) -class JoinIndex(object): - - goal_time = 0.2 - +class JoinIndex: def setup(self): N = 50000 - self.left = DataFrame(np.random.randint(1, N / 500, (N, 2)), - columns=['jim', 'joe']) - self.right = DataFrame(np.random.randint(1, N / 500, (N, 2)), - columns=['jolie', 'jolia']).set_index('jolie') + self.left = DataFrame( + np.random.randint(1, N / 500, (N, 2)), columns=["jim", "joe"] + ) + self.right = DataFrame( + np.random.randint(1, N / 500, (N, 2)), columns=["jolie", "jolia"] + ).set_index("jolie") def time_left_outer_join_index(self): - self.left.join(self.right, on='jim') + self.left.join(self.right, on="jim") -class JoinNonUnique(object): +class JoinNonUnique: # outer join of non-unique # GH 6329 - goal_time = 0.2 - def setup(self): - date_index = date_range('01-Jan-2013', '23-Jan-2013', freq='T') - daily_dates = date_index.to_period('D').to_timestamp('S', 'S') + date_index = date_range("01-Jan-2013", "23-Jan-2013", freq="T") + daily_dates = date_index.to_period("D").to_timestamp("S", "S") self.fracofday = date_index.values - daily_dates.values - self.fracofday = self.fracofday.astype('timedelta64[ns]') + self.fracofday = self.fracofday.astype("timedelta64[ns]") self.fracofday = self.fracofday.astype(np.float64) / 86400000000000.0 self.fracofday = Series(self.fracofday, daily_dates) - index = date_range(date_index.min(), date_index.max(), freq='D') + index = date_range(date_index.min(), date_index.max(), freq="D") self.temp = Series(1.0, index)[self.fracofday.index] def time_join_non_unique_equal(self): self.fracofday * self.temp -class Merge(object): +class Merge: - goal_time = 0.2 params = [True, False] - param_names = ['sort'] + param_names = ["sort"] def setup(self, sort): N = 10000 @@ -211,17 +174,25 @@ def setup(self, sort): indices2 = tm.makeStringIndex(N).values key = np.tile(indices[:8000], 10) key2 = np.tile(indices2[:8000], 10) - self.left = DataFrame({'key': key, 'key2': key2, - 'value': np.random.randn(80000)}) - self.right = DataFrame({'key': indices[2000:], - 'key2': indices2[2000:], - 'value2': np.random.randn(8000)}) - - self.df = DataFrame({'key1': np.tile(np.arange(500).repeat(10), 2), - 'key2': np.tile(np.arange(250).repeat(10), 4), - 'value': np.random.randn(10000)}) - self.df2 = DataFrame({'key1': np.arange(500), - 'value2': np.random.randn(500)}) + self.left = DataFrame( + {"key": key, "key2": key2, "value": np.random.randn(80000)} + ) + self.right = DataFrame( + { + "key": indices[2000:], + "key2": indices2[2000:], + "value2": np.random.randn(8000), + } + ) + + self.df = DataFrame( + { + "key1": np.tile(np.arange(500).repeat(10), 2), + "key2": np.tile(np.arange(250).repeat(10), 4), + "value": np.random.randn(10000), + } + ) + self.df2 = DataFrame({"key1": np.arange(500), "value2": np.random.randn(500)}) self.df3 = self.df[:5000] def time_merge_2intkey(self, sort): @@ -231,125 +202,141 @@ def time_merge_dataframe_integer_2key(self, sort): merge(self.df, self.df3, sort=sort) def time_merge_dataframe_integer_key(self, sort): - merge(self.df, self.df2, on='key1', sort=sort) + merge(self.df, self.df2, on="key1", sort=sort) -class I8Merge(object): +class I8Merge: - goal_time = 0.2 - params = ['inner', 'outer', 'left', 'right'] - param_names = ['how'] + params = ["inner", "outer", "left", "right"] + param_names = ["how"] def setup(self, how): - low, high, n = -1000, 1000, 10**6 - self.left = DataFrame(np.random.randint(low, high, (n, 7)), - columns=list('ABCDEFG')) - self.left['left'] = self.left.sum(axis=1) - self.right = self.left.sample(frac=1).rename({'left': 'right'}, axis=1) + low, high, n = -1000, 1000, 10 ** 6 + self.left = DataFrame( + np.random.randint(low, high, (n, 7)), columns=list("ABCDEFG") + ) + self.left["left"] = self.left.sum(axis=1) + self.right = self.left.sample(frac=1).rename({"left": "right"}, axis=1) self.right = self.right.reset_index(drop=True) - self.right['right'] *= -1 + self.right["right"] *= -1 def time_i8merge(self, how): merge(self.left, self.right, how=how) -class MergeCategoricals(object): - - goal_time = 0.2 - +class MergeCategoricals: def setup(self): self.left_object = DataFrame( - {'X': np.random.choice(range(0, 10), size=(10000,)), - 'Y': np.random.choice(['one', 'two', 'three'], size=(10000,))}) + { + "X": np.random.choice(range(0, 10), size=(10000,)), + "Y": np.random.choice(["one", "two", "three"], size=(10000,)), + } + ) self.right_object = DataFrame( - {'X': np.random.choice(range(0, 10), size=(10000,)), - 'Z': np.random.choice(['jjj', 'kkk', 'sss'], size=(10000,))}) + { + "X": np.random.choice(range(0, 10), size=(10000,)), + "Z": np.random.choice(["jjj", "kkk", "sss"], size=(10000,)), + } + ) self.left_cat = self.left_object.assign( - Y=self.left_object['Y'].astype('category')) + Y=self.left_object["Y"].astype("category") + ) self.right_cat = self.right_object.assign( - Z=self.right_object['Z'].astype('category')) + Z=self.right_object["Z"].astype("category") + ) def time_merge_object(self): - merge(self.left_object, self.right_object, on='X') + merge(self.left_object, self.right_object, on="X") def time_merge_cat(self): - merge(self.left_cat, self.right_cat, on='X') - + merge(self.left_cat, self.right_cat, on="X") -class MergeOrdered(object): +class MergeOrdered: def setup(self): groups = tm.makeStringIndex(10).values - self.left = DataFrame({'group': groups.repeat(5000), - 'key': np.tile(np.arange(0, 10000, 2), 10), - 'lvalue': np.random.randn(50000)}) - self.right = DataFrame({'key': np.arange(10000), - 'rvalue': np.random.randn(10000)}) + self.left = DataFrame( + { + "group": groups.repeat(5000), + "key": np.tile(np.arange(0, 10000, 2), 10), + "lvalue": np.random.randn(50000), + } + ) + self.right = DataFrame( + {"key": np.arange(10000), "rvalue": np.random.randn(10000)} + ) def time_merge_ordered(self): - merge_ordered(self.left, self.right, on='key', left_by='group') + merge_ordered(self.left, self.right, on="key", left_by="group") -class MergeAsof(object): +class MergeAsof: + params = [["backward", "forward", "nearest"]] + param_names = ["direction"] - def setup(self): + def setup(self, direction): one_count = 200000 two_count = 1000000 df1 = DataFrame( - {'time': np.random.randint(0, one_count / 20, one_count), - 'key': np.random.choice(list(string.ascii_uppercase), one_count), - 'key2': np.random.randint(0, 25, one_count), - 'value1': np.random.randn(one_count)}) + { + "time": np.random.randint(0, one_count / 20, one_count), + "key": np.random.choice(list(string.ascii_uppercase), one_count), + "key2": np.random.randint(0, 25, one_count), + "value1": np.random.randn(one_count), + } + ) df2 = DataFrame( - {'time': np.random.randint(0, two_count / 20, two_count), - 'key': np.random.choice(list(string.ascii_uppercase), two_count), - 'key2': np.random.randint(0, 25, two_count), - 'value2': np.random.randn(two_count)}) - - df1 = df1.sort_values('time') - df2 = df2.sort_values('time') - - df1['time32'] = np.int32(df1.time) - df2['time32'] = np.int32(df2.time) - - self.df1a = df1[['time', 'value1']] - self.df2a = df2[['time', 'value2']] - self.df1b = df1[['time', 'key', 'value1']] - self.df2b = df2[['time', 'key', 'value2']] - self.df1c = df1[['time', 'key2', 'value1']] - self.df2c = df2[['time', 'key2', 'value2']] - self.df1d = df1[['time32', 'value1']] - self.df2d = df2[['time32', 'value2']] - self.df1e = df1[['time', 'key', 'key2', 'value1']] - self.df2e = df2[['time', 'key', 'key2', 'value2']] - - def time_on_int(self): - merge_asof(self.df1a, self.df2a, on='time') - - def time_on_int32(self): - merge_asof(self.df1d, self.df2d, on='time32') - - def time_by_object(self): - merge_asof(self.df1b, self.df2b, on='time', by='key') - - def time_by_int(self): - merge_asof(self.df1c, self.df2c, on='time', by='key2') - - def time_multiby(self): - merge_asof(self.df1e, self.df2e, on='time', by=['key', 'key2']) - - -class Align(object): - - goal_time = 0.2 - + { + "time": np.random.randint(0, two_count / 20, two_count), + "key": np.random.choice(list(string.ascii_uppercase), two_count), + "key2": np.random.randint(0, 25, two_count), + "value2": np.random.randn(two_count), + } + ) + + df1 = df1.sort_values("time") + df2 = df2.sort_values("time") + + df1["time32"] = np.int32(df1.time) + df2["time32"] = np.int32(df2.time) + + self.df1a = df1[["time", "value1"]] + self.df2a = df2[["time", "value2"]] + self.df1b = df1[["time", "key", "value1"]] + self.df2b = df2[["time", "key", "value2"]] + self.df1c = df1[["time", "key2", "value1"]] + self.df2c = df2[["time", "key2", "value2"]] + self.df1d = df1[["time32", "value1"]] + self.df2d = df2[["time32", "value2"]] + self.df1e = df1[["time", "key", "key2", "value1"]] + self.df2e = df2[["time", "key", "key2", "value2"]] + + def time_on_int(self, direction): + merge_asof(self.df1a, self.df2a, on="time", direction=direction) + + def time_on_int32(self, direction): + merge_asof(self.df1d, self.df2d, on="time32", direction=direction) + + def time_by_object(self, direction): + merge_asof(self.df1b, self.df2b, on="time", by="key", direction=direction) + + def time_by_int(self, direction): + merge_asof(self.df1c, self.df2c, on="time", by="key2", direction=direction) + + def time_multiby(self, direction): + merge_asof( + self.df1e, self.df2e, on="time", by=["key", "key2"], direction=direction + ) + + +class Align: def setup(self): - size = 5 * 10**5 - rng = np.arange(0, 10**13, 10**7) - stamps = np.datetime64('now').view('i8') + rng + size = 5 * 10 ** 5 + rng = np.arange(0, 10 ** 13, 10 ** 7) + stamps = np.datetime64("now").view("i8") + rng idx1 = np.sort(np.random.choice(stamps, size, replace=False)) idx2 = np.sort(np.random.choice(stamps, size, replace=False)) self.ts1 = Series(np.random.randn(size), idx1) @@ -359,4 +346,7 @@ def time_series_align_int64_index(self): self.ts1 + self.ts2 def time_series_align_left_monotonic(self): - self.ts1.align(self.ts2, join='left') + self.ts1.align(self.ts2, join="left") + + +from .pandas_vb_common import setup # noqa: F401 isort:skip diff --git a/asv_bench/benchmarks/multiindex_object.py b/asv_bench/benchmarks/multiindex_object.py index 0c92214795557..3f4fd7ad911c1 100644 --- a/asv_bench/benchmarks/multiindex_object.py +++ b/asv_bench/benchmarks/multiindex_object.py @@ -1,57 +1,50 @@ import string import numpy as np -import pandas.util.testing as tm -from pandas import date_range, MultiIndex - -from .pandas_vb_common import setup # noqa - -class GetLoc(object): +from pandas import DataFrame, MultiIndex, date_range +import pandas.util.testing as tm - goal_time = 0.2 +class GetLoc: def setup(self): self.mi_large = MultiIndex.from_product( [np.arange(1000), np.arange(20), list(string.ascii_letters)], - names=['one', 'two', 'three']) + names=["one", "two", "three"], + ) self.mi_med = MultiIndex.from_product( - [np.arange(1000), np.arange(10), list('A')], - names=['one', 'two', 'three']) + [np.arange(1000), np.arange(10), list("A")], names=["one", "two", "three"] + ) self.mi_small = MultiIndex.from_product( - [np.arange(100), list('A'), list('A')], - names=['one', 'two', 'three']) + [np.arange(100), list("A"), list("A")], names=["one", "two", "three"] + ) def time_large_get_loc(self): - self.mi_large.get_loc((999, 19, 'Z')) + self.mi_large.get_loc((999, 19, "Z")) def time_large_get_loc_warm(self): for _ in range(1000): - self.mi_large.get_loc((999, 19, 'Z')) + self.mi_large.get_loc((999, 19, "Z")) def time_med_get_loc(self): - self.mi_med.get_loc((999, 9, 'A')) + self.mi_med.get_loc((999, 9, "A")) def time_med_get_loc_warm(self): for _ in range(1000): - self.mi_med.get_loc((999, 9, 'A')) + self.mi_med.get_loc((999, 9, "A")) def time_string_get_loc(self): - self.mi_small.get_loc((99, 'A', 'A')) + self.mi_small.get_loc((99, "A", "A")) def time_small_get_loc_warm(self): for _ in range(1000): - self.mi_small.get_loc((99, 'A', 'A')) - - -class Duplicates(object): + self.mi_small.get_loc((99, "A", "A")) - goal_time = 0.2 +class Duplicates: def setup(self): size = 65536 - arrays = [np.random.randint(0, 8192, size), - np.random.randint(0, 1024, size)] + arrays = [np.random.randint(0, 8192, size), np.random.randint(0, 1024, size)] mask = np.random.rand(size) < 0.1 self.mi_unused_levels = MultiIndex.from_arrays(arrays) self.mi_unused_levels = self.mi_unused_levels[mask] @@ -60,18 +53,26 @@ def time_remove_unused_levels(self): self.mi_unused_levels.remove_unused_levels() -class Integer(object): - - goal_time = 0.2 - +class Integer: def setup(self): - self.mi_int = MultiIndex.from_product([np.arange(1000), - np.arange(1000)], - names=['one', 'two']) - self.obj_index = np.array([(0, 10), (0, 11), (0, 12), - (0, 13), (0, 14), (0, 15), - (0, 16), (0, 17), (0, 18), - (0, 19)], dtype=object) + self.mi_int = MultiIndex.from_product( + [np.arange(1000), np.arange(1000)], names=["one", "two"] + ) + self.obj_index = np.array( + [ + (0, 10), + (0, 11), + (0, 12), + (0, 13), + (0, 14), + (0, 15), + (0, 16), + (0, 17), + (0, 18), + (0, 19), + ], + dtype=object, + ) def time_get_indexer(self): self.mi_int.get_indexer(self.obj_index) @@ -80,31 +81,25 @@ def time_is_monotonic(self): self.mi_int.is_monotonic -class Duplicated(object): - - goal_time = 0.2 - +class Duplicated: def setup(self): n, k = 200, 5000 - levels = [np.arange(n), - tm.makeStringIndex(n).values, - 1000 + np.arange(n)] - labels = [np.random.choice(n, (k * n)) for lev in levels] - self.mi = MultiIndex(levels=levels, labels=labels) + levels = [np.arange(n), tm.makeStringIndex(n).values, 1000 + np.arange(n)] + codes = [np.random.choice(n, (k * n)) for lev in levels] + self.mi = MultiIndex(levels=levels, codes=codes) def time_duplicated(self): self.mi.duplicated() -class Sortlevel(object): - - goal_time = 0.2 - +class Sortlevel: def setup(self): n = 1182720 low, high = -4096, 4096 - arrs = [np.repeat(np.random.randint(low, high, (n // k)), k) - for k in [11, 7, 5, 3, 1]] + arrs = [ + np.repeat(np.random.randint(low, high, (n // k)), k) + for k in [11, 7, 5, 3, 1] + ] self.mi_int = MultiIndex.from_arrays(arrs)[np.random.permutation(n)] a = np.repeat(np.arange(100), 1000) @@ -122,14 +117,11 @@ def time_sortlevel_one(self): self.mi.sortlevel(1) -class Values(object): - - goal_time = 0.2 - +class Values: def setup_cache(self): level1 = range(1000) - level2 = date_range(start='1/1/2012', periods=100) + level2 = date_range(start="1/1/2012", periods=100) mi = MultiIndex.from_product([level1, level2]) return mi @@ -138,3 +130,21 @@ def time_datetime_level_values_copy(self, mi): def time_datetime_level_values_sliced(self, mi): mi[:10].values + + +class CategoricalLevel: + def setup(self): + + self.df = DataFrame( + { + "a": np.arange(1_000_000, dtype=np.int32), + "b": np.arange(1_000_000, dtype=np.int64), + "c": np.arange(1_000_000, dtype=float), + } + ).astype({"a": "category", "b": "category"}) + + def time_categorical_level(self): + self.df.set_index(["a", "b"]) + + +from .pandas_vb_common import setup # noqa: F401 isort:skip diff --git a/asv_bench/benchmarks/offset.py b/asv_bench/benchmarks/offset.py index e161b887ee86f..d822646e712ae 100644 --- a/asv_bench/benchmarks/offset.py +++ b/asv_bench/benchmarks/offset.py @@ -1,79 +1,85 @@ -# -*- coding: utf-8 -*- -import warnings from datetime import datetime +import warnings import numpy as np + import pandas as pd + try: import pandas.tseries.holiday # noqa except ImportError: pass hcal = pd.tseries.holiday.USFederalHolidayCalendar() -# These offests currently raise a NotImplimentedError with .apply_index() -non_apply = [pd.offsets.Day(), - pd.offsets.BYearEnd(), - pd.offsets.BYearBegin(), - pd.offsets.BQuarterEnd(), - pd.offsets.BQuarterBegin(), - pd.offsets.BMonthEnd(), - pd.offsets.BMonthBegin(), - pd.offsets.CustomBusinessDay(), - pd.offsets.CustomBusinessDay(calendar=hcal), - pd.offsets.CustomBusinessMonthBegin(calendar=hcal), - pd.offsets.CustomBusinessMonthEnd(calendar=hcal), - pd.offsets.CustomBusinessMonthEnd(calendar=hcal)] -other_offsets = [pd.offsets.YearEnd(), pd.offsets.YearBegin(), - pd.offsets.QuarterEnd(), pd.offsets.QuarterBegin(), - pd.offsets.MonthEnd(), pd.offsets.MonthBegin(), - pd.offsets.DateOffset(months=2, days=2), - pd.offsets.BusinessDay(), pd.offsets.SemiMonthEnd(), - pd.offsets.SemiMonthBegin()] +# These offsets currently raise a NotImplimentedError with .apply_index() +non_apply = [ + pd.offsets.Day(), + pd.offsets.BYearEnd(), + pd.offsets.BYearBegin(), + pd.offsets.BQuarterEnd(), + pd.offsets.BQuarterBegin(), + pd.offsets.BMonthEnd(), + pd.offsets.BMonthBegin(), + pd.offsets.CustomBusinessDay(), + pd.offsets.CustomBusinessDay(calendar=hcal), + pd.offsets.CustomBusinessMonthBegin(calendar=hcal), + pd.offsets.CustomBusinessMonthEnd(calendar=hcal), + pd.offsets.CustomBusinessMonthEnd(calendar=hcal), +] +other_offsets = [ + pd.offsets.YearEnd(), + pd.offsets.YearBegin(), + pd.offsets.QuarterEnd(), + pd.offsets.QuarterBegin(), + pd.offsets.MonthEnd(), + pd.offsets.MonthBegin(), + pd.offsets.DateOffset(months=2, days=2), + pd.offsets.BusinessDay(), + pd.offsets.SemiMonthEnd(), + pd.offsets.SemiMonthBegin(), +] offsets = non_apply + other_offsets -class ApplyIndex(object): - - goal_time = 0.2 +class ApplyIndex: params = other_offsets - param_names = ['offset'] + param_names = ["offset"] def setup(self, offset): N = 10000 - self.rng = pd.date_range(start='1/1/2000', periods=N, freq='T') + self.rng = pd.date_range(start="1/1/2000", periods=N, freq="T") def time_apply_index(self, offset): offset.apply_index(self.rng) -class OnOffset(object): - - goal_time = 0.2 +class OnOffset: params = offsets - param_names = ['offset'] + param_names = ["offset"] def setup(self, offset): - self.dates = [datetime(2016, m, d) - for m in [10, 11, 12] - for d in [1, 2, 3, 28, 29, 30, 31] - if not (m == 11 and d == 31)] + self.dates = [ + datetime(2016, m, d) + for m in [10, 11, 12] + for d in [1, 2, 3, 28, 29, 30, 31] + if not (m == 11 and d == 31) + ] def time_on_offset(self, offset): for date in self.dates: offset.onOffset(date) -class OffsetSeriesArithmetic(object): +class OffsetSeriesArithmetic: - goal_time = 0.2 params = offsets - param_names = ['offset'] + param_names = ["offset"] def setup(self, offset): N = 1000 - rng = pd.date_range(start='1/1/2000', periods=N, freq='T') + rng = pd.date_range(start="1/1/2000", periods=N, freq="T") self.data = pd.Series(rng) def time_add_offset(self, offset): @@ -81,30 +87,28 @@ def time_add_offset(self, offset): self.data + offset -class OffsetDatetimeIndexArithmetic(object): +class OffsetDatetimeIndexArithmetic: - goal_time = 0.2 params = offsets - param_names = ['offset'] + param_names = ["offset"] def setup(self, offset): N = 1000 - self.data = pd.date_range(start='1/1/2000', periods=N, freq='T') + self.data = pd.date_range(start="1/1/2000", periods=N, freq="T") def time_add_offset(self, offset): with warnings.catch_warnings(record=True): self.data + offset -class OffestDatetimeArithmetic(object): +class OffestDatetimeArithmetic: - goal_time = 0.2 params = offsets - param_names = ['offset'] + param_names = ["offset"] def setup(self, offset): self.date = datetime(2011, 1, 1) - self.dt64 = np.datetime64('2011-01-01 09:00Z') + self.dt64 = np.datetime64("2011-01-01 09:00Z") def time_apply(self, offset): offset.apply(self.date) diff --git a/asv_bench/benchmarks/package.py b/asv_bench/benchmarks/package.py new file mode 100644 index 0000000000000..8ca33db361fa0 --- /dev/null +++ b/asv_bench/benchmarks/package.py @@ -0,0 +1,25 @@ +""" +Benchmarks for pandas at the package-level. +""" +import subprocess +import sys + +from pandas.compat import PY37 + + +class TimeImport: + def time_import(self): + if PY37: + # on py37+ we the "-X importtime" usage gives us a more precise + # measurement of the import time we actually care about, + # without the subprocess or interpreter overhead + cmd = [sys.executable, "-X", "importtime", "-c", "import pandas as pd"] + p = subprocess.run(cmd, stderr=subprocess.PIPE) + + line = p.stderr.splitlines()[-1] + field = line.split(b"|")[-2].strip() + total = int(field) # microseconds + return total + + cmd = [sys.executable, "-c", "import pandas as pd"] + subprocess.run(cmd, stderr=subprocess.PIPE) diff --git a/asv_bench/benchmarks/pandas_vb_common.py b/asv_bench/benchmarks/pandas_vb_common.py index c0d24afae4219..1faf13329110d 100644 --- a/asv_bench/benchmarks/pandas_vb_common.py +++ b/asv_bench/benchmarks/pandas_vb_common.py @@ -1,23 +1,49 @@ -import os from importlib import import_module +import os import numpy as np -try: - from pandas import Panel -except ImportError: - from pandas import WidePanel as Panel # noqa + +import pandas as pd # Compatibility import for lib -for imp in ['pandas._libs.lib', 'pandas.lib']: +for imp in ["pandas._libs.lib", "pandas.lib"]: try: lib = import_module(imp) break - except: + except (ImportError, TypeError, ValueError): pass -numeric_dtypes = [np.int64, np.int32, np.uint32, np.uint64, np.float32, - np.float64, np.int16, np.int8, np.uint16, np.uint8] +numeric_dtypes = [ + np.int64, + np.int32, + np.uint32, + np.uint64, + np.float32, + np.float64, + np.int16, + np.int8, + np.uint16, + np.uint8, +] datetime_dtypes = [np.datetime64, np.timedelta64] +string_dtypes = [np.object] +try: + extension_dtypes = [ + pd.Int8Dtype, + pd.Int16Dtype, + pd.Int32Dtype, + pd.Int64Dtype, + pd.UInt8Dtype, + pd.UInt16Dtype, + pd.UInt32Dtype, + pd.UInt64Dtype, + pd.CategoricalDtype, + pd.IntervalDtype, + pd.DatetimeTZDtype("ns", "UTC"), + pd.PeriodDtype("D"), + ] +except AttributeError: + extension_dtypes = [] def setup(*args, **kwargs): @@ -27,17 +53,18 @@ def setup(*args, **kwargs): np.random.seed(1234) -class BaseIO(object): +class BaseIO: """ Base class for IO benchmarks """ + fname = None def remove(self, f): """Remove created files""" try: os.remove(f) - except: + except OSError: # On Windows, attempting to remove a file that is in use # causes an exception to be raised pass diff --git a/asv_bench/benchmarks/panel_ctor.py b/asv_bench/benchmarks/panel_ctor.py deleted file mode 100644 index ce946c76ed199..0000000000000 --- a/asv_bench/benchmarks/panel_ctor.py +++ /dev/null @@ -1,60 +0,0 @@ -import warnings -from datetime import datetime, timedelta - -from pandas import DataFrame, DatetimeIndex, date_range - -from .pandas_vb_common import Panel, setup # noqa - - -class DifferentIndexes(object): - goal_time = 0.2 - - def setup(self): - self.data_frames = {} - start = datetime(1990, 1, 1) - end = datetime(2012, 1, 1) - for x in range(100): - end += timedelta(days=1) - idx = date_range(start, end) - df = DataFrame({'a': 0, 'b': 1, 'c': 2}, index=idx) - self.data_frames[x] = df - - def time_from_dict(self): - with warnings.catch_warnings(record=True): - Panel.from_dict(self.data_frames) - - -class SameIndexes(object): - - goal_time = 0.2 - - def setup(self): - idx = DatetimeIndex(start=datetime(1990, 1, 1), - end=datetime(2012, 1, 1), - freq='D') - df = DataFrame({'a': 0, 'b': 1, 'c': 2}, index=idx) - self.data_frames = dict(enumerate([df] * 100)) - - def time_from_dict(self): - with warnings.catch_warnings(record=True): - Panel.from_dict(self.data_frames) - - -class TwoIndexes(object): - - goal_time = 0.2 - - def setup(self): - start = datetime(1990, 1, 1) - end = datetime(2012, 1, 1) - df1 = DataFrame({'a': 0, 'b': 1, 'c': 2}, - index=DatetimeIndex(start=start, end=end, freq='D')) - end += timedelta(days=1) - df2 = DataFrame({'a': 0, 'b': 1, 'c': 2}, - index=DatetimeIndex(start=start, end=end, freq='D')) - dfs = [df1] * 50 + [df2] * 50 - self.data_frames = dict(enumerate(dfs)) - - def time_from_dict(self): - with warnings.catch_warnings(record=True): - Panel.from_dict(self.data_frames) diff --git a/asv_bench/benchmarks/panel_methods.py b/asv_bench/benchmarks/panel_methods.py deleted file mode 100644 index a5b1a92e9cf67..0000000000000 --- a/asv_bench/benchmarks/panel_methods.py +++ /dev/null @@ -1,24 +0,0 @@ -import warnings - -import numpy as np - -from .pandas_vb_common import Panel, setup # noqa - - -class PanelMethods(object): - - goal_time = 0.2 - params = ['items', 'major', 'minor'] - param_names = ['axis'] - - def setup(self, axis): - with warnings.catch_warnings(record=True): - self.panel = Panel(np.random.randn(100, 1000, 100)) - - def time_pct_change(self, axis): - with warnings.catch_warnings(record=True): - self.panel.pct_change(1, axis=axis) - - def time_shift(self, axis): - with warnings.catch_warnings(record=True): - self.panel.shift(1, axis=axis) diff --git a/asv_bench/benchmarks/period.py b/asv_bench/benchmarks/period.py index 897a3338c164c..7303240a25f29 100644 --- a/asv_bench/benchmarks/period.py +++ b/asv_bench/benchmarks/period.py @@ -1,29 +1,46 @@ -from pandas import (DataFrame, Series, Period, PeriodIndex, date_range, - period_range) - - -class PeriodProperties(object): - - params = (['M', 'min'], - ['year', 'month', 'day', 'hour', 'minute', 'second', - 'is_leap_year', 'quarter', 'qyear', 'week', 'daysinmonth', - 'dayofweek', 'dayofyear', 'start_time', 'end_time']) - param_names = ['freq', 'attr'] +from pandas import DataFrame, Period, PeriodIndex, Series, date_range, period_range + +from pandas.tseries.frequencies import to_offset + + +class PeriodProperties: + + params = ( + ["M", "min"], + [ + "year", + "month", + "day", + "hour", + "minute", + "second", + "is_leap_year", + "quarter", + "qyear", + "week", + "daysinmonth", + "dayofweek", + "dayofyear", + "start_time", + "end_time", + ], + ) + param_names = ["freq", "attr"] def setup(self, freq, attr): - self.per = Period('2012-06-01', freq=freq) + self.per = Period("2012-06-01", freq=freq) def time_property(self, freq, attr): getattr(self.per, attr) -class PeriodUnaryMethods(object): +class PeriodUnaryMethods: - params = ['M', 'min'] - param_names = ['freq'] + params = ["M", "min"] + param_names = ["freq"] def setup(self, freq): - self.per = Period('2012-06-01', freq=freq) + self.per = Period("2012-06-01", freq=freq) def time_to_timestamp(self, freq): self.per.to_timestamp() @@ -32,53 +49,83 @@ def time_now(self, freq): self.per.now(freq) def time_asfreq(self, freq): - self.per.asfreq('A') + self.per.asfreq("A") -class PeriodIndexConstructor(object): +class PeriodConstructor: + params = [["D"], [True, False]] + param_names = ["freq", "is_offset"] - goal_time = 0.2 + def setup(self, freq, is_offset): + if is_offset: + self.freq = to_offset(freq) + else: + self.freq = freq - params = ['D'] - param_names = ['freq'] + def time_period_constructor(self, freq, is_offset): + Period("2012-06-01", freq=freq) - def setup(self, freq): - self.rng = date_range('1985', periods=1000) - self.rng2 = date_range('1985', periods=1000).to_pydatetime() - def time_from_date_range(self, freq): +class PeriodIndexConstructor: + + params = [["D"], [True, False]] + param_names = ["freq", "is_offset"] + + def setup(self, freq, is_offset): + self.rng = date_range("1985", periods=1000) + self.rng2 = date_range("1985", periods=1000).to_pydatetime() + self.ints = list(range(2000, 3000)) + self.daily_ints = ( + date_range("1/1/2000", periods=1000, freq=freq).strftime("%Y%m%d").map(int) + ) + if is_offset: + self.freq = to_offset(freq) + else: + self.freq = freq + + def time_from_date_range(self, freq, is_offset): PeriodIndex(self.rng, freq=freq) - def time_from_pydatetime(self, freq): + def time_from_pydatetime(self, freq, is_offset): PeriodIndex(self.rng2, freq=freq) + def time_from_ints(self, freq, is_offset): + PeriodIndex(self.ints, freq=freq) -class DataFramePeriodColumn(object): + def time_from_ints_daily(self, freq, is_offset): + PeriodIndex(self.daily_ints, freq=freq) - goal_time = 0.2 +class DataFramePeriodColumn: def setup(self): - self.rng = period_range(start='1/1/1990', freq='S', periods=20000) + self.rng = period_range(start="1/1/1990", freq="S", periods=20000) self.df = DataFrame(index=range(len(self.rng))) def time_setitem_period_column(self): - self.df['col'] = self.rng + self.df["col"] = self.rng + def time_set_index(self): + # GH#21582 limited by comparisons of Period objects + self.df["col2"] = self.rng + self.df.set_index("col2", append=True) -class Algorithms(object): - goal_time = 0.2 +class Algorithms: - params = ['index', 'series'] - param_names = ['typ'] + params = ["index", "series"] + param_names = ["typ"] def setup(self, typ): - data = [Period('2011-01', freq='M'), Period('2011-02', freq='M'), - Period('2011-03', freq='M'), Period('2011-04', freq='M')] - - if typ == 'index': - self.vector = PeriodIndex(data * 1000, freq='M') - elif typ == 'series': + data = [ + Period("2011-01", freq="M"), + Period("2011-02", freq="M"), + Period("2011-03", freq="M"), + Period("2011-04", freq="M"), + ] + + if typ == "index": + self.vector = PeriodIndex(data * 1000, freq="M") + elif typ == "series": self.vector = Series(data * 1000) def time_drop_duplicates(self, typ): @@ -88,12 +135,9 @@ def time_value_counts(self, typ): self.vector.value_counts() -class Indexing(object): - - goal_time = 0.2 - +class Indexing: def setup(self): - self.index = PeriodIndex(start='1985', periods=1000, freq='D') + self.index = period_range(start="1985", periods=1000, freq="D") self.series = Series(range(1000), index=self.index) self.period = self.index[500] @@ -110,7 +154,10 @@ def time_series_loc(self): self.series.loc[self.period] def time_align(self): - DataFrame({'a': self.series, 'b': self.series[:500]}) + DataFrame({"a": self.series, "b": self.series[:500]}) def time_intersection(self): self.index[:750].intersection(self.index[250:]) + + def time_unique(self): + self.index.unique() diff --git a/asv_bench/benchmarks/plotting.py b/asv_bench/benchmarks/plotting.py index 5b49112b0e07d..5c718516360ed 100644 --- a/asv_bench/benchmarks/plotting.py +++ b/asv_bench/benchmarks/plotting.py @@ -1,44 +1,74 @@ +import matplotlib import numpy as np -from pandas import DataFrame, Series, DatetimeIndex, date_range + +from pandas import DataFrame, DatetimeIndex, Series, date_range + try: from pandas.plotting import andrews_curves except ImportError: from pandas.tools.plotting import andrews_curves -import matplotlib -matplotlib.use('Agg') -from .pandas_vb_common import setup # noqa +matplotlib.use("Agg") -class Plotting(object): +class SeriesPlotting: + params = [["line", "bar", "area", "barh", "hist", "kde", "pie"]] + param_names = ["kind"] - goal_time = 0.2 + def setup(self, kind): + if kind in ["bar", "barh", "pie"]: + n = 100 + elif kind in ["kde"]: + n = 10000 + else: + n = 1000000 - def setup(self): - self.s = Series(np.random.randn(1000000)) - self.df = DataFrame({'col': self.s}) + self.s = Series(np.random.randn(n)) + if kind in ["area", "pie"]: + self.s = self.s.abs() - def time_series_plot(self): - self.s.plot() + def time_series_plot(self, kind): + self.s.plot(kind=kind) - def time_frame_plot(self): - self.df.plot() +class FramePlotting: + params = [ + ["line", "bar", "area", "barh", "hist", "kde", "pie", "scatter", "hexbin"] + ] + param_names = ["kind"] -class TimeseriesPlotting(object): + def setup(self, kind): + if kind in ["bar", "barh", "pie"]: + n = 100 + elif kind in ["kde", "scatter", "hexbin"]: + n = 10000 + else: + n = 1000000 - goal_time = 0.2 + self.x = Series(np.random.randn(n)) + self.y = Series(np.random.randn(n)) + if kind in ["area", "pie"]: + self.x = self.x.abs() + self.y = self.y.abs() + self.df = DataFrame({"x": self.x, "y": self.y}) + def time_frame_plot(self, kind): + self.df.plot(x="x", y="y", kind=kind) + + +class TimeseriesPlotting: def setup(self): N = 2000 M = 5 - idx = date_range('1/1/1975', periods=N) + idx = date_range("1/1/1975", periods=N) self.df = DataFrame(np.random.randn(N, M), index=idx) - idx_irregular = DatetimeIndex(np.concatenate((idx.values[0:10], - idx.values[12:]))) - self.df2 = DataFrame(np.random.randn(len(idx_irregular), M), - index=idx_irregular) + idx_irregular = DatetimeIndex( + np.concatenate((idx.values[0:10], idx.values[12:])) + ) + self.df2 = DataFrame( + np.random.randn(len(idx_irregular), M), index=idx_irregular + ) def time_plot_regular(self): self.df.plot() @@ -49,16 +79,19 @@ def time_plot_regular_compat(self): def time_plot_irregular(self): self.df2.plot() + def time_plot_table(self): + self.df.plot(table=True) -class Misc(object): - - goal_time = 0.6 +class Misc: def setup(self): N = 500 M = 10 self.df = DataFrame(np.random.randn(N, M)) - self.df['Name'] = ["A"] * N + self.df["Name"] = ["A"] * N def time_plot_andrews_curves(self): andrews_curves(self.df, "Name") + + +from .pandas_vb_common import setup # noqa: F401 isort:skip diff --git a/asv_bench/benchmarks/reindex.py b/asv_bench/benchmarks/reindex.py index 413427a16f40b..cd450f801c805 100644 --- a/asv_bench/benchmarks/reindex.py +++ b/asv_bench/benchmarks/reindex.py @@ -1,22 +1,20 @@ import numpy as np -import pandas.util.testing as tm -from pandas import (DataFrame, Series, DatetimeIndex, MultiIndex, Index, - date_range) -from .pandas_vb_common import setup, lib # noqa +from pandas import DataFrame, Index, MultiIndex, Series, date_range, period_range +import pandas.util.testing as tm -class Reindex(object): +from .pandas_vb_common import lib - goal_time = 0.2 +class Reindex: def setup(self): - rng = DatetimeIndex(start='1/1/1970', periods=10000, freq='1min') - self.df = DataFrame(np.random.rand(10000, 10), index=rng, - columns=range(10)) - self.df['foo'] = 'bar' + rng = date_range(start="1/1/1970", periods=10000, freq="1min") + self.df = DataFrame(np.random.rand(10000, 10), index=rng, columns=range(10)) + self.df["foo"] = "bar" self.rng_subset = Index(rng[::2]) - self.df2 = DataFrame(index=range(10000), - data=np.random.rand(10000, 30), columns=range(30)) + self.df2 = DataFrame( + index=range(10000), data=np.random.rand(10000, 30), columns=range(30) + ) N = 5000 K = 200 level1 = tm.makeStringIndex(N).values.repeat(K) @@ -35,33 +33,31 @@ def time_reindex_multiindex(self): self.s.reindex(self.s_subset.index) -class ReindexMethod(object): +class ReindexMethod: - goal_time = 0.2 - params = ['pad', 'backfill'] - param_names = ['method'] + params = [["pad", "backfill"], [date_range, period_range]] + param_names = ["method", "constructor"] - def setup(self, method): + def setup(self, method, constructor): N = 100000 - self.idx = date_range('1/1/2000', periods=N, freq='1min') + self.idx = constructor("1/1/2000", periods=N, freq="1min") self.ts = Series(np.random.randn(N), index=self.idx)[::2] - def time_reindex_method(self, method): + def time_reindex_method(self, method, constructor): self.ts.reindex(self.idx, method=method) -class Fillna(object): +class Fillna: - goal_time = 0.2 - params = ['pad', 'backfill'] - param_names = ['method'] + params = ["pad", "backfill"] + param_names = ["method"] def setup(self, method): N = 100000 - self.idx = date_range('1/1/2000', periods=N, freq='1min') + self.idx = date_range("1/1/2000", periods=N, freq="1min") ts = Series(np.random.randn(N), index=self.idx)[::2] self.ts_reindexed = ts.reindex(self.idx) - self.ts_float32 = self.ts_reindexed.astype('float32') + self.ts_float32 = self.ts_reindexed.astype("float32") def time_reindexed(self, method): self.ts_reindexed.fillna(method=method) @@ -70,20 +66,18 @@ def time_float_32(self, method): self.ts_float32.fillna(method=method) -class LevelAlign(object): - - goal_time = 0.2 - +class LevelAlign: def setup(self): self.index = MultiIndex( levels=[np.arange(10), np.arange(100), np.arange(100)], - labels=[np.arange(10).repeat(10000), - np.tile(np.arange(100).repeat(100), 10), - np.tile(np.tile(np.arange(100), 100), 10)]) - self.df = DataFrame(np.random.randn(len(self.index), 4), - index=self.index) - self.df_level = DataFrame(np.random.randn(100, 4), - index=self.index.levels[1]) + codes=[ + np.arange(10).repeat(10000), + np.tile(np.arange(100).repeat(100), 10), + np.tile(np.tile(np.arange(100), 100), 10), + ], + ) + self.df = DataFrame(np.random.randn(len(self.index), 4), index=self.index) + self.df_level = DataFrame(np.random.randn(100, 4), index=self.index.levels[1]) def time_align_level(self): self.df.align(self.df_level, level=1, copy=False) @@ -92,19 +86,19 @@ def time_reindex_level(self): self.df_level.reindex(self.index, level=1) -class DropDuplicates(object): +class DropDuplicates: - goal_time = 0.2 params = [True, False] - param_names = ['inplace'] + param_names = ["inplace"] def setup(self, inplace): N = 10000 K = 10 key1 = tm.makeStringIndex(N).values.repeat(K) key2 = tm.makeStringIndex(N).values.repeat(K) - self.df = DataFrame({'key1': key1, 'key2': key2, - 'value': np.random.randn(N * K)}) + self.df = DataFrame( + {"key1": key1, "key2": key2, "value": np.random.randn(N * K)} + ) self.df_nan = self.df.copy() self.df_nan.iloc[:10000, :] = np.nan @@ -114,15 +108,14 @@ def setup(self, inplace): N = 1000000 K = 10000 key1 = np.random.randint(0, K, size=N) - self.df_int = DataFrame({'key1': key1}) - self.df_bool = DataFrame(np.random.randint(0, 2, size=(K, 10), - dtype=bool)) + self.df_int = DataFrame({"key1": key1}) + self.df_bool = DataFrame(np.random.randint(0, 2, size=(K, 10), dtype=bool)) def time_frame_drop_dups(self, inplace): - self.df.drop_duplicates(['key1', 'key2'], inplace=inplace) + self.df.drop_duplicates(["key1", "key2"], inplace=inplace) def time_frame_drop_dups_na(self, inplace): - self.df_nan.drop_duplicates(['key1', 'key2'], inplace=inplace) + self.df_nan.drop_duplicates(["key1", "key2"], inplace=inplace) def time_series_drop_dups_int(self, inplace): self.s.drop_duplicates(inplace=inplace) @@ -137,27 +130,23 @@ def time_frame_drop_dups_bool(self, inplace): self.df_bool.drop_duplicates(inplace=inplace) -class Align(object): +class Align: # blog "pandas escaped the zoo" - goal_time = 0.2 - def setup(self): n = 50000 indices = tm.makeStringIndex(n) subsample_size = 40000 self.x = Series(np.random.randn(n), indices) - self.y = Series(np.random.randn(subsample_size), - index=np.random.choice(indices, subsample_size, - replace=False)) + self.y = Series( + np.random.randn(subsample_size), + index=np.random.choice(indices, subsample_size, replace=False), + ) def time_align_series_irregular_string(self): self.x + self.y -class LibFastZip(object): - - goal_time = 0.2 - +class LibFastZip: def setup(self): N = 10000 K = 10 @@ -170,3 +159,6 @@ def setup(self): def time_lib_fast_zip(self): lib.fast_zip(self.col_array_list) + + +from .pandas_vb_common import setup # noqa: F401 isort:skip diff --git a/asv_bench/benchmarks/replace.py b/asv_bench/benchmarks/replace.py index 41208125e8f32..2a115fb0b4fe3 100644 --- a/asv_bench/benchmarks/replace.py +++ b/asv_bench/benchmarks/replace.py @@ -1,18 +1,16 @@ import numpy as np -import pandas as pd -from .pandas_vb_common import setup # noqa +import pandas as pd -class FillNa(object): +class FillNa: - goal_time = 0.2 params = [True, False] - param_names = ['inplace'] + param_names = ["inplace"] def setup(self, inplace): - N = 10**6 - rng = pd.date_range('1/1/2000', periods=N, freq='min') + N = 10 ** 6 + rng = pd.date_range("1/1/2000", periods=N, freq="min") data = np.random.randn(N) data[::2] = np.nan self.ts = pd.Series(data, index=rng) @@ -24,35 +22,56 @@ def time_replace(self, inplace): self.ts.replace(np.nan, 0.0, inplace=inplace) -class ReplaceDict(object): +class ReplaceDict: - goal_time = 0.2 params = [True, False] - param_names = ['inplace'] + param_names = ["inplace"] def setup(self, inplace): - N = 10**5 - start_value = 10**5 + N = 10 ** 5 + start_value = 10 ** 5 self.to_rep = dict(enumerate(np.arange(N) + start_value)) - self.s = pd.Series(np.random.randint(N, size=10**3)) + self.s = pd.Series(np.random.randint(N, size=10 ** 3)) def time_replace_series(self, inplace): self.s.replace(self.to_rep, inplace=inplace) -class Convert(object): +class ReplaceList: + # GH#28099 + + params = [(True, False)] + param_names = ["inplace"] + + def setup(self, inplace): + self.df = pd.DataFrame({"A": 0, "B": 0}, index=range(4 * 10 ** 7)) + + def time_replace_list(self, inplace): + self.df.replace([np.inf, -np.inf], np.nan, inplace=inplace) - goal_time = 0.5 - params = (['DataFrame', 'Series'], ['Timestamp', 'Timedelta']) - param_names = ['constructor', 'replace_data'] + def time_replace_list_one_match(self, inplace): + # the 1 can be held in self._df.blocks[0], while the inf and -inf cant + self.df.replace([np.inf, -np.inf, 1], np.nan, inplace=inplace) + + +class Convert: + + params = (["DataFrame", "Series"], ["Timestamp", "Timedelta"]) + param_names = ["constructor", "replace_data"] def setup(self, constructor, replace_data): - N = 10**3 - data = {'Series': pd.Series(np.random.randint(N, size=N)), - 'DataFrame': pd.DataFrame({'A': np.random.randint(N, size=N), - 'B': np.random.randint(N, size=N)})} + N = 10 ** 3 + data = { + "Series": pd.Series(np.random.randint(N, size=N)), + "DataFrame": pd.DataFrame( + {"A": np.random.randint(N, size=N), "B": np.random.randint(N, size=N)} + ), + } self.to_replace = {i: getattr(pd, replace_data) for i in range(N)} self.data = data[constructor] def time_replace(self, constructor, replace_data): self.data.replace(self.to_replace) + + +from .pandas_vb_common import setup # noqa: F401 isort:skip diff --git a/asv_bench/benchmarks/reshape.py b/asv_bench/benchmarks/reshape.py index 9044b080c45f9..441f4b380656e 100644 --- a/asv_bench/benchmarks/reshape.py +++ b/asv_bench/benchmarks/reshape.py @@ -1,47 +1,40 @@ from itertools import product +import string import numpy as np -from pandas import DataFrame, MultiIndex, date_range, melt, wide_to_long - -from .pandas_vb_common import setup # noqa +import pandas as pd +from pandas import DataFrame, MultiIndex, date_range, melt, wide_to_long -class Melt(object): - - goal_time = 0.2 +class Melt: def setup(self): - self.df = DataFrame(np.random.randn(10000, 3), columns=['A', 'B', 'C']) - self.df['id1'] = np.random.randint(0, 10, 10000) - self.df['id2'] = np.random.randint(100, 1000, 10000) + self.df = DataFrame(np.random.randn(10000, 3), columns=["A", "B", "C"]) + self.df["id1"] = np.random.randint(0, 10, 10000) + self.df["id2"] = np.random.randint(100, 1000, 10000) def time_melt_dataframe(self): - melt(self.df, id_vars=['id1', 'id2']) + melt(self.df, id_vars=["id1", "id2"]) -class Pivot(object): - - goal_time = 0.2 - +class Pivot: def setup(self): N = 10000 - index = date_range('1/1/2000', periods=N, freq='h') - data = {'value': np.random.randn(N * 50), - 'variable': np.arange(50).repeat(N), - 'date': np.tile(index.values, 50)} + index = date_range("1/1/2000", periods=N, freq="h") + data = { + "value": np.random.randn(N * 50), + "variable": np.arange(50).repeat(N), + "date": np.tile(index.values, 50), + } self.df = DataFrame(data) def time_reshape_pivot_time_series(self): - self.df.pivot('date', 'variable', 'value') - + self.df.pivot("date", "variable", "value") -class SimpleReshape(object): - - goal_time = 0.2 +class SimpleReshape: def setup(self): - arrays = [np.arange(100).repeat(100), - np.roll(np.tile(np.arange(100), 100), 25)] + arrays = [np.arange(100).repeat(100), np.roll(np.tile(np.arange(100), 100), 25)] index = MultiIndex.from_arrays(arrays) self.df = DataFrame(np.random.randn(10000, 4), index=index) self.udf = self.df.unstack(1) @@ -53,82 +46,221 @@ def time_unstack(self): self.df.unstack(1) -class Unstack(object): +class Unstack: - goal_time = 0.2 + params = ["int", "category"] - def setup(self): + def setup(self, dtype): m = 100 n = 1000 levels = np.arange(m) index = MultiIndex.from_product([levels] * 2) columns = np.arange(n) - values = np.arange(m * m * n).reshape(m * m, n) + if dtype == "int": + values = np.arange(m * m * n).reshape(m * m, n) + else: + # the category branch is ~20x slower than int. So we + # cut down the size a bit. Now it's only ~3x slower. + n = 50 + columns = columns[:n] + indices = np.random.randint(0, 52, size=(m * m, n)) + values = np.take(list(string.ascii_letters), indices) + values = [pd.Categorical(v) for v in values.T] + self.df = DataFrame(values, index, columns) self.df2 = self.df.iloc[:-1] - def time_full_product(self): + def time_full_product(self, dtype): self.df.unstack() - def time_without_last_row(self): + def time_without_last_row(self, dtype): self.df2.unstack() -class SparseIndex(object): - - goal_time = 0.2 - +class SparseIndex: def setup(self): NUM_ROWS = 1000 - self.df = DataFrame({'A': np.random.randint(50, size=NUM_ROWS), - 'B': np.random.randint(50, size=NUM_ROWS), - 'C': np.random.randint(-10, 10, size=NUM_ROWS), - 'D': np.random.randint(-10, 10, size=NUM_ROWS), - 'E': np.random.randint(10, size=NUM_ROWS), - 'F': np.random.randn(NUM_ROWS)}) - self.df = self.df.set_index(['A', 'B', 'C', 'D', 'E']) + self.df = DataFrame( + { + "A": np.random.randint(50, size=NUM_ROWS), + "B": np.random.randint(50, size=NUM_ROWS), + "C": np.random.randint(-10, 10, size=NUM_ROWS), + "D": np.random.randint(-10, 10, size=NUM_ROWS), + "E": np.random.randint(10, size=NUM_ROWS), + "F": np.random.randn(NUM_ROWS), + } + ) + self.df = self.df.set_index(["A", "B", "C", "D", "E"]) def time_unstack(self): self.df.unstack() -class WideToLong(object): - - goal_time = 0.2 - +class WideToLong: def setup(self): nyrs = 20 nidvars = 20 N = 5000 - self.letters = list('ABCD') - yrvars = [l + str(num) - for l, num in product(self.letters, range(1, nyrs + 1))] + self.letters = list("ABCD") + yrvars = [l + str(num) for l, num in product(self.letters, range(1, nyrs + 1))] columns = [str(i) for i in range(nidvars)] + yrvars - self.df = DataFrame(np.random.randn(N, nidvars + len(yrvars)), - columns=columns) - self.df['id'] = self.df.index + self.df = DataFrame(np.random.randn(N, nidvars + len(yrvars)), columns=columns) + self.df["id"] = self.df.index def time_wide_to_long_big(self): - wide_to_long(self.df, self.letters, i='id', j='year') + wide_to_long(self.df, self.letters, i="id", j="year") -class PivotTable(object): - - goal_time = 0.2 - +class PivotTable: def setup(self): N = 100000 - fac1 = np.array(['A', 'B', 'C'], dtype='O') - fac2 = np.array(['one', 'two'], dtype='O') + fac1 = np.array(["A", "B", "C"], dtype="O") + fac2 = np.array(["one", "two"], dtype="O") ind1 = np.random.randint(0, 3, size=N) ind2 = np.random.randint(0, 2, size=N) - self.df = DataFrame({'key1': fac1.take(ind1), - 'key2': fac2.take(ind2), - 'key3': fac2.take(ind2), - 'value1': np.random.randn(N), - 'value2': np.random.randn(N), - 'value3': np.random.randn(N)}) + self.df = DataFrame( + { + "key1": fac1.take(ind1), + "key2": fac2.take(ind2), + "key3": fac2.take(ind2), + "value1": np.random.randn(N), + "value2": np.random.randn(N), + "value3": np.random.randn(N), + } + ) + self.df2 = DataFrame( + {"col1": list("abcde"), "col2": list("fghij"), "col3": [1, 2, 3, 4, 5]} + ) + self.df2.col1 = self.df2.col1.astype("category") + self.df2.col2 = self.df2.col2.astype("category") def time_pivot_table(self): - self.df.pivot_table(index='key1', columns=['key2', 'key3']) + self.df.pivot_table(index="key1", columns=["key2", "key3"]) + + def time_pivot_table_agg(self): + self.df.pivot_table( + index="key1", columns=["key2", "key3"], aggfunc=["sum", "mean"] + ) + + def time_pivot_table_margins(self): + self.df.pivot_table(index="key1", columns=["key2", "key3"], margins=True) + + def time_pivot_table_categorical(self): + self.df2.pivot_table( + index="col1", values="col3", columns="col2", aggfunc=np.sum, fill_value=0 + ) + + def time_pivot_table_categorical_observed(self): + self.df2.pivot_table( + index="col1", + values="col3", + columns="col2", + aggfunc=np.sum, + fill_value=0, + observed=True, + ) + + +class Crosstab: + def setup(self): + N = 100000 + fac1 = np.array(["A", "B", "C"], dtype="O") + fac2 = np.array(["one", "two"], dtype="O") + self.ind1 = np.random.randint(0, 3, size=N) + self.ind2 = np.random.randint(0, 2, size=N) + self.vec1 = fac1.take(self.ind1) + self.vec2 = fac2.take(self.ind2) + + def time_crosstab(self): + pd.crosstab(self.vec1, self.vec2) + + def time_crosstab_values(self): + pd.crosstab(self.vec1, self.vec2, values=self.ind1, aggfunc="sum") + + def time_crosstab_normalize(self): + pd.crosstab(self.vec1, self.vec2, normalize=True) + + def time_crosstab_normalize_margins(self): + pd.crosstab(self.vec1, self.vec2, normalize=True, margins=True) + + +class GetDummies: + def setup(self): + categories = list(string.ascii_letters[:12]) + s = pd.Series( + np.random.choice(categories, size=1000000), + dtype=pd.api.types.CategoricalDtype(categories), + ) + self.s = s + + def time_get_dummies_1d(self): + pd.get_dummies(self.s, sparse=False) + + def time_get_dummies_1d_sparse(self): + pd.get_dummies(self.s, sparse=True) + + +class Cut: + params = [[4, 10, 1000]] + param_names = ["bins"] + + def setup(self, bins): + N = 10 ** 5 + self.int_series = pd.Series(np.arange(N).repeat(5)) + self.float_series = pd.Series(np.random.randn(N).repeat(5)) + self.timedelta_series = pd.Series( + np.random.randint(N, size=N), dtype="timedelta64[ns]" + ) + self.datetime_series = pd.Series( + np.random.randint(N, size=N), dtype="datetime64[ns]" + ) + self.interval_bins = pd.IntervalIndex.from_breaks(np.linspace(0, N, bins)) + + def time_cut_int(self, bins): + pd.cut(self.int_series, bins) + + def time_cut_float(self, bins): + pd.cut(self.float_series, bins) + + def time_cut_timedelta(self, bins): + pd.cut(self.timedelta_series, bins) + + def time_cut_datetime(self, bins): + pd.cut(self.datetime_series, bins) + + def time_qcut_int(self, bins): + pd.qcut(self.int_series, bins) + + def time_qcut_float(self, bins): + pd.qcut(self.float_series, bins) + + def time_qcut_timedelta(self, bins): + pd.qcut(self.timedelta_series, bins) + + def time_qcut_datetime(self, bins): + pd.qcut(self.datetime_series, bins) + + def time_cut_interval(self, bins): + # GH 27668 + pd.cut(self.int_series, self.interval_bins) + + def peakmem_cut_interval(self, bins): + # GH 27668 + pd.cut(self.int_series, self.interval_bins) + + +class Explode: + param_names = ["n_rows", "max_list_length"] + params = [[100, 1000, 10000], [3, 5, 10]] + + def setup(self, n_rows, max_list_length): + + data = [np.arange(np.random.randint(max_list_length)) for _ in range(n_rows)] + self.series = pd.Series(data) + + def time_explode(self, n_rows, max_list_length): + self.series.explode() + + +from .pandas_vb_common import setup # noqa: F401 isort:skip diff --git a/asv_bench/benchmarks/rolling.py b/asv_bench/benchmarks/rolling.py index 75990d83f8212..b42fa553b495c 100644 --- a/asv_bench/benchmarks/rolling.py +++ b/asv_bench/benchmarks/rolling.py @@ -1,38 +1,85 @@ -import pandas as pd import numpy as np -from .pandas_vb_common import setup # noqa +import pandas as pd -class Methods(object): +class Methods: - sample_time = 0.2 - params = (['DataFrame', 'Series'], - [10, 1000], - ['int', 'float'], - ['median', 'mean', 'max', 'min', 'std', 'count', 'skew', 'kurt', - 'sum']) - param_names = ['contructor', 'window', 'dtype', 'method'] + params = ( + ["DataFrame", "Series"], + [10, 1000], + ["int", "float"], + ["median", "mean", "max", "min", "std", "count", "skew", "kurt", "sum"], + ) + param_names = ["contructor", "window", "dtype", "method"] def setup(self, constructor, window, dtype, method): - N = 10**5 - arr = np.random.random(N).astype(dtype) + N = 10 ** 5 + arr = (100 * np.random.random(N)).astype(dtype) self.roll = getattr(pd, constructor)(arr).rolling(window) def time_rolling(self, constructor, window, dtype, method): getattr(self.roll, method)() + def peakmem_rolling(self, constructor, window, dtype, method): + getattr(self.roll, method)() + + +class ExpandingMethods: + + params = ( + ["DataFrame", "Series"], + ["int", "float"], + ["median", "mean", "max", "min", "std", "count", "skew", "kurt", "sum"], + ) + param_names = ["contructor", "window", "dtype", "method"] + + def setup(self, constructor, dtype, method): + N = 10 ** 5 + arr = (100 * np.random.random(N)).astype(dtype) + self.expanding = getattr(pd, constructor)(arr).expanding() + + def time_expanding(self, constructor, dtype, method): + getattr(self.expanding, method)() + + +class EWMMethods: + + params = (["DataFrame", "Series"], [10, 1000], ["int", "float"], ["mean", "std"]) + param_names = ["contructor", "window", "dtype", "method"] + + def setup(self, constructor, window, dtype, method): + N = 10 ** 5 + arr = (100 * np.random.random(N)).astype(dtype) + self.ewm = getattr(pd, constructor)(arr).ewm(halflife=window) + + def time_ewm(self, constructor, window, dtype, method): + getattr(self.ewm, method)() + + +class VariableWindowMethods(Methods): + params = ( + ["DataFrame", "Series"], + ["50s", "1h", "1d"], + ["int", "float"], + ["median", "mean", "max", "min", "std", "count", "skew", "kurt", "sum"], + ) + param_names = ["contructor", "window", "dtype", "method"] + + def setup(self, constructor, window, dtype, method): + N = 10 ** 5 + arr = (100 * np.random.random(N)).astype(dtype) + index = pd.date_range("2017-01-01", periods=N, freq="5s") + self.roll = getattr(pd, constructor)(arr, index=index).rolling(window) + -class Pairwise(object): +class Pairwise: - sample_time = 0.2 - params = ([10, 1000, None], - ['corr', 'cov'], - [True, False]) - param_names = ['window', 'method', 'pairwise'] + params = ([10, 1000, None], ["corr", "cov"], [True, False]) + param_names = ["window", "method", "pairwise"] def setup(self, window, method, pairwise): - N = 10**4 + N = 10 ** 4 arr = np.random.random(N) self.df = pd.DataFrame(arr) @@ -44,19 +91,38 @@ def time_pairwise(self, window, method, pairwise): getattr(r, method)(self.df, pairwise=pairwise) -class Quantile(object): +class Quantile: + params = ( + ["DataFrame", "Series"], + [10, 1000], + ["int", "float"], + [0, 0.5, 1], + ["linear", "nearest", "lower", "higher", "midpoint"], + ) + param_names = ["constructor", "window", "dtype", "percentile"] - sample_time = 0.2 - params = (['DataFrame', 'Series'], - [10, 1000], - ['int', 'float'], - [0, 0.5, 1]) - param_names = ['constructor', 'window', 'dtype', 'percentile'] - - def setup(self, constructor, window, dtype, percentile): - N = 10**5 + def setup(self, constructor, window, dtype, percentile, interpolation): + N = 10 ** 5 arr = np.random.random(N).astype(dtype) self.roll = getattr(pd, constructor)(arr).rolling(window) - def time_quantile(self, constructor, window, dtype, percentile): - self.roll.quantile(percentile) + def time_quantile(self, constructor, window, dtype, percentile, interpolation): + self.roll.quantile(percentile, interpolation=interpolation) + + +class PeakMemFixed: + def setup(self): + N = 10 + arr = 100 * np.random.random(N) + self.roll = pd.Series(arr).rolling(10) + + def peakmem_fixed(self): + # GH 25926 + # This is to detect memory leaks in rolling operations. + # To save time this is only ran on one method. + # 6000 iterations is enough for most types of leaks to be detected + for x in range(6000): + self.roll.max() + + +from .pandas_vb_common import setup # noqa: F401 isort:skip diff --git a/asv_bench/benchmarks/series_methods.py b/asv_bench/benchmarks/series_methods.py index 478aba278029c..a3f1d92545c3f 100644 --- a/asv_bench/benchmarks/series_methods.py +++ b/asv_bench/benchmarks/series_methods.py @@ -1,22 +1,20 @@ from datetime import datetime import numpy as np -import pandas.util.testing as tm -from pandas import Series, date_range, NaT -from .pandas_vb_common import setup # noqa +from pandas import NaT, Series, date_range +import pandas.util.testing as tm -class SeriesConstructor(object): +class SeriesConstructor: - goal_time = 0.2 - params = [None, 'dict'] - param_names = ['data'] + params = [None, "dict"] + param_names = ["data"] def setup(self, data): - self.idx = date_range(start=datetime(2015, 10, 26), - end=datetime(2016, 1, 1), - freq='50s') + self.idx = date_range( + start=datetime(2015, 10, 26), end=datetime(2016, 1, 1), freq="50s" + ) dict_data = dict(zip(self.idx, range(len(self.idx)))) self.data = None if data is None else dict_data @@ -24,11 +22,10 @@ def time_constructor(self, data): Series(data=self.data, index=self.idx) -class IsIn(object): +class IsIn: - goal_time = 0.2 - params = ['int64', 'object'] - param_names = ['dtype'] + params = ["int64", "uint64", "object"] + param_names = ["dtype"] def setup(self, dtype): self.s = Series(np.random.randint(1, 10, 100000)).astype(dtype) @@ -38,11 +35,66 @@ def time_isin(self, dtypes): self.s.isin(self.values) -class NSort(object): +class IsInFloat64: + def setup(self): + self.small = Series([1, 2], dtype=np.float64) + self.many_different_values = np.arange(10 ** 6, dtype=np.float64) + self.few_different_values = np.zeros(10 ** 7, dtype=np.float64) + self.only_nans_values = np.full(10 ** 7, np.nan, dtype=np.float64) - goal_time = 0.2 - params = ['last', 'first'] - param_names = ['keep'] + def time_isin_many_different(self): + # runtime is dominated by creation of the lookup-table + self.small.isin(self.many_different_values) + + def time_isin_few_different(self): + # runtime is dominated by creation of the lookup-table + self.small.isin(self.few_different_values) + + def time_isin_nan_values(self): + # runtime is dominated by creation of the lookup-table + self.small.isin(self.few_different_values) + + +class IsInForObjects: + def setup(self): + self.s_nans = Series(np.full(10 ** 4, np.nan)).astype(np.object) + self.vals_nans = np.full(10 ** 4, np.nan).astype(np.object) + self.s_short = Series(np.arange(2)).astype(np.object) + self.s_long = Series(np.arange(10 ** 5)).astype(np.object) + self.vals_short = np.arange(2).astype(np.object) + self.vals_long = np.arange(10 ** 5).astype(np.object) + # because of nans floats are special: + self.s_long_floats = Series(np.arange(10 ** 5, dtype=np.float)).astype( + np.object + ) + self.vals_long_floats = np.arange(10 ** 5, dtype=np.float).astype(np.object) + + def time_isin_nans(self): + # if nan-objects are different objects, + # this has the potential to trigger O(n^2) running time + self.s_nans.isin(self.vals_nans) + + def time_isin_short_series_long_values(self): + # running time dominated by the preprocessing + self.s_short.isin(self.vals_long) + + def time_isin_long_series_short_values(self): + # running time dominated by look-up + self.s_long.isin(self.vals_short) + + def time_isin_long_series_long_values(self): + # no dominating part + self.s_long.isin(self.vals_long) + + def time_isin_long_series_long_values_floats(self): + # no dominating part + self.s_long_floats.isin(self.vals_long_floats) + + +class NSort: + + params = ["first", "last", "all"] + param_names = ["keep"] def setup(self, keep): self.s = Series(np.random.randint(1, 10, 100000)) @@ -54,56 +106,95 @@ def time_nsmallest(self, keep): self.s.nsmallest(3, keep=keep) -class Dropna(object): +class Dropna: - goal_time = 0.2 - params = ['int', 'datetime'] - param_names = ['dtype'] + params = ["int", "datetime"] + param_names = ["dtype"] def setup(self, dtype): - N = 10**6 - data = {'int': np.random.randint(1, 10, N), - 'datetime': date_range('2000-01-01', freq='S', periods=N)} + N = 10 ** 6 + data = { + "int": np.random.randint(1, 10, N), + "datetime": date_range("2000-01-01", freq="S", periods=N), + } self.s = Series(data[dtype]) - if dtype == 'datetime': + if dtype == "datetime": self.s[np.random.randint(1, N, 100)] = NaT def time_dropna(self, dtype): self.s.dropna() -class Map(object): +class SearchSorted: goal_time = 0.2 - params = ['dict', 'Series'] - param_names = 'mapper' + params = [ + "int8", + "int16", + "int32", + "int64", + "uint8", + "uint16", + "uint32", + "uint64", + "float16", + "float32", + "float64", + "str", + ] + param_names = ["dtype"] - def setup(self, mapper): - map_size = 1000 - map_data = Series(map_size - np.arange(map_size)) - self.map_data = map_data if mapper == 'Series' else map_data.to_dict() - self.s = Series(np.random.randint(0, map_size, 10000)) + def setup(self, dtype): + N = 10 ** 5 + data = np.array([1] * N + [2] * N + [3] * N).astype(dtype) + self.s = Series(data) - def time_map(self, mapper): - self.s.map(self.map_data) + def time_searchsorted(self, dtype): + key = "2" if dtype == "str" else 2 + self.s.searchsorted(key) -class Clip(object): +class Map: - goal_time = 0.2 + params = (["dict", "Series", "lambda"], ["object", "category", "int"]) + param_names = "mapper" + + def setup(self, mapper, dtype): + map_size = 1000 + map_data = Series(map_size - np.arange(map_size), dtype=dtype) + + # construct mapper + if mapper == "Series": + self.map_data = map_data + elif mapper == "dict": + self.map_data = map_data.to_dict() + elif mapper == "lambda": + map_dict = map_data.to_dict() + self.map_data = lambda x: map_dict[x] + else: + raise NotImplementedError + + self.s = Series(np.random.randint(0, map_size, 10000), dtype=dtype) + + def time_map(self, mapper, *args, **kwargs): + self.s.map(self.map_data) - def setup(self): - self.s = Series(np.random.randn(50)) - def time_clip(self): +class Clip: + params = [50, 1000, 10 ** 5] + param_names = ["n"] + + def setup(self, n): + self.s = Series(np.random.randn(n)) + + def time_clip(self, n): self.s.clip(0, 1) -class ValueCounts(object): +class ValueCounts: - goal_time = 0.2 - params = ['int', 'float', 'object'] - param_names = ['dtype'] + params = ["int", "uint", "float", "object"] + param_names = ["dtype"] def setup(self, dtype): self.s = Series(np.random.randint(0, 1000, size=100000)).astype(dtype) @@ -112,12 +203,77 @@ def time_value_counts(self, dtype): self.s.value_counts() -class Dir(object): - - goal_time = 0.2 - +class Dir: def setup(self): self.s = Series(index=tm.makeStringIndex(10000)) def time_dir_strings(self): dir(self.s) + + +class SeriesGetattr: + # https://github.com/pandas-dev/pandas/issues/19764 + def setup(self): + self.s = Series(1, index=date_range("2012-01-01", freq="s", periods=int(1e6))) + + def time_series_datetimeindex_repr(self): + getattr(self.s, "a", None) + + +class All: + + params = [[10 ** 3, 10 ** 6], ["fast", "slow"]] + param_names = ["N", "case"] + + def setup(self, N, case): + val = case != "fast" + self.s = Series([val] * N) + + def time_all(self, N, case): + self.s.all() + + +class Any: + + params = [[10 ** 3, 10 ** 6], ["fast", "slow"]] + param_names = ["N", "case"] + + def setup(self, N, case): + val = case == "fast" + self.s = Series([val] * N) + + def time_any(self, N, case): + self.s.any() + + +class NanOps: + + params = [ + [ + "var", + "mean", + "median", + "max", + "min", + "sum", + "std", + "sem", + "argmax", + "skew", + "kurt", + "prod", + ], + [10 ** 3, 10 ** 6], + ["int8", "int32", "int64", "float64"], + ] + param_names = ["func", "N", "dtype"] + + def setup(self, func, N, dtype): + self.s = Series([1] * N, dtype=dtype) + self.func = getattr(self.s, func) + + def time_func(self, func, N, dtype): + self.func() + + +from .pandas_vb_common import setup # noqa: F401 isort:skip diff --git a/asv_bench/benchmarks/sparse.py b/asv_bench/benchmarks/sparse.py index dcb7694abc2ad..ac78ca53679fd 100644 --- a/asv_bench/benchmarks/sparse.py +++ b/asv_bench/benchmarks/sparse.py @@ -1,11 +1,8 @@ -import itertools - import numpy as np import scipy.sparse -from pandas import (SparseSeries, SparseDataFrame, SparseArray, Series, - date_range, MultiIndex) -from .pandas_vb_common import setup # noqa +import pandas as pd +from pandas import MultiIndex, Series, SparseArray, date_range def make_array(size, dense_proportion, fill_value, dtype): @@ -16,99 +13,75 @@ def make_array(size, dense_proportion, fill_value, dtype): return arr -class SparseSeriesToFrame(object): - - goal_time = 0.2 - +class SparseSeriesToFrame: def setup(self): K = 50 N = 50001 - rng = date_range('1/1/2000', periods=N, freq='T') + rng = date_range("1/1/2000", periods=N, freq="T") self.series = {} for i in range(1, K): data = np.random.randn(N)[:-i] idx = rng[:-i] data[100:] = np.nan - self.series[i] = SparseSeries(data, index=idx) + self.series[i] = pd.Series(pd.SparseArray(data), index=idx) def time_series_to_frame(self): - SparseDataFrame(self.series) + pd.DataFrame(self.series) -class SparseArrayConstructor(object): +class SparseArrayConstructor: - goal_time = 0.2 - params = ([0.1, 0.01], [0, np.nan], - [np.int64, np.float64, np.object]) - param_names = ['dense_proportion', 'fill_value', 'dtype'] + params = ([0.1, 0.01], [0, np.nan], [np.int64, np.float64, np.object]) + param_names = ["dense_proportion", "fill_value", "dtype"] def setup(self, dense_proportion, fill_value, dtype): - N = 10**6 + N = 10 ** 6 self.array = make_array(N, dense_proportion, fill_value, dtype) def time_sparse_array(self, dense_proportion, fill_value, dtype): SparseArray(self.array, fill_value=fill_value, dtype=dtype) -class SparseDataFrameConstructor(object): - - goal_time = 0.2 - +class SparseDataFrameConstructor: def setup(self): N = 1000 self.arr = np.arange(N) self.sparse = scipy.sparse.rand(N, N, 0.005) - self.dict = dict(zip(range(N), itertools.repeat([0]))) - - def time_constructor(self): - SparseDataFrame(columns=self.arr, index=self.arr) def time_from_scipy(self): - SparseDataFrame(self.sparse) - - def time_from_dict(self): - SparseDataFrame(self.dict) + pd.DataFrame.sparse.from_spmatrix(self.sparse) -class FromCoo(object): - - goal_time = 0.2 - +class FromCoo: def setup(self): - self.matrix = scipy.sparse.coo_matrix(([3.0, 1.0, 2.0], - ([1, 0, 0], [0, 2, 3])), - shape=(100, 100)) + self.matrix = scipy.sparse.coo_matrix( + ([3.0, 1.0, 2.0], ([1, 0, 0], [0, 2, 3])), shape=(100, 100) + ) def time_sparse_series_from_coo(self): - SparseSeries.from_coo(self.matrix) - + pd.Series.sparse.from_coo(self.matrix) -class ToCoo(object): - - goal_time = 0.2 +class ToCoo: def setup(self): s = Series([np.nan] * 10000) s[0] = 3.0 s[100] = -1.0 s[999] = 12.1 s.index = MultiIndex.from_product([range(10)] * 4) - self.ss = s.to_sparse() + self.ss = s.astype("Sparse") def time_sparse_series_to_coo(self): - self.ss.to_coo(row_levels=[0, 1], - column_levels=[2, 3], - sort_labels=True) + self.ss.sparse.to_coo(row_levels=[0, 1], column_levels=[2, 3], sort_labels=True) -class Arithmetic(object): +class Arithmetic: - goal_time = 0.2 params = ([0.1, 0.01], [0, np.nan]) - param_names = ['dense_proportion', 'fill_value'] + param_names = ["dense_proportion", "fill_value"] def setup(self, dense_proportion, fill_value): - N = 10**6 + N = 10 ** 6 arr1 = make_array(N, dense_proportion, fill_value, np.int64) self.array1 = SparseArray(arr1, fill_value=fill_value) arr2 = make_array(N, dense_proportion, fill_value, np.int64) @@ -127,26 +100,27 @@ def time_divide(self, dense_proportion, fill_value): self.array1 / self.array2 -class ArithmeticBlock(object): +class ArithmeticBlock: - goal_time = 0.2 params = [np.nan, 0] - param_names = ['fill_value'] + param_names = ["fill_value"] def setup(self, fill_value): - N = 10**6 - self.arr1 = self.make_block_array(length=N, num_blocks=1000, - block_size=10, fill_value=fill_value) - self.arr2 = self.make_block_array(length=N, num_blocks=1000, - block_size=10, fill_value=fill_value) + N = 10 ** 6 + self.arr1 = self.make_block_array( + length=N, num_blocks=1000, block_size=10, fill_value=fill_value + ) + self.arr2 = self.make_block_array( + length=N, num_blocks=1000, block_size=10, fill_value=fill_value + ) def make_block_array(self, length, num_blocks, block_size, fill_value): arr = np.full(length, fill_value) - indicies = np.random.choice(np.arange(0, length, block_size), - num_blocks, - replace=False) + indicies = np.random.choice( + np.arange(0, length, block_size), num_blocks, replace=False + ) for ind in indicies: - arr[ind:ind + block_size] = np.random.randint(0, 100, block_size) + arr[ind : ind + block_size] = np.random.randint(0, 100, block_size) return SparseArray(arr, fill_value=fill_value) def time_make_union(self, fill_value): @@ -160,3 +134,6 @@ def time_addition(self, fill_value): def time_division(self, fill_value): self.arr1 / self.arr2 + + +from .pandas_vb_common import setup # noqa: F401 isort:skip diff --git a/asv_bench/benchmarks/stat_ops.py b/asv_bench/benchmarks/stat_ops.py index c447c78d0d070..ed5ebfa61594e 100644 --- a/asv_bench/benchmarks/stat_ops.py +++ b/asv_bench/benchmarks/stat_ops.py @@ -1,25 +1,22 @@ import numpy as np -import pandas as pd - -from .pandas_vb_common import setup # noqa +import pandas as pd -ops = ['mean', 'sum', 'median', 'std', 'skew', 'kurt', 'mad', 'prod', 'sem', - 'var'] +ops = ["mean", "sum", "median", "std", "skew", "kurt", "mad", "prod", "sem", "var"] -class FrameOps(object): +class FrameOps: - goal_time = 0.2 - params = [ops, ['float', 'int'], [0, 1], [True, False]] - param_names = ['op', 'dtype', 'axis', 'use_bottleneck'] + params = [ops, ["float", "int"], [0, 1], [True, False]] + param_names = ["op", "dtype", "axis", "use_bottleneck"] def setup(self, op, dtype, axis, use_bottleneck): df = pd.DataFrame(np.random.randn(100000, 4)).astype(dtype) try: pd.options.compute.use_bottleneck = use_bottleneck - except: + except TypeError: from pandas.core import nanops + nanops._USE_BOTTLENECK = use_bottleneck self.df_func = getattr(df, op) @@ -27,18 +24,19 @@ def time_op(self, op, dtype, axis, use_bottleneck): self.df_func(axis=axis) -class FrameMultiIndexOps(object): +class FrameMultiIndexOps: - goal_time = 0.2 params = ([0, 1, [0, 1]], ops) - param_names = ['level', 'op'] + param_names = ["level", "op"] def setup(self, level, op): levels = [np.arange(10), np.arange(100), np.arange(100)] - labels = [np.arange(10).repeat(10000), - np.tile(np.arange(100).repeat(100), 10), - np.tile(np.tile(np.arange(100), 100), 10)] - index = pd.MultiIndex(levels=levels, labels=labels) + codes = [ + np.arange(10).repeat(10000), + np.tile(np.arange(100).repeat(100), 10), + np.tile(np.tile(np.arange(100), 100), 10), + ] + index = pd.MultiIndex(levels=levels, codes=codes) df = pd.DataFrame(np.random.randn(len(index), 4), index=index) self.df_func = getattr(df, op) @@ -46,18 +44,18 @@ def time_op(self, level, op): self.df_func(level=level) -class SeriesOps(object): +class SeriesOps: - goal_time = 0.2 - params = [ops, ['float', 'int'], [True, False]] - param_names = ['op', 'dtype', 'use_bottleneck'] + params = [ops, ["float", "int"], [True, False]] + param_names = ["op", "dtype", "use_bottleneck"] def setup(self, op, dtype, use_bottleneck): s = pd.Series(np.random.randn(100000)).astype(dtype) try: pd.options.compute.use_bottleneck = use_bottleneck - except: + except TypeError: from pandas.core import nanops + nanops._USE_BOTTLENECK = use_bottleneck self.s_func = getattr(s, op) @@ -65,18 +63,19 @@ def time_op(self, op, dtype, use_bottleneck): self.s_func() -class SeriesMultiIndexOps(object): +class SeriesMultiIndexOps: - goal_time = 0.2 params = ([0, 1, [0, 1]], ops) - param_names = ['level', 'op'] + param_names = ["level", "op"] def setup(self, level, op): levels = [np.arange(10), np.arange(100), np.arange(100)] - labels = [np.arange(10).repeat(10000), - np.tile(np.arange(100).repeat(100), 10), - np.tile(np.tile(np.arange(100), 100), 10)] - index = pd.MultiIndex(levels=levels, labels=labels) + codes = [ + np.arange(10).repeat(10000), + np.tile(np.arange(100).repeat(100), 10), + np.tile(np.tile(np.arange(100), 100), 10), + ] + index = pd.MultiIndex(levels=levels, codes=codes) s = pd.Series(np.random.randn(len(index)), index=index) self.s_func = getattr(s, op) @@ -84,14 +83,13 @@ def time_op(self, level, op): self.s_func(level=level) -class Rank(object): +class Rank: - goal_time = 0.2 - params = [['DataFrame', 'Series'], [True, False]] - param_names = ['constructor', 'pct'] + params = [["DataFrame", "Series"], [True, False]] + param_names = ["constructor", "pct"] def setup(self, constructor, pct): - values = np.random.randn(10**5) + values = np.random.randn(10 ** 5) self.data = getattr(pd, constructor)(values) def time_rank(self, constructor, pct): @@ -101,14 +99,64 @@ def time_average_old(self, constructor, pct): self.data.rank(pct=pct) / len(self.data) -class Correlation(object): +class Correlation: + + params = [["spearman", "kendall", "pearson"], [True, False]] + param_names = ["method", "use_bottleneck"] - goal_time = 0.2 - params = ['spearman', 'kendall', 'pearson'] - param_names = ['method'] + def setup(self, method, use_bottleneck): + try: + pd.options.compute.use_bottleneck = use_bottleneck + except TypeError: + from pandas.core import nanops - def setup(self, method): + nanops._USE_BOTTLENECK = use_bottleneck self.df = pd.DataFrame(np.random.randn(1000, 30)) + self.df2 = pd.DataFrame(np.random.randn(1000, 30)) + self.df_wide = pd.DataFrame(np.random.randn(1000, 200)) + self.df_wide_nans = self.df_wide.where(np.random.random((1000, 200)) < 0.9) + self.s = pd.Series(np.random.randn(1000)) + self.s2 = pd.Series(np.random.randn(1000)) - def time_corr(self, method): + def time_corr(self, method, use_bottleneck): self.df.corr(method=method) + + def time_corr_wide(self, method, use_bottleneck): + self.df_wide.corr(method=method) + + def time_corr_wide_nans(self, method, use_bottleneck): + self.df_wide_nans.corr(method=method) + + def peakmem_corr_wide(self, method, use_bottleneck): + self.df_wide.corr(method=method) + + def time_corr_series(self, method, use_bottleneck): + self.s.corr(self.s2, method=method) + + def time_corrwith_cols(self, method, use_bottleneck): + self.df.corrwith(self.df2, method=method) + + def time_corrwith_rows(self, method, use_bottleneck): + self.df.corrwith(self.df2, axis=1, method=method) + + +class Covariance: + + params = [[True, False]] + param_names = ["use_bottleneck"] + + def setup(self, use_bottleneck): + try: + pd.options.compute.use_bottleneck = use_bottleneck + except TypeError: + from pandas.core import nanops + + nanops._USE_BOTTLENECK = use_bottleneck + self.s = pd.Series(np.random.randn(100000)) + self.s2 = pd.Series(np.random.randn(100000)) + + def time_cov_series(self, use_bottleneck): + self.s.cov(self.s2) + + +from .pandas_vb_common import setup # noqa: F401 isort:skip diff --git a/asv_bench/benchmarks/strings.py b/asv_bench/benchmarks/strings.py index b203c8b0fa5c9..f30b2482615bd 100644 --- a/asv_bench/benchmarks/strings.py +++ b/asv_bench/benchmarks/strings.py @@ -1,35 +1,36 @@ import warnings import numpy as np -from pandas import Series -import pandas.util.testing as tm - -class Methods(object): +from pandas import DataFrame, Series +import pandas.util.testing as tm - goal_time = 0.2 +class Methods: def setup(self): - self.s = Series(tm.makeStringIndex(10**5)) - - def time_cat(self): - self.s.str.cat(sep=',') + self.s = Series(tm.makeStringIndex(10 ** 5)) def time_center(self): self.s.str.center(100) def time_count(self): - self.s.str.count('A') + self.s.str.count("A") def time_endswith(self): - self.s.str.endswith('A') + self.s.str.endswith("A") def time_extract(self): with warnings.catch_warnings(record=True): - self.s.str.extract('(\\w*)A(\\w*)') + self.s.str.extract("(\\w*)A(\\w*)") def time_findall(self): - self.s.str.findall('[A-Z]+') + self.s.str.findall("[A-Z]+") + + def time_find(self): + self.s.str.find("[A-Z]+") + + def time_rfind(self): + self.s.str.rfind("[A-Z]+") def time_get(self): self.s.str.get(0) @@ -37,29 +38,44 @@ def time_get(self): def time_len(self): self.s.str.len() + def time_join(self): + self.s.str.join(" ") + def time_match(self): - self.s.str.match('A') + self.s.str.match("A") + + def time_normalize(self): + self.s.str.normalize("NFC") def time_pad(self): - self.s.str.pad(100, side='both') + self.s.str.pad(100, side="both") + + def time_partition(self): + self.s.str.partition("A") + + def time_rpartition(self): + self.s.str.rpartition("A") def time_replace(self): - self.s.str.replace('A', '\x01\x01') + self.s.str.replace("A", "\x01\x01") + + def time_translate(self): + self.s.str.translate({"A": "\x01\x01"}) def time_slice(self): self.s.str.slice(5, 15, 2) def time_startswith(self): - self.s.str.startswith('A') + self.s.str.startswith("A") def time_strip(self): - self.s.str.strip('A') + self.s.str.strip("A") def time_rstrip(self): - self.s.str.rstrip('A') + self.s.str.rstrip("A") def time_lstrip(self): - self.s.str.lstrip('A') + self.s.str.lstrip("A") def time_title(self): self.s.str.title() @@ -70,77 +86,99 @@ def time_upper(self): def time_lower(self): self.s.str.lower() + def time_wrap(self): + self.s.str.wrap(10) + + def time_zfill(self): + self.s.str.zfill(10) -class Repeat(object): - goal_time = 0.2 - params = ['int', 'array'] - param_names = ['repeats'] +class Repeat: + + params = ["int", "array"] + param_names = ["repeats"] def setup(self, repeats): - N = 10**5 + N = 10 ** 5 self.s = Series(tm.makeStringIndex(N)) - repeat = {'int': 1, 'array': np.random.randint(1, 3, N)} - self.repeat = repeat[repeats] + repeat = {"int": 1, "array": np.random.randint(1, 3, N)} + self.values = repeat[repeats] def time_repeat(self, repeats): - self.s.str.repeat(self.repeat) + self.s.str.repeat(self.values) + + +class Cat: + + params = ([0, 3], [None, ","], [None, "-"], [0.0, 0.001, 0.15]) + param_names = ["other_cols", "sep", "na_rep", "na_frac"] + + def setup(self, other_cols, sep, na_rep, na_frac): + N = 10 ** 5 + mask_gen = lambda: np.random.choice([True, False], N, p=[1 - na_frac, na_frac]) + self.s = Series(tm.makeStringIndex(N)).where(mask_gen()) + if other_cols == 0: + # str.cat self-concatenates only for others=None + self.others = None + else: + self.others = DataFrame( + {i: tm.makeStringIndex(N).where(mask_gen()) for i in range(other_cols)} + ) + + def time_cat(self, other_cols, sep, na_rep, na_frac): + # before the concatenation (one caller + other_cols columns), the total + # expected fraction of rows containing any NaN is: + # reduce(lambda t, _: t + (1 - t) * na_frac, range(other_cols + 1), 0) + # for other_cols=3 and na_frac=0.15, this works out to ~48% + self.s.str.cat(others=self.others, sep=sep, na_rep=na_rep) -class Contains(object): +class Contains: - goal_time = 0.2 params = [True, False] - param_names = ['regex'] + param_names = ["regex"] def setup(self, regex): - self.s = Series(tm.makeStringIndex(10**5)) + self.s = Series(tm.makeStringIndex(10 ** 5)) def time_contains(self, regex): - self.s.str.contains('A', regex=regex) + self.s.str.contains("A", regex=regex) -class Split(object): +class Split: - goal_time = 0.2 params = [True, False] - param_names = ['expand'] + param_names = ["expand"] def setup(self, expand): - self.s = Series(tm.makeStringIndex(10**5)).str.join('--') + self.s = Series(tm.makeStringIndex(10 ** 5)).str.join("--") def time_split(self, expand): - self.s.str.split('--', expand=expand) + self.s.str.split("--", expand=expand) + def time_rsplit(self, expand): + self.s.str.rsplit("--", expand=expand) -class Dummies(object): - - goal_time = 0.2 +class Dummies: def setup(self): - self.s = Series(tm.makeStringIndex(10**5)).str.join('|') + self.s = Series(tm.makeStringIndex(10 ** 5)).str.join("|") def time_get_dummies(self): - self.s.str.get_dummies('|') - - -class Encode(object): + self.s.str.get_dummies("|") - goal_time = 0.2 +class Encode: def setup(self): self.ser = Series(tm.makeUnicodeIndex()) def time_encode_decode(self): - self.ser.str.encode('utf-8').str.decode('utf-8') - - -class Slice(object): + self.ser.str.encode("utf-8").str.decode("utf-8") - goal_time = 0.2 +class Slice: def setup(self): - self.s = Series(['abcdefg', np.nan] * 500000) + self.s = Series(["abcdefg", np.nan] * 500000) def time_vector_slice(self): # GH 2602 diff --git a/asv_bench/benchmarks/timedelta.py b/asv_bench/benchmarks/timedelta.py index 3fe75b3c34299..36a9db529f98f 100644 --- a/asv_bench/benchmarks/timedelta.py +++ b/asv_bench/benchmarks/timedelta.py @@ -1,53 +1,62 @@ import datetime import numpy as np -from pandas import Series, timedelta_range, to_timedelta, Timestamp, Timedelta +from pandas import ( + DataFrame, + Series, + Timedelta, + Timestamp, + timedelta_range, + to_timedelta, +) -class TimedeltaConstructor(object): - - goal_time = 0.2 +class TimedeltaConstructor: def time_from_int(self): Timedelta(123456789) def time_from_unit(self): - Timedelta(1, unit='d') + Timedelta(1, unit="d") def time_from_components(self): - Timedelta(days=1, hours=2, minutes=3, seconds=4, milliseconds=5, - microseconds=6, nanoseconds=7) + Timedelta( + days=1, + hours=2, + minutes=3, + seconds=4, + milliseconds=5, + microseconds=6, + nanoseconds=7, + ) def time_from_datetime_timedelta(self): Timedelta(datetime.timedelta(days=1, seconds=1)) def time_from_np_timedelta(self): - Timedelta(np.timedelta64(1, 'ms')) + Timedelta(np.timedelta64(1, "ms")) def time_from_string(self): - Timedelta('1 days') + Timedelta("1 days") def time_from_iso_format(self): - Timedelta('P4DT12H30M5S') + Timedelta("P4DT12H30M5S") def time_from_missing(self): - Timedelta('nat') - - -class ToTimedelta(object): + Timedelta("nat") - goal_time = 0.2 +class ToTimedelta: def setup(self): self.ints = np.random.randint(0, 60, size=10000) self.str_days = [] self.str_seconds = [] for i in self.ints: - self.str_days.append('{0} days'.format(i)) - self.str_seconds.append('00:00:{0:02d}'.format(i)) + self.str_days.append("{0} days".format(i)) + self.str_seconds.append("00:00:{0:02d}".format(i)) def time_convert_int(self): - to_timedelta(self.ints, unit='s') + to_timedelta(self.ints, unit="s") def time_convert_string_days(self): to_timedelta(self.str_days) @@ -56,37 +65,30 @@ def time_convert_string_seconds(self): to_timedelta(self.str_seconds) -class ToTimedeltaErrors(object): +class ToTimedeltaErrors: - goal_time = 0.2 - params = ['coerce', 'ignore'] - param_names = ['errors'] + params = ["coerce", "ignore"] + param_names = ["errors"] def setup(self, errors): ints = np.random.randint(0, 60, size=10000) - self.arr = ['{0} days'.format(i) for i in ints] - self.arr[-1] = 'apple' + self.arr = ["{0} days".format(i) for i in ints] + self.arr[-1] = "apple" def time_convert(self, errors): to_timedelta(self.arr, errors=errors) -class TimedeltaOps(object): - - goal_time = 0.2 - +class TimedeltaOps: def setup(self): self.td = to_timedelta(np.arange(1000000)) - self.ts = Timestamp('2000') + self.ts = Timestamp("2000") def time_add_td_ts(self): self.td + self.ts -class TimedeltaProperties(object): - - goal_time = 0.2 - +class TimedeltaProperties: def setup_cache(self): td = Timedelta(days=365, minutes=35, seconds=25, milliseconds=35) return td @@ -104,13 +106,10 @@ def time_timedelta_nanoseconds(self, td): td.nanoseconds -class DatetimeAccessor(object): - - goal_time = 0.2 - +class DatetimeAccessor: def setup_cache(self): N = 100000 - series = Series(timedelta_range('1 days', periods=N, freq='h')) + series = Series(timedelta_range("1 days", periods=N, freq="h")) return series def time_dt_accessor(self, series): @@ -127,3 +126,35 @@ def time_timedelta_microseconds(self, series): def time_timedelta_nanoseconds(self, series): series.dt.nanoseconds + + +class TimedeltaIndexing: + def setup(self): + self.index = timedelta_range(start="1985", periods=1000, freq="D") + self.index2 = timedelta_range(start="1986", periods=1000, freq="D") + self.series = Series(range(1000), index=self.index) + self.timedelta = self.index[500] + + def time_get_loc(self): + self.index.get_loc(self.timedelta) + + def time_shape(self): + self.index.shape + + def time_shallow_copy(self): + self.index._shallow_copy() + + def time_series_loc(self): + self.series.loc[self.timedelta] + + def time_align(self): + DataFrame({"a": self.series, "b": self.series[:500]}) + + def time_intersection(self): + self.index.intersection(self.index2) + + def time_union(self): + self.index.union(self.index2) + + def time_unique(self): + self.index.unique() diff --git a/asv_bench/benchmarks/timeseries.py b/asv_bench/benchmarks/timeseries.py index e1a6bc7a68e9d..498774034d642 100644 --- a/asv_bench/benchmarks/timeseries.py +++ b/asv_bench/benchmarks/timeseries.py @@ -1,37 +1,36 @@ -import warnings from datetime import timedelta +import dateutil import numpy as np -from pandas import to_datetime, date_range, Series, DataFrame, period_range + +from pandas import DataFrame, Series, date_range, period_range, to_datetime + from pandas.tseries.frequencies import infer_freq + try: - from pandas.plotting._converter import DatetimeConverter + from pandas.plotting._matplotlib.converter import DatetimeConverter except ImportError: from pandas.tseries.converter import DatetimeConverter -from .pandas_vb_common import setup # noqa +class DatetimeIndex: -class DatetimeIndex(object): - - goal_time = 0.2 - params = ['dst', 'repeated', 'tz_aware', 'tz_naive'] - param_names = ['index_type'] + params = ["dst", "repeated", "tz_aware", "tz_local", "tz_naive"] + param_names = ["index_type"] def setup(self, index_type): N = 100000 - dtidxes = {'dst': date_range(start='10/29/2000 1:00:00', - end='10/29/2000 1:59:59', freq='S'), - 'repeated': date_range(start='2000', - periods=N / 10, - freq='s').repeat(10), - 'tz_aware': date_range(start='2000', - periods=N, - freq='s', - tz='US/Eastern'), - 'tz_naive': date_range(start='2000', - periods=N, - freq='s')} + dtidxes = { + "dst": date_range( + start="10/29/2000 1:00:00", end="10/29/2000 1:59:59", freq="S" + ), + "repeated": date_range(start="2000", periods=N / 10, freq="s").repeat(10), + "tz_aware": date_range(start="2000", periods=N, freq="s", tz="US/Eastern"), + "tz_local": date_range( + start="2000", periods=N, freq="s", tz=dateutil.tz.tzlocal() + ), + "tz_naive": date_range(start="2000", periods=N, freq="s"), + } self.index = dtidxes[index_type] def time_add_timedelta(self, index_type): @@ -59,93 +58,86 @@ def time_to_pydatetime(self, index_type): self.index.to_pydatetime() -class TzLocalize(object): +class TzLocalize: - goal_time = 0.2 + params = [None, "US/Eastern", "UTC", dateutil.tz.tzutc()] + param_names = "tz" - def setup(self): - dst_rng = date_range(start='10/29/2000 1:00:00', - end='10/29/2000 1:59:59', freq='S') - self.index = date_range(start='10/29/2000', - end='10/29/2000 00:59:59', freq='S') + def setup(self, tz): + dst_rng = date_range( + start="10/29/2000 1:00:00", end="10/29/2000 1:59:59", freq="S" + ) + self.index = date_range(start="10/29/2000", end="10/29/2000 00:59:59", freq="S") self.index = self.index.append(dst_rng) self.index = self.index.append(dst_rng) - self.index = self.index.append(date_range(start='10/29/2000 2:00:00', - end='10/29/2000 3:00:00', - freq='S')) + self.index = self.index.append( + date_range(start="10/29/2000 2:00:00", end="10/29/2000 3:00:00", freq="S") + ) - def time_infer_dst(self): - with warnings.catch_warnings(record=True): - self.index.tz_localize('US/Eastern', infer_dst=True) + def time_infer_dst(self, tz): + self.index.tz_localize(tz, ambiguous="infer") -class ResetIndex(object): +class ResetIndex: - goal_time = 0.2 - params = [None, 'US/Eastern'] - param_names = 'tz' + params = [None, "US/Eastern"] + param_names = "tz" def setup(self, tz): - idx = date_range(start='1/1/2000', periods=1000, freq='H', tz=tz) + idx = date_range(start="1/1/2000", periods=1000, freq="H", tz=tz) self.df = DataFrame(np.random.randn(1000, 2), index=idx) def time_reest_datetimeindex(self, tz): self.df.reset_index() -class Factorize(object): +class Factorize: - goal_time = 0.2 - params = [None, 'Asia/Tokyo'] - param_names = 'tz' + params = [None, "Asia/Tokyo"] + param_names = "tz" def setup(self, tz): N = 100000 - self.dti = date_range('2011-01-01', freq='H', periods=N, tz=tz) + self.dti = date_range("2011-01-01", freq="H", periods=N, tz=tz) self.dti = self.dti.repeat(5) def time_factorize(self, tz): self.dti.factorize() -class InferFreq(object): +class InferFreq: - goal_time = 0.2 - params = [None, 'D', 'B'] - param_names = ['freq'] + params = [None, "D", "B"] + param_names = ["freq"] def setup(self, freq): if freq is None: - self.idx = date_range(start='1/1/1700', freq='D', periods=10000) + self.idx = date_range(start="1/1/1700", freq="D", periods=10000) self.idx.freq = None else: - self.idx = date_range(start='1/1/1700', freq=freq, periods=10000) + self.idx = date_range(start="1/1/1700", freq=freq, periods=10000) def time_infer_freq(self, freq): infer_freq(self.idx) -class TimeDatetimeConverter(object): - - goal_time = 0.2 - +class TimeDatetimeConverter: def setup(self): N = 100000 - self.rng = date_range(start='1/1/2000', periods=N, freq='T') + self.rng = date_range(start="1/1/2000", periods=N, freq="T") def time_convert(self): DatetimeConverter.convert(self.rng, None, None) -class Iteration(object): +class Iteration: - goal_time = 0.2 params = [date_range, period_range] - param_names = ['time_index'] + param_names = ["time_index"] def setup(self, time_index): - N = 10**6 - self.idx = time_index(start='20140101', freq='T', periods=N) + N = 10 ** 6 + self.idx = time_index(start="20140101", freq="T", periods=N) self.exit = 10000 def time_iter(self, time_index): @@ -158,34 +150,30 @@ def time_iter_preexit(self, time_index): break -class ResampleDataFrame(object): +class ResampleDataFrame: - goal_time = 0.2 - params = ['max', 'mean', 'min'] - param_names = ['method'] + params = ["max", "mean", "min"] + param_names = ["method"] def setup(self, method): - rng = date_range(start='20130101', periods=100000, freq='50L') + rng = date_range(start="20130101", periods=100000, freq="50L") df = DataFrame(np.random.randn(100000, 2), index=rng) - self.resample = getattr(df.resample('1s'), method) + self.resample = getattr(df.resample("1s"), method) def time_method(self, method): self.resample() -class ResampleSeries(object): +class ResampleSeries: - goal_time = 0.2 - params = (['period', 'datetime'], ['5min', '1D'], ['mean', 'ohlc']) - param_names = ['index', 'freq', 'method'] + params = (["period", "datetime"], ["5min", "1D"], ["mean", "ohlc"]) + param_names = ["index", "freq", "method"] def setup(self, index, freq, method): - indexes = {'period': period_range(start='1/1/2000', - end='1/1/2001', - freq='T'), - 'datetime': date_range(start='1/1/2000', - end='1/1/2001', - freq='T')} + indexes = { + "period": period_range(start="1/1/2000", end="1/1/2001", freq="T"), + "datetime": date_range(start="1/1/2000", end="1/1/2001", freq="T"), + } idx = indexes[index] ts = Series(np.random.randn(len(idx)), index=idx) self.resample = getattr(ts.resample(freq), method) @@ -194,38 +182,38 @@ def time_resample(self, index, freq, method): self.resample() -class ResampleDatetetime64(object): +class ResampleDatetetime64: # GH 7754 - goal_time = 0.2 - def setup(self): - rng3 = date_range(start='2000-01-01 00:00:00', - end='2000-01-01 10:00:00', freq='555000U') - self.dt_ts = Series(5, rng3, dtype='datetime64[ns]') + rng3 = date_range( + start="2000-01-01 00:00:00", end="2000-01-01 10:00:00", freq="555000U" + ) + self.dt_ts = Series(5, rng3, dtype="datetime64[ns]") def time_resample(self): - self.dt_ts.resample('1S').last() + self.dt_ts.resample("1S").last() -class AsOf(object): +class AsOf: - goal_time = 0.2 - params = ['DataFrame', 'Series'] - param_names = ['constructor'] + params = ["DataFrame", "Series"] + param_names = ["constructor"] def setup(self, constructor): N = 10000 M = 10 - rng = date_range(start='1/1/1990', periods=N, freq='53s') - data = {'DataFrame': DataFrame(np.random.randn(N, M)), - 'Series': Series(np.random.randn(N))} + rng = date_range(start="1/1/1990", periods=N, freq="53s") + data = { + "DataFrame": DataFrame(np.random.randn(N, M)), + "Series": Series(np.random.randn(N)), + } self.ts = data[constructor] self.ts.index = rng self.ts2 = self.ts.copy() self.ts2.iloc[250:5000] = np.nan self.ts3 = self.ts.copy() self.ts3.iloc[-5000:] = np.nan - self.dates = date_range(start='1/1/1990', periods=N * 10, freq='5s') + self.dates = date_range(start="1/1/1990", periods=N * 10, freq="5s") self.date = self.dates[0] self.date_last = self.dates[-1] self.date_early = self.date - timedelta(10) @@ -255,15 +243,14 @@ def time_asof_nan_single(self, constructor): self.ts3.asof(self.date_last) -class SortIndex(object): +class SortIndex: - goal_time = 0.2 params = [True, False] - param_names = ['monotonic'] + param_names = ["monotonic"] def setup(self, monotonic): - N = 10**5 - idx = date_range(start='1/1/2000', periods=N, freq='s') + N = 10 ** 5 + idx = date_range(start="1/1/2000", periods=N, freq="s") self.s = Series(np.random.randn(N), index=idx) if not monotonic: self.s = self.s.sample(frac=1) @@ -275,13 +262,10 @@ def time_get_slice(self, monotonic): self.s[:10000] -class IrregularOps(object): - - goal_time = 0.2 - +class IrregularOps: def setup(self): - N = 10**5 - idx = date_range(start='1/1/2000', periods=N, freq='s') + N = 10 ** 5 + idx = date_range(start="1/1/2000", periods=N, freq="s") s = Series(np.random.randn(N), index=idx) self.left = s.sample(frac=1) self.right = s.sample(frac=1) @@ -290,13 +274,10 @@ def time_add(self): self.left + self.right -class Lookup(object): - - goal_time = 0.2 - +class Lookup: def setup(self): N = 1500000 - rng = date_range(start='1/1/2000', periods=N, freq='S') + rng = date_range(start="1/1/2000", periods=N, freq="S") self.ts = Series(1, index=rng) self.lookup_val = rng[N // 2] @@ -305,28 +286,36 @@ def time_lookup_and_cleanup(self): self.ts.index._cleanup() -class ToDatetimeYYYYMMDD(object): - - goal_time = 0.2 - +class ToDatetimeYYYYMMDD: def setup(self): - rng = date_range(start='1/1/2000', periods=10000, freq='D') - self.stringsD = Series(rng.strftime('%Y%m%d')) + rng = date_range(start="1/1/2000", periods=10000, freq="D") + self.stringsD = Series(rng.strftime("%Y%m%d")) def time_format_YYYYMMDD(self): - to_datetime(self.stringsD, format='%Y%m%d') + to_datetime(self.stringsD, format="%Y%m%d") + + +class ToDatetimeCacheSmallCount: + + params = ([True, False], [50, 500, 5000, 100000]) + param_names = ["cache", "count"] + def setup(self, cache, count): + rng = date_range(start="1/1/1971", periods=count) + self.unique_date_strings = rng.strftime("%Y-%m-%d").tolist() -class ToDatetimeISO8601(object): + def time_unique_date_strings(self, cache, count): + to_datetime(self.unique_date_strings, cache=cache) - goal_time = 0.2 +class ToDatetimeISO8601: def setup(self): - rng = date_range(start='1/1/2000', periods=20000, freq='H') - self.strings = rng.strftime('%Y-%m-%d %H:%M:%S').tolist() - self.strings_nosep = rng.strftime('%Y%m%d %H:%M:%S').tolist() - self.strings_tz_space = [x.strftime('%Y-%m-%d %H:%M:%S') + ' -0800' - for x in rng] + rng = date_range(start="1/1/2000", periods=20000, freq="H") + self.strings = rng.strftime("%Y-%m-%d %H:%M:%S").tolist() + self.strings_nosep = rng.strftime("%Y%m%d %H:%M:%S").tolist() + self.strings_tz_space = [ + x.strftime("%Y-%m-%d %H:%M:%S") + " -0800" for x in rng + ] def time_iso8601(self): to_datetime(self.strings) @@ -335,67 +324,108 @@ def time_iso8601_nosep(self): to_datetime(self.strings_nosep) def time_iso8601_format(self): - to_datetime(self.strings, format='%Y-%m-%d %H:%M:%S') + to_datetime(self.strings, format="%Y-%m-%d %H:%M:%S") def time_iso8601_format_no_sep(self): - to_datetime(self.strings_nosep, format='%Y%m%d %H:%M:%S') + to_datetime(self.strings_nosep, format="%Y%m%d %H:%M:%S") def time_iso8601_tz_spaceformat(self): to_datetime(self.strings_tz_space) -class ToDatetimeFormat(object): +class ToDatetimeNONISO8601: + def setup(self): + N = 10000 + half = int(N / 2) + ts_string_1 = "March 1, 2018 12:00:00+0400" + ts_string_2 = "March 1, 2018 12:00:00+0500" + self.same_offset = [ts_string_1] * N + self.diff_offset = [ts_string_1] * half + [ts_string_2] * half + + def time_same_offset(self): + to_datetime(self.same_offset) - goal_time = 0.2 + def time_different_offset(self): + to_datetime(self.diff_offset) + +class ToDatetimeFormatQuarters: def setup(self): - self.s = Series(['19MAY11', '19MAY11:00:00:00'] * 100000) - self.s2 = self.s.str.replace(':\\S+$', '') + self.s = Series(["2Q2005", "2Q05", "2005Q1", "05Q1"] * 10000) + + def time_infer_quarter(self): + to_datetime(self.s) + + +class ToDatetimeFormat: + def setup(self): + self.s = Series(["19MAY11", "19MAY11:00:00:00"] * 100000) + self.s2 = self.s.str.replace(":\\S+$", "") def time_exact(self): - to_datetime(self.s2, format='%d%b%y') + to_datetime(self.s2, format="%d%b%y") def time_no_exact(self): - to_datetime(self.s, format='%d%b%y', exact=False) + to_datetime(self.s, format="%d%b%y", exact=False) -class ToDatetimeCache(object): +class ToDatetimeCache: - goal_time = 0.2 params = [True, False] - param_names = ['cache'] + param_names = ["cache"] def setup(self, cache): N = 10000 self.unique_numeric_seconds = list(range(N)) self.dup_numeric_seconds = [1000] * N - self.dup_string_dates = ['2000-02-11'] * N - self.dup_string_with_tz = ['2000-02-11 15:00:00-0800'] * N + self.dup_string_dates = ["2000-02-11"] * N + self.dup_string_with_tz = ["2000-02-11 15:00:00-0800"] * N def time_unique_seconds_and_unit(self, cache): - to_datetime(self.unique_numeric_seconds, unit='s', cache=cache) + to_datetime(self.unique_numeric_seconds, unit="s", cache=cache) def time_dup_seconds_and_unit(self, cache): - to_datetime(self.dup_numeric_seconds, unit='s', cache=cache) + to_datetime(self.dup_numeric_seconds, unit="s", cache=cache) def time_dup_string_dates(self, cache): to_datetime(self.dup_string_dates, cache=cache) def time_dup_string_dates_and_format(self, cache): - to_datetime(self.dup_string_dates, format='%Y-%m-%d', cache=cache) + to_datetime(self.dup_string_dates, format="%Y-%m-%d", cache=cache) def time_dup_string_tzoffset_dates(self, cache): to_datetime(self.dup_string_with_tz, cache=cache) -class DatetimeAccessor(object): +class DatetimeAccessor: - def setup(self): + params = [None, "US/Eastern", "UTC", dateutil.tz.tzutc()] + param_names = "tz" + + def setup(self, tz): N = 100000 - self.series = Series(date_range(start='1/1/2000', periods=N, freq='T')) + self.series = Series(date_range(start="1/1/2000", periods=N, freq="T", tz=tz)) - def time_dt_accessor(self): + def time_dt_accessor(self, tz): self.series.dt - def time_dt_accessor_normalize(self): + def time_dt_accessor_normalize(self, tz): self.series.dt.normalize() + + def time_dt_accessor_month_name(self, tz): + self.series.dt.month_name() + + def time_dt_accessor_day_name(self, tz): + self.series.dt.day_name() + + def time_dt_accessor_time(self, tz): + self.series.dt.time + + def time_dt_accessor_date(self, tz): + self.series.dt.date + + def time_dt_accessor_year(self, tz): + self.series.dt.year + + +from .pandas_vb_common import setup # noqa: F401 isort:skip diff --git a/asv_bench/benchmarks/timestamp.py b/asv_bench/benchmarks/timestamp.py index c142a9b59fc43..8ebb2d8d2f35d 100644 --- a/asv_bench/benchmarks/timestamp.py +++ b/asv_bench/benchmarks/timestamp.py @@ -1,25 +1,26 @@ import datetime -from pandas import Timestamp +import dateutil import pytz +from pandas import Timestamp -class TimestampConstruction(object): +class TimestampConstruction: def time_parse_iso8601_no_tz(self): - Timestamp('2017-08-25 08:16:14') + Timestamp("2017-08-25 08:16:14") def time_parse_iso8601_tz(self): - Timestamp('2017-08-25 08:16:14-0500') + Timestamp("2017-08-25 08:16:14-0500") def time_parse_dateutil(self): - Timestamp('2017/08/25 08:16:14 AM') + Timestamp("2017/08/25 08:16:14 AM") def time_parse_today(self): - Timestamp('today') + Timestamp("today") def time_parse_now(self): - Timestamp('now') + Timestamp("now") def time_fromordinal(self): Timestamp.fromordinal(730120) @@ -28,16 +29,14 @@ def time_fromtimestamp(self): Timestamp.fromtimestamp(1515448538) -class TimestampProperties(object): - goal_time = 0.2 - - _tzs = [None, pytz.timezone('Europe/Amsterdam')] - _freqs = [None, 'B'] +class TimestampProperties: + _tzs = [None, pytz.timezone("Europe/Amsterdam"), pytz.UTC, dateutil.tz.tzutc()] + _freqs = [None, "B"] params = [_tzs, _freqs] - param_names = ['tz', 'freq'] + param_names = ["tz", "freq"] def setup(self, tz, freq): - self.ts = Timestamp('2017-08-25 08:16:14', tzinfo=tz, freq=freq) + self.ts = Timestamp("2017-08-25 08:16:14", tzinfo=tz, freq=freq) def time_tz(self, tz, freq): self.ts.tz @@ -46,7 +45,7 @@ def time_dayofweek(self, tz, freq): self.ts.dayofweek def time_weekday_name(self, tz, freq): - self.ts.weekday_name + self.ts.day_name def time_dayofyear(self, tz, freq): self.ts.dayofyear @@ -76,29 +75,30 @@ def time_is_quarter_end(self, tz, freq): self.ts.is_quarter_end def time_is_year_start(self, tz, freq): - self.ts.is_quarter_end + self.ts.is_year_start def time_is_year_end(self, tz, freq): - self.ts.is_quarter_end + self.ts.is_year_end def time_is_leap_year(self, tz, freq): - self.ts.is_quarter_end + self.ts.is_leap_year def time_microsecond(self, tz, freq): self.ts.microsecond + def time_month_name(self, tz, freq): + self.ts.month_name() -class TimestampOps(object): - goal_time = 0.2 - params = [None, 'US/Eastern'] - param_names = ['tz'] +class TimestampOps: + params = [None, "US/Eastern", pytz.UTC, dateutil.tz.tzutc()] + param_names = ["tz"] def setup(self, tz): - self.ts = Timestamp('2017-08-25 08:16:14', tz=tz) + self.ts = Timestamp("2017-08-25 08:16:14", tz=tz) def time_replace_tz(self, tz): - self.ts.replace(tzinfo=pytz.timezone('US/Eastern')) + self.ts.replace(tzinfo=pytz.timezone("US/Eastern")) def time_replace_None(self, tz): self.ts.replace(tzinfo=None) @@ -106,13 +106,31 @@ def time_replace_None(self, tz): def time_to_pydatetime(self, tz): self.ts.to_pydatetime() + def time_normalize(self, tz): + self.ts.normalize() + + def time_tz_convert(self, tz): + if self.ts.tz is not None: + self.ts.tz_convert(tz) + + def time_tz_localize(self, tz): + if self.ts.tz is None: + self.ts.tz_localize(tz) + + def time_to_julian_date(self, tz): + self.ts.to_julian_date() + + def time_floor(self, tz): + self.ts.floor("5T") + + def time_ceil(self, tz): + self.ts.ceil("5T") -class TimestampAcrossDst(object): - goal_time = 0.2 +class TimestampAcrossDst: def setup(self): dt = datetime.datetime(2016, 3, 27, 1) - self.tzinfo = pytz.timezone('CET').localize(dt, is_dst=False).tzinfo + self.tzinfo = pytz.timezone("CET").localize(dt, is_dst=False).tzinfo self.ts2 = Timestamp(dt) def time_replace_across_dst(self): diff --git a/asv_bench/vbench_to_asv.py b/asv_bench/vbench_to_asv.py deleted file mode 100644 index b1179387e65d5..0000000000000 --- a/asv_bench/vbench_to_asv.py +++ /dev/null @@ -1,163 +0,0 @@ -import ast -import vbench -import os -import sys -import astor -import glob - - -def vbench_to_asv_source(bench, kinds=None): - tab = ' ' * 4 - if kinds is None: - kinds = ['time'] - - output = 'class {}(object):\n'.format(bench.name) - output += tab + 'goal_time = 0.2\n\n' - - if bench.setup: - indented_setup = [tab * 2 + '{}\n'.format(x) for x in bench.setup.splitlines()] - output += tab + 'def setup(self):\n' + ''.join(indented_setup) + '\n' - - for kind in kinds: - output += tab + 'def {}_{}(self):\n'.format(kind, bench.name) - for line in bench.code.splitlines(): - output += tab * 2 + line + '\n' - output += '\n\n' - - if bench.cleanup: - output += tab + 'def teardown(self):\n' + tab * 2 + bench.cleanup - - output += '\n\n' - return output - - -class AssignToSelf(ast.NodeTransformer): - def __init__(self): - super(AssignToSelf, self).__init__() - self.transforms = {} - self.imports = [] - - self.in_class_define = False - self.in_setup = False - - def visit_ClassDef(self, node): - self.transforms = {} - self.in_class_define = True - - functions_to_promote = [] - setup_func = None - - for class_func in ast.iter_child_nodes(node): - if isinstance(class_func, ast.FunctionDef): - if class_func.name == 'setup': - setup_func = class_func - for anon_func in ast.iter_child_nodes(class_func): - if isinstance(anon_func, ast.FunctionDef): - functions_to_promote.append(anon_func) - - if setup_func: - for func in functions_to_promote: - setup_func.body.remove(func) - func.args.args.insert(0, ast.Name(id='self', ctx=ast.Load())) - node.body.append(func) - self.transforms[func.name] = 'self.' + func.name - - ast.fix_missing_locations(node) - - self.generic_visit(node) - - return node - - def visit_TryExcept(self, node): - if any(isinstance(x, (ast.Import, ast.ImportFrom)) for x in node.body): - self.imports.append(node) - else: - self.generic_visit(node) - return node - - def visit_Assign(self, node): - for target in node.targets: - if isinstance(target, ast.Name) and not isinstance(target.ctx, ast.Param) and not self.in_class_define: - self.transforms[target.id] = 'self.' + target.id - self.generic_visit(node) - - return node - - def visit_Name(self, node): - new_node = node - if node.id in self.transforms: - if not isinstance(node.ctx, ast.Param): - new_node = ast.Attribute(value=ast.Name(id='self', ctx=node.ctx), attr=node.id, ctx=node.ctx) - - self.generic_visit(node) - - return ast.copy_location(new_node, node) - - def visit_Import(self, node): - self.imports.append(node) - - def visit_ImportFrom(self, node): - self.imports.append(node) - - def visit_FunctionDef(self, node): - """Delete functions that are empty due to imports being moved""" - self.in_class_define = False - - self.generic_visit(node) - - if node.body: - return node - - -def translate_module(target_module): - g_vars = {} - l_vars = {} - exec('import ' + target_module) in g_vars - - print(target_module) - module = eval(target_module, g_vars) - - benchmarks = [] - for obj_str in dir(module): - obj = getattr(module, obj_str) - if isinstance(obj, vbench.benchmark.Benchmark): - benchmarks.append(obj) - - if not benchmarks: - return - - rewritten_output = '' - for bench in benchmarks: - rewritten_output += vbench_to_asv_source(bench) - - with open('rewrite.py', 'w') as f: - f.write(rewritten_output) - - ast_module = ast.parse(rewritten_output) - - transformer = AssignToSelf() - transformed_module = transformer.visit(ast_module) - - unique_imports = {astor.to_source(node): node for node in transformer.imports} - - transformed_module.body = unique_imports.values() + transformed_module.body - - transformed_source = astor.to_source(transformed_module) - - with open('benchmarks/{}.py'.format(target_module), 'w') as f: - f.write(transformed_source) - - -if __name__ == '__main__': - cwd = os.getcwd() - new_dir = os.path.join(os.path.dirname(__file__), '../vb_suite') - sys.path.insert(0, new_dir) - - for module in glob.glob(os.path.join(new_dir, '*.py')): - mod = os.path.basename(module) - if mod in ['make.py', 'measure_memory_consumption.py', 'perf_HEAD.py', 'run_suite.py', 'test_perf.py', 'generate_rst_files.py', 'test.py', 'suite.py']: - continue - print('') - print(mod) - - translate_module(mod.replace('.py', '')) diff --git a/azure-pipelines.yml b/azure-pipelines.yml new file mode 100644 index 0000000000000..263a87176a9c9 --- /dev/null +++ b/azure-pipelines.yml @@ -0,0 +1,169 @@ +# Adapted from https://github.com/numba/numba/blob/master/azure-pipelines.yml +jobs: +# Mac and Linux use the same template +- template: ci/azure/posix.yml + parameters: + name: macOS + vmImage: xcode9-macos10.13 + +- template: ci/azure/posix.yml + parameters: + name: Linux + vmImage: ubuntu-16.04 + +- template: ci/azure/windows.yml + parameters: + name: Windows + vmImage: vs2017-win2016 + +- job: 'Checks' + pool: + vmImage: ubuntu-16.04 + timeoutInMinutes: 90 + steps: + - script: | + echo '##vso[task.prependpath]$(HOME)/miniconda3/bin' + echo '##vso[task.setvariable variable=ENV_FILE]environment.yml' + echo '##vso[task.setvariable variable=AZURE]true' + displayName: 'Setting environment variables' + + # Do not require a conda environment + - script: ci/code_checks.sh patterns + displayName: 'Looking for unwanted patterns' + condition: true + + - script: | + sudo apt-get install -y libc6-dev-i386 + ci/setup_env.sh + displayName: 'Setup environment and build pandas' + condition: true + + # Do not require pandas + - script: | + source activate pandas-dev + ci/code_checks.sh lint + displayName: 'Linting' + condition: true + + - script: | + source activate pandas-dev + ci/code_checks.sh dependencies + displayName: 'Dependencies consistency' + condition: true + + # Require pandas + - script: | + source activate pandas-dev + ci/code_checks.sh code + displayName: 'Checks on imported code' + condition: true + + - script: | + source activate pandas-dev + ci/code_checks.sh doctests + displayName: 'Running doctests' + condition: true + + - script: | + source activate pandas-dev + ci/code_checks.sh docstrings + displayName: 'Docstring validation' + condition: true + + - script: | + source activate pandas-dev + ci/code_checks.sh typing + displayName: 'Typing validation' + condition: true + + - script: | + source activate pandas-dev + pytest --capture=no --strict scripts + displayName: 'Testing docstring validation script' + condition: true + + - script: | + source activate pandas-dev + cd asv_bench + asv check -E existing + git remote add upstream https://github.com/pandas-dev/pandas.git + git fetch upstream + if git diff upstream/master --name-only | grep -q "^asv_bench/"; then + asv machine --yes + ASV_OUTPUT="$(asv dev)" + if [[ $(echo "$ASV_OUTPUT" | grep "failed") ]]; then + echo "##vso[task.logissue type=error]Benchmarks run with errors" + echo "$ASV_OUTPUT" + exit 1 + else + echo "Benchmarks run without errors" + fi + else + echo "Benchmarks did not run, no changes detected" + fi + displayName: 'Running benchmarks' + condition: true + +- job: 'Docs' + pool: + vmImage: ubuntu-16.04 + timeoutInMinutes: 90 + steps: + - script: | + echo '##vso[task.setvariable variable=ENV_FILE]environment.yml' + echo '##vso[task.prependpath]$(HOME)/miniconda3/bin' + displayName: 'Setting environment variables' + + - script: | + sudo apt-get install -y libc6-dev-i386 + ci/setup_env.sh + displayName: 'Setup environment and build pandas' + + - script: | + source activate pandas-dev + # Next we should simply have `doc/make.py --warnings-are-errors`, everything else is required because the ipython directive doesn't fail the build on errors (https://github.com/ipython/ipython/issues/11547) + doc/make.py --warnings-are-errors | tee sphinx.log ; SPHINX_RET=${PIPESTATUS[0]} + grep -B1 "^<<<-------------------------------------------------------------------------$" sphinx.log ; IPY_RET=$(( $? != 1 )) + exit $(( $SPHINX_RET + $IPY_RET )) + displayName: 'Build documentation' + + - script: | + cd doc/build/html + git init + touch .nojekyll + echo "dev.pandas.io" > CNAME + printf "User-agent: *\nDisallow: /" > robots.txt + git add --all . + git config user.email "pandas-dev@python.org" + git config user.name "pandas-docs-bot" + git commit -m "pandas documentation in master" + displayName: 'Create git repo for docs build' + condition : | + and(not(eq(variables['Build.Reason'], 'PullRequest')), + eq(variables['Build.SourceBranch'], 'refs/heads/master')) + + # For `InstallSSHKey@0` to work, next steps are required: + # 1. Generate a pair of private/public keys (i.e. `ssh-keygen -t rsa -b 4096 -C "your_email@example.com"`) + # 2. Go to "Library > Secure files" in the Azure Pipelines dashboard: https://dev.azure.com/pandas-dev/pandas/_library?itemType=SecureFiles + # 3. Click on "+ Secure file" + # 4. Upload the private key (the name of the file must match with the specified in "sshKeySecureFile" input below, "pandas_docs_key") + # 5. Click on file name after it is created, tick the box "Authorize for use in all pipelines" and save + # 6. The public key specified in "sshPublicKey" is the pair of the uploaded private key, and needs to be set as a deploy key of the repo where the docs will be pushed (with write access): https://github.com/pandas-dev/pandas-dev.github.io/settings/keys + - task: InstallSSHKey@0 + inputs: + hostName: 'github.com,192.30.252.128 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ==' + sshPublicKey: 'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDHmz3l/EdqrgNxEUKkwDUuUcLv91unig03pYFGO/DMIgCmPdMG96zAgfnESd837Rm0wSSqylwSzkRJt5MV/TpFlcVifDLDQmUhqCeO8Z6dLl/oe35UKmyYICVwcvQTAaHNnYRpKC5IUlTh0JEtw9fGlnp1Ta7U1ENBLbKdpywczElhZu+hOQ892zqOj3CwA+U2329/d6cd7YnqIKoFN9DWT3kS5K6JE4IoBfQEVekIOs23bKjNLvPoOmi6CroAhu/K8j+NCWQjge5eJf2x/yTnIIP1PlEcXoHIr8io517posIx3TBup+CN8bNS1PpDW3jyD3ttl1uoBudjOQrobNnJeR6Rn67DRkG6IhSwr3BWj8alwUG5mTdZzwV5Pa9KZFdIiqX7NoDGg+itsR39QCn0thK8lGRNSR8KrWC1PSjecwelKBO7uQ7rnk/rkrZdBWR4oEA8YgNH8tirUw5WfOr5a0AIaJicKxGKNdMxZt+zmC+bS7F4YCOGIm9KHa43RrKhoGRhRf9fHHHKUPwFGqtWG4ykcUgoamDOURJyepesBAO3FiRE9rLU6ILbB3yEqqoekborHmAJD5vf7PWItW3Q/YQKuk3kkqRcKnexPyzyyq5lUgTi8CxxZdaASIOu294wjBhhdyHlXEkVTNJ9JKkj/obF+XiIIp0cBDsOXY9hDQ== pandas-dev@python.org' + sshKeySecureFile: 'pandas_docs_key' + displayName: 'Install GitHub ssh deployment key' + condition : | + and(not(eq(variables['Build.Reason'], 'PullRequest')), + eq(variables['Build.SourceBranch'], 'refs/heads/master')) + + - script: | + cd doc/build/html + git remote add origin git@github.com:pandas-dev/pandas-dev.github.io.git + git push -f origin master + displayName: 'Publish docs to GitHub pages' + condition : | + and(not(eq(variables['Build.Reason'], 'PullRequest')), + eq(variables['Build.SourceBranch'], 'refs/heads/master')) diff --git a/ci/README.txt b/ci/README.txt deleted file mode 100644 index bb71dc25d6093..0000000000000 --- a/ci/README.txt +++ /dev/null @@ -1,17 +0,0 @@ -Travis is a ci service that's well-integrated with GitHub. -The following types of breakage should be detected -by Travis builds: - -1) Failing tests on any supported version of Python. -2) Pandas should install and the tests should run if no optional deps are installed. -That also means tests which rely on optional deps need to raise SkipTest() -if the dep is missing. -3) unicode related fails when running under exotic locales. - -We tried running the vbench suite for a while, but with varying load -on Travis machines, that wasn't useful. - -Travis currently (4/2013) has a 5-job concurrency limit. Exceeding it -basically doubles the total runtime for a commit through travis, and -since dep+pandas installation is already quite long, this should become -a hard limit on concurrent travis runs. diff --git a/ci/asv.sh b/ci/asv.sh deleted file mode 100755 index 1e9a8d6380eb5..0000000000000 --- a/ci/asv.sh +++ /dev/null @@ -1,35 +0,0 @@ -#!/bin/bash - -echo "inside $0" - -source activate pandas - -RET=0 - -if [ "$ASV" ]; then - echo "Check for failed asv benchmarks" - - cd asv_bench - - asv machine --yes - - time asv dev | tee failed_asv.txt - - echo "The following asvs benchmarks (if any) failed." - - cat failed_asv.txt | grep "failed" failed_asv.txt - - if [ $? = "0" ]; then - RET=1 - fi - - echo "DONE displaying failed asvs benchmarks." - - rm failed_asv.txt - - echo "Check for failed asv benchmarks DONE" -else - echo "NOT checking for failed asv benchmarks" -fi - -exit $RET diff --git a/ci/azure/posix.yml b/ci/azure/posix.yml new file mode 100644 index 0000000000000..6093df46ffb60 --- /dev/null +++ b/ci/azure/posix.yml @@ -0,0 +1,100 @@ +parameters: + name: '' + vmImage: '' + +jobs: +- job: ${{ parameters.name }} + pool: + vmImage: ${{ parameters.vmImage }} + strategy: + matrix: + ${{ if eq(parameters.name, 'macOS') }}: + py35_macos: + ENV_FILE: ci/deps/azure-macos-35.yaml + CONDA_PY: "35" + PATTERN: "not slow and not network" + + ${{ if eq(parameters.name, 'Linux') }}: + py35_compat: + ENV_FILE: ci/deps/azure-35-compat.yaml + CONDA_PY: "35" + PATTERN: "not slow and not network" + + py36_locale_slow_old_np: + ENV_FILE: ci/deps/azure-36-locale.yaml + CONDA_PY: "36" + PATTERN: "slow" + LOCALE_OVERRIDE: "zh_CN.UTF-8" + EXTRA_APT: "language-pack-zh-hans" + + py36_locale_slow: + ENV_FILE: ci/deps/azure-36-locale_slow.yaml + CONDA_PY: "36" + PATTERN: "not slow and not network" + LOCALE_OVERRIDE: "it_IT.UTF-8" + + py36_32bit: + ENV_FILE: ci/deps/azure-36-32bit.yaml + CONDA_PY: "36" + PATTERN: "not slow and not network" + BITS32: "yes" + + py37_locale: + ENV_FILE: ci/deps/azure-37-locale.yaml + CONDA_PY: "37" + PATTERN: "not slow and not network" + LOCALE_OVERRIDE: "zh_CN.UTF-8" + + py37_np_dev: + ENV_FILE: ci/deps/azure-37-numpydev.yaml + CONDA_PY: "37" + PATTERN: "not slow and not network" + TEST_ARGS: "-W error" + PANDAS_TESTING_MODE: "deprecate" + EXTRA_APT: "xsel" + + steps: + - script: | + if [ "$(uname)" == "Linux" ]; then sudo apt-get install -y libc6-dev-i386 $EXTRA_APT; fi + echo '##vso[task.prependpath]$(HOME)/miniconda3/bin' + echo "Creating Environment" + ci/setup_env.sh + displayName: 'Setup environment and build pandas' + - script: | + source activate pandas-dev + ci/run_tests.sh + displayName: 'Test' + - script: source activate pandas-dev && pushd /tmp && python -c "import pandas; pandas.show_versions();" && popd + - task: PublishTestResults@2 + inputs: + testResultsFiles: 'test-data-*.xml' + testRunTitle: ${{ format('{0}-$(CONDA_PY)', parameters.name) }} + - powershell: | + $junitXml = "test-data-single.xml" + $(Get-Content $junitXml | Out-String) -match 'failures="(.*?)"' + if ($matches[1] -eq 0) + { + Write-Host "No test failures in test-data-single" + } + else + { + # note that this will produce $LASTEXITCODE=1 + Write-Error "$($matches[1]) tests failed" + } + + $junitXmlMulti = "test-data-multiple.xml" + $(Get-Content $junitXmlMulti | Out-String) -match 'failures="(.*?)"' + if ($matches[1] -eq 0) + { + Write-Host "No test failures in test-data-multi" + } + else + { + # note that this will produce $LASTEXITCODE=1 + Write-Error "$($matches[1]) tests failed" + } + displayName: 'Check for test failures' + - script: | + source activate pandas-dev + python ci/print_skipped.py + displayName: 'Print skipped tests' diff --git a/ci/azure/windows.yml b/ci/azure/windows.yml new file mode 100644 index 0000000000000..dfa82819b9826 --- /dev/null +++ b/ci/azure/windows.yml @@ -0,0 +1,59 @@ +parameters: + name: '' + vmImage: '' + +jobs: +- job: ${{ parameters.name }} + pool: + vmImage: ${{ parameters.vmImage }} + strategy: + matrix: + py36_np15: + ENV_FILE: ci/deps/azure-windows-36.yaml + CONDA_PY: "36" + + py37_np141: + ENV_FILE: ci/deps/azure-windows-37.yaml + CONDA_PY: "37" + + steps: + - powershell: | + Write-Host "##vso[task.prependpath]$env:CONDA\Scripts" + Write-Host "##vso[task.prependpath]$HOME/miniconda3/bin" + displayName: 'Add conda to PATH' + - script: conda update -q -n base conda + displayName: Update conda + - script: | + call activate + conda env create -q --file ci\\deps\\azure-windows-$(CONDA_PY).yaml + displayName: 'Create anaconda environment' + - script: | + call activate pandas-dev + call conda list + ci\\incremental\\build.cmd + displayName: 'Build' + - script: | + call activate pandas-dev + pytest -m "not slow and not network" --junitxml=test-data.xml pandas -n 2 -r sxX --strict --durations=10 %* + displayName: 'Test' + - task: PublishTestResults@2 + inputs: + testResultsFiles: 'test-data.xml' + testRunTitle: 'Windows-$(CONDA_PY)' + - powershell: | + $junitXml = "test-data.xml" + $(Get-Content $junitXml | Out-String) -match 'failures="(.*?)"' + if ($matches[1] -eq 0) + { + Write-Host "No test failures in test-data" + } + else + { + # note that this will produce $LASTEXITCODE=1 + Write-Error "$($matches[1]) tests failed" + } + displayName: 'Check for test failures' + - script: | + source activate pandas-dev + python ci/print_skipped.py + displayName: 'Print skipped tests' diff --git a/ci/before_script_travis.sh b/ci/before_script_travis.sh deleted file mode 100755 index 0b3939b1906a2..0000000000000 --- a/ci/before_script_travis.sh +++ /dev/null @@ -1,11 +0,0 @@ -#!/bin/bash - -echo "inside $0" - -if [ "${TRAVIS_OS_NAME}" == "linux" ]; then - sh -e /etc/init.d/xvfb start - sleep 3 -fi - -# Never fail because bad things happened here. -true diff --git a/ci/build_docs.sh b/ci/build_docs.sh deleted file mode 100755 index a038304fe0f7a..0000000000000 --- a/ci/build_docs.sh +++ /dev/null @@ -1,73 +0,0 @@ -#!/bin/bash - -if [ "${TRAVIS_OS_NAME}" != "linux" ]; then - echo "not doing build_docs on non-linux" - exit 0 -fi - -cd "$TRAVIS_BUILD_DIR" -echo "inside $0" - -git show --pretty="format:" --name-only HEAD~5.. --first-parent | grep -P "rst|txt|doc" - -if [ "$?" != "0" ]; then - echo "Skipping doc build, none were modified" - # nope, skip docs build - exit 0 -fi - - -if [ "$DOC" ]; then - - echo "Will build docs" - - source activate pandas - - mv "$TRAVIS_BUILD_DIR"/doc /tmp - cd /tmp/doc - - echo ############################### - echo # Log file for the doc build # - echo ############################### - - echo ./make.py - ./make.py - - echo ######################## - echo # Create and send docs # - echo ######################## - - cd /tmp/doc/build/html - git config --global user.email "pandas-docs-bot@localhost.foo" - git config --global user.name "pandas-docs-bot" - - # create the repo - git init - - touch README - git add README - git commit -m "Initial commit" --allow-empty - git branch gh-pages - git checkout gh-pages - touch .nojekyll - git add --all . - git commit -m "Version" --allow-empty - - git remote remove origin - git remote add origin "https://${PANDAS_GH_TOKEN}@github.com/pandas-dev/pandas-docs-travis.git" - git fetch origin - git remote -v - - git push origin gh-pages -f - - echo "Running doctests" - cd "$TRAVIS_BUILD_DIR" - pytest --doctest-modules \ - pandas/core/reshape/concat.py \ - pandas/core/reshape/pivot.py \ - pandas/core/reshape/reshape.py \ - pandas/core/reshape/tile.py - -fi - -exit 0 diff --git a/ci/check_git_tags.sh b/ci/check_git_tags.sh new file mode 100755 index 0000000000000..9dbcd4f98683e --- /dev/null +++ b/ci/check_git_tags.sh @@ -0,0 +1,28 @@ +set -e + +if [[ ! $(git tag) ]]; then + echo "No git tags in clone, please sync your git tags with upstream using:" + echo " git fetch --tags upstream" + echo " git push --tags origin" + echo "" + echo "If the issue persists, the clone depth needs to be increased in .travis.yml" + exit 1 +fi + +# This will error if there are no tags and we omit --always +DESCRIPTION=$(git describe --long --tags) +echo "$DESCRIPTION" + +if [[ "$DESCRIPTION" == *"untagged"* ]]; then + echo "Unable to determine most recent tag, aborting build" + exit 1 +else + if [[ "$DESCRIPTION" != *"g"* ]]; then + # A good description will have the hash prefixed by g, a bad one will be + # just the hash + echo "Unable to determine most recent tag, aborting build" + exit 1 + else + echo "$(git tag)" + fi +fi diff --git a/ci/check_imports.py b/ci/check_imports.py deleted file mode 100644 index d6f24ebcc4d3e..0000000000000 --- a/ci/check_imports.py +++ /dev/null @@ -1,35 +0,0 @@ -""" -Check that certain modules are not loaded by `import pandas` -""" -import sys - -blacklist = { - 'bs4', - 'html5lib', - 'ipython', - 'jinja2' - 'lxml', - 'numexpr', - 'openpyxl', - 'py', - 'pytest', - 's3fs', - 'scipy', - 'tables', - 'xlrd', - 'xlsxwriter', - 'xlwt', -} - - -def main(): - import pandas # noqa - - modules = set(x.split('.')[0] for x in sys.modules) - imported = modules & blacklist - if modules & blacklist: - sys.exit("Imported {}".format(imported)) - - -if __name__ == '__main__': - main() diff --git a/ci/code_checks.sh b/ci/code_checks.sh new file mode 100755 index 0000000000000..f839d86318e2e --- /dev/null +++ b/ci/code_checks.sh @@ -0,0 +1,297 @@ +#!/bin/bash +# +# Run checks related to code quality. +# +# This script is intended for both the CI and to check locally that code standards are +# respected. We are currently linting (PEP-8 and similar), looking for patterns of +# common mistakes (sphinx directives with missing blank lines, old style classes, +# unwanted imports...), we run doctests here (currently some files only), and we +# validate formatting error in docstrings. +# +# Usage: +# $ ./ci/code_checks.sh # run all checks +# $ ./ci/code_checks.sh lint # run linting only +# $ ./ci/code_checks.sh patterns # check for patterns that should not exist +# $ ./ci/code_checks.sh code # checks on imported code +# $ ./ci/code_checks.sh doctests # run doctests +# $ ./ci/code_checks.sh docstrings # validate docstring errors +# $ ./ci/code_checks.sh dependencies # check that dependencies are consistent +# $ ./ci/code_checks.sh typing # run static type analysis + +[[ -z "$1" || "$1" == "lint" || "$1" == "patterns" || "$1" == "code" || "$1" == "doctests" || "$1" == "docstrings" || "$1" == "dependencies" || "$1" == "typing" ]] || \ + { echo "Unknown command $1. Usage: $0 [lint|patterns|code|doctests|docstrings|dependencies|typing]"; exit 9999; } + +BASE_DIR="$(dirname $0)/.." +RET=0 +CHECK=$1 + +function invgrep { + # grep with inverse exist status and formatting for azure-pipelines + # + # This function works exactly as grep, but with opposite exit status: + # - 0 (success) when no patterns are found + # - 1 (fail) when the patterns are found + # + # This is useful for the CI, as we want to fail if one of the patterns + # that we want to avoid is found by grep. + if [[ "$AZURE" == "true" ]]; then + set -o pipefail + grep -n "$@" | awk -F ":" '{print "##vso[task.logissue type=error;sourcepath=" $1 ";linenumber=" $2 ";] Found unwanted pattern: " $3}' + else + grep "$@" + fi + return $((! $?)) +} + +if [[ "$AZURE" == "true" ]]; then + FLAKE8_FORMAT="##vso[task.logissue type=error;sourcepath=%(path)s;linenumber=%(row)s;columnnumber=%(col)s;code=%(code)s;]%(text)s" +else + FLAKE8_FORMAT="default" +fi + +### LINTING ### +if [[ -z "$CHECK" || "$CHECK" == "lint" ]]; then + + echo "black --version" + black --version + + MSG='Checking black formatting' ; echo $MSG + black . --check --exclude '(asv_bench/env|\.egg|\.git|\.hg|\.mypy_cache|\.nox|\.tox|\.venv|_build|buck-out|build|dist|setup.py)' + RET=$(($RET + $?)) ; echo $MSG "DONE" + + # `setup.cfg` contains the list of error codes that are being ignored in flake8 + + echo "flake8 --version" + flake8 --version + + # pandas/_libs/src is C code, so no need to search there. + MSG='Linting .py code' ; echo $MSG + flake8 --format="$FLAKE8_FORMAT" . + RET=$(($RET + $?)) ; echo $MSG "DONE" + + MSG='Linting .pyx code' ; echo $MSG + flake8 --format="$FLAKE8_FORMAT" pandas --filename=*.pyx --select=E501,E302,E203,E111,E114,E221,E303,E128,E231,E126,E265,E305,E301,E127,E261,E271,E129,W291,E222,E241,E123,F403,C400,C401,C402,C403,C404,C405,C406,C407,C408,C409,C410,C411 + RET=$(($RET + $?)) ; echo $MSG "DONE" + + MSG='Linting .pxd and .pxi.in' ; echo $MSG + flake8 --format="$FLAKE8_FORMAT" pandas/_libs --filename=*.pxi.in,*.pxd --select=E501,E302,E203,E111,E114,E221,E303,E231,E126,F403 + RET=$(($RET + $?)) ; echo $MSG "DONE" + + echo "flake8-rst --version" + flake8-rst --version + + MSG='Linting code-blocks in .rst documentation' ; echo $MSG + flake8-rst doc/source --filename=*.rst --format="$FLAKE8_FORMAT" + RET=$(($RET + $?)) ; echo $MSG "DONE" + + # Check that cython casting is of the form `obj` as opposed to ` obj`; + # it doesn't make a difference, but we want to be internally consistent. + # Note: this grep pattern is (intended to be) equivalent to the python + # regex r'(?])> ' + MSG='Linting .pyx code for spacing conventions in casting' ; echo $MSG + invgrep -r -E --include '*.pyx' --include '*.pxi.in' '[a-zA-Z0-9*]> ' pandas/_libs + RET=$(($RET + $?)) ; echo $MSG "DONE" + + # readability/casting: Warnings about C casting instead of C++ casting + # runtime/int: Warnings about using C number types instead of C++ ones + # build/include_subdir: Warnings about prefacing included header files with directory + + # We don't lint all C files because we don't want to lint any that are built + # from Cython files nor do we want to lint C files that we didn't modify for + # this particular codebase (e.g. src/headers, src/klib, src/msgpack). However, + # we can lint all header files since they aren't "generated" like C files are. + MSG='Linting .c and .h' ; echo $MSG + cpplint --quiet --extensions=c,h --headers=h --recursive --filter=-readability/casting,-runtime/int,-build/include_subdir pandas/_libs/src/*.h pandas/_libs/src/parser pandas/_libs/ujson pandas/_libs/tslibs/src/datetime pandas/io/msgpack pandas/_libs/*.cpp pandas/util + RET=$(($RET + $?)) ; echo $MSG "DONE" + + echo "isort --version-number" + isort --version-number + + # Imports - Check formatting using isort see setup.cfg for settings + MSG='Check import format using isort ' ; echo $MSG + isort --recursive --check-only pandas asv_bench + RET=$(($RET + $?)) ; echo $MSG "DONE" + +fi + +### PATTERNS ### +if [[ -z "$CHECK" || "$CHECK" == "patterns" ]]; then + + # Check for imports from pandas.core.common instead of `import pandas.core.common as com` + # Check for imports from collections.abc instead of `from collections import abc` + MSG='Check for non-standard imports' ; echo $MSG + invgrep -R --include="*.py*" -E "from pandas.core.common import " pandas + invgrep -R --include="*.py*" -E "from collections.abc import " pandas + # invgrep -R --include="*.py*" -E "from numpy import nan " pandas # GH#24822 not yet implemented since the offending imports have not all been removed + RET=$(($RET + $?)) ; echo $MSG "DONE" + + MSG='Check for pytest warns' ; echo $MSG + invgrep -r -E --include '*.py' 'pytest\.warns' pandas/tests/ + RET=$(($RET + $?)) ; echo $MSG "DONE" + + MSG='Check for pytest raises without context' ; echo $MSG + invgrep -r -E --include '*.py' "[[:space:]] pytest.raises" pandas/tests/ + RET=$(($RET + $?)) ; echo $MSG "DONE" + + MSG='Check for python2-style file encodings' ; echo $MSG + invgrep -R --include="*.py" --include="*.pyx" -E "# -\*- coding: utf-8 -\*-" pandas scripts + RET=$(($RET + $?)) ; echo $MSG "DONE" + + MSG='Check for python2-style super usage' ; echo $MSG + invgrep -R --include="*.py" -E "super\(\w*, (self|cls)\)" pandas + RET=$(($RET + $?)) ; echo $MSG "DONE" + + # Check for the following code in testing: `np.testing` and `np.array_equal` + MSG='Check for invalid testing' ; echo $MSG + invgrep -r -E --include '*.py' --exclude testing.py '(numpy|np)(\.testing|\.array_equal)' pandas/tests/ + RET=$(($RET + $?)) ; echo $MSG "DONE" + + # Check for the following code in the extension array base tests: `tm.assert_frame_equal` and `tm.assert_series_equal` + MSG='Check for invalid EA testing' ; echo $MSG + invgrep -r -E --include '*.py' --exclude base.py 'tm.assert_(series|frame)_equal' pandas/tests/extension/base + RET=$(($RET + $?)) ; echo $MSG "DONE" + + MSG='Check for deprecated messages without sphinx directive' ; echo $MSG + invgrep -R --include="*.py" --include="*.pyx" -E "(DEPRECATED|DEPRECATE|Deprecated)(:|,|\.)" pandas + RET=$(($RET + $?)) ; echo $MSG "DONE" + + MSG='Check for python2 new-style classes and for empty parentheses' ; echo $MSG + invgrep -R --include="*.py" --include="*.pyx" -E "class\s\S*\((object)?\):" pandas asv_bench/benchmarks scripts + RET=$(($RET + $?)) ; echo $MSG "DONE" + + MSG='Check for backticks incorrectly rendering because of missing spaces' ; echo $MSG + invgrep -R --include="*.rst" -E "[a-zA-Z0-9]\`\`?[a-zA-Z0-9]" doc/source/ + RET=$(($RET + $?)) ; echo $MSG "DONE" + + MSG='Check for incorrect sphinx directives' ; echo $MSG + invgrep -R --include="*.py" --include="*.pyx" --include="*.rst" -E "\.\. (autosummary|contents|currentmodule|deprecated|function|image|important|include|ipython|literalinclude|math|module|note|raw|seealso|toctree|versionadded|versionchanged|warning):[^:]" ./pandas ./doc/source + RET=$(($RET + $?)) ; echo $MSG "DONE" + + MSG='Check that the deprecated `assert_raises_regex` is not used (`pytest.raises(match=pattern)` should be used instead)' ; echo $MSG + invgrep -R --exclude=*.pyc --exclude=testing.py --exclude=test_util.py assert_raises_regex pandas + RET=$(($RET + $?)) ; echo $MSG "DONE" + + # Check for the following code in testing: `unittest.mock`, `mock.Mock()` or `mock.patch` + MSG='Check that unittest.mock is not used (pytest builtin monkeypatch fixture should be used instead)' ; echo $MSG + invgrep -r -E --include '*.py' '(unittest(\.| import )mock|mock\.Mock\(\)|mock\.patch)' pandas/tests/ + RET=$(($RET + $?)) ; echo $MSG "DONE" + + MSG='Check for wrong space after code-block directive and before colon (".. code-block ::" instead of ".. code-block::")' ; echo $MSG + invgrep -R --include="*.rst" ".. code-block ::" doc/source + RET=$(($RET + $?)) ; echo $MSG "DONE" + + MSG='Check for wrong space after ipython directive and before colon (".. ipython ::" instead of ".. ipython::")' ; echo $MSG + invgrep -R --include="*.rst" ".. ipython ::" doc/source + RET=$(($RET + $?)) ; echo $MSG "DONE" + + MSG='Check that no file in the repo contains tailing whitespaces' ; echo $MSG + set -o pipefail + if [[ "$AZURE" == "true" ]]; then + # we exclude all c/cpp files as the c/cpp files of pandas code base are tested when Linting .c and .h files + ! grep -n '--exclude=*.'{svg,c,cpp,html} --exclude-dir=env -RI "\s$" * | awk -F ":" '{print "##vso[task.logissue type=error;sourcepath=" $1 ";linenumber=" $2 ";] Tailing whitespaces found: " $3}' + else + ! grep -n '--exclude=*.'{svg,c,cpp,html} --exclude-dir=env -RI "\s$" * | awk -F ":" '{print $1 ":" $2 ":Tailing whitespaces found: " $3}' + fi + RET=$(($RET + $?)) ; echo $MSG "DONE" +fi + +### CODE ### +if [[ -z "$CHECK" || "$CHECK" == "code" ]]; then + + MSG='Check import. No warnings, and blacklist some optional dependencies' ; echo $MSG + python -W error -c " +import sys +import pandas + +blacklist = {'bs4', 'gcsfs', 'html5lib', 'http', 'ipython', 'jinja2', 'hypothesis', + 'lxml', 'numexpr', 'openpyxl', 'py', 'pytest', 's3fs', 'scipy', + 'tables', 'urllib.request', 'xlrd', 'xlsxwriter', 'xlwt'} + +# GH#28227 for some of these check for top-level modules, while others are +# more specific (e.g. urllib.request) +import_mods = set(m.split('.')[0] for m in sys.modules) | set(sys.modules) +mods = blacklist & import_mods +if mods: + sys.stderr.write('err: pandas should not import: {}\n'.format(', '.join(mods))) + sys.exit(len(mods)) + " + RET=$(($RET + $?)) ; echo $MSG "DONE" + +fi + +### DOCTESTS ### +if [[ -z "$CHECK" || "$CHECK" == "doctests" ]]; then + + MSG='Doctests frame.py' ; echo $MSG + pytest -q --doctest-modules pandas/core/frame.py \ + -k" -itertuples -join -reindex -reindex_axis -round" + RET=$(($RET + $?)) ; echo $MSG "DONE" + + MSG='Doctests series.py' ; echo $MSG + pytest -q --doctest-modules pandas/core/series.py \ + -k"-nonzero -reindex -searchsorted -to_dict" + RET=$(($RET + $?)) ; echo $MSG "DONE" + + MSG='Doctests generic.py' ; echo $MSG + pytest -q --doctest-modules pandas/core/generic.py \ + -k"-_set_axis_name -_xs -describe -droplevel -groupby -interpolate -pct_change -pipe -reindex -reindex_axis -to_json -transpose -values -xs -to_clipboard" + RET=$(($RET + $?)) ; echo $MSG "DONE" + + MSG='Doctests groupby.py' ; echo $MSG + pytest -q --doctest-modules pandas/core/groupby/groupby.py -k"-cumcount -describe -pipe" + RET=$(($RET + $?)) ; echo $MSG "DONE" + + MSG='Doctests datetimes.py' ; echo $MSG + pytest -q --doctest-modules pandas/core/tools/datetimes.py + RET=$(($RET + $?)) ; echo $MSG "DONE" + + MSG='Doctests top-level reshaping functions' ; echo $MSG + pytest -q --doctest-modules \ + pandas/core/reshape/concat.py \ + pandas/core/reshape/pivot.py \ + pandas/core/reshape/reshape.py \ + pandas/core/reshape/tile.py \ + pandas/core/reshape/melt.py \ + -k"-crosstab -pivot_table -cut" + RET=$(($RET + $?)) ; echo $MSG "DONE" + + MSG='Doctests interval classes' ; echo $MSG + pytest -q --doctest-modules \ + pandas/core/indexes/interval.py \ + pandas/core/arrays/interval.py \ + -k"-from_arrays -from_breaks -from_intervals -from_tuples -set_closed -to_tuples -interval_range" + RET=$(($RET + $?)) ; echo $MSG "DONE" + +fi + +### DOCSTRINGS ### +if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then + + MSG='Validate docstrings (GL03, GL04, GL05, GL06, GL07, GL09, GL10, SS04, SS05, PR03, PR04, PR05, PR10, EX04, RT01, RT04, RT05, SA05)' ; echo $MSG + $BASE_DIR/scripts/validate_docstrings.py --format=azure --errors=GL03,GL04,GL05,GL06,GL07,GL09,GL10,SS04,SS05,PR03,PR04,PR05,PR10,EX04,RT01,RT04,RT05,SA05 + RET=$(($RET + $?)) ; echo $MSG "DONE" + +fi + +### DEPENDENCIES ### +if [[ -z "$CHECK" || "$CHECK" == "dependencies" ]]; then + + MSG='Check that requirements-dev.txt has been generated from environment.yml' ; echo $MSG + $BASE_DIR/scripts/generate_pip_deps_from_conda.py --compare --azure + RET=$(($RET + $?)) ; echo $MSG "DONE" + +fi + +### TYPING ### +if [[ -z "$CHECK" || "$CHECK" == "typing" ]]; then + + echo "mypy --version" + mypy --version + + MSG='Performing static analysis using mypy' ; echo $MSG + mypy pandas + RET=$(($RET + $?)) ; echo $MSG "DONE" +fi + + +exit $RET diff --git a/ci/deps/azure-35-compat.yaml b/ci/deps/azure-35-compat.yaml new file mode 100644 index 0000000000000..dd54001984ec7 --- /dev/null +++ b/ci/deps/azure-35-compat.yaml @@ -0,0 +1,30 @@ +name: pandas-dev +channels: + - defaults + - conda-forge +dependencies: + - beautifulsoup4=4.6.0 + - bottleneck=1.2.1 + - jinja2=2.8 + - numexpr=2.6.2 + - numpy=1.13.3 + - openpyxl=2.4.8 + - pytables=3.4.2 + - python-dateutil=2.6.1 + - python=3.5.3 + - pytz=2017.2 + - scipy=0.19.0 + - xlrd=1.1.0 + - xlsxwriter=0.9.8 + - xlwt=1.2.0 + # universal + - hypothesis>=3.58.0 + - pytest-xdist + - pytest-mock + - pytest-azurepipelines + - pip + - pip: + # for python 3.5, pytest>=4.0.2, cython>=0.29.13 is not available in conda + - cython>=0.29.13 + - pytest==4.5.0 + - html5lib==1.0b2 diff --git a/ci/deps/azure-36-32bit.yaml b/ci/deps/azure-36-32bit.yaml new file mode 100644 index 0000000000000..321cc203961d5 --- /dev/null +++ b/ci/deps/azure-36-32bit.yaml @@ -0,0 +1,22 @@ +name: pandas-dev +channels: + - defaults + - conda-forge +dependencies: + - gcc_linux-32 + - gcc_linux-32 + - gxx_linux-32 + - numpy=1.14.* + - python-dateutil + - python=3.6.* + - pytz=2017.2 + # universal + - pytest>=4.0.2,<5.0.0 + - pytest-xdist + - pytest-mock + - pytest-azurepipelines + - hypothesis>=3.58.0 + - pip + - pip: + # Anaconda doesn't build a new enough Cython + - cython>=0.29.13 diff --git a/ci/deps/azure-36-locale.yaml b/ci/deps/azure-36-locale.yaml new file mode 100644 index 0000000000000..76868f598f11b --- /dev/null +++ b/ci/deps/azure-36-locale.yaml @@ -0,0 +1,30 @@ +name: pandas-dev +channels: + - defaults + - conda-forge +dependencies: + - beautifulsoup4==4.6.0 + - bottleneck=1.2.* + - cython=0.29.13 + - lxml + - matplotlib=2.2.2 + - numpy=1.14.* + - openpyxl=2.4.8 + - python-dateutil + - python-blosc + - python=3.6.* + - pytz=2017.2 + - scipy + - sqlalchemy=1.1.4 + - xlrd=1.1.0 + - xlsxwriter=0.9.8 + - xlwt=1.2.0 + # universal + - pytest>=5.0.0 + - pytest-xdist>=1.29.0 + - pytest-mock + - pytest-azurepipelines + - hypothesis>=3.58.0 + - pip + - pip: + - html5lib==1.0b2 diff --git a/ci/deps/azure-36-locale_slow.yaml b/ci/deps/azure-36-locale_slow.yaml new file mode 100644 index 0000000000000..21205375204dc --- /dev/null +++ b/ci/deps/azure-36-locale_slow.yaml @@ -0,0 +1,36 @@ +name: pandas-dev +channels: + - defaults + - conda-forge +dependencies: + - beautifulsoup4 + - cython>=0.29.13 + - gcsfs + - html5lib + - ipython + - jinja2 + - lxml + - matplotlib=3.0.* + - nomkl + - numexpr + - numpy=1.15.* + - openpyxl + - pytables + - python-dateutil + - python=3.6.* + - pytz + - s3fs + - scipy + - xarray + - xlrd + - xlsxwriter + - xlwt + # universal + - pytest>=4.0.2 + - pytest-xdist + - pytest-mock + - pytest-azurepipelines + - moto + - pip + - pip: + - hypothesis>=3.58.0 diff --git a/ci/deps/azure-37-locale.yaml b/ci/deps/azure-37-locale.yaml new file mode 100644 index 0000000000000..24464adb74f5b --- /dev/null +++ b/ci/deps/azure-37-locale.yaml @@ -0,0 +1,35 @@ +name: pandas-dev +channels: + - defaults + - conda-forge +dependencies: + - beautifulsoup4 + - cython>=0.29.13 + - html5lib + - ipython + - jinja2 + - lxml + - matplotlib + - moto + - nomkl + - numexpr + - numpy + - openpyxl + - pytables + - python-dateutil + - python=3.7.* + - pytz + - s3fs + - scipy + - xarray + - xlrd + - xlsxwriter + - xlwt + # universal + - pytest>=5.0.1 + - pytest-xdist>=1.29.0 + - pytest-mock + - pytest-azurepipelines + - pip + - pip: + - hypothesis>=3.58.0 diff --git a/ci/deps/azure-37-numpydev.yaml b/ci/deps/azure-37-numpydev.yaml new file mode 100644 index 0000000000000..0fb06fd43724c --- /dev/null +++ b/ci/deps/azure-37-numpydev.yaml @@ -0,0 +1,22 @@ +name: pandas-dev +channels: + - defaults +dependencies: + - python=3.7.* + - pytz + - Cython>=0.29.13 + # universal + # pytest < 5 until defaults has pytest-xdist>=1.29.0 + - pytest>=4.0.2,<5.0 + - pytest-xdist + - pytest-mock + - hypothesis>=3.58.0 + - pip + - pip: + - "git+git://github.com/dateutil/dateutil.git" + - "-f https://7933911d6844c6c53a7d-47bd50c35cd79bd838daf386af554a83.ssl.cf2.rackcdn.com" + - "--pre" + - "numpy" + - "scipy" + # https://github.com/pandas-dev/pandas/issues/27421 + - pytest-azurepipelines<1.0.0 diff --git a/ci/deps/azure-macos-35.yaml b/ci/deps/azure-macos-35.yaml new file mode 100644 index 0000000000000..4e0f09904b695 --- /dev/null +++ b/ci/deps/azure-macos-35.yaml @@ -0,0 +1,35 @@ +name: pandas-dev +channels: + - defaults +dependencies: + - beautifulsoup4 + - bottleneck + - html5lib + - jinja2 + - lxml + - matplotlib=2.2.3 + - nomkl + - numexpr + - numpy=1.13.3 + - openpyxl + - pyarrow + - pytables + - python=3.5.* + - python-dateutil==2.6.1 + - pytz + - xarray + - xlrd + - xlsxwriter + - xlwt + - pip + - pip: + # Anaconda / conda-forge don't build for 3.5 + - cython>=0.29.13 + - pyreadstat + # universal + - pytest>=5.0.1 + - pytest-xdist>=1.29.0 + - pytest-mock + - hypothesis>=3.58.0 + # https://github.com/pandas-dev/pandas/issues/27421 + - pytest-azurepipelines<1.0.0 diff --git a/ci/deps/azure-windows-36.yaml b/ci/deps/azure-windows-36.yaml new file mode 100644 index 0000000000000..88b38aaef237c --- /dev/null +++ b/ci/deps/azure-windows-36.yaml @@ -0,0 +1,28 @@ +name: pandas-dev +channels: + - conda-forge + - defaults +dependencies: + - blosc + - bottleneck + - fastparquet>=0.2.1 + - matplotlib=3.0.2 + - numexpr + - numpy=1.15.* + - openpyxl + - pyarrow + - pytables + - python-dateutil + - python=3.6.* + - pytz + - scipy + - xlrd + - xlsxwriter + - xlwt + # universal + - cython>=0.29.13 + - pytest>=5.0.1 + - pytest-xdist>=1.29.0 + - pytest-mock + - pytest-azurepipelines + - hypothesis>=3.58.0 diff --git a/ci/deps/azure-windows-37.yaml b/ci/deps/azure-windows-37.yaml new file mode 100644 index 0000000000000..7680ed9fd9c92 --- /dev/null +++ b/ci/deps/azure-windows-37.yaml @@ -0,0 +1,34 @@ +name: pandas-dev +channels: + - defaults + - conda-forge +dependencies: + - beautifulsoup4 + - bottleneck + - gcsfs + - html5lib + - jinja2 + - lxml + - matplotlib=2.2.* + - moto + - numexpr + - numpy=1.14.* + - openpyxl + - pytables + - python=3.7.* + - python-dateutil + - pytz + - s3fs + - scipy + - sqlalchemy + - xlrd + - xlsxwriter + - xlwt + # universal + - cython>=0.29.13 + - pytest>=5.0.0 + - pytest-xdist>=1.29.0 + - pytest-mock + - pytest-azurepipelines + - hypothesis>=3.58.0 + - pyreadstat diff --git a/ci/deps/travis-36-cov.yaml b/ci/deps/travis-36-cov.yaml new file mode 100644 index 0000000000000..b2a74fceaf0fa --- /dev/null +++ b/ci/deps/travis-36-cov.yaml @@ -0,0 +1,52 @@ +name: pandas-dev +channels: + - defaults + - conda-forge +dependencies: + - beautifulsoup4 + - botocore>=1.11 + - cython>=0.29.13 + - dask + - fastparquet>=0.2.1 + - gcsfs + - geopandas + - html5lib + - matplotlib + - moto + - nomkl + - numexpr + - numpy=1.15.* + - odfpy + - openpyxl + - pandas-gbq + # https://github.com/pydata/pandas-gbq/issues/271 + - google-cloud-bigquery<=1.11 + - psycopg2 + # pyarrow segfaults on load: https://github.com/pandas-dev/pandas/issues/26716 + # - pyarrow=0.9.0 + - pymysql + - pytables + - python-snappy + - python=3.6.* + - pytz + - s3fs + - scikit-learn + - scipy + - sqlalchemy + - statsmodels + - xarray + - xlrd + - xlsxwriter + - xlwt + # universal + - pytest>=5.0.1 + - pytest-xdist>=1.29.0 + - pytest-cov + - pytest-mock + - hypothesis>=3.58.0 + - pip + - pip: + - brotlipy + - coverage + - pandas-datareader + - python-dateutil diff --git a/ci/deps/travis-36-locale.yaml b/ci/deps/travis-36-locale.yaml new file mode 100644 index 0000000000000..44795766d7c31 --- /dev/null +++ b/ci/deps/travis-36-locale.yaml @@ -0,0 +1,42 @@ +name: pandas-dev +channels: + - defaults + - conda-forge +dependencies: + - beautifulsoup4 + - blosc=1.14.3 + - python-blosc + - cython>=0.29.13 + - fastparquet=0.2.1 + - gcsfs=0.2.2 + - html5lib + - ipython + - jinja2 + - lxml=3.8.0 + - matplotlib=3.0.* + - moto + - nomkl + - numexpr + - numpy + - openpyxl + - pandas-gbq=0.8.0 + - psycopg2=2.6.2 + - pymysql=0.7.11 + - pytables + - python-dateutil + - python=3.6.* + - pytz + - s3fs=0.0.8 + - scipy + - sqlalchemy=1.1.4 + - xarray=0.10 + - xlrd + - xlsxwriter + - xlwt + # universal + - pytest>=5.0.1 + - pytest-xdist>=1.29.0 + - pytest-mock + - pip + - pip: + - hypothesis>=3.58.0 diff --git a/ci/deps/travis-36-slow.yaml b/ci/deps/travis-36-slow.yaml new file mode 100644 index 0000000000000..e9c5dadbc924a --- /dev/null +++ b/ci/deps/travis-36-slow.yaml @@ -0,0 +1,32 @@ +name: pandas-dev +channels: + - defaults + - conda-forge +dependencies: + - beautifulsoup4 + - cython>=0.29.13 + - html5lib + - lxml + - matplotlib + - numexpr + - numpy + - openpyxl + - patsy + - psycopg2 + - pymysql + - pytables + - python-dateutil + - python=3.6.* + - pytz + - s3fs + - scipy + - sqlalchemy + - xlrd + - xlsxwriter + - xlwt + # universal + - pytest>=5.0.0 + - pytest-xdist>=1.29.0 + - pytest-mock + - moto + - hypothesis>=3.58.0 diff --git a/ci/deps/travis-37.yaml b/ci/deps/travis-37.yaml new file mode 100644 index 0000000000000..903636f2fe060 --- /dev/null +++ b/ci/deps/travis-37.yaml @@ -0,0 +1,24 @@ +name: pandas-dev +channels: + - defaults + - conda-forge + - c3i_test +dependencies: + - python=3.7.* + - botocore>=1.11 + - cython>=0.29.13 + - numpy + - python-dateutil + - nomkl + - pyarrow + - pytz + # universal + - pytest>=5.0.0 + - pytest-xdist>=1.29.0 + - pytest-mock + - hypothesis>=3.58.0 + - s3fs + - pip + - pyreadstat + - pip: + - moto diff --git a/ci/environment-dev.yaml b/ci/environment-dev.yaml deleted file mode 100644 index c72abd0c19516..0000000000000 --- a/ci/environment-dev.yaml +++ /dev/null @@ -1,14 +0,0 @@ -name: pandas-dev -channels: - - defaults - - conda-forge -dependencies: - - Cython - - NumPy - - moto - - pytest>=3.1 - - python-dateutil>=2.5.0 - - python=3 - - pytz - - setuptools>=3.3 - - sphinx diff --git a/ci/incremental/build.cmd b/ci/incremental/build.cmd new file mode 100644 index 0000000000000..b61b59e287299 --- /dev/null +++ b/ci/incremental/build.cmd @@ -0,0 +1,9 @@ +@rem https://github.com/numba/numba/blob/master/buildscripts/incremental/build.cmd + +@rem Build extensions +python setup.py build_ext -q -i + +@rem Install pandas +python -m pip install --no-build-isolation -e . + +if %errorlevel% neq 0 exit /b %errorlevel% diff --git a/ci/install.ps1 b/ci/install.ps1 deleted file mode 100644 index 64ec7f81884cd..0000000000000 --- a/ci/install.ps1 +++ /dev/null @@ -1,92 +0,0 @@ -# Sample script to install Miniconda under Windows -# Authors: Olivier Grisel, Jonathan Helmus and Kyle Kastner, Robert McGibbon -# License: CC0 1.0 Universal: http://creativecommons.org/publicdomain/zero/1.0/ - -$MINICONDA_URL = "http://repo.continuum.io/miniconda/" - - -function DownloadMiniconda ($python_version, $platform_suffix) { - $webclient = New-Object System.Net.WebClient - $filename = "Miniconda3-latest-Windows-" + $platform_suffix + ".exe" - $url = $MINICONDA_URL + $filename - - $basedir = $pwd.Path + "\" - $filepath = $basedir + $filename - if (Test-Path $filename) { - Write-Host "Reusing" $filepath - return $filepath - } - - # Download and retry up to 3 times in case of network transient errors. - Write-Host "Downloading" $filename "from" $url - $retry_attempts = 2 - for($i=0; $i -lt $retry_attempts; $i++){ - try { - $webclient.DownloadFile($url, $filepath) - break - } - Catch [Exception]{ - Start-Sleep 1 - } - } - if (Test-Path $filepath) { - Write-Host "File saved at" $filepath - } else { - # Retry once to get the error message if any at the last try - $webclient.DownloadFile($url, $filepath) - } - return $filepath -} - - -function InstallMiniconda ($python_version, $architecture, $python_home) { - Write-Host "Installing Python" $python_version "for" $architecture "bit architecture to" $python_home - if (Test-Path $python_home) { - Write-Host $python_home "already exists, skipping." - return $false - } - if ($architecture -match "32") { - $platform_suffix = "x86" - } else { - $platform_suffix = "x86_64" - } - - $filepath = DownloadMiniconda $python_version $platform_suffix - Write-Host "Installing" $filepath "to" $python_home - $install_log = $python_home + ".log" - $args = "/S /D=$python_home" - Write-Host $filepath $args - Start-Process -FilePath $filepath -ArgumentList $args -Wait -Passthru - if (Test-Path $python_home) { - Write-Host "Python $python_version ($architecture) installation complete" - } else { - Write-Host "Failed to install Python in $python_home" - Get-Content -Path $install_log - Exit 1 - } -} - - -function InstallCondaPackages ($python_home, $spec) { - $conda_path = $python_home + "\Scripts\conda.exe" - $args = "install --yes " + $spec - Write-Host ("conda " + $args) - Start-Process -FilePath "$conda_path" -ArgumentList $args -Wait -Passthru -} - -function UpdateConda ($python_home) { - $conda_path = $python_home + "\Scripts\conda.exe" - Write-Host "Updating conda..." - $args = "update --yes conda" - Write-Host $conda_path $args - Start-Process -FilePath "$conda_path" -ArgumentList $args -Wait -Passthru -} - - -function main () { - InstallMiniconda "3.5" $env:PYTHON_ARCH $env:CONDA_ROOT - UpdateConda $env:CONDA_ROOT - InstallCondaPackages $env:CONDA_ROOT "conda-build jinja2 anaconda-client" -} - -main diff --git a/ci/install_circle.sh b/ci/install_circle.sh deleted file mode 100755 index fd79f907625e9..0000000000000 --- a/ci/install_circle.sh +++ /dev/null @@ -1,86 +0,0 @@ -#!/usr/bin/env bash - -home_dir=$(pwd) -echo "[home_dir: $home_dir]" - -echo "[ls -ltr]" -ls -ltr - -echo "[Using clean Miniconda install]" -rm -rf "$MINICONDA_DIR" - -# install miniconda -wget http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -q -O miniconda.sh || exit 1 -bash miniconda.sh -b -p "$MINICONDA_DIR" || exit 1 - -export PATH="$MINICONDA_DIR/bin:$PATH" - -echo "[update conda]" -conda config --set ssl_verify false || exit 1 -conda config --set always_yes true --set changeps1 false || exit 1 -conda update -q conda - -# add the pandas channel to take priority -# to add extra packages -echo "[add channels]" -conda config --add channels pandas || exit 1 -conda config --remove channels defaults || exit 1 -conda config --add channels defaults || exit 1 - -# Useful for debugging any issues with conda -conda info -a || exit 1 - -# support env variables passed -export ENVS_FILE=".envs" - -# make sure that the .envs file exists. it is ok if it is empty -touch $ENVS_FILE - -# assume all command line arguments are environmental variables -for var in "$@" -do - echo "export $var" >> $ENVS_FILE -done - -echo "[environmental variable file]" -cat $ENVS_FILE -source $ENVS_FILE - -export REQ_BUILD=ci/requirements-${JOB}.build -export REQ_RUN=ci/requirements-${JOB}.run -export REQ_PIP=ci/requirements-${JOB}.pip - -# edit the locale override if needed -if [ -n "$LOCALE_OVERRIDE" ]; then - echo "[Adding locale to the first line of pandas/__init__.py]" - rm -f pandas/__init__.pyc - sedc="3iimport locale\nlocale.setlocale(locale.LC_ALL, '$LOCALE_OVERRIDE')\n" - sed -i "$sedc" pandas/__init__.py - echo "[head -4 pandas/__init__.py]" - head -4 pandas/__init__.py - echo -fi - -# create envbuild deps -echo "[create env: ${REQ_BUILD}]" -time conda create -n pandas -q --file=${REQ_BUILD} || exit 1 -time conda install -n pandas pytest>=3.1.0 || exit 1 - -source activate pandas -time pip install moto || exit 1 - -# build but don't install -echo "[build em]" -time python setup.py build_ext --inplace || exit 1 - -# we may have run installations -echo "[conda installs: ${REQ_RUN}]" -if [ -e ${REQ_RUN} ]; then - time conda install -q --file=${REQ_RUN} || exit 1 -fi - -# we may have additional pip installs -echo "[pip installs: ${REQ_PIP}]" -if [ -e ${REQ_PIP} ]; then - pip install -r $REQ_PIP -fi diff --git a/ci/install_db_circle.sh b/ci/install_db_circle.sh deleted file mode 100755 index a00f74f009f54..0000000000000 --- a/ci/install_db_circle.sh +++ /dev/null @@ -1,8 +0,0 @@ -#!/bin/bash - -echo "installing dbs" -mysql -e 'create database pandas_nosetest;' -psql -c 'create database pandas_nosetest;' -U postgres - -echo "done" -exit 0 diff --git a/ci/install_db_travis.sh b/ci/install_db_travis.sh deleted file mode 100755 index e4e6d7a5a9b85..0000000000000 --- a/ci/install_db_travis.sh +++ /dev/null @@ -1,13 +0,0 @@ -#!/bin/bash - -if [ "${TRAVIS_OS_NAME}" != "linux" ]; then - echo "not using dbs on non-linux" - exit 0 -fi - -echo "installing dbs" -mysql -e 'create database pandas_nosetest;' -psql -c 'create database pandas_nosetest;' -U postgres - -echo "done" -exit 0 diff --git a/ci/install_travis.sh b/ci/install_travis.sh deleted file mode 100755 index 6e270519e60c3..0000000000000 --- a/ci/install_travis.sh +++ /dev/null @@ -1,203 +0,0 @@ -#!/bin/bash - -# edit the locale file if needed -function edit_init() -{ - if [ -n "$LOCALE_OVERRIDE" ]; then - echo "[Adding locale to the first line of pandas/__init__.py]" - rm -f pandas/__init__.pyc - sedc="3iimport locale\nlocale.setlocale(locale.LC_ALL, '$LOCALE_OVERRIDE')\n" - sed -i "$sedc" pandas/__init__.py - echo "[head -4 pandas/__init__.py]" - head -4 pandas/__init__.py - echo - fi -} - -echo -echo "[install_travis]" -edit_init - -home_dir=$(pwd) -echo -echo "[home_dir]: $home_dir" - -# install miniconda -MINICONDA_DIR="$HOME/miniconda3" - -echo -echo "[Using clean Miniconda install]" - -if [ -d "$MINICONDA_DIR" ]; then - rm -rf "$MINICONDA_DIR" -fi - -# install miniconda -if [ "${TRAVIS_OS_NAME}" == "osx" ]; then - time wget http://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -q -O miniconda.sh || exit 1 -else - time wget http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -q -O miniconda.sh || exit 1 -fi -time bash miniconda.sh -b -p "$MINICONDA_DIR" || exit 1 - -echo -echo "[show conda]" -which conda - -echo -echo "[update conda]" -conda config --set ssl_verify false || exit 1 -conda config --set quiet true --set always_yes true --set changeps1 false || exit 1 -conda update -q conda - -if [ "$CONDA_BUILD_TEST" ]; then - echo - echo "[installing conda-build]" - conda install conda-build -fi - -echo -echo "[add channels]" -conda config --remove channels defaults || exit 1 -conda config --add channels defaults || exit 1 - -if [ "$CONDA_FORGE" ]; then - # add conda-forge channel as priority - conda config --add channels conda-forge || exit 1 -fi - -# Useful for debugging any issues with conda -conda info -a || exit 1 - -# set the compiler cache to work -echo -if [ -z "$NOCACHE" ] && [ "${TRAVIS_OS_NAME}" == "linux" ]; then - echo "[Using ccache]" - export PATH=/usr/lib/ccache:/usr/lib64/ccache:$PATH - gcc=$(which gcc) - echo "[gcc]: $gcc" - ccache=$(which ccache) - echo "[ccache]: $ccache" - export CC='ccache gcc' -elif [ -z "$NOCACHE" ] && [ "${TRAVIS_OS_NAME}" == "osx" ]; then - echo "[Install ccache]" - brew install ccache > /dev/null 2>&1 - echo "[Using ccache]" - export PATH=/usr/local/opt/ccache/libexec:$PATH - gcc=$(which gcc) - echo "[gcc]: $gcc" - ccache=$(which ccache) - echo "[ccache]: $ccache" -else - echo "[Not using ccache]" -fi - -echo -echo "[create env]" - -# create our environment -REQ="ci/requirements-${JOB}.build" -time conda create -n pandas --file=${REQ} || exit 1 - -source activate pandas - -# may have addtl installation instructions for this build -echo -echo "[build addtl installs]" -REQ="ci/requirements-${JOB}.build.sh" -if [ -e ${REQ} ]; then - time bash $REQ || exit 1 -fi - -time conda install -n pandas pytest>=3.1.0 -time pip install -q pytest-xdist moto - -if [ "$LINT" ]; then - conda install flake8=3.4.1 - pip install cpplint -fi - -if [ "$COVERAGE" ]; then - pip install coverage pytest-cov -fi - -echo -if [ -z "$PIP_BUILD_TEST" ] && [ -z "$CONDA_BUILD_TEST" ]; then - - # build but don't install - echo "[build em]" - time python setup.py build_ext --inplace || exit 1 - -fi - -# we may have run installations -echo -echo "[conda installs]" -REQ="ci/requirements-${JOB}.run" -if [ -e ${REQ} ]; then - time conda install -n pandas --file=${REQ} || exit 1 -fi - -# we may have additional pip installs -echo -echo "[pip installs]" -REQ="ci/requirements-${JOB}.pip" -if [ -e ${REQ} ]; then - pip install -r $REQ -fi - -# may have addtl installation instructions for this build -echo -echo "[addtl installs]" -REQ="ci/requirements-${JOB}.sh" -if [ -e ${REQ} ]; then - time bash $REQ || exit 1 -fi - -# remove any installed pandas package -# w/o removing anything else -echo -echo "[removing installed pandas]" -conda remove pandas -y --force -pip uninstall -y pandas - -echo -echo "[no installed pandas]" -conda list pandas -pip list --format columns |grep pandas - -# build and install -echo - -if [ "$PIP_BUILD_TEST" ]; then - - # build & install testing - echo "[building release]" - time bash scripts/build_dist_for_release.sh || exit 1 - conda uninstall -y cython - time pip install dist/*tar.gz || exit 1 - -elif [ "$CONDA_BUILD_TEST" ]; then - - # build & install testing - echo "[building conda recipe]" - time conda build ./conda.recipe --python 3.5 -q --no-test || exit 1 - - echo "[installing]" - conda install pandas --use-local || exit 1 - -else - - # install our pandas - echo "[running setup.py develop]" - python setup.py develop || exit 1 - -fi - -echo -echo "[show pandas]" -conda list pandas - -echo -echo "[done]" -exit 0 diff --git a/ci/lint.sh b/ci/lint.sh deleted file mode 100755 index 49bf9a690b990..0000000000000 --- a/ci/lint.sh +++ /dev/null @@ -1,156 +0,0 @@ -#!/bin/bash - -echo "inside $0" - -source activate pandas - -RET=0 - -if [ "$LINT" ]; then - - # pandas/_libs/src is C code, so no need to search there. - echo "Linting *.py" - flake8 pandas --filename=*.py --exclude pandas/_libs/src - if [ $? -ne "0" ]; then - RET=1 - fi - echo "Linting *.py DONE" - - echo "Linting setup.py" - flake8 setup.py - if [ $? -ne "0" ]; then - RET=1 - fi - echo "Linting setup.py DONE" - - echo "Linting asv_bench/benchmarks/" - flake8 asv_bench/benchmarks/ --exclude=asv_bench/benchmarks/*.py --ignore=F811 - if [ $? -ne "0" ]; then - RET=1 - fi - echo "Linting asv_bench/benchmarks/*.py DONE" - - echo "Linting scripts/*.py" - flake8 scripts --filename=*.py - if [ $? -ne "0" ]; then - RET=1 - fi - echo "Linting scripts/*.py DONE" - - echo "Linting *.pyx" - flake8 pandas --filename=*.pyx --select=E501,E302,E203,E111,E114,E221,E303,E128,E231,E126,E265,E305,E301,E127,E261,E271,E129,W291,E222,E241,E123,F403 - if [ $? -ne "0" ]; then - RET=1 - fi - echo "Linting *.pyx DONE" - - echo "Linting *.pxi.in" - for path in 'src' - do - echo "linting -> pandas/$path" - flake8 pandas/$path --filename=*.pxi.in --select=E501,E302,E203,E111,E114,E221,E303,E231,E126,F403 - if [ $? -ne "0" ]; then - RET=1 - fi - done - echo "Linting *.pxi.in DONE" - - echo "Linting *.pxd" - for path in '_libs' - do - echo "linting -> pandas/$path" - flake8 pandas/$path --filename=*.pxd --select=E501,E302,E203,E111,E114,E221,E303,E231,E126,F403 - if [ $? -ne "0" ]; then - RET=1 - fi - done - echo "Linting *.pxd DONE" - - # readability/casting: Warnings about C casting instead of C++ casting - # runtime/int: Warnings about using C number types instead of C++ ones - # build/include_subdir: Warnings about prefacing included header files with directory - - # We don't lint all C files because we don't want to lint any that are built - # from Cython files nor do we want to lint C files that we didn't modify for - # this particular codebase (e.g. src/headers, src/klib, src/msgpack). However, - # we can lint all header files since they aren't "generated" like C files are. - echo "Linting *.c and *.h" - for path in '*.h' 'period_helper.c' 'datetime' 'parser' 'ujson' - do - echo "linting -> pandas/_libs/src/$path" - cpplint --quiet --extensions=c,h --headers=h --filter=-readability/casting,-runtime/int,-build/include_subdir --recursive pandas/_libs/src/$path - if [ $? -ne "0" ]; then - RET=1 - fi - done - echo "Linting *.c and *.h DONE" - - echo "Check for invalid testing" - - # Check for the following code in testing: - # - # np.testing - # np.array_equal - grep -r -E --include '*.py' --exclude testing.py '(numpy|np)(\.testing|\.array_equal)' pandas/tests/ - - if [ $? = "0" ]; then - RET=1 - fi - - # Check for pytest.warns - grep -r -E --include '*.py' 'pytest\.warns' pandas/tests/ - - if [ $? = "0" ]; then - RET=1 - fi - - echo "Check for invalid testing DONE" - - # Check for imports from pandas.core.common instead - # of `import pandas.core.common as com` - echo "Check for non-standard imports" - grep -R --include="*.py*" -E "from pandas.core.common import " pandas - if [ $? = "0" ]; then - RET=1 - fi - echo "Check for non-standard imports DONE" - - echo "Check for use of lists instead of generators in built-in Python functions" - - # Example: Avoid `any([i for i in some_iterator])` in favor of `any(i for i in some_iterator)` - # - # Check the following functions: - # any(), all(), sum(), max(), min(), list(), dict(), set(), frozenset(), tuple(), str.join() - grep -R --include="*.py*" -E "[^_](any|all|sum|max|min|list|dict|set|frozenset|tuple|join)\(\[.* for .* in .*\]\)" pandas - - if [ $? = "0" ]; then - RET=1 - fi - echo "Check for use of lists instead of generators in built-in Python functions DONE" - - echo "Check for incorrect sphinx directives" - SPHINX_DIRECTIVES=$(echo \ - "autosummary|contents|currentmodule|deprecated|function|image|"\ - "important|include|ipython|literalinclude|math|module|note|raw|"\ - "seealso|toctree|versionadded|versionchanged|warning" | tr -d "[:space:]") - for path in './pandas' './doc/source' - do - grep -R --include="*.py" --include="*.pyx" --include="*.rst" -E "\.\. ($SPHINX_DIRECTIVES):[^:]" $path - if [ $? = "0" ]; then - RET=1 - fi - done - echo "Check for incorrect sphinx directives DONE" - - echo "Check for deprecated messages without sphinx directive" - grep -R --include="*.py" --include="*.pyx" -E "(DEPRECATED|DEPRECATE|Deprecated)(:|,|\.)" pandas - - if [ $? = "0" ]; then - RET=1 - fi - echo "Check for deprecated messages without sphinx directive DONE" -else - echo "NOT Linting" -fi - -exit $RET diff --git a/ci/print_skipped.py b/ci/print_skipped.py index dd2180f6eeb19..6bc1dcfcd320d 100755 --- a/ci/print_skipped.py +++ b/ci/print_skipped.py @@ -1,7 +1,8 @@ #!/usr/bin/env python -import sys import math +import os +import sys import xml.etree.ElementTree as et @@ -10,43 +11,42 @@ def parse_results(filename): root = tree.getroot() skipped = [] - current_class = old_class = '' + current_class = "" i = 1 assert i - 1 == len(skipped) - for el in root.findall('testcase'): - cn = el.attrib['classname'] - for sk in el.findall('skipped'): + for el in root.findall("testcase"): + cn = el.attrib["classname"] + for sk in el.findall("skipped"): old_class = current_class current_class = cn - name = '{classname}.{name}'.format(classname=current_class, - name=el.attrib['name']) - msg = sk.attrib['message'] - out = '' + name = "{classname}.{name}".format( + classname=current_class, name=el.attrib["name"] + ) + msg = sk.attrib["message"] + out = "" if old_class != current_class: ndigits = int(math.log(i, 10) + 1) - out += ('-' * (len(name + msg) + 4 + ndigits) + '\n') # 4 for : + space + # + space - out += '#{i} {name}: {msg}'.format(i=i, name=name, msg=msg) + + # 4 for : + space + # + space + out += "-" * (len(name + msg) + 4 + ndigits) + "\n" + out += "#{i} {name}: {msg}".format(i=i, name=name, msg=msg) skipped.append(out) i += 1 assert i - 1 == len(skipped) assert i - 1 == len(skipped) # assert len(skipped) == int(root.attrib['skip']) - return '\n'.join(skipped) + return "\n".join(skipped) -def main(args): - print('SKIPPED TESTS:') - for fn in args.filename: - print(parse_results(fn)) - return 0 - +def main(): + test_files = ["test-data-single.xml", "test-data-multiple.xml", "test-data.xml"] -def parse_args(): - import argparse - parser = argparse.ArgumentParser() - parser.add_argument('filename', nargs='+', help='XUnit file to parse') - return parser.parse_args() + print("SKIPPED TESTS:") + for fn in test_files: + if os.path.isfile(fn): + print(parse_results(fn)) + return 0 -if __name__ == '__main__': - sys.exit(main(parse_args())) +if __name__ == "__main__": + sys.exit(main()) diff --git a/ci/print_versions.py b/ci/print_versions.py deleted file mode 100755 index 8be795174d76d..0000000000000 --- a/ci/print_versions.py +++ /dev/null @@ -1,28 +0,0 @@ -#!/usr/bin/env python - - -def show_versions(as_json=False): - import imp - import os - fn = __file__ - this_dir = os.path.dirname(fn) - pandas_dir = os.path.abspath(os.path.join(this_dir, "..")) - sv_path = os.path.join(pandas_dir, 'pandas', 'util') - mod = imp.load_module( - 'pvmod', *imp.find_module('print_versions', [sv_path])) - return mod.show_versions(as_json) - - -if __name__ == '__main__': - # optparse is 2.6-safe - from optparse import OptionParser - parser = OptionParser() - parser.add_option("-j", "--json", metavar="FILE", nargs=1, - help="Save output as JSON into file, pass in '-' to output to stdout") - - (options, args) = parser.parse_args() - - if options.json == "-": - options.json = True - - show_versions(as_json=options.json) diff --git a/ci/requirements-2.7.build b/ci/requirements-2.7.build deleted file mode 100644 index 17d34f3895c64..0000000000000 --- a/ci/requirements-2.7.build +++ /dev/null @@ -1,6 +0,0 @@ -python=2.7* -python-dateutil=2.5.0 -pytz=2013b -nomkl -numpy=1.13* -cython=0.24 diff --git a/ci/requirements-2.7.pip b/ci/requirements-2.7.pip deleted file mode 100644 index 876d9e978fa84..0000000000000 --- a/ci/requirements-2.7.pip +++ /dev/null @@ -1,10 +0,0 @@ -blosc -pandas-gbq -html5lib -beautifulsoup4 -pathlib -backports.lzma -py -PyCrypto -mock -ipython diff --git a/ci/requirements-2.7.run b/ci/requirements-2.7.run deleted file mode 100644 index 7c10b98fb6e14..0000000000000 --- a/ci/requirements-2.7.run +++ /dev/null @@ -1,20 +0,0 @@ -python-dateutil=2.5.0 -pytz=2013b -numpy -xlwt=0.7.5 -numexpr -pytables -matplotlib -openpyxl=2.4.0 -xlrd=0.9.2 -sqlalchemy=0.9.6 -lxml -scipy -xlsxwriter=0.5.2 -s3fs -bottleneck -psycopg2 -patsy -pymysql=0.6.3 -jinja2=2.8 -xarray=0.8.0 diff --git a/ci/requirements-2.7.sh b/ci/requirements-2.7.sh deleted file mode 100644 index e3bd5e46026c5..0000000000000 --- a/ci/requirements-2.7.sh +++ /dev/null @@ -1,7 +0,0 @@ -#!/bin/bash - -source activate pandas - -echo "install 27" - -conda install -n pandas -c conda-forge feather-format pyarrow=0.4.1 fastparquet diff --git a/ci/requirements-2.7_COMPAT.build b/ci/requirements-2.7_COMPAT.build deleted file mode 100644 index 0a83a7346e8b5..0000000000000 --- a/ci/requirements-2.7_COMPAT.build +++ /dev/null @@ -1,5 +0,0 @@ -python=2.7* -numpy=1.9.2 -cython=0.24 -python-dateutil=2.5.0 -pytz=2013b diff --git a/ci/requirements-2.7_COMPAT.pip b/ci/requirements-2.7_COMPAT.pip deleted file mode 100644 index 13cd35a923124..0000000000000 --- a/ci/requirements-2.7_COMPAT.pip +++ /dev/null @@ -1,4 +0,0 @@ -html5lib==1.0b2 -beautifulsoup4==4.2.0 -openpyxl -argparse diff --git a/ci/requirements-2.7_COMPAT.run b/ci/requirements-2.7_COMPAT.run deleted file mode 100644 index c3daed6e6e1da..0000000000000 --- a/ci/requirements-2.7_COMPAT.run +++ /dev/null @@ -1,14 +0,0 @@ -numpy=1.9.2 -python-dateutil=2.5.0 -pytz=2013b -scipy=0.14.0 -xlwt=0.7.5 -xlrd=0.9.2 -bottleneck=1.0.0 -numexpr=2.4.4 # we test that we correctly don't use an unsupported numexpr -pytables=3.2.2 -psycopg2 -pymysql=0.6.0 -sqlalchemy=0.7.8 -xlsxwriter=0.5.2 -jinja2=2.8 diff --git a/ci/requirements-2.7_LOCALE.build b/ci/requirements-2.7_LOCALE.build deleted file mode 100644 index a6f2e25387910..0000000000000 --- a/ci/requirements-2.7_LOCALE.build +++ /dev/null @@ -1,5 +0,0 @@ -python=2.7* -python-dateutil -pytz=2013b -numpy=1.9.2 -cython=0.24 diff --git a/ci/requirements-2.7_LOCALE.pip b/ci/requirements-2.7_LOCALE.pip deleted file mode 100644 index 1b825bbf492ca..0000000000000 --- a/ci/requirements-2.7_LOCALE.pip +++ /dev/null @@ -1,3 +0,0 @@ -html5lib==1.0b2 -beautifulsoup4==4.2.1 -blosc diff --git a/ci/requirements-2.7_LOCALE.run b/ci/requirements-2.7_LOCALE.run deleted file mode 100644 index 0a809a7dd6e5d..0000000000000 --- a/ci/requirements-2.7_LOCALE.run +++ /dev/null @@ -1,12 +0,0 @@ -python-dateutil -pytz -numpy=1.9.2 -xlwt=0.7.5 -openpyxl=2.4.0 -xlsxwriter=0.5.2 -xlrd=0.9.2 -bottleneck=1.0.0 -matplotlib=1.4.3 -sqlalchemy=0.8.1 -lxml -scipy diff --git a/ci/requirements-2.7_SLOW.build b/ci/requirements-2.7_SLOW.build deleted file mode 100644 index a665ab9edd585..0000000000000 --- a/ci/requirements-2.7_SLOW.build +++ /dev/null @@ -1,5 +0,0 @@ -python=2.7* -python-dateutil -pytz -numpy=1.10* -cython diff --git a/ci/requirements-2.7_SLOW.run b/ci/requirements-2.7_SLOW.run deleted file mode 100644 index db95a6ccb2314..0000000000000 --- a/ci/requirements-2.7_SLOW.run +++ /dev/null @@ -1,19 +0,0 @@ -python-dateutil -pytz -numpy=1.10* -matplotlib=1.4.3 -scipy -patsy -xlwt -openpyxl -xlsxwriter -xlrd -numexpr -pytables -sqlalchemy -lxml -s3fs -psycopg2 -pymysql -html5lib -beautifulsoup4 diff --git a/ci/requirements-2.7_WIN.run b/ci/requirements-2.7_WIN.run deleted file mode 100644 index c4ca7fc736bb1..0000000000000 --- a/ci/requirements-2.7_WIN.run +++ /dev/null @@ -1,18 +0,0 @@ -dateutil -pytz -numpy=1.10* -xlwt -numexpr -pytables==3.2.2 -matplotlib -openpyxl -xlrd -sqlalchemy -lxml -scipy -xlsxwriter -s3fs -bottleneck -html5lib -beautifulsoup4 -jinja2=2.8 diff --git a/ci/requirements-3.5_ASCII.build b/ci/requirements-3.5_ASCII.build deleted file mode 100644 index f7befe3b31865..0000000000000 --- a/ci/requirements-3.5_ASCII.build +++ /dev/null @@ -1,6 +0,0 @@ -python=3.5* -python-dateutil -pytz -nomkl -numpy -cython diff --git a/ci/requirements-3.5_ASCII.run b/ci/requirements-3.5_ASCII.run deleted file mode 100644 index b9d543f557d06..0000000000000 --- a/ci/requirements-3.5_ASCII.run +++ /dev/null @@ -1,3 +0,0 @@ -python-dateutil -pytz -numpy diff --git a/ci/requirements-3.5_CONDA_BUILD_TEST.build b/ci/requirements-3.5_CONDA_BUILD_TEST.build deleted file mode 100644 index f7befe3b31865..0000000000000 --- a/ci/requirements-3.5_CONDA_BUILD_TEST.build +++ /dev/null @@ -1,6 +0,0 @@ -python=3.5* -python-dateutil -pytz -nomkl -numpy -cython diff --git a/ci/requirements-3.5_CONDA_BUILD_TEST.pip b/ci/requirements-3.5_CONDA_BUILD_TEST.pip deleted file mode 100644 index c9565f2173070..0000000000000 --- a/ci/requirements-3.5_CONDA_BUILD_TEST.pip +++ /dev/null @@ -1,2 +0,0 @@ -xarray==0.9.1 -pandas_gbq diff --git a/ci/requirements-3.5_CONDA_BUILD_TEST.run b/ci/requirements-3.5_CONDA_BUILD_TEST.run deleted file mode 100644 index 669cf437f2164..0000000000000 --- a/ci/requirements-3.5_CONDA_BUILD_TEST.run +++ /dev/null @@ -1,20 +0,0 @@ -pytz -numpy -openpyxl -xlsxwriter -xlrd -xlwt -scipy -numexpr -pytables -html5lib -lxml -matplotlib -jinja2 -bottleneck -sqlalchemy -pymysql -psycopg2 -s3fs -beautifulsoup4 -ipython diff --git a/ci/requirements-3.5_CONDA_BUILD_TEST.sh b/ci/requirements-3.5_CONDA_BUILD_TEST.sh deleted file mode 100644 index 093fdbcf21d78..0000000000000 --- a/ci/requirements-3.5_CONDA_BUILD_TEST.sh +++ /dev/null @@ -1,11 +0,0 @@ -#!/bin/bash - -source activate pandas - -echo "install 35 CONDA_BUILD_TEST" - -# pip install python-dateutil to get latest -conda remove -n pandas python-dateutil --force -pip install python-dateutil - -conda install -n pandas -c conda-forge feather-format pyarrow=0.7.1 diff --git a/ci/requirements-3.5_OSX.build b/ci/requirements-3.5_OSX.build deleted file mode 100644 index f5bc01b67a20a..0000000000000 --- a/ci/requirements-3.5_OSX.build +++ /dev/null @@ -1,4 +0,0 @@ -python=3.5* -nomkl -numpy=1.10.4 -cython diff --git a/ci/requirements-3.5_OSX.pip b/ci/requirements-3.5_OSX.pip deleted file mode 100644 index d1fc1fe24a079..0000000000000 --- a/ci/requirements-3.5_OSX.pip +++ /dev/null @@ -1 +0,0 @@ -python-dateutil==2.5.3 diff --git a/ci/requirements-3.5_OSX.run b/ci/requirements-3.5_OSX.run deleted file mode 100644 index 1d83474d10f2f..0000000000000 --- a/ci/requirements-3.5_OSX.run +++ /dev/null @@ -1,16 +0,0 @@ -pytz -numpy=1.10.4 -openpyxl -xlsxwriter -xlrd -xlwt -numexpr -pytables -html5lib -lxml -matplotlib -jinja2 -bottleneck -xarray -s3fs -beautifulsoup4 diff --git a/ci/requirements-3.5_OSX.sh b/ci/requirements-3.5_OSX.sh deleted file mode 100644 index c2978b175968c..0000000000000 --- a/ci/requirements-3.5_OSX.sh +++ /dev/null @@ -1,7 +0,0 @@ -#!/bin/bash - -source activate pandas - -echo "install 35_OSX" - -conda install -n pandas -c conda-forge feather-format==0.3.1 fastparquet diff --git a/ci/requirements-3.6.build b/ci/requirements-3.6.build deleted file mode 100644 index 1c4b46aea3865..0000000000000 --- a/ci/requirements-3.6.build +++ /dev/null @@ -1,6 +0,0 @@ -python=3.6* -python-dateutil -pytz -nomkl -numpy -cython diff --git a/ci/requirements-3.6.pip b/ci/requirements-3.6.pip deleted file mode 100644 index 753a60d6c119a..0000000000000 --- a/ci/requirements-3.6.pip +++ /dev/null @@ -1 +0,0 @@ -brotlipy diff --git a/ci/requirements-3.6.run b/ci/requirements-3.6.run deleted file mode 100644 index e30461d06b8ea..0000000000000 --- a/ci/requirements-3.6.run +++ /dev/null @@ -1,25 +0,0 @@ -python-dateutil -pytz -numpy -scipy -openpyxl -xlsxwriter -xlrd -xlwt -numexpr -pytables -matplotlib -lxml -html5lib -jinja2 -sqlalchemy -pymysql<0.8.0 -feather-format -pyarrow -psycopg2 -python-snappy -fastparquet -beautifulsoup4 -s3fs -xarray -ipython diff --git a/ci/requirements-3.6.sh b/ci/requirements-3.6.sh deleted file mode 100644 index f5c3dbf59a29d..0000000000000 --- a/ci/requirements-3.6.sh +++ /dev/null @@ -1,7 +0,0 @@ -#!/bin/bash - -source activate pandas - -echo "[install 3.6 downstream deps]" - -conda install -n pandas -c conda-forge pandas-datareader xarray geopandas seaborn statsmodels scikit-learn dask diff --git a/ci/requirements-3.6_ASV.build b/ci/requirements-3.6_ASV.build deleted file mode 100644 index bc72eed2a0d4e..0000000000000 --- a/ci/requirements-3.6_ASV.build +++ /dev/null @@ -1,5 +0,0 @@ -python=3.6* -python-dateutil -pytz -numpy=1.13* -cython diff --git a/ci/requirements-3.6_ASV.run b/ci/requirements-3.6_ASV.run deleted file mode 100644 index 6c45e3371e9cf..0000000000000 --- a/ci/requirements-3.6_ASV.run +++ /dev/null @@ -1,25 +0,0 @@ -ipython -ipykernel -ipywidgets -sphinx=1.5* -nbconvert -nbformat -notebook -matplotlib -seaborn -scipy -lxml -beautifulsoup4 -html5lib -pytables -python-snappy -openpyxl -xlrd -xlwt -xlsxwriter -sqlalchemy -numexpr -bottleneck -statsmodels -xarray -pyqt diff --git a/ci/requirements-3.6_ASV.sh b/ci/requirements-3.6_ASV.sh deleted file mode 100755 index 8a46f85dbb6bc..0000000000000 --- a/ci/requirements-3.6_ASV.sh +++ /dev/null @@ -1,7 +0,0 @@ -#!/bin/bash - -source activate pandas - -echo "[install ASV_BUILD deps]" - -pip install git+https://github.com/spacetelescope/asv diff --git a/ci/requirements-3.6_DOC.build b/ci/requirements-3.6_DOC.build deleted file mode 100644 index bc72eed2a0d4e..0000000000000 --- a/ci/requirements-3.6_DOC.build +++ /dev/null @@ -1,5 +0,0 @@ -python=3.6* -python-dateutil -pytz -numpy=1.13* -cython diff --git a/ci/requirements-3.6_DOC.run b/ci/requirements-3.6_DOC.run deleted file mode 100644 index 6c45e3371e9cf..0000000000000 --- a/ci/requirements-3.6_DOC.run +++ /dev/null @@ -1,25 +0,0 @@ -ipython -ipykernel -ipywidgets -sphinx=1.5* -nbconvert -nbformat -notebook -matplotlib -seaborn -scipy -lxml -beautifulsoup4 -html5lib -pytables -python-snappy -openpyxl -xlrd -xlwt -xlsxwriter -sqlalchemy -numexpr -bottleneck -statsmodels -xarray -pyqt diff --git a/ci/requirements-3.6_DOC.sh b/ci/requirements-3.6_DOC.sh deleted file mode 100644 index aec0f62148622..0000000000000 --- a/ci/requirements-3.6_DOC.sh +++ /dev/null @@ -1,11 +0,0 @@ -#!/bin/bash - -source activate pandas - -echo "[install DOC_BUILD deps]" - -pip install pandas-gbq - -conda install -n pandas -c conda-forge feather-format pyarrow nbsphinx pandoc fastparquet - -conda install -n pandas -c r r rpy2 --yes diff --git a/ci/requirements-3.6_LOCALE.build b/ci/requirements-3.6_LOCALE.build deleted file mode 100644 index 1c4b46aea3865..0000000000000 --- a/ci/requirements-3.6_LOCALE.build +++ /dev/null @@ -1,6 +0,0 @@ -python=3.6* -python-dateutil -pytz -nomkl -numpy -cython diff --git a/ci/requirements-3.6_LOCALE.run b/ci/requirements-3.6_LOCALE.run deleted file mode 100644 index ad54284c6f7e3..0000000000000 --- a/ci/requirements-3.6_LOCALE.run +++ /dev/null @@ -1,22 +0,0 @@ -python-dateutil -pytz -numpy -scipy -openpyxl -xlsxwriter -xlrd -xlwt -numexpr -pytables -matplotlib -lxml -html5lib -jinja2 -sqlalchemy -pymysql -# feather-format (not available on defaults ATM) -psycopg2 -beautifulsoup4 -s3fs -xarray -ipython diff --git a/ci/requirements-3.6_LOCALE_SLOW.build b/ci/requirements-3.6_LOCALE_SLOW.build deleted file mode 100644 index 1c4b46aea3865..0000000000000 --- a/ci/requirements-3.6_LOCALE_SLOW.build +++ /dev/null @@ -1,6 +0,0 @@ -python=3.6* -python-dateutil -pytz -nomkl -numpy -cython diff --git a/ci/requirements-3.6_LOCALE_SLOW.run b/ci/requirements-3.6_LOCALE_SLOW.run deleted file mode 100644 index ad54284c6f7e3..0000000000000 --- a/ci/requirements-3.6_LOCALE_SLOW.run +++ /dev/null @@ -1,22 +0,0 @@ -python-dateutil -pytz -numpy -scipy -openpyxl -xlsxwriter -xlrd -xlwt -numexpr -pytables -matplotlib -lxml -html5lib -jinja2 -sqlalchemy -pymysql -# feather-format (not available on defaults ATM) -psycopg2 -beautifulsoup4 -s3fs -xarray -ipython diff --git a/ci/requirements-3.6_NUMPY_DEV.build b/ci/requirements-3.6_NUMPY_DEV.build deleted file mode 100644 index 336fbe86b57d8..0000000000000 --- a/ci/requirements-3.6_NUMPY_DEV.build +++ /dev/null @@ -1,2 +0,0 @@ -python=3.6* -pytz diff --git a/ci/requirements-3.6_NUMPY_DEV.build.sh b/ci/requirements-3.6_NUMPY_DEV.build.sh deleted file mode 100644 index 9145bf1d3481c..0000000000000 --- a/ci/requirements-3.6_NUMPY_DEV.build.sh +++ /dev/null @@ -1,21 +0,0 @@ -#!/bin/bash - -source activate pandas - -echo "install numpy master wheel" - -# remove the system installed numpy -pip uninstall numpy -y - -# install numpy wheel from master -PRE_WHEELS="https://7933911d6844c6c53a7d-47bd50c35cd79bd838daf386af554a83.ssl.cf2.rackcdn.com" -pip install --pre --upgrade --timeout=60 -f $PRE_WHEELS numpy scipy - -# install dateutil from master -# pip install -U git+git://github.com/dateutil/dateutil.git -pip install dateutil - -# cython via pip -pip install cython - -true diff --git a/ci/requirements-3.6_NUMPY_DEV.run b/ci/requirements-3.6_NUMPY_DEV.run deleted file mode 100644 index af44f198c687e..0000000000000 --- a/ci/requirements-3.6_NUMPY_DEV.run +++ /dev/null @@ -1 +0,0 @@ -pytz diff --git a/ci/requirements-3.6_PIP_BUILD_TEST.build b/ci/requirements-3.6_PIP_BUILD_TEST.build deleted file mode 100644 index 1c4b46aea3865..0000000000000 --- a/ci/requirements-3.6_PIP_BUILD_TEST.build +++ /dev/null @@ -1,6 +0,0 @@ -python=3.6* -python-dateutil -pytz -nomkl -numpy -cython diff --git a/ci/requirements-3.6_PIP_BUILD_TEST.pip b/ci/requirements-3.6_PIP_BUILD_TEST.pip deleted file mode 100644 index f4617133cad5b..0000000000000 --- a/ci/requirements-3.6_PIP_BUILD_TEST.pip +++ /dev/null @@ -1,6 +0,0 @@ -xarray -geopandas -seaborn -pandas_datareader -statsmodels -scikit-learn diff --git a/ci/requirements-3.6_PIP_BUILD_TEST.sh b/ci/requirements-3.6_PIP_BUILD_TEST.sh deleted file mode 100644 index 3a8cf673b32f2..0000000000000 --- a/ci/requirements-3.6_PIP_BUILD_TEST.sh +++ /dev/null @@ -1,7 +0,0 @@ -#!/bin/bash - -source activate pandas - -echo "install 36 PIP_BUILD_TEST" - -conda install -n pandas -c conda-forge pyarrow dask pyqt qtpy diff --git a/ci/requirements-3.6_WIN.run b/ci/requirements-3.6_WIN.run deleted file mode 100644 index 3042888763863..0000000000000 --- a/ci/requirements-3.6_WIN.run +++ /dev/null @@ -1,17 +0,0 @@ -python-dateutil -pytz -numpy=1.13* -bottleneck -openpyxl -xlsxwriter -xlrd -xlwt -scipy -feather-format -numexpr -pytables -matplotlib -blosc -thrift=0.10* -fastparquet -pyarrow diff --git a/ci/requirements-optional-conda.txt b/ci/requirements-optional-conda.txt deleted file mode 100644 index 6edb8d17337e4..0000000000000 --- a/ci/requirements-optional-conda.txt +++ /dev/null @@ -1,27 +0,0 @@ -beautifulsoup4 -blosc -bottleneck -fastparquet -feather-format -html5lib -ipython -ipykernel -jinja2 -lxml -matplotlib -nbsphinx -numexpr -openpyxl -pyarrow -pymysql -pytables -pytest-cov -pytest-xdist -s3fs -scipy -seaborn -sqlalchemy -xarray -xlrd -xlsxwriter -xlwt diff --git a/ci/requirements-optional-pip.txt b/ci/requirements-optional-pip.txt deleted file mode 100644 index 8d4421ba2b681..0000000000000 --- a/ci/requirements-optional-pip.txt +++ /dev/null @@ -1,29 +0,0 @@ -# This file was autogenerated by scripts/convert_deps.py -# Do not modify directly -beautifulsoup4 -blosc -bottleneck -fastparquet -feather-format -html5lib -ipython -ipykernel -jinja2 -lxml -matplotlib -nbsphinx -numexpr -openpyxl -pyarrow -pymysql -tables -pytest-cov -pytest-xdist -s3fs -scipy -seaborn -sqlalchemy -xarray -xlrd -xlsxwriter -xlwt \ No newline at end of file diff --git a/ci/requirements_dev.txt b/ci/requirements_dev.txt deleted file mode 100644 index 82f8de277c57b..0000000000000 --- a/ci/requirements_dev.txt +++ /dev/null @@ -1,10 +0,0 @@ -# This file was autogenerated by scripts/convert_deps.py -# Do not modify directly -Cython -NumPy -moto -pytest>=3.1 -python-dateutil>=2.5.0 -pytz -setuptools>=3.3 -sphinx diff --git a/ci/run_build_docs.sh b/ci/run_build_docs.sh deleted file mode 100755 index 2909b9619552e..0000000000000 --- a/ci/run_build_docs.sh +++ /dev/null @@ -1,10 +0,0 @@ -#!/bin/bash - -echo "inside $0" - -"$TRAVIS_BUILD_DIR"/ci/build_docs.sh 2>&1 - -# wait until subprocesses finish (build_docs.sh) -wait - -exit 0 diff --git a/ci/run_circle.sh b/ci/run_circle.sh deleted file mode 100755 index 435985bd42148..0000000000000 --- a/ci/run_circle.sh +++ /dev/null @@ -1,9 +0,0 @@ -#!/usr/bin/env bash - -echo "[running tests]" -export PATH="$MINICONDA_DIR/bin:$PATH" - -source activate pandas - -echo "pytest --strict --junitxml=$CIRCLE_TEST_REPORTS/reports/junit.xml $@ pandas" -pytest --strict --junitxml=$CIRCLE_TEST_REPORTS/reports/junit.xml $@ pandas diff --git a/ci/run_tests.sh b/ci/run_tests.sh new file mode 100755 index 0000000000000..27d3fcb4cf563 --- /dev/null +++ b/ci/run_tests.sh @@ -0,0 +1,59 @@ +#!/bin/bash + +set -e + +if [ "$DOC" ]; then + echo "We are not running pytest as this is a doc-build" + exit 0 +fi + +# Workaround for pytest-xdist flaky collection order +# https://github.com/pytest-dev/pytest/issues/920 +# https://github.com/pytest-dev/pytest/issues/1075 +export PYTHONHASHSEED=$(python -c 'import random; print(random.randint(1, 4294967295))') + +if [ -n "$LOCALE_OVERRIDE" ]; then + export LC_ALL="$LOCALE_OVERRIDE" + export LANG="$LOCALE_OVERRIDE" + PANDAS_LOCALE=`python -c 'import pandas; pandas.get_option("display.encoding")'` + if [[ "$LOCALE_OVERIDE" != "$PANDAS_LOCALE" ]]; then + echo "pandas could not detect the locale. System locale: $LOCALE_OVERRIDE, pandas detected: $PANDAS_LOCALE" + # TODO Not really aborting the tests until https://github.com/pandas-dev/pandas/issues/23923 is fixed + # exit 1 + fi +fi +if [[ "not network" == *"$PATTERN"* ]]; then + export http_proxy=http://1.2.3.4 https_proxy=http://1.2.3.4; +fi + + +if [ -n "$PATTERN" ]; then + PATTERN=" and $PATTERN" +fi + +for TYPE in single multiple +do + if [ "$COVERAGE" ]; then + COVERAGE_FNAME="/tmp/coc-$TYPE.xml" + COVERAGE="-s --cov=pandas --cov-report=xml:$COVERAGE_FNAME" + fi + + TYPE_PATTERN=$TYPE + NUM_JOBS=1 + if [[ "$TYPE_PATTERN" == "multiple" ]]; then + TYPE_PATTERN="not single" + NUM_JOBS=2 + fi + + PYTEST_CMD="pytest -m \"$TYPE_PATTERN$PATTERN\" -n $NUM_JOBS -s --strict --durations=10 --junitxml=test-data-$TYPE.xml $TEST_ARGS $COVERAGE pandas" + echo $PYTEST_CMD + # if no tests are found (the case of "single and slow"), pytest exits with code 5, and would make the script fail, if not for the below code + sh -c "$PYTEST_CMD; ret=\$?; [ \$ret = 5 ] && exit 0 || exit \$ret" + + # 2019-08-21 disabling because this is hitting HTTP 400 errors GH#27602 + # if [[ "$COVERAGE" && $? == 0 && "$TRAVIS_BRANCH" == "master" ]]; then + # echo "uploading coverage for $TYPE tests" + # echo "bash <(curl -s https://codecov.io/bash) -Z -c -F $TYPE -f $COVERAGE_FNAME" + # bash <(curl -s https://codecov.io/bash) -Z -c -F $TYPE -f $COVERAGE_FNAME + # fi +done diff --git a/ci/run_with_env.cmd b/ci/run_with_env.cmd deleted file mode 100644 index 848f4608c8627..0000000000000 --- a/ci/run_with_env.cmd +++ /dev/null @@ -1,95 +0,0 @@ -:: EXPECTED ENV VARS: PYTHON_ARCH (either x86 or x64) -:: CONDA_PY (either 27, 33, 35 etc. - only major version is extracted) -:: -:: -:: To build extensions for 64 bit Python 3, we need to configure environment -:: variables to use the MSVC 2010 C++ compilers from GRMSDKX_EN_DVD.iso of: -:: MS Windows SDK for Windows 7 and .NET Framework 4 (SDK v7.1) -:: -:: To build extensions for 64 bit Python 2, we need to configure environment -:: variables to use the MSVC 2008 C++ compilers from GRMSDKX_EN_DVD.iso of: -:: MS Windows SDK for Windows 7 and .NET Framework 3.5 (SDK v7.0) -:: -:: 32 bit builds, and 64-bit builds for 3.5 and beyond, do not require specific -:: environment configurations. -:: -:: Note: this script needs to be run with the /E:ON and /V:ON flags for the -:: cmd interpreter, at least for (SDK v7.0) -:: -:: More details at: -:: https://github.com/cython/cython/wiki/64BitCythonExtensionsOnWindows -:: http://stackoverflow.com/a/13751649/163740 -:: -:: Author: Phil Elson -:: Original Author: Olivier Grisel (https://github.com/ogrisel/python-appveyor-demo) -:: License: CC0 1.0 Universal: http://creativecommons.org/publicdomain/zero/1.0/ -:: -:: Notes about batch files for Python people: -:: -:: Quotes in values are literally part of the values: -:: SET FOO="bar" -:: FOO is now five characters long: " b a r " -:: If you don't want quotes, don't include them on the right-hand side. -:: -:: The CALL lines at the end of this file look redundant, but if you move them -:: outside of the IF clauses, they do not run properly in the SET_SDK_64==Y -:: case, I don't know why. -:: originally from https://github.com/pelson/Obvious-CI/blob/master/scripts/obvci_appveyor_python_build_env.cmd -@ECHO OFF - -SET COMMAND_TO_RUN=%* -SET WIN_SDK_ROOT=C:\Program Files\Microsoft SDKs\Windows - -:: Extract the major and minor versions, and allow for the minor version to be -:: more than 9. This requires the version number to have two dots in it. -SET MAJOR_PYTHON_VERSION=%CONDA_PY:~0,1% - -IF "%CONDA_PY:~2,1%" == "" ( - :: CONDA_PY style, such as 27, 34 etc. - SET MINOR_PYTHON_VERSION=%CONDA_PY:~1,1% -) ELSE ( - IF "%CONDA_PY:~3,1%" == "." ( - SET MINOR_PYTHON_VERSION=%CONDA_PY:~2,1% - ) ELSE ( - SET MINOR_PYTHON_VERSION=%CONDA_PY:~2,2% - ) -) - -:: Based on the Python version, determine what SDK version to use, and whether -:: to set the SDK for 64-bit. -IF %MAJOR_PYTHON_VERSION% == 2 ( - SET WINDOWS_SDK_VERSION="v7.0" - SET SET_SDK_64=Y -) ELSE ( - IF %MAJOR_PYTHON_VERSION% == 3 ( - SET WINDOWS_SDK_VERSION="v7.1" - IF %MINOR_PYTHON_VERSION% LEQ 4 ( - SET SET_SDK_64=Y - ) ELSE ( - SET SET_SDK_64=N - ) - ) ELSE ( - ECHO Unsupported Python version: "%MAJOR_PYTHON_VERSION%" - EXIT /B 1 - ) -) - -IF "%PYTHON_ARCH%"=="64" ( - IF %SET_SDK_64% == Y ( - ECHO Configuring Windows SDK %WINDOWS_SDK_VERSION% for Python %MAJOR_PYTHON_VERSION% on a 64 bit architecture - SET DISTUTILS_USE_SDK=1 - SET MSSdk=1 - "%WIN_SDK_ROOT%\%WINDOWS_SDK_VERSION%\Setup\WindowsSdkVer.exe" -q -version:%WINDOWS_SDK_VERSION% - "%WIN_SDK_ROOT%\%WINDOWS_SDK_VERSION%\Bin\SetEnv.cmd" /x64 /release - ECHO Executing: %COMMAND_TO_RUN% - call %COMMAND_TO_RUN% || EXIT /B 1 - ) ELSE ( - ECHO Using default MSVC build environment for 64 bit architecture - ECHO Executing: %COMMAND_TO_RUN% - call %COMMAND_TO_RUN% || EXIT /B 1 - ) -) ELSE ( - ECHO Using default MSVC build environment for 32 bit architecture - ECHO Executing: %COMMAND_TO_RUN% - call %COMMAND_TO_RUN% || EXIT /B 1 -) diff --git a/ci/script_multi.sh b/ci/script_multi.sh deleted file mode 100755 index 766e51625fbe6..0000000000000 --- a/ci/script_multi.sh +++ /dev/null @@ -1,60 +0,0 @@ -#!/bin/bash -e - -echo "[script multi]" - -source activate pandas - -if [ -n "$LOCALE_OVERRIDE" ]; then - export LC_ALL="$LOCALE_OVERRIDE"; - echo "Setting LC_ALL to $LOCALE_OVERRIDE" - - pycmd='import pandas; print("pandas detected console encoding: %s" % pandas.get_option("display.encoding"))' - python -c "$pycmd" -fi - -# Workaround for pytest-xdist flaky collection order -# https://github.com/pytest-dev/pytest/issues/920 -# https://github.com/pytest-dev/pytest/issues/1075 -export PYTHONHASHSEED=$(python -c 'import random; print(random.randint(1, 4294967295))') -echo PYTHONHASHSEED=$PYTHONHASHSEED - -if [ "$PIP_BUILD_TEST" ] || [ "$CONDA_BUILD_TEST" ]; then - echo "[build-test]" - - echo "[env]" - pip list --format columns |grep pandas - - echo "[running]" - cd /tmp - unset PYTHONPATH - - echo "[build-test: single]" - python -c 'import pandas; pandas.test(["--skip-slow", "--skip-network", "-r xX", "-m single"])' - - echo "[build-test: not single]" - python -c 'import pandas; pandas.test(["-n 2", "--skip-slow", "--skip-network", "-r xX", "-m not single"])' - -elif [ "$DOC" ]; then - echo "We are not running pytest as this is a doc-build" - -elif [ "$ASV" ]; then - echo "We are not running pytest as this is an asv-build" - -elif [ "$COVERAGE" ]; then - echo pytest -s -n 2 -m "not single" --cov=pandas --cov-report xml:/tmp/cov-multiple.xml --junitxml=/tmp/multiple.xml --strict $TEST_ARGS pandas - pytest -s -n 2 -m "not single" --cov=pandas --cov-report xml:/tmp/cov-multiple.xml --junitxml=/tmp/multiple.xml --strict $TEST_ARGS pandas - -elif [ "$SLOW" ]; then - TEST_ARGS="--only-slow --skip-network" - echo pytest -r xX -m "not single and slow" -v --junitxml=/tmp/multiple.xml --strict $TEST_ARGS pandas - pytest -r xX -m "not single and slow" -v --junitxml=/tmp/multiple.xml --strict $TEST_ARGS pandas - -else - echo pytest -n 2 -r xX -m "not single" --junitxml=/tmp/multiple.xml --strict $TEST_ARGS pandas - pytest -n 2 -r xX -m "not single" --junitxml=/tmp/multiple.xml --strict $TEST_ARGS pandas # TODO: doctest - -fi - -RET="$?" - -exit "$RET" diff --git a/ci/script_single.sh b/ci/script_single.sh deleted file mode 100755 index 153847ab2e8c9..0000000000000 --- a/ci/script_single.sh +++ /dev/null @@ -1,40 +0,0 @@ -#!/bin/bash - -echo "[script_single]" - -source activate pandas - -if [ -n "$LOCALE_OVERRIDE" ]; then - export LC_ALL="$LOCALE_OVERRIDE"; - echo "Setting LC_ALL to $LOCALE_OVERRIDE" - - pycmd='import pandas; print("pandas detected console encoding: %s" % pandas.get_option("display.encoding"))' - python -c "$pycmd" -fi - -if [ "$SLOW" ]; then - TEST_ARGS="--only-slow --skip-network" -fi - -if [ "$PIP_BUILD_TEST" ] || [ "$CONDA_BUILD_TEST" ]; then - echo "We are not running pytest as this is a build test." - -elif [ "$DOC" ]; then - echo "We are not running pytest as this is a doc-build" - -elif [ "$ASV" ]; then - echo "We are not running pytest as this is an asv-build" - -elif [ "$COVERAGE" ]; then - echo pytest -s -m "single" --strict --cov=pandas --cov-report xml:/tmp/cov-single.xml --junitxml=/tmp/single.xml $TEST_ARGS pandas - pytest -s -m "single" --strict --cov=pandas --cov-report xml:/tmp/cov-single.xml --junitxml=/tmp/single.xml $TEST_ARGS pandas - -else - echo pytest -m "single" -r xX --junitxml=/tmp/single.xml --strict $TEST_ARGS pandas - pytest -m "single" -r xX --junitxml=/tmp/single.xml --strict $TEST_ARGS pandas # TODO: doctest - -fi - -RET="$?" - -exit "$RET" diff --git a/ci/setup_env.sh b/ci/setup_env.sh new file mode 100755 index 0000000000000..382491a947488 --- /dev/null +++ b/ci/setup_env.sh @@ -0,0 +1,146 @@ +#!/bin/bash -e + + +# edit the locale file if needed +if [ -n "$LOCALE_OVERRIDE" ]; then + echo "Adding locale to the first line of pandas/__init__.py" + rm -f pandas/__init__.pyc + SEDC="3iimport locale\nlocale.setlocale(locale.LC_ALL, '$LOCALE_OVERRIDE')\n" + sed -i "$SEDC" pandas/__init__.py + echo "[head -4 pandas/__init__.py]" + head -4 pandas/__init__.py + echo + sudo locale-gen "$LOCALE_OVERRIDE" +fi + +MINICONDA_DIR="$HOME/miniconda3" + + +if [ -d "$MINICONDA_DIR" ]; then + echo + echo "rm -rf "$MINICONDA_DIR"" + rm -rf "$MINICONDA_DIR" +fi + +echo "Install Miniconda" +UNAME_OS=$(uname) +if [[ "$UNAME_OS" == 'Linux' ]]; then + if [[ "$BITS32" == "yes" ]]; then + CONDA_OS="Linux-x86" + else + CONDA_OS="Linux-x86_64" + fi +elif [[ "$UNAME_OS" == 'Darwin' ]]; then + CONDA_OS="MacOSX-x86_64" +else + echo "OS $UNAME_OS not supported" + exit 1 +fi + +wget -q "https://repo.continuum.io/miniconda/Miniconda3-latest-$CONDA_OS.sh" -O miniconda.sh +chmod +x miniconda.sh +./miniconda.sh -b + +export PATH=$MINICONDA_DIR/bin:$PATH + +echo +echo "which conda" +which conda + +echo +echo "update conda" +conda config --set ssl_verify false +conda config --set quiet true --set always_yes true --set changeps1 false +conda update -n base conda + +echo "conda info -a" +conda info -a + +echo +echo "set the compiler cache to work" +if [ -z "$NOCACHE" ] && [ "${TRAVIS_OS_NAME}" == "linux" ]; then + echo "Using ccache" + export PATH=/usr/lib/ccache:/usr/lib64/ccache:$PATH + GCC=$(which gcc) + echo "gcc: $GCC" + CCACHE=$(which ccache) + echo "ccache: $CCACHE" + export CC='ccache gcc' +elif [ -z "$NOCACHE" ] && [ "${TRAVIS_OS_NAME}" == "osx" ]; then + echo "Install ccache" + brew install ccache > /dev/null 2>&1 + echo "Using ccache" + export PATH=/usr/local/opt/ccache/libexec:$PATH + gcc=$(which gcc) + echo "gcc: $gcc" + CCACHE=$(which ccache) + echo "ccache: $CCACHE" +else + echo "Not using ccache" +fi + +echo "source deactivate" +source deactivate + +echo "conda list (root environment)" +conda list + +# Clean up any left-over from a previous build +# (note workaround for https://github.com/conda/conda/issues/2679: +# `conda env remove` issue) +conda remove --all -q -y -n pandas-dev + +echo +echo "conda env create -q --file=${ENV_FILE}" +time conda env create -q --file="${ENV_FILE}" + + +if [[ "$BITS32" == "yes" ]]; then + # activate 32-bit compiler + export CONDA_BUILD=1 +fi + +echo "activate pandas-dev" +source activate pandas-dev + +echo +echo "remove any installed pandas package" +echo "w/o removing anything else" +conda remove pandas -y --force || true +pip uninstall -y pandas || true + +echo +echo "conda list pandas" +conda list pandas + +# Make sure any error below is reported as such + +echo "[Build extensions]" +python setup.py build_ext -q -i + +# XXX: Some of our environments end up with old verisons of pip (10.x) +# Adding a new enough verison of pip to the requirements explodes the +# solve time. Just using pip to update itself. +# - py35_macos +# - py35_compat +# - py36_32bit +echo "[Updating pip]" +python -m pip install --no-deps -U pip wheel setuptools + +echo "[Install pandas]" +python -m pip install --no-build-isolation -e . + +echo +echo "conda list" +conda list + +# Install DB for Linux +if [ "${TRAVIS_OS_NAME}" == "linux" ]; then + echo "installing dbs" + mysql -e 'create database pandas_nosetest;' + psql -c 'create database pandas_nosetest;' -U postgres +else + echo "not using dbs on non-linux Travis builds or Azure Pipelines" +fi + +echo "done" diff --git a/ci/show_circle.sh b/ci/show_circle.sh deleted file mode 100755 index bfaa65c1d84f2..0000000000000 --- a/ci/show_circle.sh +++ /dev/null @@ -1,8 +0,0 @@ -#!/usr/bin/env bash - -echo "[installed versions]" - -export PATH="$MINICONDA_DIR/bin:$PATH" -source activate pandas - -python -c "import pandas; pandas.show_versions();" diff --git a/ci/upload_coverage.sh b/ci/upload_coverage.sh deleted file mode 100755 index a7ef2fa908079..0000000000000 --- a/ci/upload_coverage.sh +++ /dev/null @@ -1,12 +0,0 @@ -#!/bin/bash - -if [ -z "$COVERAGE" ]; then - echo "coverage is not selected for this build" - exit 0 -fi - -source activate pandas - -echo "uploading coverage" -bash <(curl -s https://codecov.io/bash) -Z -c -F single -f /tmp/cov-single.xml -bash <(curl -s https://codecov.io/bash) -Z -c -F multiple -f /tmp/cov-multiple.xml diff --git a/circle.yml b/circle.yml deleted file mode 100644 index 9d49145af54e3..0000000000000 --- a/circle.yml +++ /dev/null @@ -1,38 +0,0 @@ -machine: - environment: - # these are globally set - MINICONDA_DIR: /home/ubuntu/miniconda3 - - -database: - override: - - ./ci/install_db_circle.sh - - -checkout: - post: - # since circleci does a shallow fetch - # we need to populate our tags - - git fetch --depth=1000 - - -dependencies: - override: - - > - case $CIRCLE_NODE_INDEX in - 0) - sudo apt-get install language-pack-it && ./ci/install_circle.sh JOB="2.7_COMPAT" LOCALE_OVERRIDE="it_IT.UTF-8" ;; - 1) - sudo apt-get install language-pack-zh-hans && ./ci/install_circle.sh JOB="3.6_LOCALE" LOCALE_OVERRIDE="zh_CN.UTF-8" ;; - 2) - sudo apt-get install language-pack-zh-hans && ./ci/install_circle.sh JOB="3.6_LOCALE_SLOW" LOCALE_OVERRIDE="zh_CN.UTF-8" ;; - 3) - ./ci/install_circle.sh JOB="3.5_ASCII" LOCALE_OVERRIDE="C" ;; - esac - - ./ci/show_circle.sh - - -test: - override: - - case $CIRCLE_NODE_INDEX in 0) ./ci/run_circle.sh --skip-slow --skip-network ;; 1) ./ci/run_circle.sh --only-slow --skip-network ;; 2) ./ci/run_circle.sh --skip-slow --skip-network ;; 3) ./ci/run_circle.sh --skip-slow --skip-network ;; esac: - parallel: true diff --git a/codecov.yml b/codecov.yml index 512bc2e82a736..1644bf315e0ac 100644 --- a/codecov.yml +++ b/codecov.yml @@ -1,13 +1,13 @@ codecov: branch: master +comment: off + coverage: status: project: default: - enabled: no target: '82' patch: default: - enabled: no target: '50' diff --git a/conda.recipe/meta.yaml b/conda.recipe/meta.yaml index 86bed996c8aab..f92090fecccf3 100644 --- a/conda.recipe/meta.yaml +++ b/conda.recipe/meta.yaml @@ -12,22 +12,28 @@ source: requirements: build: + - {{ compiler('c') }} + - {{ compiler('cxx') }} + host: - python + - pip - cython - - numpy 1.11.* + - numpy - setuptools >=3.3 - python-dateutil >=2.5.0 - pytz - run: - - python - - numpy >=1.11.* + - python {{ python }} + - {{ pin_compatible('numpy') }} - python-dateutil >=2.5.0 - pytz test: - imports: - - pandas + requires: + - pytest + commands: + - python -c "import pandas; pandas.test()" + about: home: http://pandas.pydata.org diff --git a/doc/README.rst b/doc/README.rst index efa21fdd3a2d9..5423e7419d03b 100644 --- a/doc/README.rst +++ b/doc/README.rst @@ -1,173 +1 @@ -.. _contributing.docs: - -Contributing to the documentation -================================= - -Whether you are someone who loves writing, teaching, or development, -contributing to the documentation is a huge value. If you don't see yourself -as a developer type, please don't stress and know that we want you to -contribute. You don't even have to be an expert on *pandas* to do so! -Something as simple as rewriting small passages for clarity -as you reference the docs is a simple but effective way to contribute. The -next person to read that passage will be in your debt! - -Actually, there are sections of the docs that are worse off by being written -by experts. If something in the docs doesn't make sense to you, updating the -relevant section after you figure it out is a simple way to ensure it will -help the next person. - -.. contents:: Table of contents: - :local: - - -About the pandas documentation ------------------------------- - -The documentation is written in **reStructuredText**, which is almost like writing -in plain English, and built using `Sphinx `__. The -Sphinx Documentation has an excellent `introduction to reST -`__. Review the Sphinx docs to perform more -complex changes to the documentation as well. - -Some other important things to know about the docs: - -- The pandas documentation consists of two parts: the docstrings in the code - itself and the docs in this folder ``pandas/doc/``. - - The docstrings provide a clear explanation of the usage of the individual - functions, while the documentation in this folder consists of tutorial-like - overviews per topic together with some other information (what's new, - installation, etc). - -- The docstrings follow the **Numpy Docstring Standard** which is used widely - in the Scientific Python community. This standard specifies the format of - the different sections of the docstring. See `this document - `_ - for a detailed explanation, or look at some of the existing functions to - extend it in a similar manner. - -- The tutorials make heavy use of the `ipython directive - `_ sphinx extension. - This directive lets you put code in the documentation which will be run - during the doc build. For example: - - :: - - .. ipython:: python - - x = 2 - x**3 - - will be rendered as - - :: - - In [1]: x = 2 - - In [2]: x**3 - Out[2]: 8 - - This means that almost all code examples in the docs are always run (and the - output saved) during the doc build. This way, they will always be up to date, - but it makes the doc building a bit more complex. - - -How to build the pandas documentation -------------------------------------- - -Requirements -^^^^^^^^^^^^ - -To build the pandas docs there are some extra requirements: you will need to -have ``sphinx`` and ``ipython`` installed. `numpydoc -`_ is used to parse the docstrings that -follow the Numpy Docstring Standard (see above), but you don't need to install -this because a local copy of ``numpydoc`` is included in the pandas source -code. `nbsphinx `_ is used to convert -Jupyter notebooks. You will need to install it if you intend to modify any of -the notebooks included in the documentation. - -Furthermore, it is recommended to have all `optional dependencies -`_ -installed. This is not needed, but be aware that you will see some error -messages. Because all the code in the documentation is executed during the doc -build, the examples using this optional dependencies will generate errors. -Run ``pd.show_versions()`` to get an overview of the installed version of all -dependencies. - -.. warning:: - - Sphinx version >= 1.2.2 or the older 1.1.3 is required. - -Building pandas -^^^^^^^^^^^^^^^ - -For a step-by-step overview on how to set up your environment, to work with -the pandas code and git, see `the developer pages -`_. -When you start to work on some docs, be sure to update your code to the latest -development version ('master'):: - - git fetch upstream - git rebase upstream/master - -Often it will be necessary to rebuild the C extension after updating:: - - python setup.py build_ext --inplace - -Building the documentation -^^^^^^^^^^^^^^^^^^^^^^^^^^ - -So how do you build the docs? Navigate to your local folder -``pandas/doc/`` directory in the console and run:: - - python make.py html - -And then you can find the html output in the folder ``pandas/doc/build/html/``. - -The first time it will take quite a while, because it has to run all the code -examples in the documentation and build all generated docstring pages. -In subsequent evocations, sphinx will try to only build the pages that have -been modified. - -If you want to do a full clean build, do:: - - python make.py clean - python make.py build - - -Starting with 0.13.1 you can tell ``make.py`` to compile only a single section -of the docs, greatly reducing the turn-around time for checking your changes. -You will be prompted to delete `.rst` files that aren't required, since the -last committed version can always be restored from git. - -:: - - #omit autosummary and API section - python make.py clean - python make.py --no-api - - # compile the docs with only a single - # section, that which is in indexing.rst - python make.py clean - python make.py --single indexing - -For comparison, a full doc build may take 10 minutes. a ``-no-api`` build -may take 3 minutes and a single section may take 15 seconds. - -Where to start? ---------------- - -There are a number of issues listed under `Docs -`_ -and `good first issue -`_ -where you could start out. - -Or maybe you have an idea of your own, by using pandas, looking for something -in the documentation and thinking 'this can be improved', let's do something -about that! - -Feel free to ask questions on `mailing list -`_ or submit an -issue on Github. +See `contributing.rst `_ in this repo. diff --git a/doc/cheatsheet/Pandas_Cheat_Sheet.pdf b/doc/cheatsheet/Pandas_Cheat_Sheet.pdf index 0492805a1408b..48da05d053b96 100644 Binary files a/doc/cheatsheet/Pandas_Cheat_Sheet.pdf and b/doc/cheatsheet/Pandas_Cheat_Sheet.pdf differ diff --git a/doc/cheatsheet/Pandas_Cheat_Sheet.pptx b/doc/cheatsheet/Pandas_Cheat_Sheet.pptx index 6cca9ac4647f7..039b3898fa301 100644 Binary files a/doc/cheatsheet/Pandas_Cheat_Sheet.pptx and b/doc/cheatsheet/Pandas_Cheat_Sheet.pptx differ diff --git a/doc/cheatsheet/Pandas_Cheat_Sheet_JA.pdf b/doc/cheatsheet/Pandas_Cheat_Sheet_JA.pdf new file mode 100644 index 0000000000000..cf1e40e627f33 Binary files /dev/null and b/doc/cheatsheet/Pandas_Cheat_Sheet_JA.pdf differ diff --git a/doc/cheatsheet/Pandas_Cheat_Sheet_JA.pptx b/doc/cheatsheet/Pandas_Cheat_Sheet_JA.pptx new file mode 100644 index 0000000000000..564d92ddbb56a Binary files /dev/null and b/doc/cheatsheet/Pandas_Cheat_Sheet_JA.pptx differ diff --git a/doc/cheatsheet/README.txt b/doc/cheatsheet/README.txt index d32fe5bcd05a6..0eae39f318d23 100644 --- a/doc/cheatsheet/README.txt +++ b/doc/cheatsheet/README.txt @@ -1,8 +1,8 @@ The Pandas Cheat Sheet was created using Microsoft Powerpoint 2013. To create the PDF version, within Powerpoint, simply do a "Save As" -and pick "PDF' as the format. +and pick "PDF" as the format. -This cheat sheet was inspired by the RstudioData Wrangling Cheatsheet[1], written by Irv Lustig, Princeton Consultants[2]. +This cheat sheet was inspired by the RStudio Data Wrangling Cheatsheet[1], written by Irv Lustig, Princeton Consultants[2]. [1]: https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf [2]: http://www.princetonoptimization.com/ diff --git a/doc/logo/pandas_logo.py b/doc/logo/pandas_logo.py index c3647f0c7d2a8..89410e3847bef 100644 --- a/doc/logo/pandas_logo.py +++ b/doc/logo/pandas_logo.py @@ -1,10 +1,9 @@ # script to generate the pandas logo -from matplotlib import pyplot as plt -from matplotlib import rcParams +from matplotlib import pyplot as plt, rcParams import numpy as np -rcParams['mathtext.fontset'] = 'cm' +rcParams["mathtext.fontset"] = "cm" def fnx(): @@ -37,8 +36,12 @@ def fnx(): plt.figtext(0.05, 0.5, "pandas", size=40) plt.figtext( - 0.05, 0.2, r"$y_{it} = \beta^{\prime} x_{it} + \mu_{i} + \epsilon_{it}$", - size=16, color="#5a89a4") - -fig.savefig('pandas_logo.svg') -fig.savefig('pandas_logo.png') + 0.05, + 0.2, + r"$y_{it} = \beta^{\prime} x_{it} + \mu_{i} + \epsilon_{it}$", + size=16, + color="#5a89a4", +) + +fig.savefig("pandas_logo.svg") +fig.savefig("pandas_logo.png") diff --git a/doc/make.py b/doc/make.py index acef563f301e4..cbb1fa6a5324a 100755 --- a/doc/make.py +++ b/doc/make.py @@ -1,438 +1,372 @@ #!/usr/bin/env python - """ Python script for building documentation. To build the docs you must have all optional dependencies for pandas installed. See the installation instructions for a list of these. -Note: currently latex builds do not work because of table formats that are not -supported in the latex generation. - -2014-01-30: Latex has some issues but 'latex_forced' works ok for 0.13.0-400 or so - Usage ----- -python make.py clean -python make.py html + $ python make.py clean + $ python make.py html + $ python make.py latex """ -from __future__ import print_function - -import io -import glob # noqa +import argparse +import csv +import importlib import os import shutil +import subprocess import sys -from contextlib import contextmanager - -import sphinx # noqa -import argparse -import jinja2 # noqa - -os.environ['PYTHONPATH'] = '..' - -SPHINX_BUILD = 'sphinxbuild' - - -def _process_user(user): - if user is None or user is False: - user = '' - else: - user = user + '@' - return user - - -def upload_dev(user=None): - 'push a copy to the pydata dev directory' - user = _process_user(user) - if os.system('cd build/html; rsync -avz . {0}pandas.pydata.org' - ':/usr/share/nginx/pandas/pandas-docs/dev/ -essh'.format(user)): - raise SystemExit('Upload to Pydata Dev failed') - - -def upload_dev_pdf(user=None): - 'push a copy to the pydata dev directory' - user = _process_user(user) - if os.system('cd build/latex; scp pandas.pdf {0}pandas.pydata.org' - ':/usr/share/nginx/pandas/pandas-docs/dev/'.format(user)): - raise SystemExit('PDF upload to Pydata Dev failed') - - -def upload_stable(user=None): - 'push a copy to the pydata stable directory' - user = _process_user(user) - if os.system('cd build/html; rsync -avz . {0}pandas.pydata.org' - ':/usr/share/nginx/pandas/pandas-docs/stable/ -essh'.format(user)): - raise SystemExit('Upload to stable failed') - - -def upload_stable_pdf(user=None): - 'push a copy to the pydata dev directory' - user = _process_user(user) - if os.system('cd build/latex; scp pandas.pdf {0}pandas.pydata.org' - ':/usr/share/nginx/pandas/pandas-docs/stable/'.format(user)): - raise SystemExit('PDF upload to stable failed') - - -def upload_prev(ver, doc_root='./', user=None): - 'push a copy of older release to appropriate version directory' - user = _process_user(user) - local_dir = doc_root + 'build/html' - remote_dir = '/usr/share/nginx/pandas/pandas-docs/version/%s/' % ver - cmd = 'cd %s; rsync -avz . %spandas.pydata.org:%s -essh' - cmd = cmd % (local_dir, user, remote_dir) - print(cmd) - if os.system(cmd): - raise SystemExit( - 'Upload to %s from %s failed' % (remote_dir, local_dir)) - - local_dir = doc_root + 'build/latex' - pdf_cmd = 'cd %s; scp pandas.pdf %spandas.pydata.org:%s' - pdf_cmd = pdf_cmd % (local_dir, user, remote_dir) - if os.system(pdf_cmd): - raise SystemExit('Upload PDF to %s from %s failed' % (ver, doc_root)) - -def build_pandas(): - os.chdir('..') - os.system('python setup.py clean') - os.system('python setup.py build_ext --inplace') - os.chdir('doc') - -def build_prev(ver): - if os.system('git checkout v%s' % ver) != 1: - os.chdir('..') - os.system('python setup.py clean') - os.system('python setup.py build_ext --inplace') - os.chdir('doc') - os.system('python make.py clean') - os.system('python make.py html') - os.system('python make.py latex') - os.system('git checkout master') - - -def clean(): - if os.path.exists('build'): - shutil.rmtree('build') - - if os.path.exists('source/generated'): - shutil.rmtree('source/generated') - - -@contextmanager -def maybe_exclude_notebooks(): - """ - Skip building the notebooks if pandoc is not installed. - This assumes that nbsphinx is installed. - """ - base = os.path.dirname(__file__) - notebooks = [os.path.join(base, 'source', nb) - for nb in ['style.ipynb']] - contents = {} - - def _remove_notebooks(): - for nb in notebooks: - with open(nb, 'rt') as f: - contents[nb] = f.read() - os.remove(nb) - - # Skip notebook conversion if - # 1. nbconvert isn't installed, or - # 2. nbconvert is installed, but pandoc isn't - try: - import nbconvert - except ImportError: - print("Warning: nbconvert not installed. Skipping notebooks.") - _remove_notebooks() - else: - try: - nbconvert.utils.pandoc.get_pandoc_version() - except nbconvert.utils.pandoc.PandocMissing: - print("Warning: Pandoc is not installed. Skipping notebooks.") - _remove_notebooks() - - yield - for nb, content in contents.items(): - with open(nb, 'wt') as f: - f.write(content) - - -def html(): - check_build() - - with maybe_exclude_notebooks(): - if os.system('sphinx-build -P -b html -d build/doctrees ' - 'source build/html'): - raise SystemExit("Building HTML failed.") - try: - # remove stale file - os.remove('build/html/pandas.zip') - except: - pass - - -def zip_html(): - try: - print("\nZipping up HTML docs...") - # just in case the wonky build box doesn't have zip - # don't fail this. - os.system('cd build; rm -f html/pandas.zip; zip html/pandas.zip -r -q html/* ') - print("\n") - except: - pass - -def latex(): - check_build() - if sys.platform != 'win32': - # LaTeX format. - if os.system('sphinx-build -j 2 -b latex -d build/doctrees ' - 'source build/latex'): - raise SystemExit("Building LaTeX failed.") - # Produce pdf. - - os.chdir('build/latex') - - # Call the makefile produced by sphinx... - if os.system('make'): - print("Rendering LaTeX failed.") - print("You may still be able to get a usable PDF file by going into 'build/latex'") - print("and executing 'pdflatex pandas.tex' for the requisite number of passes.") - print("Or using the 'latex_forced' target") - raise SystemExit - - os.chdir('../..') - else: - print('latex build has not been tested on windows') - -def latex_forced(): - check_build() - if sys.platform != 'win32': - # LaTeX format. - if os.system('sphinx-build -j 2 -b latex -d build/doctrees ' - 'source build/latex'): - raise SystemExit("Building LaTeX failed.") - # Produce pdf. - - os.chdir('build/latex') - - # Manually call pdflatex, 3 passes should ensure latex fixes up - # all the required cross-references and such. - os.system('pdflatex -interaction=nonstopmode pandas.tex') - os.system('pdflatex -interaction=nonstopmode pandas.tex') - os.system('pdflatex -interaction=nonstopmode pandas.tex') - raise SystemExit("You should check the file 'build/latex/pandas.pdf' for problems.") - - os.chdir('../..') - else: - print('latex build has not been tested on windows') - - -def check_build(): - build_dirs = [ - 'build', 'build/doctrees', 'build/html', - 'build/latex', 'build/plots', 'build/_static', - 'build/_templates'] - for d in build_dirs: - try: - os.mkdir(d) - except OSError: - pass - - -def all(): - # clean() - html() - - -def auto_dev_build(debug=False): - msg = '' - try: - step = 'clean' - clean() - step = 'html' - html() - step = 'upload dev' - upload_dev() - if not debug: - sendmail(step) - - step = 'latex' - latex() - step = 'upload pdf' - upload_dev_pdf() - if not debug: - sendmail(step) - except (Exception, SystemExit) as inst: - msg = str(inst) + '\n' - sendmail(step, '[ERROR] ' + msg) - - -def sendmail(step=None, err_msg=None): - from_name, to_name = _get_config() - - if step is None: - step = '' - - if err_msg is None or '[ERROR]' not in err_msg: - msgstr = 'Daily docs %s completed successfully' % step - subject = "DOC: %s successful" % step - else: - msgstr = err_msg - subject = "DOC: %s failed" % step - - import smtplib - from email.MIMEText import MIMEText - msg = MIMEText(msgstr) - msg['Subject'] = subject - msg['From'] = from_name - msg['To'] = to_name - - server_str, port, login, pwd = _get_credentials() - server = smtplib.SMTP(server_str, port) - server.ehlo() - server.starttls() - server.ehlo() - - server.login(login, pwd) - try: - server.sendmail(from_name, to_name, msg.as_string()) - finally: - server.close() - - -def _get_dir(subdir=None): - import getpass - USERNAME = getpass.getuser() - if sys.platform == 'darwin': - HOME = '/Users/%s' % USERNAME - else: - HOME = '/home/%s' % USERNAME - - if subdir is None: - subdir = '/code/scripts/config' - conf_dir = '%s/%s' % (HOME, subdir) - return conf_dir - - -def _get_credentials(): - tmp_dir = _get_dir() - cred = '%s/credentials' % tmp_dir - with open(cred, 'r') as fh: - server, port, un, domain = fh.read().split(',') - port = int(port) - login = un + '@' + domain + '.com' - - import base64 - with open('%s/cron_email_pwd' % tmp_dir, 'r') as fh: - pwd = base64.b64decode(fh.read()) - - return server, port, login, pwd - - -def _get_config(): - tmp_dir = _get_dir() - with open('%s/addresses' % tmp_dir, 'r') as fh: - from_name, to_name = fh.read().split(',') - return from_name, to_name - -funcd = { - 'html': html, - 'zip_html': zip_html, - 'upload_dev': upload_dev, - 'upload_stable': upload_stable, - 'upload_dev_pdf': upload_dev_pdf, - 'upload_stable_pdf': upload_stable_pdf, - 'latex': latex, - 'latex_forced': latex_forced, - 'clean': clean, - 'auto_dev': auto_dev_build, - 'auto_debug': lambda: auto_dev_build(True), - 'build_pandas': build_pandas, - 'all': all, -} - -small_docs = False - -# current_dir = os.getcwd() -# os.chdir(os.path.dirname(os.path.join(current_dir, __file__))) +import webbrowser -import argparse -argparser = argparse.ArgumentParser(description=""" -pandas documentation builder -""".strip()) +import docutils +import docutils.parsers.rst -# argparser.add_argument('-arg_name', '--arg_name', -# metavar='label for arg help', -# type=str|etc, -# nargs='N|*|?|+|argparse.REMAINDER', -# required=False, -# #choices='abc', -# help='help string', -# action='store|store_true') +DOC_PATH = os.path.dirname(os.path.abspath(__file__)) +SOURCE_PATH = os.path.join(DOC_PATH, "source") +BUILD_PATH = os.path.join(DOC_PATH, "build") +REDIRECTS_FILE = os.path.join(DOC_PATH, "redirects.csv") -# args = argparser.parse_args() -#print args.accumulate(args.integers) +class DocBuilder: + """ + Class to wrap the different commands of this script. -def generate_index(api=True, single=False, **kwds): - from jinja2 import Template - with open("source/index.rst.template") as f: - t = Template(f.read()) + All public methods of this class can be called as parameters of the + script. + """ - with open("source/index.rst","w") as f: - f.write(t.render(api=api,single=single,**kwds)) + def __init__( + self, + num_jobs=0, + include_api=True, + single_doc=None, + verbosity=0, + warnings_are_errors=False, + ): + self.num_jobs = num_jobs + self.verbosity = verbosity + self.warnings_are_errors = warnings_are_errors + + if single_doc: + single_doc = self._process_single_doc(single_doc) + include_api = False + os.environ["SPHINX_PATTERN"] = single_doc + elif not include_api: + os.environ["SPHINX_PATTERN"] = "-api" + + self.single_doc_html = None + if single_doc and single_doc.endswith(".rst"): + self.single_doc_html = os.path.splitext(single_doc)[0] + ".html" + elif single_doc: + self.single_doc_html = "reference/api/pandas.{}.html".format(single_doc) + + def _process_single_doc(self, single_doc): + """ + Make sure the provided value for --single is a path to an existing + .rst/.ipynb file, or a pandas object that can be imported. + + For example, categorial.rst or pandas.DataFrame.head. For the latter, + return the corresponding file path + (e.g. reference/api/pandas.DataFrame.head.rst). + """ + base_name, extension = os.path.splitext(single_doc) + if extension in (".rst", ".ipynb"): + if os.path.exists(os.path.join(SOURCE_PATH, single_doc)): + return single_doc + else: + raise FileNotFoundError("File {} not found".format(single_doc)) + + elif single_doc.startswith("pandas."): + try: + obj = pandas # noqa: F821 + for name in single_doc.split("."): + obj = getattr(obj, name) + except AttributeError: + raise ImportError("Could not import {}".format(single_doc)) + else: + return single_doc[len("pandas.") :] + else: + raise ValueError( + ( + "--single={} not understood. Value should be a " + "valid path to a .rst or .ipynb file, or a " + "valid pandas object (e.g. categorical.rst or " + "pandas.DataFrame.head)" + ).format(single_doc) + ) + + @staticmethod + def _run_os(*args): + """ + Execute a command as a OS terminal. + + Parameters + ---------- + *args : list of str + Command and parameters to be executed + + Examples + -------- + >>> DocBuilder()._run_os('python', '--version') + """ + subprocess.check_call(args, stdout=sys.stdout, stderr=sys.stderr) + + def _sphinx_build(self, kind): + """ + Call sphinx to build documentation. + + Attribute `num_jobs` from the class is used. + + Parameters + ---------- + kind : {'html', 'latex'} + + Examples + -------- + >>> DocBuilder(num_jobs=4)._sphinx_build('html') + """ + if kind not in ("html", "latex"): + raise ValueError("kind must be html or latex, " "not {}".format(kind)) + + cmd = ["sphinx-build", "-b", kind] + if self.num_jobs: + cmd += ["-j", str(self.num_jobs)] + if self.warnings_are_errors: + cmd += ["-W", "--keep-going"] + if self.verbosity: + cmd.append("-{}".format("v" * self.verbosity)) + cmd += [ + "-d", + os.path.join(BUILD_PATH, "doctrees"), + SOURCE_PATH, + os.path.join(BUILD_PATH, kind), + ] + return subprocess.call(cmd) + + def _open_browser(self, single_doc_html): + """ + Open a browser tab showing single + """ + url = os.path.join("file://", DOC_PATH, "build", "html", single_doc_html) + webbrowser.open(url, new=2) + + def _get_page_title(self, page): + """ + Open the rst file `page` and extract its title. + """ + fname = os.path.join(SOURCE_PATH, "{}.rst".format(page)) + option_parser = docutils.frontend.OptionParser( + components=(docutils.parsers.rst.Parser,) + ) + doc = docutils.utils.new_document("", option_parser.get_default_values()) + with open(fname) as f: + data = f.read() + + parser = docutils.parsers.rst.Parser() + # do not generate any warning when parsing the rst + with open(os.devnull, "a") as f: + doc.reporter.stream = f + parser.parse(data, doc) + + section = next( + node for node in doc.children if isinstance(node, docutils.nodes.section) + ) + title = next( + node for node in section.children if isinstance(node, docutils.nodes.title) + ) + + return title.astext() + + def _add_redirects(self): + """ + Create in the build directory an html file with a redirect, + for every row in REDIRECTS_FILE. + """ + html = """ + + + + + +

+ The page has been moved to {title} +

+ + + """ + with open(REDIRECTS_FILE) as mapping_fd: + reader = csv.reader(mapping_fd) + for row in reader: + if not row or row[0].strip().startswith("#"): + continue + + path = os.path.join(BUILD_PATH, "html", *row[0].split("/")) + ".html" + + try: + title = self._get_page_title(row[1]) + except Exception: + # the file can be an ipynb and not an rst, or docutils + # may not be able to read the rst because it has some + # sphinx specific stuff + title = "this page" + + if os.path.exists(path): + raise RuntimeError( + ("Redirection would overwrite an existing file: " "{}").format( + path + ) + ) + + with open(path, "w") as moved_page_fd: + moved_page_fd.write( + html.format(url="{}.html".format(row[1]), title=title) + ) + + def html(self): + """ + Build HTML documentation. + """ + ret_code = self._sphinx_build("html") + zip_fname = os.path.join(BUILD_PATH, "html", "pandas.zip") + if os.path.exists(zip_fname): + os.remove(zip_fname) + + if ret_code == 0: + if self.single_doc_html is not None: + self._open_browser(self.single_doc_html) + else: + self._add_redirects() + return ret_code + + def latex(self, force=False): + """ + Build PDF documentation. + """ + if sys.platform == "win32": + sys.stderr.write("latex build has not been tested on windows\n") + else: + ret_code = self._sphinx_build("latex") + os.chdir(os.path.join(BUILD_PATH, "latex")) + if force: + for i in range(3): + self._run_os("pdflatex", "-interaction=nonstopmode", "pandas.tex") + raise SystemExit( + "You should check the file " + '"build/latex/pandas.pdf" for problems.' + ) + else: + self._run_os("make") + return ret_code + + def latex_forced(self): + """ + Build PDF documentation with retries to find missing references. + """ + return self.latex(force=True) + + @staticmethod + def clean(): + """ + Clean documentation generated files. + """ + shutil.rmtree(BUILD_PATH, ignore_errors=True) + shutil.rmtree(os.path.join(SOURCE_PATH, "reference", "api"), ignore_errors=True) + + def zip_html(self): + """ + Compress HTML documentation into a zip file. + """ + zip_fname = os.path.join(BUILD_PATH, "html", "pandas.zip") + if os.path.exists(zip_fname): + os.remove(zip_fname) + dirname = os.path.join(BUILD_PATH, "html") + fnames = os.listdir(dirname) + os.chdir(dirname) + self._run_os("zip", zip_fname, "-r", "-q", *fnames) -import argparse -argparser = argparse.ArgumentParser(description="pandas documentation builder", - epilog="Targets : %s" % funcd.keys()) - -argparser.add_argument('--no-api', - default=False, - help='Ommit api and autosummary', - action='store_true') -argparser.add_argument('--single', - metavar='FILENAME', - type=str, - default=False, - help='filename of section to compile, e.g. "indexing"') -argparser.add_argument('--user', - type=str, - default=False, - help='Username to connect to the pydata server') def main(): - args, unknown = argparser.parse_known_args() - sys.argv = [sys.argv[0]] + unknown - if args.single: - args.single = os.path.basename(args.single).split(".rst")[0] - - if 'clean' in unknown: - args.single=False - - generate_index(api=not args.no_api and not args.single, single=args.single) - - if len(sys.argv) > 2: - ftype = sys.argv[1] - ver = sys.argv[2] - - if ftype == 'build_previous': - build_prev(ver, user=args.user) - if ftype == 'upload_previous': - upload_prev(ver, user=args.user) - elif len(sys.argv) == 2: - for arg in sys.argv[1:]: - func = funcd.get(arg) - if func is None: - raise SystemExit('Do not know how to handle %s; valid args are %s' % ( - arg, list(funcd.keys()))) - if args.user: - func(user=args.user) - else: - func() - else: - small_docs = False - all() -# os.chdir(current_dir) - -if __name__ == '__main__': - import sys + cmds = [method for method in dir(DocBuilder) if not method.startswith("_")] + + argparser = argparse.ArgumentParser( + description="pandas documentation builder", + epilog="Commands: {}".format(",".join(cmds)), + ) + argparser.add_argument( + "command", + nargs="?", + default="html", + help="command to run: {}".format(", ".join(cmds)), + ) + argparser.add_argument( + "--num-jobs", type=int, default=0, help="number of jobs used by sphinx-build" + ) + argparser.add_argument( + "--no-api", default=False, help="omit api and autosummary", action="store_true" + ) + argparser.add_argument( + "--single", + metavar="FILENAME", + type=str, + default=None, + help=( + 'filename (relative to the "source" folder)' + " of section or method name to compile, e.g. " + '"development/contributing.rst",' + ' "ecosystem.rst", "pandas.DataFrame.join"' + ), + ) + argparser.add_argument( + "--python-path", type=str, default=os.path.dirname(DOC_PATH), help="path" + ) + argparser.add_argument( + "-v", + action="count", + dest="verbosity", + default=0, + help=( + "increase verbosity (can be repeated), " + "passed to the sphinx build command" + ), + ) + argparser.add_argument( + "--warnings-are-errors", + "-W", + action="store_true", + help="fail if warnings are raised", + ) + args = argparser.parse_args() + + if args.command not in cmds: + raise ValueError( + "Unknown command {}. Available options: {}".format( + args.command, ", ".join(cmds) + ) + ) + + # Below we update both os.environ and sys.path. The former is used by + # external libraries (namely Sphinx) to compile this module and resolve + # the import of `python_path` correctly. The latter is used to resolve + # the import within the module, injecting it into the global namespace + os.environ["PYTHONPATH"] = args.python_path + sys.path.insert(0, args.python_path) + globals()["pandas"] = importlib.import_module("pandas") + + # Set the matplotlib backend to the non-interactive Agg backend for all + # child processes. + os.environ["MPLBACKEND"] = "module://matplotlib.backends.backend_agg" + + builder = DocBuilder( + args.num_jobs, + not args.no_api, + args.single, + args.verbosity, + args.warnings_are_errors, + ) + return getattr(builder, args.command)() + + +if __name__ == "__main__": sys.exit(main()) diff --git a/doc/redirects.csv b/doc/redirects.csv new file mode 100644 index 0000000000000..a7886779c97d5 --- /dev/null +++ b/doc/redirects.csv @@ -0,0 +1,1581 @@ +# This file should contain all the redirects in the documentation +# in the format `,` + +# whatsnew +whatsnew,whatsnew/index +release,whatsnew/index + +# getting started +10min,getting_started/10min +basics,getting_started/basics +comparison_with_r,getting_started/comparison/comparison_with_r +comparison_with_sql,getting_started/comparison/comparison_with_sql +comparison_with_sas,getting_started/comparison/comparison_with_sas +comparison_with_stata,getting_started/comparison/comparison_with_stata +dsintro,getting_started/dsintro +overview,getting_started/overview +tutorials,getting_started/tutorials + +# user guide +advanced,user_guide/advanced +categorical,user_guide/categorical +computation,user_guide/computation +cookbook,user_guide/cookbook +enhancingperf,user_guide/enhancingperf +gotchas,user_guide/gotchas +groupby,user_guide/groupby +indexing,user_guide/indexing +integer_na,user_guide/integer_na +io,user_guide/io +merging,user_guide/merging +missing_data,user_guide/missing_data +options,user_guide/options +reshaping,user_guide/reshaping +sparse,user_guide/sparse +style,user_guide/style +text,user_guide/text +timedeltas,user_guide/timedeltas +timeseries,user_guide/timeseries +visualization,user_guide/visualization + +# development +contributing,development/contributing +contributing_docstring,development/contributing_docstring +developer,development/developer +extending,development/extending +internals,development/internals + +# api +api,reference/index +generated/pandas.api.extensions.ExtensionArray.argsort,../reference/api/pandas.api.extensions.ExtensionArray.argsort +generated/pandas.api.extensions.ExtensionArray.astype,../reference/api/pandas.api.extensions.ExtensionArray.astype +generated/pandas.api.extensions.ExtensionArray.copy,../reference/api/pandas.api.extensions.ExtensionArray.copy +generated/pandas.api.extensions.ExtensionArray.dropna,../reference/api/pandas.api.extensions.ExtensionArray.dropna +generated/pandas.api.extensions.ExtensionArray.dtype,../reference/api/pandas.api.extensions.ExtensionArray.dtype +generated/pandas.api.extensions.ExtensionArray.factorize,../reference/api/pandas.api.extensions.ExtensionArray.factorize +generated/pandas.api.extensions.ExtensionArray.fillna,../reference/api/pandas.api.extensions.ExtensionArray.fillna +generated/pandas.api.extensions.ExtensionArray,../reference/api/pandas.api.extensions.ExtensionArray +generated/pandas.api.extensions.ExtensionArray.isna,../reference/api/pandas.api.extensions.ExtensionArray.isna +generated/pandas.api.extensions.ExtensionArray.nbytes,../reference/api/pandas.api.extensions.ExtensionArray.nbytes +generated/pandas.api.extensions.ExtensionArray.ndim,../reference/api/pandas.api.extensions.ExtensionArray.ndim +generated/pandas.api.extensions.ExtensionArray.shape,../reference/api/pandas.api.extensions.ExtensionArray.shape +generated/pandas.api.extensions.ExtensionArray.take,../reference/api/pandas.api.extensions.ExtensionArray.take +generated/pandas.api.extensions.ExtensionArray.unique,../reference/api/pandas.api.extensions.ExtensionArray.unique +generated/pandas.api.extensions.ExtensionDtype.construct_array_type,../reference/api/pandas.api.extensions.ExtensionDtype.construct_array_type +generated/pandas.api.extensions.ExtensionDtype.construct_from_string,../reference/api/pandas.api.extensions.ExtensionDtype.construct_from_string +generated/pandas.api.extensions.ExtensionDtype,../reference/api/pandas.api.extensions.ExtensionDtype +generated/pandas.api.extensions.ExtensionDtype.is_dtype,../reference/api/pandas.api.extensions.ExtensionDtype.is_dtype +generated/pandas.api.extensions.ExtensionDtype.kind,../reference/api/pandas.api.extensions.ExtensionDtype.kind +generated/pandas.api.extensions.ExtensionDtype.name,../reference/api/pandas.api.extensions.ExtensionDtype.name +generated/pandas.api.extensions.ExtensionDtype.names,../reference/api/pandas.api.extensions.ExtensionDtype.names +generated/pandas.api.extensions.ExtensionDtype.na_value,../reference/api/pandas.api.extensions.ExtensionDtype.na_value +generated/pandas.api.extensions.ExtensionDtype.type,../reference/api/pandas.api.extensions.ExtensionDtype.type +generated/pandas.api.extensions.register_dataframe_accessor,../reference/api/pandas.api.extensions.register_dataframe_accessor +generated/pandas.api.extensions.register_extension_dtype,../reference/api/pandas.api.extensions.register_extension_dtype +generated/pandas.api.extensions.register_index_accessor,../reference/api/pandas.api.extensions.register_index_accessor +generated/pandas.api.extensions.register_series_accessor,../reference/api/pandas.api.extensions.register_series_accessor +generated/pandas.api.types.infer_dtype,../reference/api/pandas.api.types.infer_dtype +generated/pandas.api.types.is_bool_dtype,../reference/api/pandas.api.types.is_bool_dtype +generated/pandas.api.types.is_bool,../reference/api/pandas.api.types.is_bool +generated/pandas.api.types.is_categorical_dtype,../reference/api/pandas.api.types.is_categorical_dtype +generated/pandas.api.types.is_categorical,../reference/api/pandas.api.types.is_categorical +generated/pandas.api.types.is_complex_dtype,../reference/api/pandas.api.types.is_complex_dtype +generated/pandas.api.types.is_complex,../reference/api/pandas.api.types.is_complex +generated/pandas.api.types.is_datetime64_any_dtype,../reference/api/pandas.api.types.is_datetime64_any_dtype +generated/pandas.api.types.is_datetime64_dtype,../reference/api/pandas.api.types.is_datetime64_dtype +generated/pandas.api.types.is_datetime64_ns_dtype,../reference/api/pandas.api.types.is_datetime64_ns_dtype +generated/pandas.api.types.is_datetime64tz_dtype,../reference/api/pandas.api.types.is_datetime64tz_dtype +generated/pandas.api.types.is_datetimetz,../reference/api/pandas.api.types.is_datetimetz +generated/pandas.api.types.is_dict_like,../reference/api/pandas.api.types.is_dict_like +generated/pandas.api.types.is_extension_array_dtype,../reference/api/pandas.api.types.is_extension_array_dtype +generated/pandas.api.types.is_extension_type,../reference/api/pandas.api.types.is_extension_type +generated/pandas.api.types.is_file_like,../reference/api/pandas.api.types.is_file_like +generated/pandas.api.types.is_float_dtype,../reference/api/pandas.api.types.is_float_dtype +generated/pandas.api.types.is_float,../reference/api/pandas.api.types.is_float +generated/pandas.api.types.is_hashable,../reference/api/pandas.api.types.is_hashable +generated/pandas.api.types.is_int64_dtype,../reference/api/pandas.api.types.is_int64_dtype +generated/pandas.api.types.is_integer_dtype,../reference/api/pandas.api.types.is_integer_dtype +generated/pandas.api.types.is_integer,../reference/api/pandas.api.types.is_integer +generated/pandas.api.types.is_interval_dtype,../reference/api/pandas.api.types.is_interval_dtype +generated/pandas.api.types.is_interval,../reference/api/pandas.api.types.is_interval +generated/pandas.api.types.is_iterator,../reference/api/pandas.api.types.is_iterator +generated/pandas.api.types.is_list_like,../reference/api/pandas.api.types.is_list_like +generated/pandas.api.types.is_named_tuple,../reference/api/pandas.api.types.is_named_tuple +generated/pandas.api.types.is_number,../reference/api/pandas.api.types.is_number +generated/pandas.api.types.is_numeric_dtype,../reference/api/pandas.api.types.is_numeric_dtype +generated/pandas.api.types.is_object_dtype,../reference/api/pandas.api.types.is_object_dtype +generated/pandas.api.types.is_period_dtype,../reference/api/pandas.api.types.is_period_dtype +generated/pandas.api.types.is_period,../reference/api/pandas.api.types.is_period +generated/pandas.api.types.is_re_compilable,../reference/api/pandas.api.types.is_re_compilable +generated/pandas.api.types.is_re,../reference/api/pandas.api.types.is_re +generated/pandas.api.types.is_scalar,../reference/api/pandas.api.types.is_scalar +generated/pandas.api.types.is_signed_integer_dtype,../reference/api/pandas.api.types.is_signed_integer_dtype +generated/pandas.api.types.is_sparse,../reference/api/pandas.api.types.is_sparse +generated/pandas.api.types.is_string_dtype,../reference/api/pandas.api.types.is_string_dtype +generated/pandas.api.types.is_timedelta64_dtype,../reference/api/pandas.api.types.is_timedelta64_dtype +generated/pandas.api.types.is_timedelta64_ns_dtype,../reference/api/pandas.api.types.is_timedelta64_ns_dtype +generated/pandas.api.types.is_unsigned_integer_dtype,../reference/api/pandas.api.types.is_unsigned_integer_dtype +generated/pandas.api.types.pandas_dtype,../reference/api/pandas.api.types.pandas_dtype +generated/pandas.api.types.union_categoricals,../reference/api/pandas.api.types.union_categoricals +generated/pandas.bdate_range,../reference/api/pandas.bdate_range +generated/pandas.Categorical.__array__,../reference/api/pandas.Categorical.__array__ +generated/pandas.Categorical.categories,../reference/api/pandas.Categorical.categories +generated/pandas.Categorical.codes,../reference/api/pandas.Categorical.codes +generated/pandas.CategoricalDtype.categories,../reference/api/pandas.CategoricalDtype.categories +generated/pandas.Categorical.dtype,../reference/api/pandas.Categorical.dtype +generated/pandas.CategoricalDtype,../reference/api/pandas.CategoricalDtype +generated/pandas.CategoricalDtype.ordered,../reference/api/pandas.CategoricalDtype.ordered +generated/pandas.Categorical.from_codes,../reference/api/pandas.Categorical.from_codes +generated/pandas.Categorical,../reference/api/pandas.Categorical +generated/pandas.CategoricalIndex.add_categories,../reference/api/pandas.CategoricalIndex.add_categories +generated/pandas.CategoricalIndex.as_ordered,../reference/api/pandas.CategoricalIndex.as_ordered +generated/pandas.CategoricalIndex.as_unordered,../reference/api/pandas.CategoricalIndex.as_unordered +generated/pandas.CategoricalIndex.categories,../reference/api/pandas.CategoricalIndex.categories +generated/pandas.CategoricalIndex.codes,../reference/api/pandas.CategoricalIndex.codes +generated/pandas.CategoricalIndex.equals,../reference/api/pandas.CategoricalIndex.equals +generated/pandas.CategoricalIndex,../reference/api/pandas.CategoricalIndex +generated/pandas.CategoricalIndex.map,../reference/api/pandas.CategoricalIndex.map +generated/pandas.CategoricalIndex.ordered,../reference/api/pandas.CategoricalIndex.ordered +generated/pandas.CategoricalIndex.remove_categories,../reference/api/pandas.CategoricalIndex.remove_categories +generated/pandas.CategoricalIndex.remove_unused_categories,../reference/api/pandas.CategoricalIndex.remove_unused_categories +generated/pandas.CategoricalIndex.rename_categories,../reference/api/pandas.CategoricalIndex.rename_categories +generated/pandas.CategoricalIndex.reorder_categories,../reference/api/pandas.CategoricalIndex.reorder_categories +generated/pandas.CategoricalIndex.set_categories,../reference/api/pandas.CategoricalIndex.set_categories +generated/pandas.Categorical.ordered,../reference/api/pandas.Categorical.ordered +generated/pandas.concat,../reference/api/pandas.concat +generated/pandas.core.groupby.DataFrameGroupBy.all,../reference/api/pandas.core.groupby.DataFrameGroupBy.all +generated/pandas.core.groupby.DataFrameGroupBy.any,../reference/api/pandas.core.groupby.DataFrameGroupBy.any +generated/pandas.core.groupby.DataFrameGroupBy.bfill,../reference/api/pandas.core.groupby.DataFrameGroupBy.bfill +generated/pandas.core.groupby.DataFrameGroupBy.boxplot,../reference/api/pandas.core.groupby.DataFrameGroupBy.boxplot +generated/pandas.core.groupby.DataFrameGroupBy.corr,../reference/api/pandas.core.groupby.DataFrameGroupBy.corr +generated/pandas.core.groupby.DataFrameGroupBy.corrwith,../reference/api/pandas.core.groupby.DataFrameGroupBy.corrwith +generated/pandas.core.groupby.DataFrameGroupBy.count,../reference/api/pandas.core.groupby.DataFrameGroupBy.count +generated/pandas.core.groupby.DataFrameGroupBy.cov,../reference/api/pandas.core.groupby.DataFrameGroupBy.cov +generated/pandas.core.groupby.DataFrameGroupBy.cummax,../reference/api/pandas.core.groupby.DataFrameGroupBy.cummax +generated/pandas.core.groupby.DataFrameGroupBy.cummin,../reference/api/pandas.core.groupby.DataFrameGroupBy.cummin +generated/pandas.core.groupby.DataFrameGroupBy.cumprod,../reference/api/pandas.core.groupby.DataFrameGroupBy.cumprod +generated/pandas.core.groupby.DataFrameGroupBy.cumsum,../reference/api/pandas.core.groupby.DataFrameGroupBy.cumsum +generated/pandas.core.groupby.DataFrameGroupBy.describe,../reference/api/pandas.core.groupby.DataFrameGroupBy.describe +generated/pandas.core.groupby.DataFrameGroupBy.diff,../reference/api/pandas.core.groupby.DataFrameGroupBy.diff +generated/pandas.core.groupby.DataFrameGroupBy.ffill,../reference/api/pandas.core.groupby.DataFrameGroupBy.ffill +generated/pandas.core.groupby.DataFrameGroupBy.fillna,../reference/api/pandas.core.groupby.DataFrameGroupBy.fillna +generated/pandas.core.groupby.DataFrameGroupBy.filter,../reference/api/pandas.core.groupby.DataFrameGroupBy.filter +generated/pandas.core.groupby.DataFrameGroupBy.hist,../reference/api/pandas.core.groupby.DataFrameGroupBy.hist +generated/pandas.core.groupby.DataFrameGroupBy.idxmax,../reference/api/pandas.core.groupby.DataFrameGroupBy.idxmax +generated/pandas.core.groupby.DataFrameGroupBy.idxmin,../reference/api/pandas.core.groupby.DataFrameGroupBy.idxmin +generated/pandas.core.groupby.DataFrameGroupBy.mad,../reference/api/pandas.core.groupby.DataFrameGroupBy.mad +generated/pandas.core.groupby.DataFrameGroupBy.pct_change,../reference/api/pandas.core.groupby.DataFrameGroupBy.pct_change +generated/pandas.core.groupby.DataFrameGroupBy.plot,../reference/api/pandas.core.groupby.DataFrameGroupBy.plot +generated/pandas.core.groupby.DataFrameGroupBy.quantile,../reference/api/pandas.core.groupby.DataFrameGroupBy.quantile +generated/pandas.core.groupby.DataFrameGroupBy.rank,../reference/api/pandas.core.groupby.DataFrameGroupBy.rank +generated/pandas.core.groupby.DataFrameGroupBy.resample,../reference/api/pandas.core.groupby.DataFrameGroupBy.resample +generated/pandas.core.groupby.DataFrameGroupBy.shift,../reference/api/pandas.core.groupby.DataFrameGroupBy.shift +generated/pandas.core.groupby.DataFrameGroupBy.size,../reference/api/pandas.core.groupby.DataFrameGroupBy.size +generated/pandas.core.groupby.DataFrameGroupBy.skew,../reference/api/pandas.core.groupby.DataFrameGroupBy.skew +generated/pandas.core.groupby.DataFrameGroupBy.take,../reference/api/pandas.core.groupby.DataFrameGroupBy.take +generated/pandas.core.groupby.DataFrameGroupBy.tshift,../reference/api/pandas.core.groupby.DataFrameGroupBy.tshift +generated/pandas.core.groupby.GroupBy.agg,../reference/api/pandas.core.groupby.GroupBy.agg +generated/pandas.core.groupby.GroupBy.aggregate,../reference/api/pandas.core.groupby.GroupBy.aggregate +generated/pandas.core.groupby.GroupBy.all,../reference/api/pandas.core.groupby.GroupBy.all +generated/pandas.core.groupby.GroupBy.any,../reference/api/pandas.core.groupby.GroupBy.any +generated/pandas.core.groupby.GroupBy.apply,../reference/api/pandas.core.groupby.GroupBy.apply +generated/pandas.core.groupby.GroupBy.bfill,../reference/api/pandas.core.groupby.GroupBy.bfill +generated/pandas.core.groupby.GroupBy.count,../reference/api/pandas.core.groupby.GroupBy.count +generated/pandas.core.groupby.GroupBy.cumcount,../reference/api/pandas.core.groupby.GroupBy.cumcount +generated/pandas.core.groupby.GroupBy.ffill,../reference/api/pandas.core.groupby.GroupBy.ffill +generated/pandas.core.groupby.GroupBy.first,../reference/api/pandas.core.groupby.GroupBy.first +generated/pandas.core.groupby.GroupBy.get_group,../reference/api/pandas.core.groupby.GroupBy.get_group +generated/pandas.core.groupby.GroupBy.groups,../reference/api/pandas.core.groupby.GroupBy.groups +generated/pandas.core.groupby.GroupBy.head,../reference/api/pandas.core.groupby.GroupBy.head +generated/pandas.core.groupby.GroupBy.indices,../reference/api/pandas.core.groupby.GroupBy.indices +generated/pandas.core.groupby.GroupBy.__iter__,../reference/api/pandas.core.groupby.GroupBy.__iter__ +generated/pandas.core.groupby.GroupBy.last,../reference/api/pandas.core.groupby.GroupBy.last +generated/pandas.core.groupby.GroupBy.max,../reference/api/pandas.core.groupby.GroupBy.max +generated/pandas.core.groupby.GroupBy.mean,../reference/api/pandas.core.groupby.GroupBy.mean +generated/pandas.core.groupby.GroupBy.median,../reference/api/pandas.core.groupby.GroupBy.median +generated/pandas.core.groupby.GroupBy.min,../reference/api/pandas.core.groupby.GroupBy.min +generated/pandas.core.groupby.GroupBy.ngroup,../reference/api/pandas.core.groupby.GroupBy.ngroup +generated/pandas.core.groupby.GroupBy.nth,../reference/api/pandas.core.groupby.GroupBy.nth +generated/pandas.core.groupby.GroupBy.ohlc,../reference/api/pandas.core.groupby.GroupBy.ohlc +generated/pandas.core.groupby.GroupBy.pct_change,../reference/api/pandas.core.groupby.GroupBy.pct_change +generated/pandas.core.groupby.GroupBy.pipe,../reference/api/pandas.core.groupby.GroupBy.pipe +generated/pandas.core.groupby.GroupBy.prod,../reference/api/pandas.core.groupby.GroupBy.prod +generated/pandas.core.groupby.GroupBy.rank,../reference/api/pandas.core.groupby.GroupBy.rank +generated/pandas.core.groupby.GroupBy.sem,../reference/api/pandas.core.groupby.GroupBy.sem +generated/pandas.core.groupby.GroupBy.size,../reference/api/pandas.core.groupby.GroupBy.size +generated/pandas.core.groupby.GroupBy.std,../reference/api/pandas.core.groupby.GroupBy.std +generated/pandas.core.groupby.GroupBy.sum,../reference/api/pandas.core.groupby.GroupBy.sum +generated/pandas.core.groupby.GroupBy.tail,../reference/api/pandas.core.groupby.GroupBy.tail +generated/pandas.core.groupby.GroupBy.transform,../reference/api/pandas.core.groupby.GroupBy.transform +generated/pandas.core.groupby.GroupBy.var,../reference/api/pandas.core.groupby.GroupBy.var +generated/pandas.core.groupby.SeriesGroupBy.is_monotonic_decreasing,../reference/api/pandas.core.groupby.SeriesGroupBy.is_monotonic_decreasing +generated/pandas.core.groupby.SeriesGroupBy.is_monotonic_increasing,../reference/api/pandas.core.groupby.SeriesGroupBy.is_monotonic_increasing +generated/pandas.core.groupby.SeriesGroupBy.nlargest,../reference/api/pandas.core.groupby.SeriesGroupBy.nlargest +generated/pandas.core.groupby.SeriesGroupBy.nsmallest,../reference/api/pandas.core.groupby.SeriesGroupBy.nsmallest +generated/pandas.core.groupby.SeriesGroupBy.nunique,../reference/api/pandas.core.groupby.SeriesGroupBy.nunique +generated/pandas.core.groupby.SeriesGroupBy.unique,../reference/api/pandas.core.groupby.SeriesGroupBy.unique +generated/pandas.core.groupby.SeriesGroupBy.value_counts,../reference/api/pandas.core.groupby.SeriesGroupBy.value_counts +generated/pandas.core.resample.Resampler.aggregate,../reference/api/pandas.core.resample.Resampler.aggregate +generated/pandas.core.resample.Resampler.apply,../reference/api/pandas.core.resample.Resampler.apply +generated/pandas.core.resample.Resampler.asfreq,../reference/api/pandas.core.resample.Resampler.asfreq +generated/pandas.core.resample.Resampler.backfill,../reference/api/pandas.core.resample.Resampler.backfill +generated/pandas.core.resample.Resampler.bfill,../reference/api/pandas.core.resample.Resampler.bfill +generated/pandas.core.resample.Resampler.count,../reference/api/pandas.core.resample.Resampler.count +generated/pandas.core.resample.Resampler.ffill,../reference/api/pandas.core.resample.Resampler.ffill +generated/pandas.core.resample.Resampler.fillna,../reference/api/pandas.core.resample.Resampler.fillna +generated/pandas.core.resample.Resampler.first,../reference/api/pandas.core.resample.Resampler.first +generated/pandas.core.resample.Resampler.get_group,../reference/api/pandas.core.resample.Resampler.get_group +generated/pandas.core.resample.Resampler.groups,../reference/api/pandas.core.resample.Resampler.groups +generated/pandas.core.resample.Resampler.indices,../reference/api/pandas.core.resample.Resampler.indices +generated/pandas.core.resample.Resampler.interpolate,../reference/api/pandas.core.resample.Resampler.interpolate +generated/pandas.core.resample.Resampler.__iter__,../reference/api/pandas.core.resample.Resampler.__iter__ +generated/pandas.core.resample.Resampler.last,../reference/api/pandas.core.resample.Resampler.last +generated/pandas.core.resample.Resampler.max,../reference/api/pandas.core.resample.Resampler.max +generated/pandas.core.resample.Resampler.mean,../reference/api/pandas.core.resample.Resampler.mean +generated/pandas.core.resample.Resampler.median,../reference/api/pandas.core.resample.Resampler.median +generated/pandas.core.resample.Resampler.min,../reference/api/pandas.core.resample.Resampler.min +generated/pandas.core.resample.Resampler.nearest,../reference/api/pandas.core.resample.Resampler.nearest +generated/pandas.core.resample.Resampler.nunique,../reference/api/pandas.core.resample.Resampler.nunique +generated/pandas.core.resample.Resampler.ohlc,../reference/api/pandas.core.resample.Resampler.ohlc +generated/pandas.core.resample.Resampler.pad,../reference/api/pandas.core.resample.Resampler.pad +generated/pandas.core.resample.Resampler.pipe,../reference/api/pandas.core.resample.Resampler.pipe +generated/pandas.core.resample.Resampler.prod,../reference/api/pandas.core.resample.Resampler.prod +generated/pandas.core.resample.Resampler.quantile,../reference/api/pandas.core.resample.Resampler.quantile +generated/pandas.core.resample.Resampler.sem,../reference/api/pandas.core.resample.Resampler.sem +generated/pandas.core.resample.Resampler.size,../reference/api/pandas.core.resample.Resampler.size +generated/pandas.core.resample.Resampler.std,../reference/api/pandas.core.resample.Resampler.std +generated/pandas.core.resample.Resampler.sum,../reference/api/pandas.core.resample.Resampler.sum +generated/pandas.core.resample.Resampler.transform,../reference/api/pandas.core.resample.Resampler.transform +generated/pandas.core.resample.Resampler.var,../reference/api/pandas.core.resample.Resampler.var +generated/pandas.core.window.EWM.corr,../reference/api/pandas.core.window.EWM.corr +generated/pandas.core.window.EWM.cov,../reference/api/pandas.core.window.EWM.cov +generated/pandas.core.window.EWM.mean,../reference/api/pandas.core.window.EWM.mean +generated/pandas.core.window.EWM.std,../reference/api/pandas.core.window.EWM.std +generated/pandas.core.window.EWM.var,../reference/api/pandas.core.window.EWM.var +generated/pandas.core.window.Expanding.aggregate,../reference/api/pandas.core.window.Expanding.aggregate +generated/pandas.core.window.Expanding.apply,../reference/api/pandas.core.window.Expanding.apply +generated/pandas.core.window.Expanding.corr,../reference/api/pandas.core.window.Expanding.corr +generated/pandas.core.window.Expanding.count,../reference/api/pandas.core.window.Expanding.count +generated/pandas.core.window.Expanding.cov,../reference/api/pandas.core.window.Expanding.cov +generated/pandas.core.window.Expanding.kurt,../reference/api/pandas.core.window.Expanding.kurt +generated/pandas.core.window.Expanding.max,../reference/api/pandas.core.window.Expanding.max +generated/pandas.core.window.Expanding.mean,../reference/api/pandas.core.window.Expanding.mean +generated/pandas.core.window.Expanding.median,../reference/api/pandas.core.window.Expanding.median +generated/pandas.core.window.Expanding.min,../reference/api/pandas.core.window.Expanding.min +generated/pandas.core.window.Expanding.quantile,../reference/api/pandas.core.window.Expanding.quantile +generated/pandas.core.window.Expanding.skew,../reference/api/pandas.core.window.Expanding.skew +generated/pandas.core.window.Expanding.std,../reference/api/pandas.core.window.Expanding.std +generated/pandas.core.window.Expanding.sum,../reference/api/pandas.core.window.Expanding.sum +generated/pandas.core.window.Expanding.var,../reference/api/pandas.core.window.Expanding.var +generated/pandas.core.window.Rolling.aggregate,../reference/api/pandas.core.window.Rolling.aggregate +generated/pandas.core.window.Rolling.apply,../reference/api/pandas.core.window.Rolling.apply +generated/pandas.core.window.Rolling.corr,../reference/api/pandas.core.window.Rolling.corr +generated/pandas.core.window.Rolling.count,../reference/api/pandas.core.window.Rolling.count +generated/pandas.core.window.Rolling.cov,../reference/api/pandas.core.window.Rolling.cov +generated/pandas.core.window.Rolling.kurt,../reference/api/pandas.core.window.Rolling.kurt +generated/pandas.core.window.Rolling.max,../reference/api/pandas.core.window.Rolling.max +generated/pandas.core.window.Rolling.mean,../reference/api/pandas.core.window.Rolling.mean +generated/pandas.core.window.Rolling.median,../reference/api/pandas.core.window.Rolling.median +generated/pandas.core.window.Rolling.min,../reference/api/pandas.core.window.Rolling.min +generated/pandas.core.window.Rolling.quantile,../reference/api/pandas.core.window.Rolling.quantile +generated/pandas.core.window.Rolling.skew,../reference/api/pandas.core.window.Rolling.skew +generated/pandas.core.window.Rolling.std,../reference/api/pandas.core.window.Rolling.std +generated/pandas.core.window.Rolling.sum,../reference/api/pandas.core.window.Rolling.sum +generated/pandas.core.window.Rolling.var,../reference/api/pandas.core.window.Rolling.var +generated/pandas.core.window.Window.mean,../reference/api/pandas.core.window.Window.mean +generated/pandas.core.window.Window.sum,../reference/api/pandas.core.window.Window.sum +generated/pandas.crosstab,../reference/api/pandas.crosstab +generated/pandas.cut,../reference/api/pandas.cut +generated/pandas.DataFrame.abs,../reference/api/pandas.DataFrame.abs +generated/pandas.DataFrame.add,../reference/api/pandas.DataFrame.add +generated/pandas.DataFrame.add_prefix,../reference/api/pandas.DataFrame.add_prefix +generated/pandas.DataFrame.add_suffix,../reference/api/pandas.DataFrame.add_suffix +generated/pandas.DataFrame.agg,../reference/api/pandas.DataFrame.agg +generated/pandas.DataFrame.aggregate,../reference/api/pandas.DataFrame.aggregate +generated/pandas.DataFrame.align,../reference/api/pandas.DataFrame.align +generated/pandas.DataFrame.all,../reference/api/pandas.DataFrame.all +generated/pandas.DataFrame.any,../reference/api/pandas.DataFrame.any +generated/pandas.DataFrame.append,../reference/api/pandas.DataFrame.append +generated/pandas.DataFrame.apply,../reference/api/pandas.DataFrame.apply +generated/pandas.DataFrame.applymap,../reference/api/pandas.DataFrame.applymap +generated/pandas.DataFrame.as_blocks,../reference/api/pandas.DataFrame.as_blocks +generated/pandas.DataFrame.asfreq,../reference/api/pandas.DataFrame.asfreq +generated/pandas.DataFrame.as_matrix,../reference/api/pandas.DataFrame.as_matrix +generated/pandas.DataFrame.asof,../reference/api/pandas.DataFrame.asof +generated/pandas.DataFrame.assign,../reference/api/pandas.DataFrame.assign +generated/pandas.DataFrame.astype,../reference/api/pandas.DataFrame.astype +generated/pandas.DataFrame.at,../reference/api/pandas.DataFrame.at +generated/pandas.DataFrame.at_time,../reference/api/pandas.DataFrame.at_time +generated/pandas.DataFrame.axes,../reference/api/pandas.DataFrame.axes +generated/pandas.DataFrame.between_time,../reference/api/pandas.DataFrame.between_time +generated/pandas.DataFrame.bfill,../reference/api/pandas.DataFrame.bfill +generated/pandas.DataFrame.blocks,../reference/api/pandas.DataFrame.blocks +generated/pandas.DataFrame.bool,../reference/api/pandas.DataFrame.bool +generated/pandas.DataFrame.boxplot,../reference/api/pandas.DataFrame.boxplot +generated/pandas.DataFrame.clip,../reference/api/pandas.DataFrame.clip +generated/pandas.DataFrame.clip_lower,../reference/api/pandas.DataFrame.clip_lower +generated/pandas.DataFrame.clip_upper,../reference/api/pandas.DataFrame.clip_upper +generated/pandas.DataFrame.columns,../reference/api/pandas.DataFrame.columns +generated/pandas.DataFrame.combine_first,../reference/api/pandas.DataFrame.combine_first +generated/pandas.DataFrame.combine,../reference/api/pandas.DataFrame.combine +generated/pandas.DataFrame.compound,../reference/api/pandas.DataFrame.compound +generated/pandas.DataFrame.convert_objects,../reference/api/pandas.DataFrame.convert_objects +generated/pandas.DataFrame.copy,../reference/api/pandas.DataFrame.copy +generated/pandas.DataFrame.corr,../reference/api/pandas.DataFrame.corr +generated/pandas.DataFrame.corrwith,../reference/api/pandas.DataFrame.corrwith +generated/pandas.DataFrame.count,../reference/api/pandas.DataFrame.count +generated/pandas.DataFrame.cov,../reference/api/pandas.DataFrame.cov +generated/pandas.DataFrame.cummax,../reference/api/pandas.DataFrame.cummax +generated/pandas.DataFrame.cummin,../reference/api/pandas.DataFrame.cummin +generated/pandas.DataFrame.cumprod,../reference/api/pandas.DataFrame.cumprod +generated/pandas.DataFrame.cumsum,../reference/api/pandas.DataFrame.cumsum +generated/pandas.DataFrame.describe,../reference/api/pandas.DataFrame.describe +generated/pandas.DataFrame.diff,../reference/api/pandas.DataFrame.diff +generated/pandas.DataFrame.div,../reference/api/pandas.DataFrame.div +generated/pandas.DataFrame.divide,../reference/api/pandas.DataFrame.divide +generated/pandas.DataFrame.dot,../reference/api/pandas.DataFrame.dot +generated/pandas.DataFrame.drop_duplicates,../reference/api/pandas.DataFrame.drop_duplicates +generated/pandas.DataFrame.drop,../reference/api/pandas.DataFrame.drop +generated/pandas.DataFrame.droplevel,../reference/api/pandas.DataFrame.droplevel +generated/pandas.DataFrame.dropna,../reference/api/pandas.DataFrame.dropna +generated/pandas.DataFrame.dtypes,../reference/api/pandas.DataFrame.dtypes +generated/pandas.DataFrame.duplicated,../reference/api/pandas.DataFrame.duplicated +generated/pandas.DataFrame.empty,../reference/api/pandas.DataFrame.empty +generated/pandas.DataFrame.eq,../reference/api/pandas.DataFrame.eq +generated/pandas.DataFrame.equals,../reference/api/pandas.DataFrame.equals +generated/pandas.DataFrame.eval,../reference/api/pandas.DataFrame.eval +generated/pandas.DataFrame.ewm,../reference/api/pandas.DataFrame.ewm +generated/pandas.DataFrame.expanding,../reference/api/pandas.DataFrame.expanding +generated/pandas.DataFrame.ffill,../reference/api/pandas.DataFrame.ffill +generated/pandas.DataFrame.fillna,../reference/api/pandas.DataFrame.fillna +generated/pandas.DataFrame.filter,../reference/api/pandas.DataFrame.filter +generated/pandas.DataFrame.first,../reference/api/pandas.DataFrame.first +generated/pandas.DataFrame.first_valid_index,../reference/api/pandas.DataFrame.first_valid_index +generated/pandas.DataFrame.floordiv,../reference/api/pandas.DataFrame.floordiv +generated/pandas.DataFrame.from_csv,../reference/api/pandas.DataFrame.from_csv +generated/pandas.DataFrame.from_dict,../reference/api/pandas.DataFrame.from_dict +generated/pandas.DataFrame.from_items,../reference/api/pandas.DataFrame.from_items +generated/pandas.DataFrame.from_records,../reference/api/pandas.DataFrame.from_records +generated/pandas.DataFrame.ftypes,../reference/api/pandas.DataFrame.ftypes +generated/pandas.DataFrame.ge,../reference/api/pandas.DataFrame.ge +generated/pandas.DataFrame.get_dtype_counts,../reference/api/pandas.DataFrame.get_dtype_counts +generated/pandas.DataFrame.get_ftype_counts,../reference/api/pandas.DataFrame.get_ftype_counts +generated/pandas.DataFrame.get,../reference/api/pandas.DataFrame.get +generated/pandas.DataFrame.get_value,../reference/api/pandas.DataFrame.get_value +generated/pandas.DataFrame.get_values,../reference/api/pandas.DataFrame.get_values +generated/pandas.DataFrame.groupby,../reference/api/pandas.DataFrame.groupby +generated/pandas.DataFrame.gt,../reference/api/pandas.DataFrame.gt +generated/pandas.DataFrame.head,../reference/api/pandas.DataFrame.head +generated/pandas.DataFrame.hist,../reference/api/pandas.DataFrame.hist +generated/pandas.DataFrame,../reference/api/pandas.DataFrame +generated/pandas.DataFrame.iat,../reference/api/pandas.DataFrame.iat +generated/pandas.DataFrame.idxmax,../reference/api/pandas.DataFrame.idxmax +generated/pandas.DataFrame.idxmin,../reference/api/pandas.DataFrame.idxmin +generated/pandas.DataFrame.iloc,../reference/api/pandas.DataFrame.iloc +generated/pandas.DataFrame.index,../reference/api/pandas.DataFrame.index +generated/pandas.DataFrame.infer_objects,../reference/api/pandas.DataFrame.infer_objects +generated/pandas.DataFrame.info,../reference/api/pandas.DataFrame.info +generated/pandas.DataFrame.insert,../reference/api/pandas.DataFrame.insert +generated/pandas.DataFrame.interpolate,../reference/api/pandas.DataFrame.interpolate +generated/pandas.DataFrame.is_copy,../reference/api/pandas.DataFrame.is_copy +generated/pandas.DataFrame.isin,../reference/api/pandas.DataFrame.isin +generated/pandas.DataFrame.isna,../reference/api/pandas.DataFrame.isna +generated/pandas.DataFrame.isnull,../reference/api/pandas.DataFrame.isnull +generated/pandas.DataFrame.items,../reference/api/pandas.DataFrame.items +generated/pandas.DataFrame.__iter__,../reference/api/pandas.DataFrame.__iter__ +generated/pandas.DataFrame.iteritems,../reference/api/pandas.DataFrame.iteritems +generated/pandas.DataFrame.iterrows,../reference/api/pandas.DataFrame.iterrows +generated/pandas.DataFrame.itertuples,../reference/api/pandas.DataFrame.itertuples +generated/pandas.DataFrame.ix,../reference/api/pandas.DataFrame.ix +generated/pandas.DataFrame.join,../reference/api/pandas.DataFrame.join +generated/pandas.DataFrame.keys,../reference/api/pandas.DataFrame.keys +generated/pandas.DataFrame.kurt,../reference/api/pandas.DataFrame.kurt +generated/pandas.DataFrame.kurtosis,../reference/api/pandas.DataFrame.kurtosis +generated/pandas.DataFrame.last,../reference/api/pandas.DataFrame.last +generated/pandas.DataFrame.last_valid_index,../reference/api/pandas.DataFrame.last_valid_index +generated/pandas.DataFrame.le,../reference/api/pandas.DataFrame.le +generated/pandas.DataFrame.loc,../reference/api/pandas.DataFrame.loc +generated/pandas.DataFrame.lookup,../reference/api/pandas.DataFrame.lookup +generated/pandas.DataFrame.lt,../reference/api/pandas.DataFrame.lt +generated/pandas.DataFrame.mad,../reference/api/pandas.DataFrame.mad +generated/pandas.DataFrame.mask,../reference/api/pandas.DataFrame.mask +generated/pandas.DataFrame.max,../reference/api/pandas.DataFrame.max +generated/pandas.DataFrame.mean,../reference/api/pandas.DataFrame.mean +generated/pandas.DataFrame.median,../reference/api/pandas.DataFrame.median +generated/pandas.DataFrame.melt,../reference/api/pandas.DataFrame.melt +generated/pandas.DataFrame.memory_usage,../reference/api/pandas.DataFrame.memory_usage +generated/pandas.DataFrame.merge,../reference/api/pandas.DataFrame.merge +generated/pandas.DataFrame.min,../reference/api/pandas.DataFrame.min +generated/pandas.DataFrame.mode,../reference/api/pandas.DataFrame.mode +generated/pandas.DataFrame.mod,../reference/api/pandas.DataFrame.mod +generated/pandas.DataFrame.mul,../reference/api/pandas.DataFrame.mul +generated/pandas.DataFrame.multiply,../reference/api/pandas.DataFrame.multiply +generated/pandas.DataFrame.ndim,../reference/api/pandas.DataFrame.ndim +generated/pandas.DataFrame.ne,../reference/api/pandas.DataFrame.ne +generated/pandas.DataFrame.nlargest,../reference/api/pandas.DataFrame.nlargest +generated/pandas.DataFrame.notna,../reference/api/pandas.DataFrame.notna +generated/pandas.DataFrame.notnull,../reference/api/pandas.DataFrame.notnull +generated/pandas.DataFrame.nsmallest,../reference/api/pandas.DataFrame.nsmallest +generated/pandas.DataFrame.nunique,../reference/api/pandas.DataFrame.nunique +generated/pandas.DataFrame.pct_change,../reference/api/pandas.DataFrame.pct_change +generated/pandas.DataFrame.pipe,../reference/api/pandas.DataFrame.pipe +generated/pandas.DataFrame.pivot,../reference/api/pandas.DataFrame.pivot +generated/pandas.DataFrame.pivot_table,../reference/api/pandas.DataFrame.pivot_table +generated/pandas.DataFrame.plot.barh,../reference/api/pandas.DataFrame.plot.barh +generated/pandas.DataFrame.plot.bar,../reference/api/pandas.DataFrame.plot.bar +generated/pandas.DataFrame.plot.box,../reference/api/pandas.DataFrame.plot.box +generated/pandas.DataFrame.plot.density,../reference/api/pandas.DataFrame.plot.density +generated/pandas.DataFrame.plot.hexbin,../reference/api/pandas.DataFrame.plot.hexbin +generated/pandas.DataFrame.plot.hist,../reference/api/pandas.DataFrame.plot.hist +generated/pandas.DataFrame.plot,../reference/api/pandas.DataFrame.plot +generated/pandas.DataFrame.plot.kde,../reference/api/pandas.DataFrame.plot.kde +generated/pandas.DataFrame.plot.line,../reference/api/pandas.DataFrame.plot.line +generated/pandas.DataFrame.plot.pie,../reference/api/pandas.DataFrame.plot.pie +generated/pandas.DataFrame.plot.scatter,../reference/api/pandas.DataFrame.plot.scatter +generated/pandas.DataFrame.pop,../reference/api/pandas.DataFrame.pop +generated/pandas.DataFrame.pow,../reference/api/pandas.DataFrame.pow +generated/pandas.DataFrame.prod,../reference/api/pandas.DataFrame.prod +generated/pandas.DataFrame.product,../reference/api/pandas.DataFrame.product +generated/pandas.DataFrame.quantile,../reference/api/pandas.DataFrame.quantile +generated/pandas.DataFrame.query,../reference/api/pandas.DataFrame.query +generated/pandas.DataFrame.radd,../reference/api/pandas.DataFrame.radd +generated/pandas.DataFrame.rank,../reference/api/pandas.DataFrame.rank +generated/pandas.DataFrame.rdiv,../reference/api/pandas.DataFrame.rdiv +generated/pandas.DataFrame.reindex_axis,../reference/api/pandas.DataFrame.reindex_axis +generated/pandas.DataFrame.reindex,../reference/api/pandas.DataFrame.reindex +generated/pandas.DataFrame.reindex_like,../reference/api/pandas.DataFrame.reindex_like +generated/pandas.DataFrame.rename_axis,../reference/api/pandas.DataFrame.rename_axis +generated/pandas.DataFrame.rename,../reference/api/pandas.DataFrame.rename +generated/pandas.DataFrame.reorder_levels,../reference/api/pandas.DataFrame.reorder_levels +generated/pandas.DataFrame.replace,../reference/api/pandas.DataFrame.replace +generated/pandas.DataFrame.resample,../reference/api/pandas.DataFrame.resample +generated/pandas.DataFrame.reset_index,../reference/api/pandas.DataFrame.reset_index +generated/pandas.DataFrame.rfloordiv,../reference/api/pandas.DataFrame.rfloordiv +generated/pandas.DataFrame.rmod,../reference/api/pandas.DataFrame.rmod +generated/pandas.DataFrame.rmul,../reference/api/pandas.DataFrame.rmul +generated/pandas.DataFrame.rolling,../reference/api/pandas.DataFrame.rolling +generated/pandas.DataFrame.round,../reference/api/pandas.DataFrame.round +generated/pandas.DataFrame.rpow,../reference/api/pandas.DataFrame.rpow +generated/pandas.DataFrame.rsub,../reference/api/pandas.DataFrame.rsub +generated/pandas.DataFrame.rtruediv,../reference/api/pandas.DataFrame.rtruediv +generated/pandas.DataFrame.sample,../reference/api/pandas.DataFrame.sample +generated/pandas.DataFrame.select_dtypes,../reference/api/pandas.DataFrame.select_dtypes +generated/pandas.DataFrame.select,../reference/api/pandas.DataFrame.select +generated/pandas.DataFrame.sem,../reference/api/pandas.DataFrame.sem +generated/pandas.DataFrame.set_axis,../reference/api/pandas.DataFrame.set_axis +generated/pandas.DataFrame.set_index,../reference/api/pandas.DataFrame.set_index +generated/pandas.DataFrame.set_value,../reference/api/pandas.DataFrame.set_value +generated/pandas.DataFrame.shape,../reference/api/pandas.DataFrame.shape +generated/pandas.DataFrame.shift,../reference/api/pandas.DataFrame.shift +generated/pandas.DataFrame.size,../reference/api/pandas.DataFrame.size +generated/pandas.DataFrame.skew,../reference/api/pandas.DataFrame.skew +generated/pandas.DataFrame.slice_shift,../reference/api/pandas.DataFrame.slice_shift +generated/pandas.DataFrame.sort_index,../reference/api/pandas.DataFrame.sort_index +generated/pandas.DataFrame.sort_values,../reference/api/pandas.DataFrame.sort_values +generated/pandas.DataFrame.squeeze,../reference/api/pandas.DataFrame.squeeze +generated/pandas.DataFrame.stack,../reference/api/pandas.DataFrame.stack +generated/pandas.DataFrame.std,../reference/api/pandas.DataFrame.std +generated/pandas.DataFrame.style,../reference/api/pandas.DataFrame.style +generated/pandas.DataFrame.sub,../reference/api/pandas.DataFrame.sub +generated/pandas.DataFrame.subtract,../reference/api/pandas.DataFrame.subtract +generated/pandas.DataFrame.sum,../reference/api/pandas.DataFrame.sum +generated/pandas.DataFrame.swapaxes,../reference/api/pandas.DataFrame.swapaxes +generated/pandas.DataFrame.swaplevel,../reference/api/pandas.DataFrame.swaplevel +generated/pandas.DataFrame.tail,../reference/api/pandas.DataFrame.tail +generated/pandas.DataFrame.take,../reference/api/pandas.DataFrame.take +generated/pandas.DataFrame.T,../reference/api/pandas.DataFrame.T +generated/pandas.DataFrame.timetuple,../reference/api/pandas.DataFrame.timetuple +generated/pandas.DataFrame.to_clipboard,../reference/api/pandas.DataFrame.to_clipboard +generated/pandas.DataFrame.to_csv,../reference/api/pandas.DataFrame.to_csv +generated/pandas.DataFrame.to_dense,../reference/api/pandas.DataFrame.to_dense +generated/pandas.DataFrame.to_dict,../reference/api/pandas.DataFrame.to_dict +generated/pandas.DataFrame.to_excel,../reference/api/pandas.DataFrame.to_excel +generated/pandas.DataFrame.to_feather,../reference/api/pandas.DataFrame.to_feather +generated/pandas.DataFrame.to_gbq,../reference/api/pandas.DataFrame.to_gbq +generated/pandas.DataFrame.to_hdf,../reference/api/pandas.DataFrame.to_hdf +generated/pandas.DataFrame.to,../reference/api/pandas.DataFrame.to +generated/pandas.DataFrame.to_json,../reference/api/pandas.DataFrame.to_json +generated/pandas.DataFrame.to_latex,../reference/api/pandas.DataFrame.to_latex +generated/pandas.DataFrame.to_msgpack,../reference/api/pandas.DataFrame.to_msgpack +generated/pandas.DataFrame.to_numpy,../reference/api/pandas.DataFrame.to_numpy +generated/pandas.DataFrame.to_panel,../reference/api/pandas.DataFrame.to_panel +generated/pandas.DataFrame.to_parquet,../reference/api/pandas.DataFrame.to_parquet +generated/pandas.DataFrame.to_period,../reference/api/pandas.DataFrame.to_period +generated/pandas.DataFrame.to_pickle,../reference/api/pandas.DataFrame.to_pickle +generated/pandas.DataFrame.to_records,../reference/api/pandas.DataFrame.to_records +generated/pandas.DataFrame.to_sparse,../reference/api/pandas.DataFrame.to_sparse +generated/pandas.DataFrame.to_sql,../reference/api/pandas.DataFrame.to_sql +generated/pandas.DataFrame.to_stata,../reference/api/pandas.DataFrame.to_stata +generated/pandas.DataFrame.to_string,../reference/api/pandas.DataFrame.to_string +generated/pandas.DataFrame.to_timestamp,../reference/api/pandas.DataFrame.to_timestamp +generated/pandas.DataFrame.to_xarray,../reference/api/pandas.DataFrame.to_xarray +generated/pandas.DataFrame.transform,../reference/api/pandas.DataFrame.transform +generated/pandas.DataFrame.transpose,../reference/api/pandas.DataFrame.transpose +generated/pandas.DataFrame.truediv,../reference/api/pandas.DataFrame.truediv +generated/pandas.DataFrame.truncate,../reference/api/pandas.DataFrame.truncate +generated/pandas.DataFrame.tshift,../reference/api/pandas.DataFrame.tshift +generated/pandas.DataFrame.tz_convert,../reference/api/pandas.DataFrame.tz_convert +generated/pandas.DataFrame.tz_localize,../reference/api/pandas.DataFrame.tz_localize +generated/pandas.DataFrame.unstack,../reference/api/pandas.DataFrame.unstack +generated/pandas.DataFrame.update,../reference/api/pandas.DataFrame.update +generated/pandas.DataFrame.values,../reference/api/pandas.DataFrame.values +generated/pandas.DataFrame.var,../reference/api/pandas.DataFrame.var +generated/pandas.DataFrame.where,../reference/api/pandas.DataFrame.where +generated/pandas.DataFrame.xs,../reference/api/pandas.DataFrame.xs +generated/pandas.date_range,../reference/api/pandas.date_range +generated/pandas.DatetimeIndex.ceil,../reference/api/pandas.DatetimeIndex.ceil +generated/pandas.DatetimeIndex.date,../reference/api/pandas.DatetimeIndex.date +generated/pandas.DatetimeIndex.day,../reference/api/pandas.DatetimeIndex.day +generated/pandas.DatetimeIndex.day_name,../reference/api/pandas.DatetimeIndex.day_name +generated/pandas.DatetimeIndex.dayofweek,../reference/api/pandas.DatetimeIndex.dayofweek +generated/pandas.DatetimeIndex.dayofyear,../reference/api/pandas.DatetimeIndex.dayofyear +generated/pandas.DatetimeIndex.floor,../reference/api/pandas.DatetimeIndex.floor +generated/pandas.DatetimeIndex.freq,../reference/api/pandas.DatetimeIndex.freq +generated/pandas.DatetimeIndex.freqstr,../reference/api/pandas.DatetimeIndex.freqstr +generated/pandas.DatetimeIndex.hour,../reference/api/pandas.DatetimeIndex.hour +generated/pandas.DatetimeIndex,../reference/api/pandas.DatetimeIndex +generated/pandas.DatetimeIndex.indexer_at_time,../reference/api/pandas.DatetimeIndex.indexer_at_time +generated/pandas.DatetimeIndex.indexer_between_time,../reference/api/pandas.DatetimeIndex.indexer_between_time +generated/pandas.DatetimeIndex.inferred_freq,../reference/api/pandas.DatetimeIndex.inferred_freq +generated/pandas.DatetimeIndex.is_leap_year,../reference/api/pandas.DatetimeIndex.is_leap_year +generated/pandas.DatetimeIndex.is_month_end,../reference/api/pandas.DatetimeIndex.is_month_end +generated/pandas.DatetimeIndex.is_month_start,../reference/api/pandas.DatetimeIndex.is_month_start +generated/pandas.DatetimeIndex.is_quarter_end,../reference/api/pandas.DatetimeIndex.is_quarter_end +generated/pandas.DatetimeIndex.is_quarter_start,../reference/api/pandas.DatetimeIndex.is_quarter_start +generated/pandas.DatetimeIndex.is_year_end,../reference/api/pandas.DatetimeIndex.is_year_end +generated/pandas.DatetimeIndex.is_year_start,../reference/api/pandas.DatetimeIndex.is_year_start +generated/pandas.DatetimeIndex.microsecond,../reference/api/pandas.DatetimeIndex.microsecond +generated/pandas.DatetimeIndex.minute,../reference/api/pandas.DatetimeIndex.minute +generated/pandas.DatetimeIndex.month,../reference/api/pandas.DatetimeIndex.month +generated/pandas.DatetimeIndex.month_name,../reference/api/pandas.DatetimeIndex.month_name +generated/pandas.DatetimeIndex.nanosecond,../reference/api/pandas.DatetimeIndex.nanosecond +generated/pandas.DatetimeIndex.normalize,../reference/api/pandas.DatetimeIndex.normalize +generated/pandas.DatetimeIndex.quarter,../reference/api/pandas.DatetimeIndex.quarter +generated/pandas.DatetimeIndex.round,../reference/api/pandas.DatetimeIndex.round +generated/pandas.DatetimeIndex.second,../reference/api/pandas.DatetimeIndex.second +generated/pandas.DatetimeIndex.snap,../reference/api/pandas.DatetimeIndex.snap +generated/pandas.DatetimeIndex.strftime,../reference/api/pandas.DatetimeIndex.strftime +generated/pandas.DatetimeIndex.time,../reference/api/pandas.DatetimeIndex.time +generated/pandas.DatetimeIndex.timetz,../reference/api/pandas.DatetimeIndex.timetz +generated/pandas.DatetimeIndex.to_frame,../reference/api/pandas.DatetimeIndex.to_frame +generated/pandas.DatetimeIndex.to_perioddelta,../reference/api/pandas.DatetimeIndex.to_perioddelta +generated/pandas.DatetimeIndex.to_period,../reference/api/pandas.DatetimeIndex.to_period +generated/pandas.DatetimeIndex.to_pydatetime,../reference/api/pandas.DatetimeIndex.to_pydatetime +generated/pandas.DatetimeIndex.to_series,../reference/api/pandas.DatetimeIndex.to_series +generated/pandas.DatetimeIndex.tz_convert,../reference/api/pandas.DatetimeIndex.tz_convert +generated/pandas.DatetimeIndex.tz,../reference/api/pandas.DatetimeIndex.tz +generated/pandas.DatetimeIndex.tz_localize,../reference/api/pandas.DatetimeIndex.tz_localize +generated/pandas.DatetimeIndex.weekday,../reference/api/pandas.DatetimeIndex.weekday +generated/pandas.DatetimeIndex.week,../reference/api/pandas.DatetimeIndex.week +generated/pandas.DatetimeIndex.weekofyear,../reference/api/pandas.DatetimeIndex.weekofyear +generated/pandas.DatetimeIndex.year,../reference/api/pandas.DatetimeIndex.year +generated/pandas.DatetimeTZDtype.base,../reference/api/pandas.DatetimeTZDtype.base +generated/pandas.DatetimeTZDtype.construct_array_type,../reference/api/pandas.DatetimeTZDtype.construct_array_type +generated/pandas.DatetimeTZDtype.construct_from_string,../reference/api/pandas.DatetimeTZDtype.construct_from_string +generated/pandas.DatetimeTZDtype,../reference/api/pandas.DatetimeTZDtype +generated/pandas.DatetimeTZDtype.isbuiltin,../reference/api/pandas.DatetimeTZDtype.isbuiltin +generated/pandas.DatetimeTZDtype.is_dtype,../reference/api/pandas.DatetimeTZDtype.is_dtype +generated/pandas.DatetimeTZDtype.isnative,../reference/api/pandas.DatetimeTZDtype.isnative +generated/pandas.DatetimeTZDtype.itemsize,../reference/api/pandas.DatetimeTZDtype.itemsize +generated/pandas.DatetimeTZDtype.kind,../reference/api/pandas.DatetimeTZDtype.kind +generated/pandas.DatetimeTZDtype.name,../reference/api/pandas.DatetimeTZDtype.name +generated/pandas.DatetimeTZDtype.names,../reference/api/pandas.DatetimeTZDtype.names +generated/pandas.DatetimeTZDtype.na_value,../reference/api/pandas.DatetimeTZDtype.na_value +generated/pandas.DatetimeTZDtype.num,../reference/api/pandas.DatetimeTZDtype.num +generated/pandas.DatetimeTZDtype.reset_cache,../reference/api/pandas.DatetimeTZDtype.reset_cache +generated/pandas.DatetimeTZDtype.shape,../reference/api/pandas.DatetimeTZDtype.shape +generated/pandas.DatetimeTZDtype.str,../reference/api/pandas.DatetimeTZDtype.str +generated/pandas.DatetimeTZDtype.subdtype,../reference/api/pandas.DatetimeTZDtype.subdtype +generated/pandas.DatetimeTZDtype.tz,../reference/api/pandas.DatetimeTZDtype.tz +generated/pandas.DatetimeTZDtype.unit,../reference/api/pandas.DatetimeTZDtype.unit +generated/pandas.describe_option,../reference/api/pandas.describe_option +generated/pandas.errors.DtypeWarning,../reference/api/pandas.errors.DtypeWarning +generated/pandas.errors.EmptyDataError,../reference/api/pandas.errors.EmptyDataError +generated/pandas.errors.OutOfBoundsDatetime,../reference/api/pandas.errors.OutOfBoundsDatetime +generated/pandas.errors.ParserError,../reference/api/pandas.errors.ParserError +generated/pandas.errors.ParserWarning,../reference/api/pandas.errors.ParserWarning +generated/pandas.errors.PerformanceWarning,../reference/api/pandas.errors.PerformanceWarning +generated/pandas.errors.UnsortedIndexError,../reference/api/pandas.errors.UnsortedIndexError +generated/pandas.errors.UnsupportedFunctionCall,../reference/api/pandas.errors.UnsupportedFunctionCall +generated/pandas.eval,../reference/api/pandas.eval +generated/pandas.ExcelFile.parse,../reference/api/pandas.ExcelFile.parse +generated/pandas.ExcelWriter,../reference/api/pandas.ExcelWriter +generated/pandas.factorize,../reference/api/pandas.factorize +generated/pandas.Float64Index,../reference/api/pandas.Float64Index +generated/pandas.get_dummies,../reference/api/pandas.get_dummies +generated/pandas.get_option,../reference/api/pandas.get_option +generated/pandas.Grouper,../reference/api/pandas.Grouper +generated/pandas.HDFStore.append,../reference/api/pandas.HDFStore.append +generated/pandas.HDFStore.get,../reference/api/pandas.HDFStore.get +generated/pandas.HDFStore.groups,../reference/api/pandas.HDFStore.groups +generated/pandas.HDFStore.info,../reference/api/pandas.HDFStore.info +generated/pandas.HDFStore.keys,../reference/api/pandas.HDFStore.keys +generated/pandas.HDFStore.put,../reference/api/pandas.HDFStore.put +generated/pandas.HDFStore.select,../reference/api/pandas.HDFStore.select +generated/pandas.HDFStore.walk,../reference/api/pandas.HDFStore.walk +generated/pandas.Index.all,../reference/api/pandas.Index.all +generated/pandas.Index.any,../reference/api/pandas.Index.any +generated/pandas.Index.append,../reference/api/pandas.Index.append +generated/pandas.Index.argmax,../reference/api/pandas.Index.argmax +generated/pandas.Index.argmin,../reference/api/pandas.Index.argmin +generated/pandas.Index.argsort,../reference/api/pandas.Index.argsort +generated/pandas.Index.array,../reference/api/pandas.Index.array +generated/pandas.Index.asi8,../reference/api/pandas.Index.asi8 +generated/pandas.Index.asof,../reference/api/pandas.Index.asof +generated/pandas.Index.asof_locs,../reference/api/pandas.Index.asof_locs +generated/pandas.Index.astype,../reference/api/pandas.Index.astype +generated/pandas.Index.base,../reference/api/pandas.Index.base +generated/pandas.Index.contains,../reference/api/pandas.Index.contains +generated/pandas.Index.copy,../reference/api/pandas.Index.copy +generated/pandas.Index.data,../reference/api/pandas.Index.data +generated/pandas.Index.delete,../reference/api/pandas.Index.delete +generated/pandas.Index.difference,../reference/api/pandas.Index.difference +generated/pandas.Index.drop_duplicates,../reference/api/pandas.Index.drop_duplicates +generated/pandas.Index.drop,../reference/api/pandas.Index.drop +generated/pandas.Index.droplevel,../reference/api/pandas.Index.droplevel +generated/pandas.Index.dropna,../reference/api/pandas.Index.dropna +generated/pandas.Index.dtype,../reference/api/pandas.Index.dtype +generated/pandas.Index.dtype_str,../reference/api/pandas.Index.dtype_str +generated/pandas.Index.duplicated,../reference/api/pandas.Index.duplicated +generated/pandas.Index.empty,../reference/api/pandas.Index.empty +generated/pandas.Index.equals,../reference/api/pandas.Index.equals +generated/pandas.Index.factorize,../reference/api/pandas.Index.factorize +generated/pandas.Index.fillna,../reference/api/pandas.Index.fillna +generated/pandas.Index.flags,../reference/api/pandas.Index.flags +generated/pandas.Index.format,../reference/api/pandas.Index.format +generated/pandas.Index.get_duplicates,../reference/api/pandas.Index.get_duplicates +generated/pandas.Index.get_indexer_for,../reference/api/pandas.Index.get_indexer_for +generated/pandas.Index.get_indexer,../reference/api/pandas.Index.get_indexer +generated/pandas.Index.get_indexer_non_unique,../reference/api/pandas.Index.get_indexer_non_unique +generated/pandas.Index.get_level_values,../reference/api/pandas.Index.get_level_values +generated/pandas.Index.get_loc,../reference/api/pandas.Index.get_loc +generated/pandas.Index.get_slice_bound,../reference/api/pandas.Index.get_slice_bound +generated/pandas.Index.get_value,../reference/api/pandas.Index.get_value +generated/pandas.Index.get_values,../reference/api/pandas.Index.get_values +generated/pandas.Index.groupby,../reference/api/pandas.Index.groupby +generated/pandas.Index.has_duplicates,../reference/api/pandas.Index.has_duplicates +generated/pandas.Index.hasnans,../reference/api/pandas.Index.hasnans +generated/pandas.Index.holds_integer,../reference/api/pandas.Index.holds_integer +generated/pandas.Index,../reference/api/pandas.Index +generated/pandas.Index.identical,../reference/api/pandas.Index.identical +generated/pandas.Index.inferred_type,../reference/api/pandas.Index.inferred_type +generated/pandas.Index.insert,../reference/api/pandas.Index.insert +generated/pandas.Index.intersection,../reference/api/pandas.Index.intersection +generated/pandas.Index.is_all_dates,../reference/api/pandas.Index.is_all_dates +generated/pandas.Index.is_boolean,../reference/api/pandas.Index.is_boolean +generated/pandas.Index.is_categorical,../reference/api/pandas.Index.is_categorical +generated/pandas.Index.is_floating,../reference/api/pandas.Index.is_floating +generated/pandas.Index.is_,../reference/api/pandas.Index.is_ +generated/pandas.Index.isin,../reference/api/pandas.Index.isin +generated/pandas.Index.is_integer,../reference/api/pandas.Index.is_integer +generated/pandas.Index.is_interval,../reference/api/pandas.Index.is_interval +generated/pandas.Index.is_lexsorted_for_tuple,../reference/api/pandas.Index.is_lexsorted_for_tuple +generated/pandas.Index.is_mixed,../reference/api/pandas.Index.is_mixed +generated/pandas.Index.is_monotonic_decreasing,../reference/api/pandas.Index.is_monotonic_decreasing +generated/pandas.Index.is_monotonic,../reference/api/pandas.Index.is_monotonic +generated/pandas.Index.is_monotonic_increasing,../reference/api/pandas.Index.is_monotonic_increasing +generated/pandas.Index.isna,../reference/api/pandas.Index.isna +generated/pandas.Index.isnull,../reference/api/pandas.Index.isnull +generated/pandas.Index.is_numeric,../reference/api/pandas.Index.is_numeric +generated/pandas.Index.is_object,../reference/api/pandas.Index.is_object +generated/pandas.Index.is_type_compatible,../reference/api/pandas.Index.is_type_compatible +generated/pandas.Index.is_unique,../reference/api/pandas.Index.is_unique +generated/pandas.Index.item,../reference/api/pandas.Index.item +generated/pandas.Index.itemsize,../reference/api/pandas.Index.itemsize +generated/pandas.Index.join,../reference/api/pandas.Index.join +generated/pandas.Index.map,../reference/api/pandas.Index.map +generated/pandas.Index.max,../reference/api/pandas.Index.max +generated/pandas.Index.memory_usage,../reference/api/pandas.Index.memory_usage +generated/pandas.Index.min,../reference/api/pandas.Index.min +generated/pandas.Index.name,../reference/api/pandas.Index.name +generated/pandas.Index.names,../reference/api/pandas.Index.names +generated/pandas.Index.nbytes,../reference/api/pandas.Index.nbytes +generated/pandas.Index.ndim,../reference/api/pandas.Index.ndim +generated/pandas.Index.nlevels,../reference/api/pandas.Index.nlevels +generated/pandas.Index.notna,../reference/api/pandas.Index.notna +generated/pandas.Index.notnull,../reference/api/pandas.Index.notnull +generated/pandas.Index.nunique,../reference/api/pandas.Index.nunique +generated/pandas.Index.putmask,../reference/api/pandas.Index.putmask +generated/pandas.Index.ravel,../reference/api/pandas.Index.ravel +generated/pandas.Index.reindex,../reference/api/pandas.Index.reindex +generated/pandas.Index.rename,../reference/api/pandas.Index.rename +generated/pandas.Index.repeat,../reference/api/pandas.Index.repeat +generated/pandas.Index.searchsorted,../reference/api/pandas.Index.searchsorted +generated/pandas.Index.set_names,../reference/api/pandas.Index.set_names +generated/pandas.Index.set_value,../reference/api/pandas.Index.set_value +generated/pandas.Index.shape,../reference/api/pandas.Index.shape +generated/pandas.Index.shift,../reference/api/pandas.Index.shift +generated/pandas.Index.size,../reference/api/pandas.Index.size +generated/pandas.IndexSlice,../reference/api/pandas.IndexSlice +generated/pandas.Index.slice_indexer,../reference/api/pandas.Index.slice_indexer +generated/pandas.Index.slice_locs,../reference/api/pandas.Index.slice_locs +generated/pandas.Index.sort,../reference/api/pandas.Index.sort +generated/pandas.Index.sortlevel,../reference/api/pandas.Index.sortlevel +generated/pandas.Index.sort_values,../reference/api/pandas.Index.sort_values +generated/pandas.Index.str,../reference/api/pandas.Index.str +generated/pandas.Index.strides,../reference/api/pandas.Index.strides +generated/pandas.Index.summary,../reference/api/pandas.Index.summary +generated/pandas.Index.symmetric_difference,../reference/api/pandas.Index.symmetric_difference +generated/pandas.Index.take,../reference/api/pandas.Index.take +generated/pandas.Index.T,../reference/api/pandas.Index.T +generated/pandas.Index.to_flat_index,../reference/api/pandas.Index.to_flat_index +generated/pandas.Index.to_frame,../reference/api/pandas.Index.to_frame +generated/pandas.Index.to_list,../reference/api/pandas.Index.to_list +generated/pandas.Index.tolist,../reference/api/pandas.Index.tolist +generated/pandas.Index.to_native_types,../reference/api/pandas.Index.to_native_types +generated/pandas.Index.to_numpy,../reference/api/pandas.Index.to_numpy +generated/pandas.Index.to_series,../reference/api/pandas.Index.to_series +generated/pandas.Index.transpose,../reference/api/pandas.Index.transpose +generated/pandas.Index.union,../reference/api/pandas.Index.union +generated/pandas.Index.unique,../reference/api/pandas.Index.unique +generated/pandas.Index.value_counts,../reference/api/pandas.Index.value_counts +generated/pandas.Index.values,../reference/api/pandas.Index.values +generated/pandas.Index.view,../reference/api/pandas.Index.view +generated/pandas.Index.where,../reference/api/pandas.Index.where +generated/pandas.infer_freq,../reference/api/pandas.infer_freq +generated/pandas.Interval.closed,../reference/api/pandas.Interval.closed +generated/pandas.Interval.closed_left,../reference/api/pandas.Interval.closed_left +generated/pandas.Interval.closed_right,../reference/api/pandas.Interval.closed_right +generated/pandas.Interval,../reference/api/pandas.Interval +generated/pandas.IntervalIndex.closed,../reference/api/pandas.IntervalIndex.closed +generated/pandas.IntervalIndex.contains,../reference/api/pandas.IntervalIndex.contains +generated/pandas.IntervalIndex.from_arrays,../reference/api/pandas.IntervalIndex.from_arrays +generated/pandas.IntervalIndex.from_breaks,../reference/api/pandas.IntervalIndex.from_breaks +generated/pandas.IntervalIndex.from_tuples,../reference/api/pandas.IntervalIndex.from_tuples +generated/pandas.IntervalIndex.get_indexer,../reference/api/pandas.IntervalIndex.get_indexer +generated/pandas.IntervalIndex.get_loc,../reference/api/pandas.IntervalIndex.get_loc +generated/pandas.IntervalIndex,../reference/api/pandas.IntervalIndex +generated/pandas.IntervalIndex.is_non_overlapping_monotonic,../reference/api/pandas.IntervalIndex.is_non_overlapping_monotonic +generated/pandas.IntervalIndex.is_overlapping,../reference/api/pandas.IntervalIndex.is_overlapping +generated/pandas.IntervalIndex.left,../reference/api/pandas.IntervalIndex.left +generated/pandas.IntervalIndex.length,../reference/api/pandas.IntervalIndex.length +generated/pandas.IntervalIndex.mid,../reference/api/pandas.IntervalIndex.mid +generated/pandas.IntervalIndex.overlaps,../reference/api/pandas.IntervalIndex.overlaps +generated/pandas.IntervalIndex.right,../reference/api/pandas.IntervalIndex.right +generated/pandas.IntervalIndex.set_closed,../reference/api/pandas.IntervalIndex.set_closed +generated/pandas.IntervalIndex.to_tuples,../reference/api/pandas.IntervalIndex.to_tuples +generated/pandas.IntervalIndex.values,../reference/api/pandas.IntervalIndex.values +generated/pandas.Interval.left,../reference/api/pandas.Interval.left +generated/pandas.Interval.length,../reference/api/pandas.Interval.length +generated/pandas.Interval.mid,../reference/api/pandas.Interval.mid +generated/pandas.Interval.open_left,../reference/api/pandas.Interval.open_left +generated/pandas.Interval.open_right,../reference/api/pandas.Interval.open_right +generated/pandas.Interval.overlaps,../reference/api/pandas.Interval.overlaps +generated/pandas.interval_range,../reference/api/pandas.interval_range +generated/pandas.Interval.right,../reference/api/pandas.Interval.right +generated/pandas.io.formats.style.Styler.apply,../reference/api/pandas.io.formats.style.Styler.apply +generated/pandas.io.formats.style.Styler.applymap,../reference/api/pandas.io.formats.style.Styler.applymap +generated/pandas.io.formats.style.Styler.background_gradient,../reference/api/pandas.io.formats.style.Styler.background_gradient +generated/pandas.io.formats.style.Styler.bar,../reference/api/pandas.io.formats.style.Styler.bar +generated/pandas.io.formats.style.Styler.clear,../reference/api/pandas.io.formats.style.Styler.clear +generated/pandas.io.formats.style.Styler.env,../reference/api/pandas.io.formats.style.Styler.env +generated/pandas.io.formats.style.Styler.export,../reference/api/pandas.io.formats.style.Styler.export +generated/pandas.io.formats.style.Styler.format,../reference/api/pandas.io.formats.style.Styler.format +generated/pandas.io.formats.style.Styler.from_custom_template,../reference/api/pandas.io.formats.style.Styler.from_custom_template +generated/pandas.io.formats.style.Styler.hide_columns,../reference/api/pandas.io.formats.style.Styler.hide_columns +generated/pandas.io.formats.style.Styler.hide_index,../reference/api/pandas.io.formats.style.Styler.hide_index +generated/pandas.io.formats.style.Styler.highlight_max,../reference/api/pandas.io.formats.style.Styler.highlight_max +generated/pandas.io.formats.style.Styler.highlight_min,../reference/api/pandas.io.formats.style.Styler.highlight_min +generated/pandas.io.formats.style.Styler.highlight_null,../reference/api/pandas.io.formats.style.Styler.highlight_null +generated/pandas.io.formats.style.Styler,../reference/api/pandas.io.formats.style.Styler +generated/pandas.io.formats.style.Styler.loader,../reference/api/pandas.io.formats.style.Styler.loader +generated/pandas.io.formats.style.Styler.pipe,../reference/api/pandas.io.formats.style.Styler.pipe +generated/pandas.io.formats.style.Styler.render,../reference/api/pandas.io.formats.style.Styler.render +generated/pandas.io.formats.style.Styler.set_caption,../reference/api/pandas.io.formats.style.Styler.set_caption +generated/pandas.io.formats.style.Styler.set_precision,../reference/api/pandas.io.formats.style.Styler.set_precision +generated/pandas.io.formats.style.Styler.set_properties,../reference/api/pandas.io.formats.style.Styler.set_properties +generated/pandas.io.formats.style.Styler.set_table_attributes,../reference/api/pandas.io.formats.style.Styler.set_table_attributes +generated/pandas.io.formats.style.Styler.set_table_styles,../reference/api/pandas.io.formats.style.Styler.set_table_styles +generated/pandas.io.formats.style.Styler.set_uuid,../reference/api/pandas.io.formats.style.Styler.set_uuid +generated/pandas.io.formats.style.Styler.template,../reference/api/pandas.io.formats.style.Styler.template +generated/pandas.io.formats.style.Styler.to_excel,../reference/api/pandas.io.formats.style.Styler.to_excel +generated/pandas.io.formats.style.Styler.use,../reference/api/pandas.io.formats.style.Styler.use +generated/pandas.io.formats.style.Styler.where,../reference/api/pandas.io.formats.style.Styler.where +generated/pandas.io.json.build_table_schema,../reference/api/pandas.io.json.build_table_schema +generated/pandas.io.json.json_normalize,../reference/api/pandas.io.json.json_normalize +generated/pandas.io.stata.StataReader.data,../reference/api/pandas.io.stata.StataReader.data +generated/pandas.io.stata.StataReader.data_label,../reference/api/pandas.io.stata.StataReader.data_label +generated/pandas.io.stata.StataReader.value_labels,../reference/api/pandas.io.stata.StataReader.value_labels +generated/pandas.io.stata.StataReader.variable_labels,../reference/api/pandas.io.stata.StataReader.variable_labels +generated/pandas.io.stata.StataWriter.write_file,../reference/api/pandas.io.stata.StataWriter.write_file +generated/pandas.isna,../reference/api/pandas.isna +generated/pandas.isnull,../reference/api/pandas.isnull +generated/pandas.melt,../reference/api/pandas.melt +generated/pandas.merge_asof,../reference/api/pandas.merge_asof +generated/pandas.merge,../reference/api/pandas.merge +generated/pandas.merge_ordered,../reference/api/pandas.merge_ordered +generated/pandas.MultiIndex.codes,../reference/api/pandas.MultiIndex.codes +generated/pandas.MultiIndex.droplevel,../reference/api/pandas.MultiIndex.droplevel +generated/pandas.MultiIndex.from_arrays,../reference/api/pandas.MultiIndex.from_arrays +generated/pandas.MultiIndex.from_frame,../reference/api/pandas.MultiIndex.from_frame +generated/pandas.MultiIndex.from_product,../reference/api/pandas.MultiIndex.from_product +generated/pandas.MultiIndex.from_tuples,../reference/api/pandas.MultiIndex.from_tuples +generated/pandas.MultiIndex.get_indexer,../reference/api/pandas.MultiIndex.get_indexer +generated/pandas.MultiIndex.get_level_values,../reference/api/pandas.MultiIndex.get_level_values +generated/pandas.MultiIndex.get_loc,../reference/api/pandas.MultiIndex.get_loc +generated/pandas.MultiIndex.get_loc_level,../reference/api/pandas.MultiIndex.get_loc_level +generated/pandas.MultiIndex,../reference/api/pandas.MultiIndex +generated/pandas.MultiIndex.is_lexsorted,../reference/api/pandas.MultiIndex.is_lexsorted +generated/pandas.MultiIndex.levels,../reference/api/pandas.MultiIndex.levels +generated/pandas.MultiIndex.levshape,../reference/api/pandas.MultiIndex.levshape +generated/pandas.MultiIndex.names,../reference/api/pandas.MultiIndex.names +generated/pandas.MultiIndex.nlevels,../reference/api/pandas.MultiIndex.nlevels +generated/pandas.MultiIndex.remove_unused_levels,../reference/api/pandas.MultiIndex.remove_unused_levels +generated/pandas.MultiIndex.reorder_levels,../reference/api/pandas.MultiIndex.reorder_levels +generated/pandas.MultiIndex.set_codes,../reference/api/pandas.MultiIndex.set_codes +generated/pandas.MultiIndex.set_levels,../reference/api/pandas.MultiIndex.set_levels +generated/pandas.MultiIndex.sortlevel,../reference/api/pandas.MultiIndex.sortlevel +generated/pandas.MultiIndex.swaplevel,../reference/api/pandas.MultiIndex.swaplevel +generated/pandas.MultiIndex.to_flat_index,../reference/api/pandas.MultiIndex.to_flat_index +generated/pandas.MultiIndex.to_frame,../reference/api/pandas.MultiIndex.to_frame +generated/pandas.MultiIndex.to_hierarchical,../reference/api/pandas.MultiIndex.to_hierarchical +generated/pandas.notna,../reference/api/pandas.notna +generated/pandas.notnull,../reference/api/pandas.notnull +generated/pandas.option_context,../reference/api/pandas.option_context +generated/pandas.Panel.abs,../reference/api/pandas.Panel.abs +generated/pandas.Panel.add,../reference/api/pandas.Panel.add +generated/pandas.Panel.add_prefix,../reference/api/pandas.Panel.add_prefix +generated/pandas.Panel.add_suffix,../reference/api/pandas.Panel.add_suffix +generated/pandas.Panel.agg,../reference/api/pandas.Panel.agg +generated/pandas.Panel.aggregate,../reference/api/pandas.Panel.aggregate +generated/pandas.Panel.align,../reference/api/pandas.Panel.align +generated/pandas.Panel.all,../reference/api/pandas.Panel.all +generated/pandas.Panel.any,../reference/api/pandas.Panel.any +generated/pandas.Panel.apply,../reference/api/pandas.Panel.apply +generated/pandas.Panel.as_blocks,../reference/api/pandas.Panel.as_blocks +generated/pandas.Panel.asfreq,../reference/api/pandas.Panel.asfreq +generated/pandas.Panel.as_matrix,../reference/api/pandas.Panel.as_matrix +generated/pandas.Panel.asof,../reference/api/pandas.Panel.asof +generated/pandas.Panel.astype,../reference/api/pandas.Panel.astype +generated/pandas.Panel.at,../reference/api/pandas.Panel.at +generated/pandas.Panel.at_time,../reference/api/pandas.Panel.at_time +generated/pandas.Panel.axes,../reference/api/pandas.Panel.axes +generated/pandas.Panel.between_time,../reference/api/pandas.Panel.between_time +generated/pandas.Panel.bfill,../reference/api/pandas.Panel.bfill +generated/pandas.Panel.blocks,../reference/api/pandas.Panel.blocks +generated/pandas.Panel.bool,../reference/api/pandas.Panel.bool +generated/pandas.Panel.clip,../reference/api/pandas.Panel.clip +generated/pandas.Panel.clip_lower,../reference/api/pandas.Panel.clip_lower +generated/pandas.Panel.clip_upper,../reference/api/pandas.Panel.clip_upper +generated/pandas.Panel.compound,../reference/api/pandas.Panel.compound +generated/pandas.Panel.conform,../reference/api/pandas.Panel.conform +generated/pandas.Panel.convert_objects,../reference/api/pandas.Panel.convert_objects +generated/pandas.Panel.copy,../reference/api/pandas.Panel.copy +generated/pandas.Panel.count,../reference/api/pandas.Panel.count +generated/pandas.Panel.cummax,../reference/api/pandas.Panel.cummax +generated/pandas.Panel.cummin,../reference/api/pandas.Panel.cummin +generated/pandas.Panel.cumprod,../reference/api/pandas.Panel.cumprod +generated/pandas.Panel.cumsum,../reference/api/pandas.Panel.cumsum +generated/pandas.Panel.describe,../reference/api/pandas.Panel.describe +generated/pandas.Panel.div,../reference/api/pandas.Panel.div +generated/pandas.Panel.divide,../reference/api/pandas.Panel.divide +generated/pandas.Panel.drop,../reference/api/pandas.Panel.drop +generated/pandas.Panel.droplevel,../reference/api/pandas.Panel.droplevel +generated/pandas.Panel.dropna,../reference/api/pandas.Panel.dropna +generated/pandas.Panel.dtypes,../reference/api/pandas.Panel.dtypes +generated/pandas.Panel.empty,../reference/api/pandas.Panel.empty +generated/pandas.Panel.eq,../reference/api/pandas.Panel.eq +generated/pandas.Panel.equals,../reference/api/pandas.Panel.equals +generated/pandas.Panel.ffill,../reference/api/pandas.Panel.ffill +generated/pandas.Panel.fillna,../reference/api/pandas.Panel.fillna +generated/pandas.Panel.filter,../reference/api/pandas.Panel.filter +generated/pandas.Panel.first,../reference/api/pandas.Panel.first +generated/pandas.Panel.first_valid_index,../reference/api/pandas.Panel.first_valid_index +generated/pandas.Panel.floordiv,../reference/api/pandas.Panel.floordiv +generated/pandas.Panel.from_dict,../reference/api/pandas.Panel.from_dict +generated/pandas.Panel.fromDict,../reference/api/pandas.Panel.fromDict +generated/pandas.Panel.ftypes,../reference/api/pandas.Panel.ftypes +generated/pandas.Panel.ge,../reference/api/pandas.Panel.ge +generated/pandas.Panel.get_dtype_counts,../reference/api/pandas.Panel.get_dtype_counts +generated/pandas.Panel.get_ftype_counts,../reference/api/pandas.Panel.get_ftype_counts +generated/pandas.Panel.get,../reference/api/pandas.Panel.get +generated/pandas.Panel.get_value,../reference/api/pandas.Panel.get_value +generated/pandas.Panel.get_values,../reference/api/pandas.Panel.get_values +generated/pandas.Panel.groupby,../reference/api/pandas.Panel.groupby +generated/pandas.Panel.gt,../reference/api/pandas.Panel.gt +generated/pandas.Panel.head,../reference/api/pandas.Panel.head +generated/pandas.Panel,../reference/api/pandas.Panel +generated/pandas.Panel.iat,../reference/api/pandas.Panel.iat +generated/pandas.Panel.iloc,../reference/api/pandas.Panel.iloc +generated/pandas.Panel.infer_objects,../reference/api/pandas.Panel.infer_objects +generated/pandas.Panel.interpolate,../reference/api/pandas.Panel.interpolate +generated/pandas.Panel.is_copy,../reference/api/pandas.Panel.is_copy +generated/pandas.Panel.isna,../reference/api/pandas.Panel.isna +generated/pandas.Panel.isnull,../reference/api/pandas.Panel.isnull +generated/pandas.Panel.items,../reference/api/pandas.Panel.items +generated/pandas.Panel.__iter__,../reference/api/pandas.Panel.__iter__ +generated/pandas.Panel.iteritems,../reference/api/pandas.Panel.iteritems +generated/pandas.Panel.ix,../reference/api/pandas.Panel.ix +generated/pandas.Panel.join,../reference/api/pandas.Panel.join +generated/pandas.Panel.keys,../reference/api/pandas.Panel.keys +generated/pandas.Panel.kurt,../reference/api/pandas.Panel.kurt +generated/pandas.Panel.kurtosis,../reference/api/pandas.Panel.kurtosis +generated/pandas.Panel.last,../reference/api/pandas.Panel.last +generated/pandas.Panel.last_valid_index,../reference/api/pandas.Panel.last_valid_index +generated/pandas.Panel.le,../reference/api/pandas.Panel.le +generated/pandas.Panel.loc,../reference/api/pandas.Panel.loc +generated/pandas.Panel.lt,../reference/api/pandas.Panel.lt +generated/pandas.Panel.mad,../reference/api/pandas.Panel.mad +generated/pandas.Panel.major_axis,../reference/api/pandas.Panel.major_axis +generated/pandas.Panel.major_xs,../reference/api/pandas.Panel.major_xs +generated/pandas.Panel.mask,../reference/api/pandas.Panel.mask +generated/pandas.Panel.max,../reference/api/pandas.Panel.max +generated/pandas.Panel.mean,../reference/api/pandas.Panel.mean +generated/pandas.Panel.median,../reference/api/pandas.Panel.median +generated/pandas.Panel.min,../reference/api/pandas.Panel.min +generated/pandas.Panel.minor_axis,../reference/api/pandas.Panel.minor_axis +generated/pandas.Panel.minor_xs,../reference/api/pandas.Panel.minor_xs +generated/pandas.Panel.mod,../reference/api/pandas.Panel.mod +generated/pandas.Panel.mul,../reference/api/pandas.Panel.mul +generated/pandas.Panel.multiply,../reference/api/pandas.Panel.multiply +generated/pandas.Panel.ndim,../reference/api/pandas.Panel.ndim +generated/pandas.Panel.ne,../reference/api/pandas.Panel.ne +generated/pandas.Panel.notna,../reference/api/pandas.Panel.notna +generated/pandas.Panel.notnull,../reference/api/pandas.Panel.notnull +generated/pandas.Panel.pct_change,../reference/api/pandas.Panel.pct_change +generated/pandas.Panel.pipe,../reference/api/pandas.Panel.pipe +generated/pandas.Panel.pop,../reference/api/pandas.Panel.pop +generated/pandas.Panel.pow,../reference/api/pandas.Panel.pow +generated/pandas.Panel.prod,../reference/api/pandas.Panel.prod +generated/pandas.Panel.product,../reference/api/pandas.Panel.product +generated/pandas.Panel.radd,../reference/api/pandas.Panel.radd +generated/pandas.Panel.rank,../reference/api/pandas.Panel.rank +generated/pandas.Panel.rdiv,../reference/api/pandas.Panel.rdiv +generated/pandas.Panel.reindex_axis,../reference/api/pandas.Panel.reindex_axis +generated/pandas.Panel.reindex,../reference/api/pandas.Panel.reindex +generated/pandas.Panel.reindex_like,../reference/api/pandas.Panel.reindex_like +generated/pandas.Panel.rename_axis,../reference/api/pandas.Panel.rename_axis +generated/pandas.Panel.rename,../reference/api/pandas.Panel.rename +generated/pandas.Panel.replace,../reference/api/pandas.Panel.replace +generated/pandas.Panel.resample,../reference/api/pandas.Panel.resample +generated/pandas.Panel.rfloordiv,../reference/api/pandas.Panel.rfloordiv +generated/pandas.Panel.rmod,../reference/api/pandas.Panel.rmod +generated/pandas.Panel.rmul,../reference/api/pandas.Panel.rmul +generated/pandas.Panel.round,../reference/api/pandas.Panel.round +generated/pandas.Panel.rpow,../reference/api/pandas.Panel.rpow +generated/pandas.Panel.rsub,../reference/api/pandas.Panel.rsub +generated/pandas.Panel.rtruediv,../reference/api/pandas.Panel.rtruediv +generated/pandas.Panel.sample,../reference/api/pandas.Panel.sample +generated/pandas.Panel.select,../reference/api/pandas.Panel.select +generated/pandas.Panel.sem,../reference/api/pandas.Panel.sem +generated/pandas.Panel.set_axis,../reference/api/pandas.Panel.set_axis +generated/pandas.Panel.set_value,../reference/api/pandas.Panel.set_value +generated/pandas.Panel.shape,../reference/api/pandas.Panel.shape +generated/pandas.Panel.shift,../reference/api/pandas.Panel.shift +generated/pandas.Panel.size,../reference/api/pandas.Panel.size +generated/pandas.Panel.skew,../reference/api/pandas.Panel.skew +generated/pandas.Panel.slice_shift,../reference/api/pandas.Panel.slice_shift +generated/pandas.Panel.sort_index,../reference/api/pandas.Panel.sort_index +generated/pandas.Panel.sort_values,../reference/api/pandas.Panel.sort_values +generated/pandas.Panel.squeeze,../reference/api/pandas.Panel.squeeze +generated/pandas.Panel.std,../reference/api/pandas.Panel.std +generated/pandas.Panel.sub,../reference/api/pandas.Panel.sub +generated/pandas.Panel.subtract,../reference/api/pandas.Panel.subtract +generated/pandas.Panel.sum,../reference/api/pandas.Panel.sum +generated/pandas.Panel.swapaxes,../reference/api/pandas.Panel.swapaxes +generated/pandas.Panel.swaplevel,../reference/api/pandas.Panel.swaplevel +generated/pandas.Panel.tail,../reference/api/pandas.Panel.tail +generated/pandas.Panel.take,../reference/api/pandas.Panel.take +generated/pandas.Panel.timetuple,../reference/api/pandas.Panel.timetuple +generated/pandas.Panel.to_clipboard,../reference/api/pandas.Panel.to_clipboard +generated/pandas.Panel.to_csv,../reference/api/pandas.Panel.to_csv +generated/pandas.Panel.to_dense,../reference/api/pandas.Panel.to_dense +generated/pandas.Panel.to_excel,../reference/api/pandas.Panel.to_excel +generated/pandas.Panel.to_frame,../reference/api/pandas.Panel.to_frame +generated/pandas.Panel.to_hdf,../reference/api/pandas.Panel.to_hdf +generated/pandas.Panel.to_json,../reference/api/pandas.Panel.to_json +generated/pandas.Panel.to_latex,../reference/api/pandas.Panel.to_latex +generated/pandas.Panel.to_msgpack,../reference/api/pandas.Panel.to_msgpack +generated/pandas.Panel.to_pickle,../reference/api/pandas.Panel.to_pickle +generated/pandas.Panel.to_sparse,../reference/api/pandas.Panel.to_sparse +generated/pandas.Panel.to_sql,../reference/api/pandas.Panel.to_sql +generated/pandas.Panel.to_xarray,../reference/api/pandas.Panel.to_xarray +generated/pandas.Panel.transform,../reference/api/pandas.Panel.transform +generated/pandas.Panel.transpose,../reference/api/pandas.Panel.transpose +generated/pandas.Panel.truediv,../reference/api/pandas.Panel.truediv +generated/pandas.Panel.truncate,../reference/api/pandas.Panel.truncate +generated/pandas.Panel.tshift,../reference/api/pandas.Panel.tshift +generated/pandas.Panel.tz_convert,../reference/api/pandas.Panel.tz_convert +generated/pandas.Panel.tz_localize,../reference/api/pandas.Panel.tz_localize +generated/pandas.Panel.update,../reference/api/pandas.Panel.update +generated/pandas.Panel.values,../reference/api/pandas.Panel.values +generated/pandas.Panel.var,../reference/api/pandas.Panel.var +generated/pandas.Panel.where,../reference/api/pandas.Panel.where +generated/pandas.Panel.xs,../reference/api/pandas.Panel.xs +generated/pandas.Period.asfreq,../reference/api/pandas.Period.asfreq +generated/pandas.Period.day,../reference/api/pandas.Period.day +generated/pandas.Period.dayofweek,../reference/api/pandas.Period.dayofweek +generated/pandas.Period.dayofyear,../reference/api/pandas.Period.dayofyear +generated/pandas.Period.days_in_month,../reference/api/pandas.Period.days_in_month +generated/pandas.Period.daysinmonth,../reference/api/pandas.Period.daysinmonth +generated/pandas.Period.end_time,../reference/api/pandas.Period.end_time +generated/pandas.Period.freq,../reference/api/pandas.Period.freq +generated/pandas.Period.freqstr,../reference/api/pandas.Period.freqstr +generated/pandas.Period.hour,../reference/api/pandas.Period.hour +generated/pandas.Period,../reference/api/pandas.Period +generated/pandas.PeriodIndex.asfreq,../reference/api/pandas.PeriodIndex.asfreq +generated/pandas.PeriodIndex.day,../reference/api/pandas.PeriodIndex.day +generated/pandas.PeriodIndex.dayofweek,../reference/api/pandas.PeriodIndex.dayofweek +generated/pandas.PeriodIndex.dayofyear,../reference/api/pandas.PeriodIndex.dayofyear +generated/pandas.PeriodIndex.days_in_month,../reference/api/pandas.PeriodIndex.days_in_month +generated/pandas.PeriodIndex.daysinmonth,../reference/api/pandas.PeriodIndex.daysinmonth +generated/pandas.PeriodIndex.end_time,../reference/api/pandas.PeriodIndex.end_time +generated/pandas.PeriodIndex.freq,../reference/api/pandas.PeriodIndex.freq +generated/pandas.PeriodIndex.freqstr,../reference/api/pandas.PeriodIndex.freqstr +generated/pandas.PeriodIndex.hour,../reference/api/pandas.PeriodIndex.hour +generated/pandas.PeriodIndex,../reference/api/pandas.PeriodIndex +generated/pandas.PeriodIndex.is_leap_year,../reference/api/pandas.PeriodIndex.is_leap_year +generated/pandas.PeriodIndex.minute,../reference/api/pandas.PeriodIndex.minute +generated/pandas.PeriodIndex.month,../reference/api/pandas.PeriodIndex.month +generated/pandas.PeriodIndex.quarter,../reference/api/pandas.PeriodIndex.quarter +generated/pandas.PeriodIndex.qyear,../reference/api/pandas.PeriodIndex.qyear +generated/pandas.PeriodIndex.second,../reference/api/pandas.PeriodIndex.second +generated/pandas.PeriodIndex.start_time,../reference/api/pandas.PeriodIndex.start_time +generated/pandas.PeriodIndex.strftime,../reference/api/pandas.PeriodIndex.strftime +generated/pandas.PeriodIndex.to_timestamp,../reference/api/pandas.PeriodIndex.to_timestamp +generated/pandas.PeriodIndex.weekday,../reference/api/pandas.PeriodIndex.weekday +generated/pandas.PeriodIndex.week,../reference/api/pandas.PeriodIndex.week +generated/pandas.PeriodIndex.weekofyear,../reference/api/pandas.PeriodIndex.weekofyear +generated/pandas.PeriodIndex.year,../reference/api/pandas.PeriodIndex.year +generated/pandas.Period.is_leap_year,../reference/api/pandas.Period.is_leap_year +generated/pandas.Period.minute,../reference/api/pandas.Period.minute +generated/pandas.Period.month,../reference/api/pandas.Period.month +generated/pandas.Period.now,../reference/api/pandas.Period.now +generated/pandas.Period.ordinal,../reference/api/pandas.Period.ordinal +generated/pandas.Period.quarter,../reference/api/pandas.Period.quarter +generated/pandas.Period.qyear,../reference/api/pandas.Period.qyear +generated/pandas.period_range,../reference/api/pandas.period_range +generated/pandas.Period.second,../reference/api/pandas.Period.second +generated/pandas.Period.start_time,../reference/api/pandas.Period.start_time +generated/pandas.Period.strftime,../reference/api/pandas.Period.strftime +generated/pandas.Period.to_timestamp,../reference/api/pandas.Period.to_timestamp +generated/pandas.Period.weekday,../reference/api/pandas.Period.weekday +generated/pandas.Period.week,../reference/api/pandas.Period.week +generated/pandas.Period.weekofyear,../reference/api/pandas.Period.weekofyear +generated/pandas.Period.year,../reference/api/pandas.Period.year +generated/pandas.pivot,../reference/api/pandas.pivot +generated/pandas.pivot_table,../reference/api/pandas.pivot_table +generated/pandas.plotting.andrews_curves,../reference/api/pandas.plotting.andrews_curves +generated/pandas.plotting.bootstrap_plot,../reference/api/pandas.plotting.bootstrap_plot +generated/pandas.plotting.deregister_matplotlib_converters,../reference/api/pandas.plotting.deregister_matplotlib_converters +generated/pandas.plotting.lag_plot,../reference/api/pandas.plotting.lag_plot +generated/pandas.plotting.parallel_coordinates,../reference/api/pandas.plotting.parallel_coordinates +generated/pandas.plotting.radviz,../reference/api/pandas.plotting.radviz +generated/pandas.plotting.register_matplotlib_converters,../reference/api/pandas.plotting.register_matplotlib_converters +generated/pandas.plotting.scatter_matrix,../reference/api/pandas.plotting.scatter_matrix +generated/pandas.qcut,../reference/api/pandas.qcut +generated/pandas.RangeIndex.from_range,../reference/api/pandas.RangeIndex.from_range +generated/pandas.RangeIndex,../reference/api/pandas.RangeIndex +generated/pandas.read_clipboard,../reference/api/pandas.read_clipboard +generated/pandas.read_csv,../reference/api/pandas.read_csv +generated/pandas.read_excel,../reference/api/pandas.read_excel +generated/pandas.read_feather,../reference/api/pandas.read_feather +generated/pandas.read_fwf,../reference/api/pandas.read_fwf +generated/pandas.read_gbq,../reference/api/pandas.read_gbq +generated/pandas.read_hdf,../reference/api/pandas.read_hdf +generated/pandas.read,../reference/api/pandas.read +generated/pandas.read_json,../reference/api/pandas.read_json +generated/pandas.read_msgpack,../reference/api/pandas.read_msgpack +generated/pandas.read_parquet,../reference/api/pandas.read_parquet +generated/pandas.read_pickle,../reference/api/pandas.read_pickle +generated/pandas.read_sas,../reference/api/pandas.read_sas +generated/pandas.read_sql,../reference/api/pandas.read_sql +generated/pandas.read_sql_query,../reference/api/pandas.read_sql_query +generated/pandas.read_sql_table,../reference/api/pandas.read_sql_table +generated/pandas.read_stata,../reference/api/pandas.read_stata +generated/pandas.read_table,../reference/api/pandas.read_table +generated/pandas.reset_option,../reference/api/pandas.reset_option +generated/pandas.Series.abs,../reference/api/pandas.Series.abs +generated/pandas.Series.add,../reference/api/pandas.Series.add +generated/pandas.Series.add_prefix,../reference/api/pandas.Series.add_prefix +generated/pandas.Series.add_suffix,../reference/api/pandas.Series.add_suffix +generated/pandas.Series.agg,../reference/api/pandas.Series.agg +generated/pandas.Series.aggregate,../reference/api/pandas.Series.aggregate +generated/pandas.Series.align,../reference/api/pandas.Series.align +generated/pandas.Series.all,../reference/api/pandas.Series.all +generated/pandas.Series.any,../reference/api/pandas.Series.any +generated/pandas.Series.append,../reference/api/pandas.Series.append +generated/pandas.Series.apply,../reference/api/pandas.Series.apply +generated/pandas.Series.argmax,../reference/api/pandas.Series.argmax +generated/pandas.Series.argmin,../reference/api/pandas.Series.argmin +generated/pandas.Series.argsort,../reference/api/pandas.Series.argsort +generated/pandas.Series.__array__,../reference/api/pandas.Series.__array__ +generated/pandas.Series.array,../reference/api/pandas.Series.array +generated/pandas.Series.as_blocks,../reference/api/pandas.Series.as_blocks +generated/pandas.Series.asfreq,../reference/api/pandas.Series.asfreq +generated/pandas.Series.as_matrix,../reference/api/pandas.Series.as_matrix +generated/pandas.Series.asobject,../reference/api/pandas.Series.asobject +generated/pandas.Series.asof,../reference/api/pandas.Series.asof +generated/pandas.Series.astype,../reference/api/pandas.Series.astype +generated/pandas.Series.at,../reference/api/pandas.Series.at +generated/pandas.Series.at_time,../reference/api/pandas.Series.at_time +generated/pandas.Series.autocorr,../reference/api/pandas.Series.autocorr +generated/pandas.Series.axes,../reference/api/pandas.Series.axes +generated/pandas.Series.base,../reference/api/pandas.Series.base +generated/pandas.Series.between,../reference/api/pandas.Series.between +generated/pandas.Series.between_time,../reference/api/pandas.Series.between_time +generated/pandas.Series.bfill,../reference/api/pandas.Series.bfill +generated/pandas.Series.blocks,../reference/api/pandas.Series.blocks +generated/pandas.Series.bool,../reference/api/pandas.Series.bool +generated/pandas.Series.cat.add_categories,../reference/api/pandas.Series.cat.add_categories +generated/pandas.Series.cat.as_ordered,../reference/api/pandas.Series.cat.as_ordered +generated/pandas.Series.cat.as_unordered,../reference/api/pandas.Series.cat.as_unordered +generated/pandas.Series.cat.categories,../reference/api/pandas.Series.cat.categories +generated/pandas.Series.cat.codes,../reference/api/pandas.Series.cat.codes +generated/pandas.Series.cat,../reference/api/pandas.Series.cat +generated/pandas.Series.cat.ordered,../reference/api/pandas.Series.cat.ordered +generated/pandas.Series.cat.remove_categories,../reference/api/pandas.Series.cat.remove_categories +generated/pandas.Series.cat.remove_unused_categories,../reference/api/pandas.Series.cat.remove_unused_categories +generated/pandas.Series.cat.rename_categories,../reference/api/pandas.Series.cat.rename_categories +generated/pandas.Series.cat.reorder_categories,../reference/api/pandas.Series.cat.reorder_categories +generated/pandas.Series.cat.set_categories,../reference/api/pandas.Series.cat.set_categories +generated/pandas.Series.clip,../reference/api/pandas.Series.clip +generated/pandas.Series.clip_lower,../reference/api/pandas.Series.clip_lower +generated/pandas.Series.clip_upper,../reference/api/pandas.Series.clip_upper +generated/pandas.Series.combine_first,../reference/api/pandas.Series.combine_first +generated/pandas.Series.combine,../reference/api/pandas.Series.combine +generated/pandas.Series.compound,../reference/api/pandas.Series.compound +generated/pandas.Series.compress,../reference/api/pandas.Series.compress +generated/pandas.Series.convert_objects,../reference/api/pandas.Series.convert_objects +generated/pandas.Series.copy,../reference/api/pandas.Series.copy +generated/pandas.Series.corr,../reference/api/pandas.Series.corr +generated/pandas.Series.count,../reference/api/pandas.Series.count +generated/pandas.Series.cov,../reference/api/pandas.Series.cov +generated/pandas.Series.cummax,../reference/api/pandas.Series.cummax +generated/pandas.Series.cummin,../reference/api/pandas.Series.cummin +generated/pandas.Series.cumprod,../reference/api/pandas.Series.cumprod +generated/pandas.Series.cumsum,../reference/api/pandas.Series.cumsum +generated/pandas.Series.data,../reference/api/pandas.Series.data +generated/pandas.Series.describe,../reference/api/pandas.Series.describe +generated/pandas.Series.diff,../reference/api/pandas.Series.diff +generated/pandas.Series.div,../reference/api/pandas.Series.div +generated/pandas.Series.divide,../reference/api/pandas.Series.divide +generated/pandas.Series.divmod,../reference/api/pandas.Series.divmod +generated/pandas.Series.dot,../reference/api/pandas.Series.dot +generated/pandas.Series.drop_duplicates,../reference/api/pandas.Series.drop_duplicates +generated/pandas.Series.drop,../reference/api/pandas.Series.drop +generated/pandas.Series.droplevel,../reference/api/pandas.Series.droplevel +generated/pandas.Series.dropna,../reference/api/pandas.Series.dropna +generated/pandas.Series.dt.ceil,../reference/api/pandas.Series.dt.ceil +generated/pandas.Series.dt.components,../reference/api/pandas.Series.dt.components +generated/pandas.Series.dt.date,../reference/api/pandas.Series.dt.date +generated/pandas.Series.dt.day,../reference/api/pandas.Series.dt.day +generated/pandas.Series.dt.day_name,../reference/api/pandas.Series.dt.day_name +generated/pandas.Series.dt.dayofweek,../reference/api/pandas.Series.dt.dayofweek +generated/pandas.Series.dt.dayofyear,../reference/api/pandas.Series.dt.dayofyear +generated/pandas.Series.dt.days,../reference/api/pandas.Series.dt.days +generated/pandas.Series.dt.days_in_month,../reference/api/pandas.Series.dt.days_in_month +generated/pandas.Series.dt.daysinmonth,../reference/api/pandas.Series.dt.daysinmonth +generated/pandas.Series.dt.end_time,../reference/api/pandas.Series.dt.end_time +generated/pandas.Series.dt.floor,../reference/api/pandas.Series.dt.floor +generated/pandas.Series.dt.freq,../reference/api/pandas.Series.dt.freq +generated/pandas.Series.dt.hour,../reference/api/pandas.Series.dt.hour +generated/pandas.Series.dt,../reference/api/pandas.Series.dt +generated/pandas.Series.dt.is_leap_year,../reference/api/pandas.Series.dt.is_leap_year +generated/pandas.Series.dt.is_month_end,../reference/api/pandas.Series.dt.is_month_end +generated/pandas.Series.dt.is_month_start,../reference/api/pandas.Series.dt.is_month_start +generated/pandas.Series.dt.is_quarter_end,../reference/api/pandas.Series.dt.is_quarter_end +generated/pandas.Series.dt.is_quarter_start,../reference/api/pandas.Series.dt.is_quarter_start +generated/pandas.Series.dt.is_year_end,../reference/api/pandas.Series.dt.is_year_end +generated/pandas.Series.dt.is_year_start,../reference/api/pandas.Series.dt.is_year_start +generated/pandas.Series.dt.microsecond,../reference/api/pandas.Series.dt.microsecond +generated/pandas.Series.dt.microseconds,../reference/api/pandas.Series.dt.microseconds +generated/pandas.Series.dt.minute,../reference/api/pandas.Series.dt.minute +generated/pandas.Series.dt.month,../reference/api/pandas.Series.dt.month +generated/pandas.Series.dt.month_name,../reference/api/pandas.Series.dt.month_name +generated/pandas.Series.dt.nanosecond,../reference/api/pandas.Series.dt.nanosecond +generated/pandas.Series.dt.nanoseconds,../reference/api/pandas.Series.dt.nanoseconds +generated/pandas.Series.dt.normalize,../reference/api/pandas.Series.dt.normalize +generated/pandas.Series.dt.quarter,../reference/api/pandas.Series.dt.quarter +generated/pandas.Series.dt.qyear,../reference/api/pandas.Series.dt.qyear +generated/pandas.Series.dt.round,../reference/api/pandas.Series.dt.round +generated/pandas.Series.dt.second,../reference/api/pandas.Series.dt.second +generated/pandas.Series.dt.seconds,../reference/api/pandas.Series.dt.seconds +generated/pandas.Series.dt.start_time,../reference/api/pandas.Series.dt.start_time +generated/pandas.Series.dt.strftime,../reference/api/pandas.Series.dt.strftime +generated/pandas.Series.dt.time,../reference/api/pandas.Series.dt.time +generated/pandas.Series.dt.timetz,../reference/api/pandas.Series.dt.timetz +generated/pandas.Series.dt.to_period,../reference/api/pandas.Series.dt.to_period +generated/pandas.Series.dt.to_pydatetime,../reference/api/pandas.Series.dt.to_pydatetime +generated/pandas.Series.dt.to_pytimedelta,../reference/api/pandas.Series.dt.to_pytimedelta +generated/pandas.Series.dt.total_seconds,../reference/api/pandas.Series.dt.total_seconds +generated/pandas.Series.dt.tz_convert,../reference/api/pandas.Series.dt.tz_convert +generated/pandas.Series.dt.tz,../reference/api/pandas.Series.dt.tz +generated/pandas.Series.dt.tz_localize,../reference/api/pandas.Series.dt.tz_localize +generated/pandas.Series.dt.weekday,../reference/api/pandas.Series.dt.weekday +generated/pandas.Series.dt.week,../reference/api/pandas.Series.dt.week +generated/pandas.Series.dt.weekofyear,../reference/api/pandas.Series.dt.weekofyear +generated/pandas.Series.dt.year,../reference/api/pandas.Series.dt.year +generated/pandas.Series.dtype,../reference/api/pandas.Series.dtype +generated/pandas.Series.dtypes,../reference/api/pandas.Series.dtypes +generated/pandas.Series.duplicated,../reference/api/pandas.Series.duplicated +generated/pandas.Series.empty,../reference/api/pandas.Series.empty +generated/pandas.Series.eq,../reference/api/pandas.Series.eq +generated/pandas.Series.equals,../reference/api/pandas.Series.equals +generated/pandas.Series.ewm,../reference/api/pandas.Series.ewm +generated/pandas.Series.expanding,../reference/api/pandas.Series.expanding +generated/pandas.Series.factorize,../reference/api/pandas.Series.factorize +generated/pandas.Series.ffill,../reference/api/pandas.Series.ffill +generated/pandas.Series.fillna,../reference/api/pandas.Series.fillna +generated/pandas.Series.filter,../reference/api/pandas.Series.filter +generated/pandas.Series.first,../reference/api/pandas.Series.first +generated/pandas.Series.first_valid_index,../reference/api/pandas.Series.first_valid_index +generated/pandas.Series.flags,../reference/api/pandas.Series.flags +generated/pandas.Series.floordiv,../reference/api/pandas.Series.floordiv +generated/pandas.Series.from_array,../reference/api/pandas.Series.from_array +generated/pandas.Series.from_csv,../reference/api/pandas.Series.from_csv +generated/pandas.Series.ftype,../reference/api/pandas.Series.ftype +generated/pandas.Series.ftypes,../reference/api/pandas.Series.ftypes +generated/pandas.Series.ge,../reference/api/pandas.Series.ge +generated/pandas.Series.get_dtype_counts,../reference/api/pandas.Series.get_dtype_counts +generated/pandas.Series.get_ftype_counts,../reference/api/pandas.Series.get_ftype_counts +generated/pandas.Series.get,../reference/api/pandas.Series.get +generated/pandas.Series.get_value,../reference/api/pandas.Series.get_value +generated/pandas.Series.get_values,../reference/api/pandas.Series.get_values +generated/pandas.Series.groupby,../reference/api/pandas.Series.groupby +generated/pandas.Series.gt,../reference/api/pandas.Series.gt +generated/pandas.Series.hasnans,../reference/api/pandas.Series.hasnans +generated/pandas.Series.head,../reference/api/pandas.Series.head +generated/pandas.Series.hist,../reference/api/pandas.Series.hist +generated/pandas.Series,../reference/api/pandas.Series +generated/pandas.Series.iat,../reference/api/pandas.Series.iat +generated/pandas.Series.idxmax,../reference/api/pandas.Series.idxmax +generated/pandas.Series.idxmin,../reference/api/pandas.Series.idxmin +generated/pandas.Series.iloc,../reference/api/pandas.Series.iloc +generated/pandas.Series.imag,../reference/api/pandas.Series.imag +generated/pandas.Series.index,../reference/api/pandas.Series.index +generated/pandas.Series.infer_objects,../reference/api/pandas.Series.infer_objects +generated/pandas.Series.interpolate,../reference/api/pandas.Series.interpolate +generated/pandas.Series.is_copy,../reference/api/pandas.Series.is_copy +generated/pandas.Series.isin,../reference/api/pandas.Series.isin +generated/pandas.Series.is_monotonic_decreasing,../reference/api/pandas.Series.is_monotonic_decreasing +generated/pandas.Series.is_monotonic,../reference/api/pandas.Series.is_monotonic +generated/pandas.Series.is_monotonic_increasing,../reference/api/pandas.Series.is_monotonic_increasing +generated/pandas.Series.isna,../reference/api/pandas.Series.isna +generated/pandas.Series.isnull,../reference/api/pandas.Series.isnull +generated/pandas.Series.is_unique,../reference/api/pandas.Series.is_unique +generated/pandas.Series.item,../reference/api/pandas.Series.item +generated/pandas.Series.items,../reference/api/pandas.Series.items +generated/pandas.Series.itemsize,../reference/api/pandas.Series.itemsize +generated/pandas.Series.__iter__,../reference/api/pandas.Series.__iter__ +generated/pandas.Series.iteritems,../reference/api/pandas.Series.iteritems +generated/pandas.Series.ix,../reference/api/pandas.Series.ix +generated/pandas.Series.keys,../reference/api/pandas.Series.keys +generated/pandas.Series.kurt,../reference/api/pandas.Series.kurt +generated/pandas.Series.kurtosis,../reference/api/pandas.Series.kurtosis +generated/pandas.Series.last,../reference/api/pandas.Series.last +generated/pandas.Series.last_valid_index,../reference/api/pandas.Series.last_valid_index +generated/pandas.Series.le,../reference/api/pandas.Series.le +generated/pandas.Series.loc,../reference/api/pandas.Series.loc +generated/pandas.Series.lt,../reference/api/pandas.Series.lt +generated/pandas.Series.mad,../reference/api/pandas.Series.mad +generated/pandas.Series.map,../reference/api/pandas.Series.map +generated/pandas.Series.mask,../reference/api/pandas.Series.mask +generated/pandas.Series.max,../reference/api/pandas.Series.max +generated/pandas.Series.mean,../reference/api/pandas.Series.mean +generated/pandas.Series.median,../reference/api/pandas.Series.median +generated/pandas.Series.memory_usage,../reference/api/pandas.Series.memory_usage +generated/pandas.Series.min,../reference/api/pandas.Series.min +generated/pandas.Series.mode,../reference/api/pandas.Series.mode +generated/pandas.Series.mod,../reference/api/pandas.Series.mod +generated/pandas.Series.mul,../reference/api/pandas.Series.mul +generated/pandas.Series.multiply,../reference/api/pandas.Series.multiply +generated/pandas.Series.name,../reference/api/pandas.Series.name +generated/pandas.Series.nbytes,../reference/api/pandas.Series.nbytes +generated/pandas.Series.ndim,../reference/api/pandas.Series.ndim +generated/pandas.Series.ne,../reference/api/pandas.Series.ne +generated/pandas.Series.nlargest,../reference/api/pandas.Series.nlargest +generated/pandas.Series.nonzero,../reference/api/pandas.Series.nonzero +generated/pandas.Series.notna,../reference/api/pandas.Series.notna +generated/pandas.Series.notnull,../reference/api/pandas.Series.notnull +generated/pandas.Series.nsmallest,../reference/api/pandas.Series.nsmallest +generated/pandas.Series.nunique,../reference/api/pandas.Series.nunique +generated/pandas.Series.pct_change,../reference/api/pandas.Series.pct_change +generated/pandas.Series.pipe,../reference/api/pandas.Series.pipe +generated/pandas.Series.plot.area,../reference/api/pandas.Series.plot.area +generated/pandas.Series.plot.barh,../reference/api/pandas.Series.plot.barh +generated/pandas.Series.plot.bar,../reference/api/pandas.Series.plot.bar +generated/pandas.Series.plot.box,../reference/api/pandas.Series.plot.box +generated/pandas.Series.plot.density,../reference/api/pandas.Series.plot.density +generated/pandas.Series.plot.hist,../reference/api/pandas.Series.plot.hist +generated/pandas.Series.plot,../reference/api/pandas.Series.plot +generated/pandas.Series.plot.kde,../reference/api/pandas.Series.plot.kde +generated/pandas.Series.plot.line,../reference/api/pandas.Series.plot.line +generated/pandas.Series.plot.pie,../reference/api/pandas.Series.plot.pie +generated/pandas.Series.pop,../reference/api/pandas.Series.pop +generated/pandas.Series.pow,../reference/api/pandas.Series.pow +generated/pandas.Series.prod,../reference/api/pandas.Series.prod +generated/pandas.Series.product,../reference/api/pandas.Series.product +generated/pandas.Series.ptp,../reference/api/pandas.Series.ptp +generated/pandas.Series.put,../reference/api/pandas.Series.put +generated/pandas.Series.quantile,../reference/api/pandas.Series.quantile +generated/pandas.Series.radd,../reference/api/pandas.Series.radd +generated/pandas.Series.rank,../reference/api/pandas.Series.rank +generated/pandas.Series.ravel,../reference/api/pandas.Series.ravel +generated/pandas.Series.rdiv,../reference/api/pandas.Series.rdiv +generated/pandas.Series.rdivmod,../reference/api/pandas.Series.rdivmod +generated/pandas.Series.real,../reference/api/pandas.Series.real +generated/pandas.Series.reindex_axis,../reference/api/pandas.Series.reindex_axis +generated/pandas.Series.reindex,../reference/api/pandas.Series.reindex +generated/pandas.Series.reindex_like,../reference/api/pandas.Series.reindex_like +generated/pandas.Series.rename_axis,../reference/api/pandas.Series.rename_axis +generated/pandas.Series.rename,../reference/api/pandas.Series.rename +generated/pandas.Series.reorder_levels,../reference/api/pandas.Series.reorder_levels +generated/pandas.Series.repeat,../reference/api/pandas.Series.repeat +generated/pandas.Series.replace,../reference/api/pandas.Series.replace +generated/pandas.Series.resample,../reference/api/pandas.Series.resample +generated/pandas.Series.reset_index,../reference/api/pandas.Series.reset_index +generated/pandas.Series.rfloordiv,../reference/api/pandas.Series.rfloordiv +generated/pandas.Series.rmod,../reference/api/pandas.Series.rmod +generated/pandas.Series.rmul,../reference/api/pandas.Series.rmul +generated/pandas.Series.rolling,../reference/api/pandas.Series.rolling +generated/pandas.Series.round,../reference/api/pandas.Series.round +generated/pandas.Series.rpow,../reference/api/pandas.Series.rpow +generated/pandas.Series.rsub,../reference/api/pandas.Series.rsub +generated/pandas.Series.rtruediv,../reference/api/pandas.Series.rtruediv +generated/pandas.Series.sample,../reference/api/pandas.Series.sample +generated/pandas.Series.searchsorted,../reference/api/pandas.Series.searchsorted +generated/pandas.Series.select,../reference/api/pandas.Series.select +generated/pandas.Series.sem,../reference/api/pandas.Series.sem +generated/pandas.Series.set_axis,../reference/api/pandas.Series.set_axis +generated/pandas.Series.set_value,../reference/api/pandas.Series.set_value +generated/pandas.Series.shape,../reference/api/pandas.Series.shape +generated/pandas.Series.shift,../reference/api/pandas.Series.shift +generated/pandas.Series.size,../reference/api/pandas.Series.size +generated/pandas.Series.skew,../reference/api/pandas.Series.skew +generated/pandas.Series.slice_shift,../reference/api/pandas.Series.slice_shift +generated/pandas.Series.sort_index,../reference/api/pandas.Series.sort_index +generated/pandas.Series.sort_values,../reference/api/pandas.Series.sort_values +generated/pandas.Series.sparse.density,../reference/api/pandas.Series.sparse.density +generated/pandas.Series.sparse.fill_value,../reference/api/pandas.Series.sparse.fill_value +generated/pandas.Series.sparse.from_coo,../reference/api/pandas.Series.sparse.from_coo +generated/pandas.Series.sparse.npoints,../reference/api/pandas.Series.sparse.npoints +generated/pandas.Series.sparse.sp_values,../reference/api/pandas.Series.sparse.sp_values +generated/pandas.Series.sparse.to_coo,../reference/api/pandas.Series.sparse.to_coo +generated/pandas.Series.squeeze,../reference/api/pandas.Series.squeeze +generated/pandas.Series.std,../reference/api/pandas.Series.std +generated/pandas.Series.str.capitalize,../reference/api/pandas.Series.str.capitalize +generated/pandas.Series.str.cat,../reference/api/pandas.Series.str.cat +generated/pandas.Series.str.center,../reference/api/pandas.Series.str.center +generated/pandas.Series.str.contains,../reference/api/pandas.Series.str.contains +generated/pandas.Series.str.count,../reference/api/pandas.Series.str.count +generated/pandas.Series.str.decode,../reference/api/pandas.Series.str.decode +generated/pandas.Series.str.encode,../reference/api/pandas.Series.str.encode +generated/pandas.Series.str.endswith,../reference/api/pandas.Series.str.endswith +generated/pandas.Series.str.extractall,../reference/api/pandas.Series.str.extractall +generated/pandas.Series.str.extract,../reference/api/pandas.Series.str.extract +generated/pandas.Series.str.findall,../reference/api/pandas.Series.str.findall +generated/pandas.Series.str.find,../reference/api/pandas.Series.str.find +generated/pandas.Series.str.get_dummies,../reference/api/pandas.Series.str.get_dummies +generated/pandas.Series.str.get,../reference/api/pandas.Series.str.get +generated/pandas.Series.str,../reference/api/pandas.Series.str +generated/pandas.Series.strides,../reference/api/pandas.Series.strides +generated/pandas.Series.str.index,../reference/api/pandas.Series.str.index +generated/pandas.Series.str.isalnum,../reference/api/pandas.Series.str.isalnum +generated/pandas.Series.str.isalpha,../reference/api/pandas.Series.str.isalpha +generated/pandas.Series.str.isdecimal,../reference/api/pandas.Series.str.isdecimal +generated/pandas.Series.str.isdigit,../reference/api/pandas.Series.str.isdigit +generated/pandas.Series.str.islower,../reference/api/pandas.Series.str.islower +generated/pandas.Series.str.isnumeric,../reference/api/pandas.Series.str.isnumeric +generated/pandas.Series.str.isspace,../reference/api/pandas.Series.str.isspace +generated/pandas.Series.str.istitle,../reference/api/pandas.Series.str.istitle +generated/pandas.Series.str.isupper,../reference/api/pandas.Series.str.isupper +generated/pandas.Series.str.join,../reference/api/pandas.Series.str.join +generated/pandas.Series.str.len,../reference/api/pandas.Series.str.len +generated/pandas.Series.str.ljust,../reference/api/pandas.Series.str.ljust +generated/pandas.Series.str.lower,../reference/api/pandas.Series.str.lower +generated/pandas.Series.str.lstrip,../reference/api/pandas.Series.str.lstrip +generated/pandas.Series.str.match,../reference/api/pandas.Series.str.match +generated/pandas.Series.str.normalize,../reference/api/pandas.Series.str.normalize +generated/pandas.Series.str.pad,../reference/api/pandas.Series.str.pad +generated/pandas.Series.str.partition,../reference/api/pandas.Series.str.partition +generated/pandas.Series.str.repeat,../reference/api/pandas.Series.str.repeat +generated/pandas.Series.str.replace,../reference/api/pandas.Series.str.replace +generated/pandas.Series.str.rfind,../reference/api/pandas.Series.str.rfind +generated/pandas.Series.str.rindex,../reference/api/pandas.Series.str.rindex +generated/pandas.Series.str.rjust,../reference/api/pandas.Series.str.rjust +generated/pandas.Series.str.rpartition,../reference/api/pandas.Series.str.rpartition +generated/pandas.Series.str.rsplit,../reference/api/pandas.Series.str.rsplit +generated/pandas.Series.str.rstrip,../reference/api/pandas.Series.str.rstrip +generated/pandas.Series.str.slice,../reference/api/pandas.Series.str.slice +generated/pandas.Series.str.slice_replace,../reference/api/pandas.Series.str.slice_replace +generated/pandas.Series.str.split,../reference/api/pandas.Series.str.split +generated/pandas.Series.str.startswith,../reference/api/pandas.Series.str.startswith +generated/pandas.Series.str.strip,../reference/api/pandas.Series.str.strip +generated/pandas.Series.str.swapcase,../reference/api/pandas.Series.str.swapcase +generated/pandas.Series.str.title,../reference/api/pandas.Series.str.title +generated/pandas.Series.str.translate,../reference/api/pandas.Series.str.translate +generated/pandas.Series.str.upper,../reference/api/pandas.Series.str.upper +generated/pandas.Series.str.wrap,../reference/api/pandas.Series.str.wrap +generated/pandas.Series.str.zfill,../reference/api/pandas.Series.str.zfill +generated/pandas.Series.sub,../reference/api/pandas.Series.sub +generated/pandas.Series.subtract,../reference/api/pandas.Series.subtract +generated/pandas.Series.sum,../reference/api/pandas.Series.sum +generated/pandas.Series.swapaxes,../reference/api/pandas.Series.swapaxes +generated/pandas.Series.swaplevel,../reference/api/pandas.Series.swaplevel +generated/pandas.Series.tail,../reference/api/pandas.Series.tail +generated/pandas.Series.take,../reference/api/pandas.Series.take +generated/pandas.Series.T,../reference/api/pandas.Series.T +generated/pandas.Series.timetuple,../reference/api/pandas.Series.timetuple +generated/pandas.Series.to_clipboard,../reference/api/pandas.Series.to_clipboard +generated/pandas.Series.to_csv,../reference/api/pandas.Series.to_csv +generated/pandas.Series.to_dense,../reference/api/pandas.Series.to_dense +generated/pandas.Series.to_dict,../reference/api/pandas.Series.to_dict +generated/pandas.Series.to_excel,../reference/api/pandas.Series.to_excel +generated/pandas.Series.to_frame,../reference/api/pandas.Series.to_frame +generated/pandas.Series.to_hdf,../reference/api/pandas.Series.to_hdf +generated/pandas.Series.to_json,../reference/api/pandas.Series.to_json +generated/pandas.Series.to_latex,../reference/api/pandas.Series.to_latex +generated/pandas.Series.to_list,../reference/api/pandas.Series.to_list +generated/pandas.Series.tolist,../reference/api/pandas.Series.tolist +generated/pandas.Series.to_msgpack,../reference/api/pandas.Series.to_msgpack +generated/pandas.Series.to_numpy,../reference/api/pandas.Series.to_numpy +generated/pandas.Series.to_period,../reference/api/pandas.Series.to_period +generated/pandas.Series.to_pickle,../reference/api/pandas.Series.to_pickle +generated/pandas.Series.to_sparse,../reference/api/pandas.Series.to_sparse +generated/pandas.Series.to_sql,../reference/api/pandas.Series.to_sql +generated/pandas.Series.to_string,../reference/api/pandas.Series.to_string +generated/pandas.Series.to_timestamp,../reference/api/pandas.Series.to_timestamp +generated/pandas.Series.to_xarray,../reference/api/pandas.Series.to_xarray +generated/pandas.Series.transform,../reference/api/pandas.Series.transform +generated/pandas.Series.transpose,../reference/api/pandas.Series.transpose +generated/pandas.Series.truediv,../reference/api/pandas.Series.truediv +generated/pandas.Series.truncate,../reference/api/pandas.Series.truncate +generated/pandas.Series.tshift,../reference/api/pandas.Series.tshift +generated/pandas.Series.tz_convert,../reference/api/pandas.Series.tz_convert +generated/pandas.Series.tz_localize,../reference/api/pandas.Series.tz_localize +generated/pandas.Series.unique,../reference/api/pandas.Series.unique +generated/pandas.Series.unstack,../reference/api/pandas.Series.unstack +generated/pandas.Series.update,../reference/api/pandas.Series.update +generated/pandas.Series.valid,../reference/api/pandas.Series.valid +generated/pandas.Series.value_counts,../reference/api/pandas.Series.value_counts +generated/pandas.Series.values,../reference/api/pandas.Series.values +generated/pandas.Series.var,../reference/api/pandas.Series.var +generated/pandas.Series.view,../reference/api/pandas.Series.view +generated/pandas.Series.where,../reference/api/pandas.Series.where +generated/pandas.Series.xs,../reference/api/pandas.Series.xs +generated/pandas.set_option,../reference/api/pandas.set_option +generated/pandas.SparseDataFrame.to_coo,../reference/api/pandas.SparseDataFrame.to_coo +generated/pandas.SparseSeries.from_coo,../reference/api/pandas.SparseSeries.from_coo +generated/pandas.SparseSeries.to_coo,../reference/api/pandas.SparseSeries.to_coo +generated/pandas.test,../reference/api/pandas.test +generated/pandas.testing.assert_frame_equal,../reference/api/pandas.testing.assert_frame_equal +generated/pandas.testing.assert_index_equal,../reference/api/pandas.testing.assert_index_equal +generated/pandas.testing.assert_series_equal,../reference/api/pandas.testing.assert_series_equal +generated/pandas.Timedelta.asm8,../reference/api/pandas.Timedelta.asm8 +generated/pandas.Timedelta.ceil,../reference/api/pandas.Timedelta.ceil +generated/pandas.Timedelta.components,../reference/api/pandas.Timedelta.components +generated/pandas.Timedelta.days,../reference/api/pandas.Timedelta.days +generated/pandas.Timedelta.delta,../reference/api/pandas.Timedelta.delta +generated/pandas.Timedelta.floor,../reference/api/pandas.Timedelta.floor +generated/pandas.Timedelta.freq,../reference/api/pandas.Timedelta.freq +generated/pandas.Timedelta,../reference/api/pandas.Timedelta +generated/pandas.TimedeltaIndex.ceil,../reference/api/pandas.TimedeltaIndex.ceil +generated/pandas.TimedeltaIndex.components,../reference/api/pandas.TimedeltaIndex.components +generated/pandas.TimedeltaIndex.days,../reference/api/pandas.TimedeltaIndex.days +generated/pandas.TimedeltaIndex.floor,../reference/api/pandas.TimedeltaIndex.floor +generated/pandas.TimedeltaIndex,../reference/api/pandas.TimedeltaIndex +generated/pandas.TimedeltaIndex.inferred_freq,../reference/api/pandas.TimedeltaIndex.inferred_freq +generated/pandas.TimedeltaIndex.microseconds,../reference/api/pandas.TimedeltaIndex.microseconds +generated/pandas.TimedeltaIndex.nanoseconds,../reference/api/pandas.TimedeltaIndex.nanoseconds +generated/pandas.TimedeltaIndex.round,../reference/api/pandas.TimedeltaIndex.round +generated/pandas.TimedeltaIndex.seconds,../reference/api/pandas.TimedeltaIndex.seconds +generated/pandas.TimedeltaIndex.to_frame,../reference/api/pandas.TimedeltaIndex.to_frame +generated/pandas.TimedeltaIndex.to_pytimedelta,../reference/api/pandas.TimedeltaIndex.to_pytimedelta +generated/pandas.TimedeltaIndex.to_series,../reference/api/pandas.TimedeltaIndex.to_series +generated/pandas.Timedelta.isoformat,../reference/api/pandas.Timedelta.isoformat +generated/pandas.Timedelta.is_populated,../reference/api/pandas.Timedelta.is_populated +generated/pandas.Timedelta.max,../reference/api/pandas.Timedelta.max +generated/pandas.Timedelta.microseconds,../reference/api/pandas.Timedelta.microseconds +generated/pandas.Timedelta.min,../reference/api/pandas.Timedelta.min +generated/pandas.Timedelta.nanoseconds,../reference/api/pandas.Timedelta.nanoseconds +generated/pandas.timedelta_range,../reference/api/pandas.timedelta_range +generated/pandas.Timedelta.resolution,../reference/api/pandas.Timedelta.resolution +generated/pandas.Timedelta.round,../reference/api/pandas.Timedelta.round +generated/pandas.Timedelta.seconds,../reference/api/pandas.Timedelta.seconds +generated/pandas.Timedelta.to_pytimedelta,../reference/api/pandas.Timedelta.to_pytimedelta +generated/pandas.Timedelta.total_seconds,../reference/api/pandas.Timedelta.total_seconds +generated/pandas.Timedelta.to_timedelta64,../reference/api/pandas.Timedelta.to_timedelta64 +generated/pandas.Timedelta.value,../reference/api/pandas.Timedelta.value +generated/pandas.Timedelta.view,../reference/api/pandas.Timedelta.view +generated/pandas.Timestamp.asm8,../reference/api/pandas.Timestamp.asm8 +generated/pandas.Timestamp.astimezone,../reference/api/pandas.Timestamp.astimezone +generated/pandas.Timestamp.ceil,../reference/api/pandas.Timestamp.ceil +generated/pandas.Timestamp.combine,../reference/api/pandas.Timestamp.combine +generated/pandas.Timestamp.ctime,../reference/api/pandas.Timestamp.ctime +generated/pandas.Timestamp.date,../reference/api/pandas.Timestamp.date +generated/pandas.Timestamp.day,../reference/api/pandas.Timestamp.day +generated/pandas.Timestamp.day_name,../reference/api/pandas.Timestamp.day_name +generated/pandas.Timestamp.dayofweek,../reference/api/pandas.Timestamp.dayofweek +generated/pandas.Timestamp.dayofyear,../reference/api/pandas.Timestamp.dayofyear +generated/pandas.Timestamp.days_in_month,../reference/api/pandas.Timestamp.days_in_month +generated/pandas.Timestamp.daysinmonth,../reference/api/pandas.Timestamp.daysinmonth +generated/pandas.Timestamp.dst,../reference/api/pandas.Timestamp.dst +generated/pandas.Timestamp.floor,../reference/api/pandas.Timestamp.floor +generated/pandas.Timestamp.fold,../reference/api/pandas.Timestamp.fold +generated/pandas.Timestamp.freq,../reference/api/pandas.Timestamp.freq +generated/pandas.Timestamp.freqstr,../reference/api/pandas.Timestamp.freqstr +generated/pandas.Timestamp.fromisoformat,../reference/api/pandas.Timestamp.fromisoformat +generated/pandas.Timestamp.fromordinal,../reference/api/pandas.Timestamp.fromordinal +generated/pandas.Timestamp.fromtimestamp,../reference/api/pandas.Timestamp.fromtimestamp +generated/pandas.Timestamp.hour,../reference/api/pandas.Timestamp.hour +generated/pandas.Timestamp,../reference/api/pandas.Timestamp +generated/pandas.Timestamp.is_leap_year,../reference/api/pandas.Timestamp.is_leap_year +generated/pandas.Timestamp.is_month_end,../reference/api/pandas.Timestamp.is_month_end +generated/pandas.Timestamp.is_month_start,../reference/api/pandas.Timestamp.is_month_start +generated/pandas.Timestamp.isocalendar,../reference/api/pandas.Timestamp.isocalendar +generated/pandas.Timestamp.isoformat,../reference/api/pandas.Timestamp.isoformat +generated/pandas.Timestamp.isoweekday,../reference/api/pandas.Timestamp.isoweekday +generated/pandas.Timestamp.is_quarter_end,../reference/api/pandas.Timestamp.is_quarter_end +generated/pandas.Timestamp.is_quarter_start,../reference/api/pandas.Timestamp.is_quarter_start +generated/pandas.Timestamp.is_year_end,../reference/api/pandas.Timestamp.is_year_end +generated/pandas.Timestamp.is_year_start,../reference/api/pandas.Timestamp.is_year_start +generated/pandas.Timestamp.max,../reference/api/pandas.Timestamp.max +generated/pandas.Timestamp.microsecond,../reference/api/pandas.Timestamp.microsecond +generated/pandas.Timestamp.min,../reference/api/pandas.Timestamp.min +generated/pandas.Timestamp.minute,../reference/api/pandas.Timestamp.minute +generated/pandas.Timestamp.month,../reference/api/pandas.Timestamp.month +generated/pandas.Timestamp.month_name,../reference/api/pandas.Timestamp.month_name +generated/pandas.Timestamp.nanosecond,../reference/api/pandas.Timestamp.nanosecond +generated/pandas.Timestamp.normalize,../reference/api/pandas.Timestamp.normalize +generated/pandas.Timestamp.now,../reference/api/pandas.Timestamp.now +generated/pandas.Timestamp.quarter,../reference/api/pandas.Timestamp.quarter +generated/pandas.Timestamp.replace,../reference/api/pandas.Timestamp.replace +generated/pandas.Timestamp.resolution,../reference/api/pandas.Timestamp.resolution +generated/pandas.Timestamp.round,../reference/api/pandas.Timestamp.round +generated/pandas.Timestamp.second,../reference/api/pandas.Timestamp.second +generated/pandas.Timestamp.strftime,../reference/api/pandas.Timestamp.strftime +generated/pandas.Timestamp.strptime,../reference/api/pandas.Timestamp.strptime +generated/pandas.Timestamp.time,../reference/api/pandas.Timestamp.time +generated/pandas.Timestamp.timestamp,../reference/api/pandas.Timestamp.timestamp +generated/pandas.Timestamp.timetuple,../reference/api/pandas.Timestamp.timetuple +generated/pandas.Timestamp.timetz,../reference/api/pandas.Timestamp.timetz +generated/pandas.Timestamp.to_datetime64,../reference/api/pandas.Timestamp.to_datetime64 +generated/pandas.Timestamp.today,../reference/api/pandas.Timestamp.today +generated/pandas.Timestamp.to_julian_date,../reference/api/pandas.Timestamp.to_julian_date +generated/pandas.Timestamp.toordinal,../reference/api/pandas.Timestamp.toordinal +generated/pandas.Timestamp.to_period,../reference/api/pandas.Timestamp.to_period +generated/pandas.Timestamp.to_pydatetime,../reference/api/pandas.Timestamp.to_pydatetime +generated/pandas.Timestamp.tz_convert,../reference/api/pandas.Timestamp.tz_convert +generated/pandas.Timestamp.tz,../reference/api/pandas.Timestamp.tz +generated/pandas.Timestamp.tzinfo,../reference/api/pandas.Timestamp.tzinfo +generated/pandas.Timestamp.tz_localize,../reference/api/pandas.Timestamp.tz_localize +generated/pandas.Timestamp.tzname,../reference/api/pandas.Timestamp.tzname +generated/pandas.Timestamp.utcfromtimestamp,../reference/api/pandas.Timestamp.utcfromtimestamp +generated/pandas.Timestamp.utcnow,../reference/api/pandas.Timestamp.utcnow +generated/pandas.Timestamp.utcoffset,../reference/api/pandas.Timestamp.utcoffset +generated/pandas.Timestamp.utctimetuple,../reference/api/pandas.Timestamp.utctimetuple +generated/pandas.Timestamp.value,../reference/api/pandas.Timestamp.value +generated/pandas.Timestamp.weekday,../reference/api/pandas.Timestamp.weekday +generated/pandas.Timestamp.weekday_name,../reference/api/pandas.Timestamp.weekday_name +generated/pandas.Timestamp.week,../reference/api/pandas.Timestamp.week +generated/pandas.Timestamp.weekofyear,../reference/api/pandas.Timestamp.weekofyear +generated/pandas.Timestamp.year,../reference/api/pandas.Timestamp.year +generated/pandas.to_datetime,../reference/api/pandas.to_datetime +generated/pandas.to_numeric,../reference/api/pandas.to_numeric +generated/pandas.to_timedelta,../reference/api/pandas.to_timedelta +generated/pandas.tseries.frequencies.to_offset,../reference/api/pandas.tseries.frequencies.to_offset +generated/pandas.unique,../reference/api/pandas.unique +generated/pandas.util.hash_array,../reference/api/pandas.util.hash_array +generated/pandas.util.hash_pandas_object,../reference/api/pandas.util.hash_pandas_object +generated/pandas.wide_to_long,../reference/api/pandas.wide_to_long diff --git a/doc/source/_static/banklist.html b/doc/source/_static/banklist.html index cbcce5a2d49ff..cb07c332acbe7 100644 --- a/doc/source/_static/banklist.html +++ b/doc/source/_static/banklist.html @@ -37,7 +37,7 @@ else var sValue = li.selectValue; $('#googlesearch').submit(); - + } function findValue2(li) { if( li == null ) return alert("No match!"); @@ -47,7 +47,7 @@ // otherwise, let's just display the value in the text box else var sValue = li.selectValue; - + $('#googlesearch2').submit(); } function selectItem(li) { @@ -62,7 +62,7 @@ function log(event, data, formatted) { $("
  • ").html( !data ? "No match!" : "Selected: " + formatted).appendTo("#result"); } - + function formatItem(row) { return row[0] + " (id: " + row[1] + ")"; } @@ -81,7 +81,7 @@ selectFirst: false }); - + $("#search2").autocomplete("/searchjs.asp", { width: 160, autoFill: false, @@ -93,7 +93,7 @@ selectFirst: false }); - + }); @@ -232,16 +232,16 @@

    Each depositor insured to at least $250,000 per insured bank

    Failed Bank List

    The FDIC is often appointed as receiver for failed banks. This page contains useful information for the customers and vendors of these banks. This includes information on the acquiring bank (if applicable), how your accounts and loans are affected, and how vendors can file claims against the receivership. Failed Financial Institution Contact Search displays point of contact information related to failed banks.

    - +

    This list includes banks which have failed since October 1, 2000. To search for banks that failed prior to those on this page, visit this link: Failures and Assistance Transactions

    - +

    Failed Bank List - CSV file (Updated on Mondays. Also opens in Excel - Excel Help)

    - +

    Due to the small screen size some information is no longer visible.
    Full information available when viewed on a larger screen.

    @@ -253,7 +253,7 @@

    Failed Bank List

    City ST CERT - Acquiring Institution + Acquiring Institution Closing Date Updated Date @@ -294,7 +294,7 @@

    Failed Bank List

    Capital Bank, N.A. May 10, 2013 May 14, 2013 - + Douglas County Bank Douglasville @@ -383,7 +383,7 @@

    Failed Bank List

    Sunwest Bank January 11, 2013 January 24, 2013 - + Community Bank of the Ozarks Sunrise Beach @@ -392,7 +392,7 @@

    Failed Bank List

    Bank of Sullivan December 14, 2012 January 24, 2013 - + Hometown Community Bank Braselton @@ -401,7 +401,7 @@

    Failed Bank List

    CertusBank, National Association November 16, 2012 January 24, 2013 - + Citizens First National Bank Princeton @@ -518,7 +518,7 @@

    Failed Bank List

    Metcalf Bank July 20, 2012 December 17, 2012 - + First Cherokee State Bank Woodstock @@ -635,7 +635,7 @@

    Failed Bank List

    Southern States Bank May 18, 2012 May 20, 2013 - + Security Bank, National Association North Lauderdale @@ -644,7 +644,7 @@

    Failed Bank List

    Banesco USA May 4, 2012 October 31, 2012 - + Palm Desert National Bank Palm Desert @@ -734,7 +734,7 @@

    Failed Bank List

    No Acquirer March 9, 2012 October 29, 2012 - + Global Commerce Bank Doraville @@ -752,7 +752,7 @@

    Failed Bank List

    No Acquirer February 24, 2012 December 17, 2012 - + Central Bank of Georgia Ellaville @@ -761,7 +761,7 @@

    Failed Bank List

    Ameris Bank February 24, 2012 August 9, 2012 - + SCB Bank Shelbyville @@ -770,7 +770,7 @@

    Failed Bank List

    First Merchants Bank, National Association February 10, 2012 March 25, 2013 - + Charter National Bank and Trust Hoffman Estates @@ -779,7 +779,7 @@

    Failed Bank List

    Barrington Bank & Trust Company, National Association February 10, 2012 March 25, 2013 - + BankEast Knoxville @@ -788,7 +788,7 @@

    Failed Bank List

    U.S.Bank National Association January 27, 2012 March 8, 2013 - + Patriot Bank Minnesota Forest Lake @@ -797,7 +797,7 @@

    Failed Bank List

    First Resource Bank January 27, 2012 September 12, 2012 - + Tennessee Commerce Bank Franklin @@ -806,7 +806,7 @@

    Failed Bank List

    Republic Bank & Trust Company January 27, 2012 November 20, 2012 - + First Guaranty Bank and Trust Company of Jacksonville Jacksonville @@ -815,7 +815,7 @@

    Failed Bank List

    CenterState Bank of Florida, N.A. January 27, 2012 September 12, 2012 - + American Eagle Savings Bank Boothwyn @@ -824,7 +824,7 @@

    Failed Bank List

    Capital Bank, N.A. January 20, 2012 January 25, 2013 - + The First State Bank Stockbridge @@ -833,7 +833,7 @@

    Failed Bank List

    Hamilton State Bank January 20, 2012 January 25, 2013 - + Central Florida State Bank Belleview @@ -842,7 +842,7 @@

    Failed Bank List

    CenterState Bank of Florida, N.A. January 20, 2012 January 25, 2013 - + Western National Bank Phoenix @@ -869,7 +869,7 @@

    Failed Bank List

    First NBC Bank November 18, 2011 August 13, 2012 - + Polk County Bank Johnston @@ -887,7 +887,7 @@

    Failed Bank List

    Century Bank of Georgia November 10, 2011 August 13, 2012 - + SunFirst Bank Saint George @@ -896,7 +896,7 @@

    Failed Bank List

    Cache Valley Bank November 4, 2011 November 16, 2012 - + Mid City Bank, Inc. Omaha @@ -905,7 +905,7 @@

    Failed Bank List

    Premier Bank November 4, 2011 August 15, 2012 - + All American Bank Des Plaines @@ -914,7 +914,7 @@

    Failed Bank List

    International Bank of Chicago October 28, 2011 August 15, 2012 - + Community Banks of Colorado Greenwood Village @@ -959,7 +959,7 @@

    Failed Bank List

    Blackhawk Bank & Trust October 14, 2011 August 15, 2012 - + First State Bank Cranford @@ -968,7 +968,7 @@

    Failed Bank List

    Northfield Bank October 14, 2011 November 8, 2012 - + Blue Ridge Savings Bank, Inc. Asheville @@ -977,7 +977,7 @@

    Failed Bank List

    Bank of North Carolina October 14, 2011 November 8, 2012 - + Piedmont Community Bank Gray @@ -986,7 +986,7 @@

    Failed Bank List

    State Bank and Trust Company October 14, 2011 January 22, 2013 - + Sun Security Bank Ellington @@ -1202,7 +1202,7 @@

    Failed Bank List

    Ameris Bank July 15, 2011 November 2, 2012 - + One Georgia Bank Atlanta @@ -1247,7 +1247,7 @@

    Failed Bank List

    First American Bank and Trust Company June 24, 2011 November 2, 2012 - + First Commercial Bank of Tampa Bay Tampa @@ -1256,7 +1256,7 @@

    Failed Bank List

    Stonegate Bank June 17, 2011 November 2, 2012 - + McIntosh State Bank Jackson @@ -1265,7 +1265,7 @@

    Failed Bank List

    Hamilton State Bank June 17, 2011 November 2, 2012 - + Atlantic Bank and Trust Charleston @@ -1274,7 +1274,7 @@

    Failed Bank List

    First Citizens Bank and Trust Company, Inc. June 3, 2011 October 31, 2012 - + First Heritage Bank Snohomish @@ -1283,7 +1283,7 @@

    Failed Bank List

    Columbia State Bank May 27, 2011 January 28, 2013 - + Summit Bank Burlington @@ -1292,7 +1292,7 @@

    Failed Bank List

    Columbia State Bank May 20, 2011 January 22, 2013 - + First Georgia Banking Company Franklin @@ -2030,7 +2030,7 @@

    Failed Bank List

    Westamerica Bank August 20, 2010 September 12, 2012 - + Los Padres Bank Solvang @@ -2624,7 +2624,7 @@

    Failed Bank List

    MB Financial Bank, N.A. April 23, 2010 August 23, 2012 - + Amcore Bank, National Association Rockford @@ -2768,7 +2768,7 @@

    Failed Bank List

    First Citizens Bank March 19, 2010 August 23, 2012 - + Bank of Hiawassee Hiawassee @@ -3480,7 +3480,7 @@

    Failed Bank List

    October 2, 2009 August 21, 2012 - + Warren Bank Warren MI @@ -3767,7 +3767,7 @@

    Failed Bank List

    Herring Bank July 31, 2009 August 20, 2012 - + Security Bank of Jones County Gray @@ -3848,7 +3848,7 @@

    Failed Bank List

    California Bank & Trust July 17, 2009 August 20, 2012 - + BankFirst Sioux Falls @@ -4811,7 +4811,7 @@

    Failed Bank List

    Bank of the Orient October 13, 2000 March 17, 2005 - + @@ -4854,7 +4854,7 @@

    Failed Bank List