-
-
Notifications
You must be signed in to change notification settings - Fork 154
Description
Short Description
Would maintainers here be open to a pull request that modifies MANIFEST.in
to exclude test files from this project's package artifacts?
Long Description
Today, all test files are included in the project's package artifacts.
python setup.py sdist > /dev/null
cat dask_kubernetes.egg-info/SOURCES.txt
LICENSE
...
dask_kubernetes/tests/config-demo.yaml
dask_kubernetes/tests/conftest.py
dask_kubernetes/tests/fake_gcp_auth.py
dask_kubernetes/tests/test_async.py
dask_kubernetes/tests/test_helm.py
dask_kubernetes/tests/test_kind.py
dask_kubernetes/tests/test_objects.py
dask_kubernetes/tests/test_sync.py
dask_kubernetes/tests/helm/values.yaml
I believe such a change could reduce the uncompressed size of the package by about 150kb.
main |
without tests | |
---|---|---|
compressed | 56k | 48k |
uncompressed | 576k | 424k |
how I estimated this this
Place the following at the root of the repo, call it check-sizes.sh
, and run sh check-sizes.sh
rm -rf dask-kubernetes.egg-info
rm -rf dist/
rm -rf __pycache__
echo ""
echo "building source distribution"
echo ""
python setup.py sdist > /dev/null
cp dask_kubernetes.egg-info/SOURCES.txt ~/DASK-K8S-SOURCES.txt
pushd dist/
echo ""
echo "sdist compressed size"
echo ""
du -a -h .
tar -xf dask-kubernetes*.tar.gz
rm dask-kubernetes*.tar.gz
ls .
echo ""
echo "sdist uncompressed size"
echo ""
du -sh .
popd
Change MANIFEST.in
to the following and run the script again.
recursive-include dask_kubernetes *.py
recursive-include dask_kubernetes *.yaml
recursive-exclude dask_kubernetes/tests *
include setup.py
include setup.cfg
include LICENSE
include README.rst
include requirements.txt
include MANIFEST.in
include versioneer.py
recursive-exclude * __pycache__
recursive-exclude * *.py[co]
why I think this change should be considered
Even though the package is already fairly small and the savings here are not huge, I still think this is worth doing unless there is a compelling reason to allow tests to be run from the published package artifact (instead of from cloning the repo).
- reduces amount of data transfer for PyPI and conda channels serving this package
- reduces the total size of any container images built with
dask-kubernetes
in them (and, therefore, data transfer when those images are pulled + storage footprint of those images) - reduces storage footprint of the package, which can be important in environments like AWS Lambda functions where storage is very limited (Suggestion: remove tests from the distribution pandas-dev/pandas#30741 (comment))
- reduces any of the above for new packages created in the future that copy
dask-kubernetes
as a reference
references
Other projects that have adopted similar changes when I've proposed them:
- remove docs from packages pandas-dev/pandas#38846
- [python-package] remove unnecessary files to reduce sdist size microsoft/LightGBM#3639
- [python-package] cut unnecessary files out of sdist package dmlc/xgboost#6560
Thanks for your time and consideration!