Skip to content

feature request: exclude tests from package tarball #340

@jameslamb

Description

@jameslamb

Short Description

Would maintainers here be open to a pull request that modifies MANIFEST.in to exclude test files from this project's package artifacts?

Long Description

Today, all test files are included in the project's package artifacts.

python setup.py sdist > /dev/null
cat dask_kubernetes.egg-info/SOURCES.txt

LICENSE
...
dask_kubernetes/tests/config-demo.yaml
dask_kubernetes/tests/conftest.py
dask_kubernetes/tests/fake_gcp_auth.py
dask_kubernetes/tests/test_async.py
dask_kubernetes/tests/test_helm.py
dask_kubernetes/tests/test_kind.py
dask_kubernetes/tests/test_objects.py
dask_kubernetes/tests/test_sync.py
dask_kubernetes/tests/helm/values.yaml

I believe such a change could reduce the uncompressed size of the package by about 150kb.

main without tests
compressed 56k 48k
uncompressed 576k 424k
how I estimated this this

Place the following at the root of the repo, call it check-sizes.sh, and run sh check-sizes.sh

rm -rf dask-kubernetes.egg-info
rm -rf dist/
rm -rf __pycache__

echo ""
echo "building source distribution"
echo ""
python setup.py sdist > /dev/null
cp dask_kubernetes.egg-info/SOURCES.txt ~/DASK-K8S-SOURCES.txt
pushd dist/
    echo ""
    echo "sdist compressed size"
    echo ""
    du -a -h .
    tar -xf dask-kubernetes*.tar.gz
    rm dask-kubernetes*.tar.gz
    ls .
    echo ""
    echo "sdist uncompressed size"
    echo ""
    du -sh .
popd

Change MANIFEST.in to the following and run the script again.

recursive-include dask_kubernetes *.py
recursive-include dask_kubernetes *.yaml
recursive-exclude dask_kubernetes/tests *

include setup.py
include setup.cfg
include LICENSE
include README.rst
include requirements.txt
include MANIFEST.in
include versioneer.py

recursive-exclude * __pycache__
recursive-exclude * *.py[co]

why I think this change should be considered

Even though the package is already fairly small and the savings here are not huge, I still think this is worth doing unless there is a compelling reason to allow tests to be run from the published package artifact (instead of from cloning the repo).

  • reduces amount of data transfer for PyPI and conda channels serving this package
  • reduces the total size of any container images built with dask-kubernetes in them (and, therefore, data transfer when those images are pulled + storage footprint of those images)
  • reduces storage footprint of the package, which can be important in environments like AWS Lambda functions where storage is very limited (Suggestion: remove tests from the distribution pandas-dev/pandas#30741 (comment))
  • reduces any of the above for new packages created in the future that copy dask-kubernetes as a reference

references

Other projects that have adopted similar changes when I've proposed them:

Thanks for your time and consideration!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions