[IMP] snippets.convert_html_columns: a batch processing story #94
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
TLDR: RTFM
Once upon a time, in a countryside farm in Belgium...
At first, the upgrade of databases was straightforward. But, as time passed, the size of the databases grew, and some CPU-intensive computations took so much time that a solution needed to be found. Hopefully, the Python standard library has the perfect module for this task:
concurrent.futures
.Then, Python 3.10 appeared, and the usage of
ProcessPoolExecutor
started to sometimes hang for no apparent reasons. Soon, our hero finds out he wasn't the only one to suffer from this issue1. Unfortunately, the proposed solution looked overkill. Still, it revealed that the issue had already been known2 for a few years. Despite the fact that an official patch wasn't ready to be committed, discussion about its legitimacy3 leads our hero to a nicer solution.By default,
ProcessPoolExecutor.map
submits elements one by one to the pool. This is pretty inefficient when there are a lot of elements to process. This can be changed by using a large value for the chunksize argument.Who would have thought that a bigger chunk size would solve a performance issue?
As always, the response was in the documentation4.
Footnotes
https://stackoverflow.com/questions/74633896/processpoolexecutor-using-map-hang-on-large-load ↩
https://github.com/python/cpython/issues/74028 ↩
https://github.com/python/cpython/pull/114975#pullrequestreview-1867070041 ↩
https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Executor.map ↩