-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
#9408 (my PR) appears to have introduced a race condition in mypy_test.py
.
Evidence of the race
We've had several "flukey" errors in CI over the last few days, all with the same traceback:
- On Python 3.7: https://github.com/python/typeshed/actions/runs/3909559123/jobs/6682289730
- On Python 3.8:
- On Python 3.9: https://github.com/python/typeshed/actions/runs/3916447325/jobs/6695558555
The traceback is the same every time:
Testing third-party packages...
Traceback (most recent call last):
File "./tests/mypy_test.py", line 557, in <module>
main()
File "./tests/mypy_test.py", line 544, in main
code, files_checked_this_version = test_typeshed(code, args=config, tempdir=td_path)
File "./tests/mypy_test.py", line 525, in test_typeshed
code, third_party_files_checked = test_third_party_stubs(code, args, tempdir)
File "./tests/mypy_test.py", line 500, in test_third_party_stubs
setup_virtual_environments(distributions_to_check, args, tempdir)
File "./tests/mypy_test.py", line 444, in setup_virtual_environments
requirements_set, venv_info = venv_info_future.result()
File "/opt/hostedtoolcache/Python/3.7.15/x64/lib/python3.7/concurrent/futures/_base.py", line 428, in result
return self.__get_result()
File "/opt/hostedtoolcache/Python/3.7.15/x64/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/opt/hostedtoolcache/Python/3.7.15/x64/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "./tests/mypy_test.py", line 384, in setup_venv_for_external_requirements_set
return requirements_set, make_venv(venv_dir)
File "/home/runner/work/typeshed/typeshed/tests/utils.py", line 150, in make_venv
venv.create(venv_dir, with_pip=True, clear=True)
File "/opt/hostedtoolcache/Python/3.7.15/x64/lib/python3.7/venv/__init__.py", line 390, in create
builder.create(env_dir)
File "/opt/hostedtoolcache/Python/3.7.15/x64/lib/python3.7/venv/__init__.py", line 68, in create
self._setup_pip(context)
File "/opt/hostedtoolcache/Python/3.7.15/x64/lib/python3.7/venv/__init__.py", line 288, in _setup_pip
subprocess.check_output(cmd, stderr=subprocess.STDOUT)
File "/opt/hostedtoolcache/Python/3.7.15/x64/lib/python3.7/subprocess.py", line 411, in check_output
**kwargs).stdout
File "/opt/hostedtoolcache/Python/3.7.15/x64/lib/python3.7/subprocess.py", line 488, in run
with Popen(*popenargs, **kwargs) as process:
File "/opt/hostedtoolcache/Python/3.7.15/x64/lib/python3.7/subprocess.py", line 800, in __init__
restore_signals, start_new_session)
File "/opt/hostedtoolcache/Python/3.7.15/x64/lib/python3.7/subprocess.py", line 1551, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 26] Text file busy: '/tmp/tmp45sugieh/.venv-5354540476898/bin/python'
Cause of the race
The race happens in this part of the test, where multiple virtual environments are set up concurrently using a threadpool.
Lines 438 to 445 in 0a291da
with concurrent.futures.ThreadPoolExecutor() as executor: | |
venv_info_futures = [ | |
executor.submit(setup_venv_for_external_requirements_set, requirements_set, tempdir) | |
for requirements_set in external_requirements_to_distributions | |
] | |
for venv_info_future in concurrent.futures.as_completed(venv_info_futures): | |
requirements_set, venv_info = venv_info_future.result() | |
requirements_sets_to_venvs[requirements_set] = venv_info |
However, it's unclear exactly why the race occurs. The traceback indicates that two threads appear to be attempting to access the same Python executable at the same time. But it's hard to see why that would be the case.
Reproducing the race condition
@JelleZijlstra has managed to reproduce the race locally on a Linux machine using Python 3.7. This rules out any GitHub Actions-specific hypotheses for why the race might be happening.
I have not been able to reproduce the failure in my local environment (Windows, Python 3.10). (I've tried creating a venv for 145 stubs packages in a threadpool with 20 workers -- still no race for me.)
Note that mypy_test.py
is only ever run on Ubuntu in CI.
Fixing the issue
Idea (1): Just create the venvs one-at-a-time
You can't have a race condition if you don't try to do things concurrently. Locally, this makes the test much slower for me -- the time taken to create the venvs goes up from around 20 seconds to around 60 seconds. The specific part of creating a venv that is slow is the venv.EnvBuilder._setup_pip
function in the stdlib -- the exact part that appears to be implicated in the race condition.
However, if we did go this route, we could potentially mitigate the performance regression by caching venvs between runs of the test (both locally and in CI).
Idea (2): Do concurrency differently
Instead of using a threadpool, we could manually create our threads. Or we could move to an asyncio
-based concurrency model.
Either of these might fix the race condition, but it's hard to know if they would without knowing exactly what's causing the race condition.
Idea (3): Paper over the problem
This diff would probably "fix" the problem, but it wouldn't be a particularly principled fix:
diff --git a/tests/mypy_test.py b/tests/mypy_test.py
index 087f41366..a0a66ed31 100644
--- a/tests/mypy_test.py
+++ b/tests/mypy_test.py
@@ -381,7 +381,17 @@ _DISTRIBUTION_TO_VENV_MAPPING: dict[str, VenvInfo] = {}
def setup_venv_for_external_requirements_set(requirements_set: frozenset[str], tempdir: Path) -> tuple[frozenset[str], VenvInfo]:
venv_dir = tempdir / f".venv-{hash(requirements_set)}"
- return requirements_set, make_venv(venv_dir)
+ while True:
+ try:
+ venv_info = make_venv(venv_dir)
+ except OSError as e:
+ if e.errno != 26:
+ raise
+ else:
+ time.sleep(0.5)
+ else:
+ break
+ return requirements_set, venv_info