-
Notifications
You must be signed in to change notification settings - Fork 908
Closed
Description
Background information
What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
v5.0.3
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Using spack 0.22.1
Please describe the system on which you are running
- Operating system/version: Ubuntu 20.04
- Computer hardware: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
Details of the problem
I am using parallel hdf5 to write a 2D distributed array. If I pass a cartesian communicator to hdf5, I sometimes notice that the dataset in the hdf5 file is corrupted when using 3 processes. You can find attached (hdf5_reproducer.tar.gz) a small reproducer in C (< 100 LOC) with a hdf5 file I got running the reproducer. You will also find the result of the ompi_info
command.
Without understanding the logic behind, I also noticed different situations where I seem to never get corrupted data:
- requiring
MPI_THREAD_MULTIPLE
during MPI initialization, - passing a non-cartesian communicator,
- using an other MPI implementation like MPICH.
Thank you,
Thomas