You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We observed this error only once in our MTT. We are running v2.x with SLURM/pmix there. It is possible that it is somehow related to this configuration, though I doubt that.
Here is the error message:
export OMPI_MCA_btl_openib_if_include=mlx4_0:1
OMPI_MCA_btl_openib_if_include=mlx4_0:1
OMPI_MCA_mpi_add_procs_cutoff=0+ OMPI_MCA_pmix_base_async_modex=1
OMPI_MCA_pmix_base_collect_data=0
/tmp/mtt_116453_slurm/bin/srun -N 8 -n 64 --mpi=pmix_v1 -p pmellanox <mtt-base>/installs/T8JL/tests/mpich_tests/mpich-mellanox.git/test/mpi/coll/allgather2
[boo13:10605] Attempt to free memory that is still in use by an ongoing MPI communication (buffer 0xa89000, size 9302016). MPI job will now abort.
srun: error: boo13: task 9: Exited with exit code 1
srun: Terminating job step 1592.0