Skip to content

ucx ptmalloc3 conflicts with Open MPI ptmalloc2 #696

@abouteiller

Description

@abouteiller

The following command line crashes during MPI_FINALIZE
mpirun -np 2 -hostfile /opt/etc/arc.machinefile.ompi -np 2 bin/ucx_perftest -c0 -xrc -dmlx4_0:2 -t put_bw

#0  0x00007fabb8628625 in raise () from /lib64/libc.so.6
#1  0x00007fabb8629d8d in abort () from /lib64/libc.so.6
#2  0x00007fabb9abcdbf in ucm_dlfree (mem=0x1a3c000) at ../../../src/ucm/ptmalloc3/malloc.c:4659
#3  0x00007fabb9ab4042 in ucm_set_event_handler (events=32683, priority=-1179959230, cb=0x7ffebc795680, arg=0x7) at ../../../src/ucm/event/event.c:374
#4  0x00007fabb9ab409e in ucm_set_event_handler (events=32683, priority=-1179912614, cb=0x1a3c010, arg=0x7fabb9ab409e <ucm_set_event_handler+120>)
    at ../../../src/ucm/event/event.c:389
#5  0x00007fabb9ab42bf in ucs_spin_lock (lock=0x7fabb9ab42bf <ucs_spin_lock+35>) at /home/bouteill/ucx/debug.build/../src/ucs/type/spinlock.h:47
#6  0x00007fabb869e77d in closedir () from /lib64/libc.so.6
#7  0x00007fabb7c90ca9 in opal_os_dirpath_is_empty () from /opt/ompi-1.8.8/lib/libopen-pal.so.6
#8  0x00007fabb7c90d15 in opal_os_dirpath_destroy () from /opt/ompi-1.8.8/lib/libopen-pal.so.6
#9  0x00007fabb7f377fa in orte_session_dir_finalize () from /opt/ompi-1.8.8/lib/libopen-rte.so.7
#10 0x00007fabb7f4b7ca in orte_ess_base_app_finalize () from /opt/ompi-1.8.8/lib/libopen-rte.so.7
#11 0x00007fabb72261eb in rte_finalize () from /opt/ompi-1.8.8/lib/openmpi/mca_ess_env.so
#12 0x00007fabb7f29e11 in orte_finalize () from /opt/ompi-1.8.8/lib/libopen-rte.so.7
#13 0x00007fabb8bea671 in ompi_mpi_finalize () from /opt/ompi-1.8.8/lib/libmpi.so.1
#14 0x0000000000408614 in cleanup_mpi_rte (ctx=0x7ffebc7958c0) at ../../../../src/tools/perf/perftest.c:1045
#15 0x0000000000408d87 in main (argc=6, argv=0x7ffebc795c98) at ../../../../src/tools/perf/perftest.c:1201

mpirun -mca memory_linux_ptmalloc2_enable false -hostfile /opt/etc/arc.machinefile.ompi -np 2 bin/ucx_perftest -c0 -xrc -dmlx4_0:2 -t put_bw
does complete w/o error.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions