-
Notifications
You must be signed in to change notification settings - Fork 910
Description
Please submit all the information below so that we can understand the working environment that is the context for your question.
- If you have a problem building or installing Open MPI, be sure to read this.
- If you have a problem launching MPI or OpenSHMEM applications, be sure to read this.
- If you have a problem running MPI or OpenSHMEM applications (i.e., after launching them), be sure to read this.
Background information
I want to exchange some data of derived data types between several ranks. When the sent data is a small array, the data can be sent and received successfully. But if I changed the array from e(2:2) to e(200:200) and sent 2(100,1:100), it showed errors. I didn't revise any other part but just the dimension of the array. It is so strange. I also tested if this problem occurs when the data type is double precision and found that it didn't.
What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
v4.0.3
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
source
If you are building/installing from a git clone, please copy-n-paste the output from git submodule status
.
Please describe the system on which you are running
- Operating system/version: ubuntu 20.04.0
- Computer hardware: Intel(R) Xeon(R) W-2155 CPU @ 3.30GHz
- Network type:wifi
Details of the problem
The derived data type is efield
and MPI_EFIELD
is the corresponding MPI_datatype. I use the MPI_Isend
and MPI_Irecv
to exchange the derived data e
between rank 0 and rank 1. It works well when I send a small array, like, e(2,2). However , when I handled a larger array, e(200:200), and sent e(1,1:100), it ran into errors and it seemed that the data were not exchanged.
The first is the example code of a small array, i.e., e(2,2), and it was followed by the output:
program main
use mpi
implicit none
type efield
double precision :: r1, r2, i1, i2
end type efield
integer :: status(MPI_STATUS_SIZE)
integer :: rank, n_ranks, request, ierr, status0,neighbour
type(efield), dimension(:,:), allocatable :: e
type(efield) :: etype
integer :: MPI_EFIELD
integer, dimension(1:4) :: types = MPI_DOUBLE_PRECISION
integer(MPI_ADDRESS_KIND) :: base_address, addr_r1, addr_r2, addr_i1, addr_i2
integer(MPI_ADDRESS_KIND), dimension(4) :: displacements
integer, dimension(4) :: block_lengths = 1
! Initialize MPI
call MPI_Init(ierr)
call MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr)
call MPI_Comm_size(MPI_COMM_WORLD, n_ranks, ierr)
! Create MPI data type for efield
call MPI_Get_address(etype%r1, addr_r1, ierr)
call MPI_Get_address(etype%r2, addr_r2, ierr)
call MPI_Get_address(etype%i1, addr_i1, ierr)
call MPI_Get_address(etype%i2, addr_i2, ierr)
call MPI_Get_address(etype, base_address, ierr)
displacements(1) = addr_r1 - base_address
displacements(2) = addr_r2 - base_address
displacements(3) = addr_i1 - base_address
displacements(4) = addr_i2 - base_address
call MPI_Type_Create_Struct(4, block_lengths, displacements, types, MPI_EFIELD, ierr)
call MPI_Type_Commit(MPI_EFIELD, ierr)
print*,'MPI Create Struct: MPI_EFIELD',ierr
allocate(e(2,2), STAT=status0)
if (status0 /= 0) then
print *, 'Allocation error on rank', rank, 'status', status0
call MPI_Abort(MPI_COMM_WORLD, status0, ierr)
else
print *, rank, 'allocates e successfully', status0
end if
if (rank == 0) then
e%r1 = 0.0
e%r2 = 0.0
e%i1 = 0.0
e%i2 = 0.0
neighbour=1
else if (rank == 1) then
e%r1 = 1.0
e%r2 = 1.0
e%i1 = 1.0
e%i2 = 1.0
neighbour=0
end if
call MPI_Isend(e(1,1), 1, MPI_EFIELD, neighbour, 0, MPI_COMM_WORLD, request, ierr)
call MPI_Irecv(e(1,1), 1, MPI_EFIELD, neighbour, 0, MPI_COMM_WORLD, request, ierr)
print *, 'before MPI_BARRIER', rank
call MPI_BARRIER(MPI_COMM_WORLD, ierr)
print *, 'after MPI_BARRIER'
print*,rank,e(1,1)%r1,e(1,2)%r1,'rank, after, before'
! Cleanup
deallocate(e, STAT=status0)
if (status0 /= 0) then
print *, 'Deallocation error on rank', rank, 'status', status0
end if
call MPI_Finalize(ierr)
end program main
output:
MPI Create Struct: MPI_EFIELD 0
0 allocates e successfully 0
before MPI_BARRIER 0
MPI Create Struct: MPI_EFIELD 0
1 allocates e successfully 0
before MPI_BARRIER 1
after MPI_BARRIER
after MPI_BARRIER
1 0.0000000000000000 1.0000000000000000 rank, after, before
0 1.0000000000000000 0.0000000000000000 rank, after, before
This is the second code, where I only changed the dimensions of e and the count of sent data, and it is followed by the output,
program main
use mpi
implicit none
type efield
double precision :: r1, r2, i1, i2
end type efield
integer :: status(MPI_STATUS_SIZE)
integer :: rank, n_ranks, request, ierr, status0,neighbour
type(efield), dimension(:,:), allocatable :: e
type(efield) :: etype
integer :: MPI_EFIELD
integer, dimension(1:4) :: types = MPI_DOUBLE_PRECISION
integer(MPI_ADDRESS_KIND) :: base_address, addr_r1, addr_r2, addr_i1, addr_i2
integer(MPI_ADDRESS_KIND), dimension(4) :: displacements
integer, dimension(4) :: block_lengths = 1
! Initialize MPI
call MPI_Init(ierr)
call MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr)
call MPI_Comm_size(MPI_COMM_WORLD, n_ranks, ierr)
! Create MPI data type for efield
call MPI_Get_address(etype%r1, addr_r1, ierr)
call MPI_Get_address(etype%r2, addr_r2, ierr)
call MPI_Get_address(etype%i1, addr_i1, ierr)
call MPI_Get_address(etype%i2, addr_i2, ierr)
call MPI_Get_address(etype, base_address, ierr)
displacements(1) = addr_r1 - base_address
displacements(2) = addr_r2 - base_address
displacements(3) = addr_i1 - base_address
displacements(4) = addr_i2 - base_address
call MPI_Type_Create_Struct(4, block_lengths, displacements, types, MPI_EFIELD, ierr)
call MPI_Type_Commit(MPI_EFIELD, ierr)
print*,'MPI Create Struct: MPI_EFIELD',ierr
allocate(e(200,200), STAT=status0)
if (status0 /= 0) then
print *, 'Allocation error on rank', rank, 'status', status0
call MPI_Abort(MPI_COMM_WORLD, status0, ierr)
else
print *, rank, 'allocates e successfully', status0
end if
if (rank == 0) then
e%r1 = 0.0
e%r2 = 0.0
e%i1 = 0.0
e%i2 = 0.0
neighbour=1
else if (rank == 1) then
e%r1 = 1.0
e%r2 = 1.0
e%i1 = 1.0
e%i2 = 1.0
neighbour=0
end if
call MPI_Isend(e(1,1:100), 100, MPI_EFIELD, neighbour, 0, MPI_COMM_WORLD, request, ierr)
call MPI_Irecv(e(1,1:100), 100, MPI_EFIELD, neighbour, 0, MPI_COMM_WORLD, request, ierr)
print *, 'before MPI_BARRIER', rank
call MPI_BARRIER(MPI_COMM_WORLD, ierr)
print *, 'after MPI_BARRIER'
print*,rank,e(1,1)%r1,e(1,199)%r1,'rank, after, before'
! Cleanup
deallocate(e, STAT=status0)
if (status0 /= 0) then
print *, 'Deallocation error on rank', rank, 'status', status0
end if
call MPI_Finalize(ierr)
end program main
output:
MPI Create Struct: MPI_EFIELD 0
1 allocates e successfully 0
MPI Create Struct: MPI_EFIELD 0
0 allocates e successfully 0
before MPI_BARRIER 1
before MPI_BARRIER 0
after MPI_BARRIER
after MPI_BARRIER
0 0.0000000000000000 0.0000000000000000 rank, after, before
1 1.0000000000000000 1.0000000000000000 rank, after, before
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x7f702bcbbd11 in ???
#1 0x7f702bcbaee5 in ???
#2 0x7f702baec08f in ???
at /build/glibc-LcI20x/glibc-2.31/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0
#3 0x7f702bb40f84 in _int_malloc
at /build/glibc-LcI20x/glibc-2.31/malloc/malloc.c:3742
#4 0x7f702bb43298 in __GI___libc_malloc
at /build/glibc-LcI20x/glibc-2.31/malloc/malloc.c:3066
#5 0x7f702acd2ff9 in ???
#6 0x7f702b9d2caa in ???
#7 0x7f702bfaec8c in ???
#8 0x55626630efa2 in ???
#9 0x55626630efe2 in ???
#10 0x7f702bacd082 in __libc_start_main
at ../csu/libc-start.c:308
#11 0x55626630e1cd in ???
#12 0xffffffffffffffff in ???
#0 0x7f99e1df8d11 in ???
#1 0x7f99e1df7ee5 in ???
#2 0x7f99e1c2908f in ???
at /build/glibc-LcI20x/glibc-2.31/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0
#3 0x7f99e1c7df84 in _int_malloc
at /build/glibc-LcI20x/glibc-2.31/malloc/malloc.c:3742
#4 0x7f99e1c80298 in __GI___libc_malloc
at /build/glibc-LcI20x/glibc-2.31/malloc/malloc.c:3066
#5 0x7f99e0e0fff9 in ???
#6 0x7f99e1b0fcaa in ???
#7 0x7f99e20ebc8c in ???
#8 0x55c17b023fa2 in ???
#9 0x55c17b023fe2 in ???
#10 0x7f99e1c0a082 in __libc_start_main
at ../csu/libc-start.c:308
#11 0x55c17b0231cd in ???
#12 0xffffffffffffffff in ???
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 0 on node bellpc exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
I also tested other cases when the data type is double precision
, but it worked well. So I wondered what's the reason for this and how could I solve this problem.