Skip to content

small array of derived data type(in Fortran) can be sent by MPI_Isend and MPI_Irecv but it ran into errors when I augment the array #12595

@Bellahra

Description

@Bellahra

Please submit all the information below so that we can understand the working environment that is the context for your question.

Background information

I want to exchange some data of derived data types between several ranks. When the sent data is a small array, the data can be sent and received successfully. But if I changed the array from e(2:2) to e(200:200) and sent 2(100,1:100), it showed errors. I didn't revise any other part but just the dimension of the array. It is so strange. I also tested if this problem occurs when the data type is double precision and found that it didn't.

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

v4.0.3

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

source

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

Please describe the system on which you are running

  • Operating system/version: ubuntu 20.04.0
  • Computer hardware: Intel(R) Xeon(R) W-2155 CPU @ 3.30GHz
  • Network type:wifi

Details of the problem

The derived data type is efield and MPI_EFIELD is the corresponding MPI_datatype. I use the MPI_Isend and MPI_Irecv to exchange the derived data e between rank 0 and rank 1. It works well when I send a small array, like, e(2,2). However , when I handled a larger array, e(200:200), and sent e(1,1:100), it ran into errors and it seemed that the data were not exchanged.
The first is the example code of a small array, i.e., e(2,2), and it was followed by the output:

program main
  use mpi
  implicit none

  type efield
    double precision :: r1, r2, i1, i2
  end type efield
  integer :: status(MPI_STATUS_SIZE)
  integer :: rank, n_ranks, request, ierr, status0,neighbour
  type(efield), dimension(:,:), allocatable :: e
  type(efield) :: etype
  integer :: MPI_EFIELD
  integer, dimension(1:4) :: types = MPI_DOUBLE_PRECISION
  integer(MPI_ADDRESS_KIND) :: base_address, addr_r1, addr_r2, addr_i1, addr_i2
  integer(MPI_ADDRESS_KIND), dimension(4) :: displacements
  integer, dimension(4) :: block_lengths = 1
  ! Initialize MPI
  call MPI_Init(ierr)
  call MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr)
  call MPI_Comm_size(MPI_COMM_WORLD, n_ranks, ierr)
  ! Create MPI data type for efield
  call MPI_Get_address(etype%r1, addr_r1, ierr)
  call MPI_Get_address(etype%r2, addr_r2, ierr)
  call MPI_Get_address(etype%i1, addr_i1, ierr)
  call MPI_Get_address(etype%i2, addr_i2, ierr)
  call MPI_Get_address(etype, base_address, ierr)
  displacements(1) = addr_r1 - base_address
  displacements(2) = addr_r2 - base_address
  displacements(3) = addr_i1 - base_address
  displacements(4) = addr_i2 - base_address
  call MPI_Type_Create_Struct(4, block_lengths, displacements, types, MPI_EFIELD, ierr)
  call MPI_Type_Commit(MPI_EFIELD, ierr)
  print*,'MPI Create Struct: MPI_EFIELD',ierr
  allocate(e(2,2), STAT=status0)
  if (status0 /= 0) then
    print *, 'Allocation error on rank', rank, 'status', status0
    call MPI_Abort(MPI_COMM_WORLD, status0, ierr)
  else
    print *, rank, 'allocates e successfully', status0
  end if
  if (rank == 0) then
    e%r1 = 0.0
    e%r2 = 0.0
    e%i1 = 0.0
    e%i2 = 0.0 
    neighbour=1
  else if (rank == 1) then
    e%r1 = 1.0
    e%r2 = 1.0
    e%i1 = 1.0
    e%i2 = 1.0
    neighbour=0  
  end if
  call MPI_Isend(e(1,1), 1, MPI_EFIELD, neighbour, 0, MPI_COMM_WORLD, request, ierr)
  call MPI_Irecv(e(1,1), 1, MPI_EFIELD, neighbour, 0, MPI_COMM_WORLD, request, ierr)
  print *, 'before MPI_BARRIER', rank
  call MPI_BARRIER(MPI_COMM_WORLD, ierr)
  print *, 'after MPI_BARRIER'
  print*,rank,e(1,1)%r1,e(1,2)%r1,'rank, after, before'
  ! Cleanup
  deallocate(e, STAT=status0)
  if (status0 /= 0) then
    print *, 'Deallocation error on rank', rank, 'status', status0
  end if
  call MPI_Finalize(ierr)
end program main

output:

 MPI Create Struct: MPI_EFIELD           0
           0 allocates e successfully           0
 before MPI_BARRIER           0
 MPI Create Struct: MPI_EFIELD           0
           1 allocates e successfully           0
 before MPI_BARRIER           1
 after MPI_BARRIER
 after MPI_BARRIER
           1   0.0000000000000000        1.0000000000000000      rank, after, before
           0   1.0000000000000000        0.0000000000000000      rank, after, before

This is the second code, where I only changed the dimensions of e and the count of sent data, and it is followed by the output,

program main
  use mpi
  implicit none

  type efield
    double precision :: r1, r2, i1, i2
  end type efield
  integer :: status(MPI_STATUS_SIZE)
  integer :: rank, n_ranks, request, ierr, status0,neighbour
  type(efield), dimension(:,:), allocatable :: e
  type(efield) :: etype
  integer :: MPI_EFIELD
  integer, dimension(1:4) :: types = MPI_DOUBLE_PRECISION
  integer(MPI_ADDRESS_KIND) :: base_address, addr_r1, addr_r2, addr_i1, addr_i2
  integer(MPI_ADDRESS_KIND), dimension(4) :: displacements
  integer, dimension(4) :: block_lengths = 1
  ! Initialize MPI
  call MPI_Init(ierr)
  call MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr)
  call MPI_Comm_size(MPI_COMM_WORLD, n_ranks, ierr)
  ! Create MPI data type for efield
  call MPI_Get_address(etype%r1, addr_r1, ierr)
  call MPI_Get_address(etype%r2, addr_r2, ierr)
  call MPI_Get_address(etype%i1, addr_i1, ierr)
  call MPI_Get_address(etype%i2, addr_i2, ierr)
  call MPI_Get_address(etype, base_address, ierr)
  displacements(1) = addr_r1 - base_address
  displacements(2) = addr_r2 - base_address
  displacements(3) = addr_i1 - base_address
  displacements(4) = addr_i2 - base_address
  call MPI_Type_Create_Struct(4, block_lengths, displacements, types, MPI_EFIELD, ierr)
  call MPI_Type_Commit(MPI_EFIELD, ierr)
  print*,'MPI Create Struct: MPI_EFIELD',ierr
  allocate(e(200,200), STAT=status0)
  if (status0 /= 0) then
    print *, 'Allocation error on rank', rank, 'status', status0
    call MPI_Abort(MPI_COMM_WORLD, status0, ierr)
  else
    print *, rank, 'allocates e successfully', status0
  end if
  if (rank == 0) then
    e%r1 = 0.0
    e%r2 = 0.0
    e%i1 = 0.0
    e%i2 = 0.0 
    neighbour=1
  else if (rank == 1) then
    e%r1 = 1.0
    e%r2 = 1.0
    e%i1 = 1.0
    e%i2 = 1.0
    neighbour=0  
  end if
  call MPI_Isend(e(1,1:100), 100, MPI_EFIELD, neighbour, 0, MPI_COMM_WORLD, request, ierr)
  call MPI_Irecv(e(1,1:100), 100, MPI_EFIELD, neighbour, 0, MPI_COMM_WORLD, request, ierr)
  print *, 'before MPI_BARRIER', rank
  call MPI_BARRIER(MPI_COMM_WORLD, ierr)
  print *, 'after MPI_BARRIER'
  print*,rank,e(1,1)%r1,e(1,199)%r1,'rank, after, before'
  ! Cleanup
  deallocate(e, STAT=status0)
  if (status0 /= 0) then
    print *, 'Deallocation error on rank', rank, 'status', status0
  end if
  call MPI_Finalize(ierr)
end program main

output:

 MPI Create Struct: MPI_EFIELD           0
           1 allocates e successfully           0
 MPI Create Struct: MPI_EFIELD           0
           0 allocates e successfully           0
 before MPI_BARRIER           1
 before MPI_BARRIER           0
 after MPI_BARRIER
 after MPI_BARRIER
           0   0.0000000000000000        0.0000000000000000      rank, after, before
           1   1.0000000000000000        1.0000000000000000      rank, after, before

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7f702bcbbd11 in ???
#1  0x7f702bcbaee5 in ???
#2  0x7f702baec08f in ???
    at /build/glibc-LcI20x/glibc-2.31/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0
#3  0x7f702bb40f84 in _int_malloc
    at /build/glibc-LcI20x/glibc-2.31/malloc/malloc.c:3742
#4  0x7f702bb43298 in __GI___libc_malloc
    at /build/glibc-LcI20x/glibc-2.31/malloc/malloc.c:3066
#5  0x7f702acd2ff9 in ???
#6  0x7f702b9d2caa in ???
#7  0x7f702bfaec8c in ???
#8  0x55626630efa2 in ???
#9  0x55626630efe2 in ???
#10  0x7f702bacd082 in __libc_start_main
    at ../csu/libc-start.c:308
#11  0x55626630e1cd in ???
#12  0xffffffffffffffff in ???
#0  0x7f99e1df8d11 in ???
#1  0x7f99e1df7ee5 in ???
#2  0x7f99e1c2908f in ???
    at /build/glibc-LcI20x/glibc-2.31/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0
#3  0x7f99e1c7df84 in _int_malloc
    at /build/glibc-LcI20x/glibc-2.31/malloc/malloc.c:3742
#4  0x7f99e1c80298 in __GI___libc_malloc
    at /build/glibc-LcI20x/glibc-2.31/malloc/malloc.c:3066
#5  0x7f99e0e0fff9 in ???
#6  0x7f99e1b0fcaa in ???
#7  0x7f99e20ebc8c in ???
#8  0x55c17b023fa2 in ???
#9  0x55c17b023fe2 in ???
#10  0x7f99e1c0a082 in __libc_start_main
    at ../csu/libc-start.c:308
#11  0x55c17b0231cd in ???
#12  0xffffffffffffffff in ???
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 0 on node bellpc exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

I also tested other cases when the data type is double precision, but it worked well. So I wondered what's the reason for this and how could I solve this problem.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions