Skip to content

MacOS 12.3.1 on M1 sysconf(_SC_OPEN_MAX) is returning an error #10358

@jsquyres

Description

@jsquyres

This issue started with this post on the users list: https://www.mail-archive.com/[email protected]/msg34892.html

I Zoom'ed with Scott, and we dug into this a bit. The root cause of the problem appears to be the do_child() function in the default ODLS module. Specifically:

long fd, fdmax = sysconf(_SC_OPEN_MAX);

This line is calling sysconf() to determine how many FD's to close in the child process that was just forked (before the exec).

On Scott's system, the value returned from this call is -1, which gets interpreted as long_max (i.e., in the billions). His system appears to hang, but it's not hung -- it's really just that the child is looping billions of times calling close() in this loop:

for(fd=3; fd<fdmax; fd++) {
if (
#if OPAL_PMIX_V1
fd != cd->opts.p_internal[1] &&
#endif
fd != write_fd) {
close(fd);
}

Scott is running:

Screen Shot 2022-05-05 at 3 12 19 PM copy

Screen Shot 2022-05-05 at 3 14 09 PM

We interactively added an opal_output() in the ODLS default component and saw:

  • The value returned by sysconf() is -1, which is interpreted as long_max (something in the billions).
  • After the call, errno was set to 22/Invalid argument (although I neglected to set errno to 0 before the call to sysconf(), so I'm not 100% sure that that errno value is from that call to sysconf())
  • The value of _SC_OPEN_MAX is 5 (which is the same as it is on my Intel MacOS 12.3.1 machine)

This is happening on Open MPI 4.1.x, but since this code hasn't changed in forever, I suspect it's happening on all versions of Open MPI / PRTE.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions