-
Notifications
You must be signed in to change notification settings - Fork 908
Closed
Labels
NEWSRTEIssue likely is in RTE or PMIx areasIssue likely is in RTE or PMIx areasTarget: v3.0.xTarget: v3.1.xTarget: v4.0.xbug
Description
Per #6298, we had an accidental change in behavior of mpirun --host aaa,bbb
between version v2.1.x and v3.0.x. A fix just went in to master in #6493.
Here's what happened:
- v2.0.x: behavior X
- v2.1.x: behavior X
- v3.0.x: switch to behavior Y
- v3.1.x: behavior Y
- v4.0.x: behavior Y
- master (to become v5.0.x): after PR Ensure that nodes are always used in order provided #6493, back to behavior X
The question is: should we put this fix on any of v3.0.x, v3.1.x, and/or v4.0.x?
Summary of behavior change
Behavior X
The ordering of hosts in the --host
list matters:
$ mpirun --host aaa,bbb rank_test
aaa: MCW rank 0
bbb: MCW rank 1
$ mpirun --host bbb,aaa rank_test
aaa: MCW rank 1
bbb: MCW rank 0
Behavior Y
The ordering of hosts in the --host
list does not matter (note: this behavior was unintentional. It was always intended that we honor the ordering of hosts in the --host
list):
$ mpirun --host aaa,bbb rank_test
aaa: MCW rank 0
bbb: MCW rank 1
$ mpirun --host bbb,aaa rank_test
aaa: MCW rank 0
bbb: MCW rank 1
Discussion points
We need to discuss this and decide what to do. Points (in no particular order):
- This is a fairly minor change in behavior.
- Apparently no one noticed this change in behavior between v2.1.x and v3.0.x. It was only discovered recently by @bturrubiates, a Cisco employee (while using Open MPI for other / unrelated testing).
- The fix is probably not worth putting into v3.0.x or v3.1.x.
- But it might be worthwhile to put in to v4.0.x...?
- That being said, even putting it in v4.0.x is at least sorta breaking backwards compatibility. You could squint at this and call it a bug and therefore allow it in. Or you could say that it was effectively the behavior of all the v3.x/v4.x releases, and they're backwards compatible with each other, so we should maintain that behavior in v4.0.x.
Metadata
Metadata
Assignees
Labels
NEWSRTEIssue likely is in RTE or PMIx areasIssue likely is in RTE or PMIx areasTarget: v3.0.xTarget: v3.1.xTarget: v4.0.xbug