Skip to content

v2.1.0: usnic cagent: correctly compute the "large" ping message size #2145

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jsquyres
Copy link
Member

@jsquyres jsquyres commented Oct 1, 2016

The (effective) "+42" computation was, in fact, the incorrect answer in this case (gasp!).

We should just take the max_msg_size from the command (which came from the libfabric endpoint max_msg_size attribute in the client) and subtract off the max header size: 68 (which is explained in the comment). This will result in a "large" message size which is likely slightly smaller than the MTU, but still right up near the MTU, and therefore good enough.

Note: the old computation (i.e., -(68-42)) worked fine when we asked for Libfabric API v1.1 because the usnic provider would return a max_msg_size that was already less than the MTU due to FI_PREFIX behavior shenanigans. Once we started asking for Libfabric API v1.4, the usnic Libfabric provider started returning (MTU + prefix_size), and the -(68-42) computation started giving a value that was over the MTU. This caused sendto() on the connectivity checker UDP socket to fail.

This commit also removes an old/misleading comment.

Signed-off-by: Jeff Squyres [email protected]

(cherry picked from commit dfc72e4)

@bturrubiates please review

The (effective) "+42" computation was, in fact, the incorrect answer
in this case (gasp!).

We should just take the max_msg_size from the command (which came from
the libfabric endpoint max_msg_size attribute in the client) and
subtract off the max header size: 68 (which is explained in the
comment).  This will result in a "large" message size which is likely
slightly smaller than the MTU, but still right up near the MTU, and
therefore good enough.

Also remove an old/misleading comment.

Signed-off-by: Jeff Squyres <[email protected]>
(cherry picked from commit dfc72e4)
@jsquyres jsquyres added the bug label Oct 1, 2016
@jsquyres jsquyres added this to the v2.1.0 milestone Oct 1, 2016
@jsquyres
Copy link
Member Author

jsquyres commented Oct 1, 2016

@hppritcha usNIC will not function with Libfabric v1.4 without this commit.

@jsquyres
Copy link
Member Author

jsquyres commented Oct 4, 2016

@bturrubiates actually approved this, but he's apparently not a committer on this repo. I'll approve the review (since I am a committer) so that Github allows the merge.

@jsquyres jsquyres merged commit f9a68cd into open-mpi:v2.x Oct 4, 2016
@jsquyres jsquyres deleted the pr/v2.1.0/usnic-fix-cagent-max-msg-size branch October 4, 2016 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants