Skip to content

coll/base: do not drop const qualifier #8207

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 14, 2020

Conversation

ggouaillardet
Copy link
Contributor

MPI_Ialltoallw() and friends take a const MPI_Datatype types[] argument.
In order to be able to call OBJ_RELEASE(types[0]), we used to simply
drop the const modifier. This change make it right by introducing the
OBJ_RELEASE_NO_NULLIFY(object) macro that no more set object = NULL
if the object is freed.

Signed-off-by: Gilles Gouaillardet [email protected]

@ggouaillardet
Copy link
Contributor Author

Refs #8023

@devreal
Copy link
Contributor

devreal commented Nov 13, 2020

I wonder if this is necessary. Section 5.12 of the 3.1 standard says:

Once initiated, all associated send buffers and buffers associated with input arguments (such
as arrays of counts, displacements, or datatypes in the vector versions of the collectives)
should not be modified, and all associated receive buffers should not be accessed, until the
collective operation completes.

That means I am not allowed to free any of the stypes I passed to MPI_Ialltoallw before the operation has completed, right? So OMPI can safely call OBJ_RELEASE on them because they will not be free'd at that moment.

@ibm-ompi
Copy link

The IBM CI (GNU/Scale) build failed! Please review the log, linked below.

Gist: https://gist.github.com/a5c4721b6bd9b002524e1dccd0e32d7e

@ibm-ompi
Copy link

The IBM CI (XL) build failed! Please review the log, linked below.

Gist: https://gist.github.com/5817c5136e27c85dd74b4ec9e215f303

@ibm-ompi
Copy link

The IBM CI (PGI) build failed! Please review the log, linked below.

Gist: https://gist.github.com/91511d778be878707a27ac11f1fbbf6b

@ggouaillardet
Copy link
Contributor Author

Thanks for the pointer!

so in the case of MPI_Ialltoallw(), does it mean that we cannot MPI_Type_free(i&stypes[0]) but we can free(stypes) ?

MPI_Ialltoallw() and friends take a const MPI_Datatype types[] argument.
In order to be able to call OBJ_RELEASE(types[0]), we used to simply
drop the const modifier. This change make it right by introducing the
OBJ_RELEASE_NO_NULLIFY(object) macro that no more set object = NULL
if the object is freed.

Signed-off-by: Gilles Gouaillardet <[email protected]>
@ggouaillardet ggouaillardet force-pushed the topic/retain_datatypes_w branch from 5b01618 to c49e5e5 Compare November 13, 2020 08:31
@devreal
Copy link
Contributor

devreal commented Nov 13, 2020

My reading of the standard is that between calling MPI_I*w and completion of the request, the user is not allowed to modify any of the buffers or their contents, so no MPI_Type_free and no free before completion. I wonder if it is really necessary to OBJ_RETAIN the types after all if my reading is correct (I'm just getting started with collectives so forgive me if I am missing something here).

@ggouaillardet
Copy link
Contributor Author

Well, in that case do not need to retain anything at all in the case of MPI_I*w(), and that will make everything much easier!
Do we even need to retain datatypes in the case of MPI_Ialltoallv() ?
if no, what about MPI_Ialltoall() and all other collectives?

@ggouaillardet
Copy link
Contributor Author

FWIW, I remember we had to retain datatype(s) because some test cases from other test suite did not pass.
I am now wondering whether the tests were just wrong ... or if only MPI_I*alltoallw() did not have to be changed...

@devreal
Copy link
Contributor

devreal commented Nov 13, 2020

My interpretation is that this only applies to buffers, not single datatypes. So it does apply to the MPI_I*w variants but not to MPI_I*v or MPI_I* collectives, where datatypes are passed by value. For the latter two, the user is free to MPI_Type_free the type once the operation was initiated (just as in MPI_Send etc)

@ggouaillardet
Copy link
Contributor Author

that's fine with me, I will take care of that.

just to be on the safe side @bosilca @jsquyres can you please confirm we have the correct interpretation of the standard here?

@ggouaillardet
Copy link
Contributor Author

Likely to be replaced with #8208 if the interpretation of the standard is confirmed.

@bosilca
Copy link
Member

bosilca commented Nov 13, 2020

I think the meaning of the paragraph in the MPI standard is to simply allow the implementation to use the user pointer to track the different objects, instead of being required to allocate their own storage. However, this does not mean the user is not allowed to free some the datatypes. This can be done for as long as the arrays passed as arguments are not modified. In this regard the advice to implementors in 5.1.9 in MPI 4.0 is quite clear:

The implementation may keep a reference count of active communications that use the datatype, in order to decide when to free it. Also, one may implement constructors of derived datatypes so that they keep pointers to their datatype arguments, rather than copying them. In this case, one needs to keep track of active datatype definition references in order to know when a datatype object can be freed.

Thus, I think we need to change the refcount of the datatypes (and all other MPI objects), but we should not change the array, which means the use of the newly added macro OBJ_RELEASE_NO_NULLIFY is the correct path forward (based on my understanding of the MPI standard).

@jsquyres
Copy link
Member

I agree with @bosilca. Let's merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants