-
Notifications
You must be signed in to change notification settings - Fork 908
osc/sm: Fix a bus error on MPI_WIN_{POST,START}. #1259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@miked-mellanox could you check why the |
@kawashima-fj @hjelmn wouldn't it be easier to have the posts at the beginning ? diff --git a/ompi/mca/osc/sm/osc_sm_component.c b/ompi/mca/osc/sm/osc_sm_component.c
index 2e2948e..91da943 100644
--- a/ompi/mca/osc/sm/osc_sm_component.c
+++ b/ompi/mca/osc/sm/osc_sm_component.c
@@ -280,9 +280,9 @@ component_select(struct ompi_win_t *win, void **base, size_t size, int disp_unit
module->posts = calloc (comm_size, sizeof (module->posts[0]));
if (NULL == module->posts) return OMPI_ERR_TEMP_OUT_OF_RESOURCE;
- module->global_state = (ompi_osc_sm_global_state_t *) (module->segment_base);
+ module->posts[0] = (uint64_t *) (module->segment_base);
+ module->global_state = (ompi_osc_sm_global_state_t *) (module->posts + comm_size * post_size);
module->node_states = (ompi_osc_sm_node_state_t *) (module->global_state + 1);
- module->posts[0] = (uint64_t *) (module->node_states + comm_size);
for (i = 0, total = state_size + posts_size ; i < comm_size ; ++i) {
if (i > 0) { |
Yes, it will be good. |
In Gilles' patch, diff --git a/ompi/mca/osc/sm/osc_sm_component.c b/ompi/mca/osc/sm/osc_sm_component.c
index 2e2948e..91da943 100644
--- a/ompi/mca/osc/sm/osc_sm_component.c
+++ b/ompi/mca/osc/sm/osc_sm_component.c
@@ -280,9 +280,9 @@ component_select(struct ompi_win_t *win, void **base, size_t size, int disp_unit
module->posts = calloc (comm_size, sizeof (module->posts[0]));
if (NULL == module->posts) return OMPI_ERR_TEMP_OUT_OF_RESOURCE;
- module->global_state = (ompi_osc_sm_global_state_t *) (module->segment_base);
+ module->posts[0] = (uint64_t *) (module->segment_base);
+ module->global_state = (ompi_osc_sm_global_state_t *) (module->posts[0] + comm_size * post_size);
module->node_states = (ompi_osc_sm_node_state_t *) (module->global_state + 1);
- module->posts[0] = (uint64_t *) (module->node_states + comm_size);
for (i = 0, total = state_size + posts_size ; i < comm_size ; ++i) {
if (i > 0) { |
@hjelmn Can you have a look at this? |
:bot:retest: |
Hmm, either I can commit the corrected patch or you can update this PR with that patch. |
A bus error occurs in sm OSC under the following conditions. - sparc64 or any other architectures which need strict alignment. - `MPI_WIN_POST` or `MPI_WIN_START` is called for a window created by sm OSC. - The communicator size is odd and greater than 3. The lines 283-285 in current `ompi/mca/osc/sm/osc_sm_component.c` has the following code. ```c module->global_state = (ompi_osc_sm_global_state_t *) (module->segment_base); module->node_states = (ompi_osc_sm_node_state_t *) (module->global_state + 1); module->posts[0] = (uint64_t *) (module->node_states + comm_size); ``` The size of `ompi_osc_sm_node_state_t` is multiples of 4 but not multiples of 8. So if `comm_size` is odd, `module->posts[0]` does not aligned to 8. This causes a bus error when accessing `module->posts[i][j]`. This patch fixes the alignment of `module->posts[0]` by setting `module->posts[0]` first.
50bc645
to
ad26899
Compare
@hjelmn Please review again. I pushed a new commit based on Gilles' patch. |
@hjelmn Can you have a look at this? |
@hjelmn what is the status of this PR ? |
Need to retest. If it comes back clean will merge and PR to 2.x. :bot:retest: |
osc/sm: Fix a bus error on MPI_WIN_{POST,START}.
…mmon-symbols-if-git Makefile.am: only check for common symbols on dev builds
A bus error occurs in sm OSC under the following conditions.
MPI_WIN_POST
orMPI_WIN_START
is called for a window created by sm OSC.The lines 283-285 in current
ompi/mca/osc/sm/osc_sm_component.c
has the following code.The size of
ompi_osc_sm_node_state_t
is multiples of 4 but not multiples of 8. So ifcomm_size
is odd,module->posts[0]
does not aligned to 8. This causes a bus error when accessingmodule->posts[i][j]
.This patch fixes the alignment of
module->posts[0]
by inserting a padding betweennode_states
andmodule->posts[0]
.@hjelmn Please review.
I don't know whether this is the best fix method.
If you have a better method, tell me or create a new PR please.
v2.x and v1.10 branches have the same problem. I'll create PRs for them if this PR is accepted.