Skip to content

Commit d523c75

Browse files
committed
Adding -mca comm_method to print table of communication methods
This is closely related to Platform-MPI's old -prot feature. The long-format of the tables it prints could look like this: > Host 0 [myhost001] ranks 0 - 1 > Host 1 [myhost002] ranks 2 - 3 > Host 2 [myhost003] ranks 4 > Host 3 [myhost004] ranks 5 > Host 4 [myhost005] ranks 6 > Host 5 [myhost006] ranks 7 > Host 6 [myhost007] ranks 8 > Host 7 [myhost008] ranks 9 > Host 8 [myhost009] ranks 10 > > host | 0 1 2 3 4 5 6 7 8 > ======|============================================== > 0 : sm tcp tcp tcp tcp tcp tcp tcp tcp > 1 : tcp sm tcp tcp tcp tcp tcp tcp tcp > 2 : tcp tcp self tcp tcp tcp tcp tcp tcp > 3 : tcp tcp tcp self tcp tcp tcp tcp tcp > 4 : tcp tcp tcp tcp self tcp tcp tcp tcp > 5 : tcp tcp tcp tcp tcp self tcp tcp tcp > 6 : tcp tcp tcp tcp tcp tcp self tcp tcp > 7 : tcp tcp tcp tcp tcp tcp tcp self tcp > 8 : tcp tcp tcp tcp tcp tcp tcp tcp self > > Connection summary: > on-host: all connections are sm or self > off-host: all connections are tcp In this example hosts 0 and 1 had multiple ranks so "sm" was more meaningful than "self" to identify how the ranks on the host are talking to each other. While host 2..8 were one rank per host so "self" was more meaningful as their btl. Above a certain number of hosts (12 by default) the above table gets too big so we shrink to a more abbreviated looking table that has the same data: > host | 0 1 2 3 4 8 > ======|==================== > 0 : A C C C C C C C C > 1 : C A C C C C C C C > 2 : C C B C C C C C C > 3 : C C C B C C C C C > 4 : C C C C B C C C C > 5 : C C C C C B C C C > 6 : C C C C C C B C C > 7 : C C C C C C C B C > 8 : C C C C C C C C B > key: A == sm > key: B == self > key: C == tcp Then above 36 hosts we stop printing the 2d table entirely and just print the summary: > Connection summary: > on-host: all connections are sm or self > off-host: all connections are tcp The options to control it are -mca comm_method 1 : print the above table at the end of MPI_Init -mca comm_method 2 : print the above table at the beginning of MPI_Finalize The most important difference between these two is that when printing the table during MPI_Init(), we send extra messages to make sure all hosts are connected to each other. So the table ends up working against the idea of on-demand connections (although it's only forcing the n^2 connections in the number of hosts, not the total ranks). If printing at MPI_Finalize() we don't create any connections that aren't already connected, so the table is more likely to have "n/a" entries if some hosts never connected to each other. The other tunable is a simple environment variable MPI_COMM_METHOD_MAX that defaults to 12 that controls at what host-count the unabbreviated / abbreviated 2d tables get printed: 1 - n : full size 2d table n+1 - 3n : shortened 2d table 3n+1 - inf : summary, no 2d table The source of the information used in the table is the .mca_component_name In the case of BTLs, the module always had a .btl_component linking back to the component. This adds a similar field for .pml_component and .mtl_component to those modules. Note, when setting the .pml_component field I noticed nobody was setting .pml_flags, so I added a 0 for setting that field as well. So with the new field linking back to the component, we can then access the component name with code like this mca_pml.pml_component->pmlm_version.mca_component_name See the three lookup_{pml,mtl,btl}_name() functions in hook_comm_method_fns.c, and their use in comm_method() to parse the strings and produce an integer to represent the connection type being used. Signed-off-by: Mark Allen <[email protected]>
1 parent f614438 commit d523c75

File tree

21 files changed

+1093
-13
lines changed

21 files changed

+1093
-13
lines changed

ompi/mca/hook/comm_method/Makefile.am

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
#
2+
# Copyright (c) 2018 IBM Corporation. All rights reserved.
3+
# $COPYRIGHT$
4+
#
5+
# Additional copyrights may follow
6+
#
7+
# $HEADER$
8+
#
9+
10+
sources = \
11+
hook_comm_method.h \
12+
hook_comm_method_component.c \
13+
hook_comm_method_fns.c
14+
15+
# This component will only ever be built statically -- never as a DSO.
16+
17+
noinst_LTLIBRARIES = libmca_hook_comm_method.la
18+
19+
libmca_hook_comm_method_la_SOURCES = $(sources)
20+
libmca_hook_comm_method_la_LDFLAGS = -module -avoid-version
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
#
2+
# Copyright (c) 2018 IBM Corporation. All rights reserved.
3+
#
4+
# $COPYRIGHT$
5+
#
6+
# Additional copyrights may follow
7+
#
8+
# $HEADER$
9+
#
10+
11+
# Make this a static component
12+
AC_DEFUN([MCA_ompi_hook_comm_method_COMPILE_MODE], [
13+
AC_MSG_CHECKING([for MCA component $2:$3 compile mode])
14+
$4="static"
15+
AC_MSG_RESULT([$$4])
16+
])
17+
18+
# MCA_hook_comm_method_CONFIG([action-if-can-compile],
19+
# [action-if-cant-compile])
20+
# ------------------------------------------------
21+
AC_DEFUN([MCA_ompi_hook_comm_method_CONFIG],[
22+
AC_CONFIG_FILES([ompi/mca/hook/comm_method/Makefile])
23+
24+
$1
25+
])
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
/*
2+
* Copyright (c) 2016-2018 IBM Corporation. All rights reserved.
3+
* $COPYRIGHT$
4+
*
5+
* Additional copyrights may follow
6+
*
7+
* $HEADER$
8+
*/
9+
#ifndef MCA_HOOK_COMM_METHOD_H
10+
#define MCA_HOOK_COMM_METHOD_H
11+
12+
#include "ompi_config.h"
13+
14+
#include "ompi/constants.h"
15+
16+
#include "ompi/mca/hook/hook.h"
17+
#include "ompi/mca/hook/base/base.h"
18+
19+
BEGIN_C_DECLS
20+
21+
OMPI_MODULE_DECLSPEC extern const ompi_hook_base_component_1_0_0_t mca_hook_comm_method_component;
22+
23+
extern int mca_hook_comm_method_verbose;
24+
extern int mca_hook_comm_method_output;
25+
extern bool hook_comm_method_enable_mpi_init;
26+
extern bool hook_comm_method_enable_mpi_finalize;
27+
28+
void ompi_hook_comm_method_mpi_init_bottom(int argc, char **argv, int requested, int *provided);
29+
30+
void ompi_hook_comm_method_mpi_finalize_top(void);
31+
32+
END_C_DECLS
33+
34+
#endif /* MCA_HOOK_COMM_METHOD_H */
Lines changed: 150 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,150 @@
1+
/*
2+
* Copyright (c) 2016-2018 IBM Corporation. All rights reserved.
3+
* $COPYRIGHT$
4+
*
5+
* Additional copyrights may follow
6+
*
7+
* $HEADER$
8+
*/
9+
10+
#include "ompi_config.h"
11+
12+
#include "hook_comm_method.h"
13+
14+
static int ompi_hook_comm_method_component_open(void);
15+
static int ompi_hook_comm_method_component_close(void);
16+
static int ompi_hook_comm_method_component_register(void);
17+
18+
/*
19+
* Public string showing the component version number
20+
*/
21+
const char *mca_hook_comm_method_component_version_string =
22+
"Open MPI 'comm_method' hook MCA component version " OMPI_VERSION;
23+
24+
/*
25+
* Instantiate the public struct with all of our public information
26+
* and pointers to our public functions in it
27+
*/
28+
const ompi_hook_base_component_1_0_0_t mca_hook_comm_method_component = {
29+
30+
/* First, the mca_component_t struct containing meta information
31+
* about the component itself */
32+
.hookm_version = {
33+
OMPI_HOOK_BASE_VERSION_1_0_0,
34+
35+
/* Component name and version */
36+
.mca_component_name = "comm_method",
37+
MCA_BASE_MAKE_VERSION(component, OMPI_MAJOR_VERSION, OMPI_MINOR_VERSION,
38+
OMPI_RELEASE_VERSION),
39+
40+
/* Component open and close functions */
41+
.mca_open_component = ompi_hook_comm_method_component_open,
42+
.mca_close_component = ompi_hook_comm_method_component_close,
43+
.mca_register_component_params = ompi_hook_comm_method_component_register,
44+
45+
// Force this component to always be considered - component must be static
46+
//.mca_component_flags = MCA_BASE_COMPONENT_FLAG_ALWAYS_CONSIDER,
47+
},
48+
.hookm_data = {
49+
/* The component is checkpoint ready */
50+
MCA_BASE_METADATA_PARAM_CHECKPOINT
51+
},
52+
53+
/* Component functions */
54+
.hookm_mpi_initialized_top = NULL,
55+
.hookm_mpi_initialized_bottom = NULL,
56+
57+
.hookm_mpi_finalized_top = NULL,
58+
.hookm_mpi_finalized_bottom = NULL,
59+
60+
.hookm_mpi_init_top = NULL,
61+
.hookm_mpi_init_top_post_opal = NULL,
62+
.hookm_mpi_init_bottom = ompi_hook_comm_method_mpi_init_bottom,
63+
.hookm_mpi_init_error = NULL,
64+
65+
.hookm_mpi_finalize_top = ompi_hook_comm_method_mpi_finalize_top,
66+
.hookm_mpi_finalize_bottom = NULL,
67+
};
68+
69+
int mca_hook_comm_method_verbose = 0;
70+
int mca_hook_comm_method_output = -1;
71+
bool hook_comm_method_enable_mpi_init = false;
72+
bool hook_comm_method_enable_mpi_finalize = false;
73+
74+
static int ompi_hook_comm_method_component_open(void)
75+
{
76+
// Nothing to do
77+
return OMPI_SUCCESS;
78+
}
79+
80+
static int ompi_hook_comm_method_component_close(void)
81+
{
82+
// Nothing to do
83+
return OMPI_SUCCESS;
84+
}
85+
86+
static int ompi_hook_comm_method_component_register(void)
87+
{
88+
89+
/*
90+
* Component verbosity level
91+
*/
92+
// Inherit the verbosity of the base framework, but also allow this to be overridden
93+
if( ompi_hook_base_framework.framework_verbose > MCA_BASE_VERBOSE_NONE ) {
94+
mca_hook_comm_method_verbose = ompi_hook_base_framework.framework_verbose;
95+
}
96+
else {
97+
mca_hook_comm_method_verbose = MCA_BASE_VERBOSE_NONE;
98+
}
99+
(void) mca_base_component_var_register(&mca_hook_comm_method_component.hookm_version, "verbose",
100+
NULL,
101+
MCA_BASE_VAR_TYPE_INT, NULL,
102+
0, 0,
103+
OPAL_INFO_LVL_9,
104+
MCA_BASE_VAR_SCOPE_READONLY,
105+
&mca_hook_comm_method_verbose);
106+
107+
mca_hook_comm_method_output = opal_output_open(NULL);
108+
opal_output_set_verbosity(mca_hook_comm_method_output, mca_hook_comm_method_verbose);
109+
110+
/*
111+
* If the component is active for mpi_init / mpi_finalize
112+
*/
113+
hook_comm_method_enable_mpi_init = false;
114+
(void) mca_base_component_var_register(&mca_hook_comm_method_component.hookm_version, "enable_mpi_init",
115+
"Enable comm_method behavior on mpi_init",
116+
MCA_BASE_VAR_TYPE_BOOL, NULL,
117+
0, 0,
118+
OPAL_INFO_LVL_3,
119+
MCA_BASE_VAR_SCOPE_READONLY,
120+
&hook_comm_method_enable_mpi_init);
121+
122+
hook_comm_method_enable_mpi_finalize = false;
123+
(void) mca_base_component_var_register(&mca_hook_comm_method_component.hookm_version, "enable_mpi_finalize",
124+
"Enable comm_method behavior on mpi_finalize",
125+
MCA_BASE_VAR_TYPE_BOOL, NULL,
126+
0, 0,
127+
OPAL_INFO_LVL_3,
128+
MCA_BASE_VAR_SCOPE_READONLY,
129+
&hook_comm_method_enable_mpi_finalize);
130+
131+
// User can set the comm_method mca variable too
132+
int hook_comm_method = -1;
133+
(void) mca_base_var_register("ompi", NULL, NULL, "comm_method",
134+
"Enable comm_method behavior (1) mpi_init or (2) mpi_finalize",
135+
MCA_BASE_VAR_TYPE_INT, NULL,
136+
0, 0,
137+
OPAL_INFO_LVL_3,
138+
MCA_BASE_VAR_SCOPE_READONLY,
139+
&hook_comm_method);
140+
141+
if( 1 == hook_comm_method ) {
142+
hook_comm_method_enable_mpi_init = true;
143+
}
144+
else if( 2 == hook_comm_method ) {
145+
hook_comm_method_enable_mpi_finalize = true;
146+
}
147+
148+
return OMPI_SUCCESS;
149+
}
150+

0 commit comments

Comments
 (0)