Title: Filebased MPI Initialization Tutorial
1File-based MPI Initialization (Tutorial)
- Yonsei Univ.
- Kyung-Lang Park
- (2005.4. 23)
2Contents
- Introduction
- DUROC-based Initialization (MPICH-G2)
- File-based Initialization (MPICH-GX)
- Details of Modification
- New File Format
3What is the MPI initialization?
- A procedure performed in each process
- To understand the topology
- What rank am I?
- How many processes were in the program?
- To get information of other processes
- Protocol Type
- Hostname
- Listening port
- It is done by MPI_Init(argc, args)
- All MPI program should start from MPI_Init
4DUROC-based Initialization
- MPICH-G2 uses DUROC when initializing MPI
processes - What is the DUROC?
- Dynamically-Updated Request Online Coallocator
- Allocate a job across multiple resource managers
- What does the DUROC do?
- Request Processing (requestor-side,
inglobusrun) - Support Runtime Communication (process-side, in
cpi) - Based on NEXUS
- Each MPI process obtains topology information by
using DUROC API - DUROC_INTRA_SUBJOB_RANK()
- DUROC_INTRA_SUBJOB_SIZE()
- MPI Processes exchange information by using DUROC
API - DUROC_INTER_SUBJOB_SEND()
- DUROC_INTRA_SUBJOB_SEND()
- Before the initialization, processes do not know
about protocol information of other processes
5DUROC-based Initialization Steps
Declare variables
Mpich_globus2_debug_init()
Module activation
Build_channels()
Duroc_barrier
Create my_commworld_id
Tcp buffer size
Distributed_byte_array (from my_commworld_id To
commword_id_vector))
Check the size of MPIR_SHANDLE
Getting MyGlobusJobContact
Check G2_MAXHOSTNAMELEN
Getting rank_in_my_subjob My_subjob_size
Distributed_byte_array (from MyglobusJobContact T
o Gramjobcontactsvectors)
Get_topology()
Create_my_miproto()
Distribute_byte_array (from miproto To
miproto_vector)
Depend on DUROC API
6File-based Initialization
- Each process obtains all necessary information
from a given file - Topology information
- Protocol information
- Delete dependency of Globus toolkit
- We can run MPI program without globusrun and DUROC
7File-based Initialization Steps
Declare variables
Mpich_globus2_debug_init()
Module activation
Build_channels()
Duroc_barrier
Create my_commworld_id
Create_my_commm_world_id_vector()
Tcp buffer size
Distributed_byte_array (from my_commworld_id To
commword_id_vector))
Check the size of MPIR_SHANDLE
Getting MyGlobusJobContact
Socket_Barrier
Check G2_MAXHOSTNAMELEN
Getting rank_in_my_subjob My_subjob_size
Distributed_byte_array (from MyglobusJobContact T
o Gramjobcontactsvectors)
Get topology from the file ()
Get_topology()
Create_my_miproto() (Using a given port)
Distribute_byte_array (from miproto To
miproto_vector)
Get mi_protos_vectors_from_file()
Depend on the File
8Modified Initialization Steps
9File Format
Rank_in_my_subjob
My_subjob_size
Unique value for commworld_id
MPID_MyWorldSize
Barrier_port
nsubjobs
hostname
MPI process listen port
Front nodes hostname
0
2
4
2
33501
9292
36000
dccsaturn
dccsun.sogang.ac.kr
dccneptune
1
2
4
2
33501
9292
36000
dccsun.sogang.ac.kr
cluster203.yonsei.ac.kr
cluster203.yonsei.ac.kr
0
2
4
2
33501
9292
36000
cluster202.yonsei.ac.kr
cluster202.yonsei.ac.kr
1
2
4
2
33501
9292
36000
dccsun.sogang.ac.kr
1
0
dccsaturn
33501
12
LAN_ID_foo
0
dccsun.sogang.ac.kr
1
0
dccneptune
33501
12
LAN_ID_foo
0
cluster203.yonsei.ac.kr
1
0
cluster203.yonsei.ac.kr
33501
12
LAN_ID_foo
1
1
0
cluster202.yonsei.ac.kr
33501
12
LAN_ID_foo
1
cluster202.yonsei.ac.kr
Front nodes hostname
lan_id
localhost_id
s_tcptype
port
wan_id_length
hostname
s_nprotos
10File Format Description
- in_my_subjob means rank in ones subjob.
- my_subjob_size means size of ones subjob.
- MPID_MyWorldSize means total size of MPI job.
- nsubjobs means number of subjob.
- MPI process listen port means listening port of
MPI process, which must besame value with the
port of second part. - unique value for commworld_id means unique id to
construct COMMWORLD. - barrier_port means listening port for
synchronization between processes in COMMWORLD. - hostname means hostname of computational node
running each a MPI process. - front nodes hostname
- means hostname of front node connected with
computational node running each a MPI process, in
case with environment to use private IP
addresses. Otherwise, it means hostname of
computational node running each a MPI process.
For instance, while first line of Figure 3
describes topology information which execution
node is dccsaturn where front node is
dccsun.sogang.ac.kr and has private IP address,
third line of Figure 3 shows information which
computational node is cluster203.yonsei.ac.kr
where has public IP address.. - s_nprotos means kinds of used protocol.
- s_tcptype means type of used protocol 0 is tcp,
1 is mpi, and 2 is unknown.Currently, tcp type
could be supported. - hostname means hostname of computational node
running each a MPI process. - port means listening port of MPI process.
- lan_id_lng means length of lan_id
- lan_id means identification that node is on the
designated LAN. - localhost_id means identification that node is on
the designated intra-machine area rather than LAN
or WAN. - front nodes hostname
- means hostname of front node connected with
computational node running each a MPI process, in
case with environment to use private IP
addresses. Otherwise, it means hostname of
computational node running each a MPI process.
11Global Declaration
- define MAX_PENDING 5
- define PROXY_MSG_SIZE 128
- struct gp_guid_t my_guid
- unsigned short Proxy_Port 12269
- extern globus_bool_t g2_proxy_connect( struct
gp_guid_t dest_gp_guid, struct gp_guid_t
src_gp_guid, - unsigned short dest_port, globus_io_attr_t
attr, globus_io_handle_t handle) -
- static unsigned short user_port (unsigned
short)0 - char master_hostnameG2_MAXHOSTNAMELEN
- char front_nameG2_MAXHOSTNAMELEN
- int get_topology(char file_contents, int
MPID_MyWorldRank, int rank_in_my_subjob, int
my_subjob_size, int MPID_MyWorldSize, - int nsubjobs, unsigned short user_port, int
channel_id, unsigned short barrier_port, char
front_name) - void get_mi_protos_vector(char file_contents,
int MPID_MyWorldSize, globus_byte_t
mi_protos_vector, int mi_protos_vector_lengths,
- char master_hostname)
- void get_commworld_id(char master_hostname, int
channel_id, int MPID_MyWorldSize, globus_byte_t
my_commworld_id_vector)
12Local Declaration in globus_init()
- int i
- globus_byte_t my_miproto
- globus_byte_t mi_protos_vectors
- int mi_protos_vector_lengths
- int nbytes
- int rank_in_my_subjob
- int my_subjob_size
- int nsubjobs
- int subjob_addresses
- int rc
- file ifp
- char file_name
- Int channel_id, string_count
- Unsigned short barrier_port
- char file_contents
- Create_my_gp_guid()
Additional variables for reading the file
Obtaining the frontname and my hostname
13Create_my_gp_guid()
- void create_my_gp_guid( struct gp_guid_t
my_guid ) -
- struct gp_guid_t tmp_guid
- char front
- / Allocating my_guid /
- tmp_guid (struct gp_guid_t)globus_libc_mall
oc(sizeof(struct gp_guid_t)) - tmp_guid-gtcompute_name (char)globus_libc_ma
lloc(sizeof(char)G2_MAXHOSTNAMELEN) - / Front_name querying get_env based /
- front globus_libc_getenv("FRONT_NAME")
- if (front ! GLOBUS_NULL)
-
- strcpy(tmp_guid-gtfront_name, front)
- globus_libc_gethostname(tmp_guid-gtcompute_nam
e, G2_MAXHOSTNAMELEN) -
- else
-
14Module activation barrier
- Act GLOBUS_DUROC_RUNTIME_MODULE
- globus_duroc_runtime_barrier() --gtDeleted
- Act GLOBUS_COMMON_MODULE
- Act GLOBUS_IO_MODULE
- if (sizeof(MPIR_SHANDLE) gt globus_dc_sizeof_u_lon
g(1)) ERROR - if (G2_MAXHOSTNAMELEN lt MAXHOSTNAMELEN) ERROR
-
15File open
- File_name gobus_libc_getenv(FILENAME)
- While(!(ifp fopen(file_name, r))
-
- globus_libc_fprintf(stderr, Cannot open)
- globus_libc_usleep(1000000)
-
- Char input_c int count 0
- if(!(file_contents (char)globus_libc_malloc(5120
sizeof(char)))) -
- globus_libc_fprintf(stderr,"ERROR failed
malloc of d bytes for input_string\n",5119sizeof
(char)) - exit(1)
-
- while((input_c fgetc(ifp)) ! EOF)
-
- (file_contents count) input_c
- if(count gt 5119)
-
- globus_libc_fprintf(stderr,"fail process
information file is too big.\n") - exit(1)
16Getting basic information
- Globus_duroc_runtime_intra_subjob_rank(rank_in_my
_subjob) - Globus_duroc_runtime_intra_subjob_size(my_subjob_
size) - - These statements can be removed because
the process reads information from the file
- get_topology(rank_in_my_subjob,
my_subjob_size, subjob_addresses,
MPID_MyWorldSize, nsubjobs,
MPID_MyWorldRank) - - Subjob_addresses is dynamic information which
obtained only from the DUROC. But, if we remove
all DUROC communications, it can be removed. - Others are obtained from the file.
- string_count get_topology(file_contents,
MPID_MyWorldRank, - rank_in_my_subjob, my_subjob_size,
- MPID_MyWorldSize, nsubjobs,
- user_port, channel_id,
- barrier_port, front_name)
17Get_topology()
- int get_topology(char file_contents,int
MPID_MyWorldRank,int rank_in_my_subjob,int
my_subjob_size,int MPID_MyWorldSize,int
nsubjobs,unsigned short user_port, int
channel_id,unsigned short barrier_port,char
front_name) -
- int i, j, k
- char my_hostnameG2_MAXHOSTNAMELEN
- char p_myinfo char file_index char
s_rank_in_my_subjob5 - char s_my_subjob_size5 char
s_MPID_MyWorldSize5 char s_nsubjobs5 - char s_user_port10 char s_channel_id10 char
s_barrier_port10 - file_index file_contents
- sscanf(file_index, "s s s s s s s s s"
, s_rank_in_my_subjob, s_my_subjob_size,
s_MPID_MyWorldSize, s_nsubjobs,s_user_port,
s_channel_id, - s_barrier_port, my_hostname, front_name)
- sscanf(s_MPID_MyWorldSize, "d",
MPID_MyWorldSize) - for(j0 j lt MPID_MyWorldSize j, i0)
-
- sscanf(file_index, "d d d d u d u s s"
,rank_in_my_subjob, my_subjob_size,MPID_MyWorldSiz
e, nsubjobs,user_port, channel_id, barrier_port,
my_hostname, front_name) - if((0 strcmp(my_hostname, my_guid-gtcompute_na
me)) (0 strcmp(front_name,
my_guid-gtfront_name))) -
- MPID_MyWorldRank j
Finding my info
18Create_my_miproto
- Getting TCP information
- Hostname
- Struct in_addr net_addr,net_mask,if_addr
- Globus_io_tcp_create_listener(port,...) //
passive socket ready - Port number can be read from the file
-
- Create char my_mi_proto
- S_tcptype
- hostname
- globus_lan_id ? GLOBUS_LAN_ID
- globus_wan_id ? GLOBUS_WAN_ID
- Local_host_id ? GLOBUS_DUROC_SUBJOB_INDEX
- Can be removed because it will be made from the
file
19distribute_byte_array(my_mi_proto,
mi_protos_vector)
- Exchange my_mi_proto with other processes using
DUROC API - Exchanged my_mi_proto are gathered into
mi_protos_vector - These statements can be removed because
mi_protos_vector are obtained from the file
20Get_mi_protos_vector()
- Void get_mi_protos_vector(char file_contents,
int MPID_MyWorldSize, globus_byte_t
mi_protos_vector, - int mi_protos_vector_lengths, char
master_hostname) -
- int count 0, i 0
- char temp file_contents
- while(temp)
-
- if(tempcount ! '\n')
- else
-
- mi_protos_vector_lengthsi count
- mi_protos_vectori (globus_byte_t )
globus_libc_malloc(mi_protos_vector_lengthsi) - memcpy((mi_protos_vector i),temp,
mi_protos_vector_lengthsi) -
- if(i 0)
- sscanf(temp 4, "s", master_hostname)
- if(i ! MPID_MyWorldSize)
- temp temp count
- else
21Build_channels
- Build channel structure using mi_protos_vector
- ? Build_channel_with_gp_guid()
- Most are same, but it add guid in channel_table
- g_malloc(tp-gtgp_guid, struct gp_guid_t ,
sizeof(struct gp_guid_t)) - sscanf(cp, "s", tp-gtgp_guid-gtfront_name)
- tp-gtgp_guid-gtcompute_name tp-gthostname
- tp-gtgp_guid-gtprocess_id channel_id
22Create my_commworld_id dist..
- Create my_commworld_id (root only)
- hostname globus_libc_get_pid()
- Read my_commworld_id from the file.
- We use assigned number instead of dynamic PID.
- Distribute_byte_array(my_commworld_id)
- Distribute my_commworld_id to other processes
- Above two statements can be removed. (we read
it from the file) - Insert channel structure into CommWorldChannelsTa
ble0 with my_commworld_id
23MyGlobusGramJobContact and dist..
- MyGlobusGramJobContact ? getenv(GLOBUS_GRAM_JOB_CO
NTACT) - Distribute_byte_array(..)
- Exchange MyGlobusGramJobcontact with other
processes - MyGlobusGramJobcontact is dynamic information
which is made by the globus-job-manager in globus
2.X environments, so that it cant be obtained
from the file. - However, I think it is not essential information,
so that it can be deleted
24Globus_barrier_with_proxy()
- We move barrier from the head to the last
- Barrier also should go through the proxy
- So, namul changed the yonsei-barrier to
proxy-barrier
25Modifying the format
Mandatory information for each process Dont
change
Shared informationMove it to the first line
Mandatory information for each process Dont
change
dccsaturn.sogang.ac.kr
0
2
4
2
33501
9292
36000
dccsun.sogang.ac.kr
dccneptune.sognag.ac.kr
1
2
4
2
33501
9292
36000
dccsun.sogang.ac.kr
cluster203.yonsei.ac.kr
cluster203.yonsei.ac.kr
0
2
4
2
33501
9292
36000
1
2
4
2
33501
9292
36000
cluster202.yonsei.ac.kr
cluster202.yonsei.ac.kr
Give global rank for ease-of-understand
1
0
dccsaturn
33501
12
LAN_ID_foo
0
dccsun.sogang.ac.kr
dccsun.sogang.ac.kr
1
0
dccneptune
33501
12
LAN_ID_foo
0
cluster203.yonsei.ac.kr
1
0
cluster203
33501
12
LAN_ID_foo
1
1
0
cluster202
33501
12
LAN_ID_foo
1
cluster202.yonsei.ac.kr
Duplicated InformationDelete
Duplicated Information Delete
Mandatory information Move it to upper side
Optional Information. Read from environmental
variables
Constant Delete
26New File Format
This is Init file example (You can use for
commenting something
Shared information (MyWorldSize, nsubjobs,
unique value)
4
2
9292
Protocol information of each process
2
0
0
33501
36000
dccsaturn.sogang.ac.kr
dccsun.sogang.ac.kr
2
1
1
33501
36000
dccneptune.sognag.ac.kr
dccsun.sogang.ac.kr
2
0
33501
36000
2
cluster203.yonsei.ac.kr
cluster203.yonsei.ac.kr
1
2
33501
36000
3
cluster202.yonsei.ac.kr
cluster202.yonsei.ac.kr
hostname
Global_rank
Barrier port
Listen port
Front hostname
My_subjob_size
Rank_in_my_subjob
27(No Transcript)