MPICH2 : MPI_Init, Send, Recv - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

MPICH2 : MPI_Init, Send, Recv

Description:

Sogang University Distributed Computing & Communication Lab. 3. Internal routine name ... Attribute. Attribute variable. Attribute function ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 27
Provided by: adore
Category:

less

Transcript and Presenter's Notes

Title: MPICH2 : MPI_Init, Send, Recv


1
MPICH2 MPI_Init, Send, Recv
  • Oct 7, 2005
  • Sogang University
  • Distributed Computing Communication Laboratory
  • Eunseok, Kim

2
Outline
  • Internal routine naming
  • MPI_Init ()
  • MPI_Send()
  • MPI_Recv()
  • Conclusion

3
Internal routine name
  • MPI
  • MPI implementation
  • MPIR
  • Routines used only within the MPI implementation
  • Outside of the ADI
  • MPID
  • Routines either defined in the ADI or used within
    the ADI
  • MPIU
  • Routines that are defined in the util directory
    and may be used by either the ADI or the
    implementation of the MPI routines

4
Concept of Initialization
  • Lazy initialization
  • Each module initialize itself
  • Reduce
  • Executable size
  • Link time
  • Enabled by Weak symbol option
  • Process info comes from PM (Process Manager)
  • Via PMI (Process Management Interface)
  • Most info set through EV
  • MPD, Gforker, Smpd
  • By default, MPD is used to launch mpi job

5
Starting MPI jobs using MPD
3. Pass the XML doc
MPI_EXEC
MPD
1. Read machine file
PMI

2. Generate XML
4. PMI_Init
Machine file
P0
Pn
. . . . . . . . . . .
6
MPI_Init()
MPI_Init(int argc, )
MPIR_Init_thread( int argc..)
MPIR_Err_init( int argc..)
MPIR_Wtime_init( int argc..)
MPIR_Datatype_init( int argc..)
MPIR_Nest_init( int argc..)
MPID_Init ( int argc..)
MPIDI_CH3_Init( int argc..)
End of MPI_Init
7
Global Object
  • MPIR_Process
  • MPIDI_Process
  • MPIDI_CH3_Process
  • Etc
  • Each module has its own global variable
  • Init finalized within the module

8
MPIR_Process
  • Comm
  • Comm_world
  • Comm_self
  • Comm_parent (spawn)
  • Group
  • Remote / local
  • Comm_kind
  • Intra / inter
  • VCRT (Virtual Connection Reference Table)
  • Thread
  • Condition variables
  • Attribute
  • Attribute variable
  • Attribute function

9
MPIDI_Process
  • Receive queue
  • Posted
  • Unexpected
  • Process group id rank
  • Condition variable

10
MPIDI_CH3I_Process
  • Parent port number
  • Accept queue
  • Condition variable

11
MPI_Init()
  • Declare nest level value
  • Ex) MPID_MPI_INIT_STATE_DECL(MPID_STATE_MPI_INIT)
  • Init critical section
  • Thread mutex creation
  • Call MPIR_Init_thread

12
MPIR_Init_thread()
  • Address alignment issues
  • Ex) HAVE_WINDOWNS_H, _WIN64
  • Setup MPIR_Process object
  • attrs
  • Init comm (null)
  • Call all MPID_Init type functions
  • Ex) MPIR_Err_init(), MPIR_Datatype_init()
  • Determine Several policy
  • Ex) allowing the device to select an alternative
    function for some function
  • Call MPID_Init (Ch3)

13
MPID_Init()
  • Init receive queues
  • Set processor name and other configurations
  • By system call and environmental variable
  • Call MPIDI_CH3_Init
  • Assign Process group id rank to MPIDI_Process
  • Cf) rank in comm_world
  • Init MPI_COMM_(WORLD / SELF) object
  • Creating VCRT
  • Get parent or parent port
  • Determine who my parent is
  • Others or no parent
  • Synchronization through MPIR_Bcast

14
MPIDI_CH3_Init()
  • PMI_Init
  • Port, Host name, Fd (socket)
  • Ex) readline(Fd, ), writeline(Fd, ) or from E.V
  • PMI_Get_(rank/id)
  • Pg_rank, Pg_id
  • MPIDI_CH3I_Progress_init
  • Establish non-blocking listener
  • From EV

15
MPIDI_CH3_Init ()
  • Setup Process group
  • vct
  • KVS (Key-Value Space)
  • Contains information about processes
  • Each process publishes own Business card and
    submit
  • Contact info
  • PMI_Barrier()
  • Synchronization command barrier_in,
    barrier_out
  • Through blocking i/o

16
An Example CH3 Implementation over TCP
  • Dashed lines separate four communication types

17
MPI_Send()
MPI_Send(int argc, )
My_rank dest_rank
MPID_Send ( int argc..)
N
Y
Pkt_size gt PKT_MAX_LEN
MPIDI_Isend_self()
N
Contiguous?
Contiguous?
Y
Y
N
MPIDI_CH3_iStartMsgv()
MPID_Segment_Init ()
MPID_Segment_Init ()
MPIDI_CH3_iStartRndvMsg()
MPIDI_CH3_iSendv ()
MPIDI_CH3_iStartMsgv()
MPID_Request_Release ()
End of MPI_Send
18
MPI_Send()
  • Validate handle params
  • MPIR_ERRTEST_COMM()
  • MPID_Comm_valid_ptr(), etc
  • Call MPID_Send()
  • If all data were sent
  • Return NULL
  • Otherwise
  • Return ptr of request
  • This function is block
  • Until the request is complete

19
MPID_Send ()
  • If the pkt can be sent directly
  • If data is contiguous
  • A single iov used
  • By MPIDI_CH3_iStartMsgv()
  • Otherwise
  • Segmentation
  • MPID_Segment_init()
  • MPIDI_CH3U_Request_load_send_iov()
  • MPIDI_CH3_iSendv()
  • Pkts which not sent are queued in the vc

20
MPID_Send ()
  • Else
  • The pkt should be sent by rendezvous
  • If data is contiguous
  • A single iov used
  • Otherwise
  • Segmentation
  • MPID_Segment_init()
  • MPIDI_CH3U_Request_load_send_iov()
  • First send rndv msg
  • MPIDI_CH3_iStartRndvMsg()
  • MPIDI_CH3_iStartMsgv()
  • After finishing send
  • Release Request obj

21
MPIDI_CH3_iStartMsg()
  • Attempt to send msg immediately
  • If successful, return NULL
  • Otherwise create Request and return ptr of it
  • Unset msg will be queued in the vc and sent later

22
MPI_Recv()
MPI_Recv(int argc, )
MPID_Recv ( int argc..)
MPIDI_REQUEST_SELF_MSG
MPIDI_REQUEST_EAGER_MSG
Msg type
MPIDI_REQUEST_RNDV_MSG
MPIDI_CH3_iStartMsgv()
Just copying buff
MPIDI_CH3U_Post_data_receive ()
MPIDI_CH3_iStartRndvTransfer ()
unpacking
MPID_Request_Release ()
End of MPI_Send
23
MPI_Recv ()
  • Validate handle params
  • MPIR_ERRTEST_COMM()
  • MPID_Comm_valid_ptr(), etc
  • Call MPID_Recv()
  • Release request after finished

24
MPID_Recv ()
  • Check unexpected / posted queue
  • MPIDI_CH3U_Recvq_FDU_or_AEP()
  • If pending req exists, then dequeue and process
    it
  • Get msg type of the request
  • MPIDI_REQUEST_EAGER_MSG
  • MPIDI_REQUEST_RNDV_MSG
  • MPIDI_REQUEST_SELF_MSG
  • Release request after finished

25
Message type
  • MPIDI_REQUEST_EAGER_MSG
  • Send back eager sync ack
  • MPIDI_CH3_iStartMsg()
  • Get pending count in the request
  • Unpack data and free buffer
  • MPIDI_REQUEST_RNDV_MSG (RTS)
  • Check posted queue and obtain req obj
  • MPIDI_CH3U_Post_data_receive()
  • If Rendezvous channel defined
  • Call MPIDI_CH3_iStartRndvTransfer()
  • RDMA read is performed.
  • MPIDI_REQUEST_RNDV_MSG (RTS)
  • Just copying buffer

26
Summary
  • Modular architecture
  • Provides independence
  • Local init / finalization
  • Developer can implement an algorithm within a
    specific module
  • Obey the interfaces
  • And link
  • However
  • Too complicated
  • Too deep reference link
  • Scattered source
  • More precise analysis can be possible
  • After analyzing coll and ptp, etc
Write a Comment
User Comments (0)
About PowerShow.com