Title: Message Passing Communication:
1Message Passing Communication Patterns
Primitives
- Communication patterns and primitives
- 1 to 1 , 1 to all, all to 1, all to all,
synchronization - Orthogonal issues
- synchronous/asynchronous
- blocking/non-blocking
- buffer management
- additional message parameters tags, groups,
data formats - PVM way
- MPI way
2Communication Patterns Primitives
1-to-1 (Point-to-point) 1 to all(many) All(many)
to 1 All to all Synchronization
3Point-to-point Communication
- Basic primitives
- send(dest, data, )
- receive(source, buf, )
- all other operations can be built from these,
but usually less efficiently - additional parameters specify data formats,
buffer locations and properties (size, (non-)
safe for reuse) , communication modes (blocking,
non-blocking) and message types - receive() can specify wildcart source receive
from anybody
send(dest,
data, ) receive(source, data, )
source dest
data
4Point-to-Point Example Tree Computation
rootNode() // global initialiation //
read data distribute() // output
result // global clean-up leafNode()
// initialize leaf receive(parent,data,)
// process data // and compute result
send(parent,result,) // leaf clean-up
innerNode() // initialize inner node
receive(parent, data) distribute()
send(parent, result) // inner node
clean-up distribute(lSon, rSon, data,
result) // split data send(lSon,
lData) send(rSon, rData)
receive(lSon,lResult) receive(rSon,rResult)
// combine results into result
5One to Many
- Basic primitives
- broadcast(data, source, group_id, )
- scatter(data, recvBuf, source, group_id, )
broadcast(data,) scatter(data,)
source
source
data
data
- group members
6Many to One
- Basic primitives
- gather(sendBuf, recvBuf, dest, group_id, )
- reduce(sendBuf, recvBuf, dest, operation,
group_id, ) - scan(sendBuf, recvBuf, operation, group_id, )
gather() reduce(
)
dest
dest
1
10
4
0
4
1
-1
2
3
- group members
7Many to One scan()
- Also called parallel prefix
- scan(sendBuf, recvBuf, operation, group_id, )
- performs reduce() on all predecessors
scan(sendBuf, recvBuf,, group_id, )
sendBuf
4
1
0
4
-1
3
2
1
recvBuf
4
8
-8
-32
-32
- group members
8Multi-party Communication Example
- Problem
- given arrays AN, Bm, where N is big and m is
reasonably small - compute CN and d, where Ci f(Ai, B) and
d max(g(Ai, B)) where f() and g() are
computationally expensive functions - master()
- .. // initialize read A and B
- // distribute data
- broadcast(B, master_id, group_id,)
- scatter(A, Abuf, master_id, group_id,)
- // gather results
- gather(Cbuf, C, master_id, group_id,)
- reduce(dBuf, d, master_id, group_id,)
- // output results
- // global clean-up
9Multi-party Communication Example (cont.)
void slave() // initialize slave // get
B broadcast(B, master_id, group_id, ) //
and my part of A into Abuf scatter(A, Abuf,
master_id, group_id, ) // compute Cbuf
f(Abuf, B) // compute dBuf
max(g(Abuf,B)) gather(Cbuf, C, master_id,
group_id,) reduce(dBuf, d, master_id,
group_id,) // clean-up
10All to All Synchronization
- there could be all2all variants allgather(),
allscatter(), allreduce() - Global synchronization
- barrier(group_id)
- stops until all tasks within the group reach the
same barrier call
barrier()
barrier()
barrier()
barrier()
synchronization
barrier()
time
11Message Passing Communication Patterns
Primitives
- Communication patterns and primitives
- 1 to 1 , 1 to all, all to 1, all to all,
synchronization - Orthogonal issues
- synchronous/asynchronous
- blocking/non-blocking
- buffer management
- additional message parameters tags, groups,
data formats - PVM way
- MPI way
12Orthogonal Issues
Synchronous methods Asynchronous methods Message
types tags Communication groups Data Types
packing unpacking transmitted data
13Synchronous communication
- Synchronous (blocking) methods
- sender waits until the receiver receives the
message - receiver waits until the sender sends the
message - advantages
- inherently synchronizes sender(s) with
receiver(s) - single copying sufficient
- disadvantages
- processor idle while waiting for communication
- possible deadlock problem
14Deadlock with blocking send()/receive()
- P1 and P2 want to exchange information
- with blocking send() and receive they cannot
execute the same code, although they are
essentially the same.
P2 send(P1, data2) receive(P1,
data1) P2 receive(P1, data1)
send(P1, data2)
P1 send(P2, data1) receive(P2,
data2) P1 receive(P2, data2)
send(P2, data1)
15Avoiding Deadlock
- P1 and P2 perform send() and receive() in
different order - - might get tricky for dynamic communication
patterns
P2 receive(P1, data1) send(P1, data2)
P1 send(P2, data1) receive(P2,
data2)
P1 sendrecv(P2, data1, data2)
P2 sendrecv(P1, data2, data1)
16Asynchronous methods
Sender continues before the receiver gets the
message Where is the message meanwhile?
- In the senders buffer
- the sender can continue immediately
- it has to be really careful not to reuse the
buffer before the message is delivered!
- Copied to a system buffer
- the sender is blocked until the message has been
copied - if the system buffer overflows, deadlock can
still occur
- asynchronous send() returns handle which can be
used to check whether/when was the message
delivered
17Asynchronous methods (cont.)
- Advantages
- allows overlapping computation and communication
- easier to avoid deadlock (but care still needed)
- Disadvantages
- additional copying (and buffers) needed
- needs explicit check whether the message has
arrived
18Additional parameters needed
- Message tags
- specifies logical message types
- easier and safer programming
- wildcard for receiving can be specified
- Group identification
- one process can participate in several
communication groups - prevents unsafe message passing
- Data type specification
- size (to know how much to copy) and type (to be
able to translate for the receiver) of the data
being sent should be specified
19Message Passing Communication Patterns
Primitives
- Communication patterns and primitives
- 1 to 1 , 1 to all, all to 1, all to all,
synchronization - Orthogonal issues
- synchronous/asynchronous
- blocking/non-blocking
- buffer management
- additional message parameters tags, groups,
data formats - PVM way
- MPI way
20Message Passing Communication Patterns
Primitives
- Communication patterns and primitives
- 1 to 1 , 1 to all, all to 1, all to all,
synchronization - Orthogonal issues
- synchronous/asynchronous
- blocking/non-blocking
- buffer management
- additional message parameters tags, groups,
data formats - PVM way
- MPI way
21MPI Routines
- General format
- rc MPI_Xxxxx(parameter, ... )
- Example
- rc MPI_Bsend(buf,count,type,dest,tag,comm)
- Basic types of routines
- Environment Management
- Point-to-Point Communication Blocking Non
Blocking - Collective Communication
- Process Group Communicator
- Derived Types, Virtual Topology Miscellaneous
22Environment Management Routines
Basic structure of a program using MPI MPI
include file Initialize MPI environment Do
work and make message passing calls Terminate
MPI environment
23Environment Management Routines (cont.)
The most important e.m. routines MPI_Init(int
argc, char argv) must be called in every
MPI program, exactly once, and before any other
MPI routine MPI_Comm_size(MPI_Comm comm, int
size) determines the size of the group
associated with the communicator MPI_Comm_rank(MPI
_Comm comm, int rank) determines the rank of
the calling process within the communicator MPI_Fi
nalize() - terminates the MPI execution
environment
24Environment Management Routines - Example
include "mpi.h" include ltstdio.hgt int main(int
argc, char argv) int numtasks, rank,
rc rc MPI_Init(argc, argv) if (rc
! 0) printf ("Error starting MPI
program. Terminating.\n")
MPI_Abort (MPI_COMM_WORLD, rc)
MPI_Comm_size(MPI_COMM_WORLD, numtasks)
MPI_Comm_rank(MPI_COMM_WORLD, rank) printf
("Number of tasks d My rank d\n", numtasks,
rank) / do some work /
MPI_Finalize()
25Blocking Point-to-Point Communication
- MPI_Send()
- - Basic blocking send operation. Routine returns
only after the application buffer in the sending
task is free for reuse. - MPI_Recv()
- Receive a message and block until the requested
data is available in the application buffer in
the receiving task. - MPI_Ssend() - synchronous blocking send
- MPI_Bsend() - buffered blocking send
- MPI_Rsend() - blocking ready send, use with great
care - MPI_Sendrecv()
- Send a message and post a receive before
blocking. Will block until the sending
application buffer is free for reuse and until
the receiving application buffer contains the
received message.
26Blocking Point2Point Comm. Example p.1
include "mpi.h include ltstdio.hgt int main(int
argc, char argv) int numtasks, rank,
dest, source, rc, tag1 char inmsg,
outmsg'x' MPI_Status Stat
MPI_Init(argc,argv) MPI_Comm_size(MPI_COMM
_WORLD, numtasks) MPI_Comm_rank(MPI_COMM_WOR
LD, rank)
27Blocking Point2Point Comm. Example p.2
if (rank 0) dest 1
source 1 rc MPI_Send(outmsg, 1,
MPI_CHAR, dest, tag, MPI_COMM_WORLD) rc
MPI_Recv(inmsg, 1, MPI_CHAR, source, tag,
MPI_COMM_WORLD, Stat) else if
(rank 1) dest 0 source 0
rc MPI_Recv(inmsg, 1, MPI_CHAR,
source, tag, MPI_COMM_WORLD, Stat)
rc MPI_Send(outmsg, 1, MPI_CHAR, dest, tag,
MPI_COMM_WORLD) MPI_Finalize()
28Nonblocking Point-to-Point Communication
MPI_Isend(), MPI_Irecv() identifies the
send/receive buffer. Communication proceeds
immediately. A communication request handle is
returned for handling the pending message status.
The program must use calls to MPI_Wait or
MPI_Test to determine when the operation
completes. MPI_Issend(), MPI_Ibsend(),
MPI_Irsend() non-blocking versions MPI_Test(),
MPI_Testany, MPI_Testall, MPI_Testsome() -
checks the status of a specified non-blocking
send or receive operation MPI_Wait(),
MPI_Waitany(), MPI_Waitall(), MPI_Waitsome() -
blocks until a specified non-blocking send or
receive operation has completed MPI_Probe() -
performs a non-blocking test for a message.
29Nonblocking P2P Comm. Example p.1
include "mpi.h" include ltstdio.hgt int
main(int argc, char argv) int numtasks,
rank, next, prev, buf2, tag11, tag22
MPI_Request reqs4 MPI_Status stats4
MPI_Init(argc,argv)
MPI_Comm_size(MPI_COMM_WORLD, numtasks)
MPI_Comm_rank(MPI_COMM_WORLD, rank)
30Nonblocking P2P Comm. Example p.2
prev rank-1 next rank1 if
(rank 0) prev numtasks - 1 if (rank
(numtasks - 1)) next 0
MPI_Irecv(buf0, 1, MPI_INT, prev, tag1,
MPI_COMM_WORLD, reqs0) MPI_Irecv(buf1,
1, MPI_INT, next, tag2, MPI_COMM_WORLD,
reqs1) MPI_Isend(rank, 1, MPI_INT,
prev, tag2, MPI_COMM_WORLD, reqs2)
MPI_Isend(rank, 1, MPI_INT, next, tag1,
MPI_COMM_WORLD, reqs3) MPI_Waitall(4,
reqs, stats) MPI_Finalize()
31Collective Communication
- Involve all processes in the scope of
communicator - Three categories
- synchronization (barrier())
- data movement (broadcast, scatter, gather,
alltoall) - collective computation (reduce(), scan())
- Limitations/differences from point-to-point
- blocking (no more true with MPI 2)
- do not take tag arguments
- work with MPI defined datatypes - not with
derived types - Collective operations within subsets of processes
are accomplished by first partitioning the
subsets into a new groups and then attaching the
new groups to new communicators
32Collective Communication Routines
MPI_Barrier() MPI_Bcast() MPI_Scatter() MPI_Gathe
r() MPI_Alltoall() MPI_Allgather() MPI_Allreduce(
) MPI_Reduce() MPI_Reduce_scatter() MPI_Scan()
33Collective Communication Example p.1
include "mpi.h include ltstdio.hgt define
SIZE 4 int main(int argc, char argv)
int numtasks, rank, sendcount, recvcount, source
float sendbufSIZESIZE 1.0,
2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0,
11.0, 12.0, 13.0, 14.0, 15.0, 16.0
float recvbufSIZE MPI_Init(argc,argv)
MPI_Comm_rank(MPI_COMM_WORLD, rank)
MPI_Comm_size(MPI_COMM_WORLD, numtasks)
34Collective Communication Example p.2
if (numtasks SIZE) source 1
sendcount SIZE recvcount
SIZE MPI_Scatter(sendbuf, sendcount,
MPI_FLOAT, recvbuf, recvcount,
MPI_FLOAT, source, MPI_COMM_WORLD)
printf("rank d Results f f f
f\n",rank,recvbuf0,
recvbuf1,recvbuf2,recvbuf3) else
printf("Must specify d processors.
Terminating.\n",SIZE) MPI_Finalize()
35Familiarize yourself with MPI
- Other MPI Routines
- Process Group Communicator
- Derived Types, Virtual Topology Miscellaneous
- First assignment includes simple MPI programming
- There are lots of MPI tutorials on the WEB
- http//www.mhpcc.edu/training/workshop/mpi/MAIN.
html - http//www-unix.mcs.anl.gov/mpi/tutorial/
-
- Lab specific info
- on the course web page