Title: Parallel Programming: Message Passing Interface
1Parallel Programming Message Passing Interface
2Overview
- MPI is a standard set of 129 functions
- Several different versions
- p4
- PVM
- Express
- PARMACS
3Overview
- We will concentrate on a core of 24 functions
- Should allow you to create simple MPI programs
- Understand how MPI implements local, global, and
asynchronous communications - Learn about mechanisms for modular programming
4MPI Programming Model
- A computation is made up of one or more processes
that call library routines (MPI functions) to
send and receive messages - Usually in MPI, a fixed set of processes is
created at runtime along with one process per
processor
5MPI Programming Model
- Each process can execute a different program
(called multiple program multiple data (MPMD)) - MPI is primarily concerned with communications
- Point-to-point
- Collective
- Probe for messages
6MPI Programming Model
- Algorithms that allocate one process per
processor are ideal for MPI - Algorithms that create processes dynamically
require reformulation
7MPI Basics
- MPI is a complex, multifaceted system with six
core functions - MPI_INIT
- MPI_FINALIZE
- MPI_COMM_SIZE
- MPI_COMM_RANK
- MPI_SEND
- MPI_RECV
8MPI Basics
- The communicator defines the process group within
which an operation is to be performed - Default values for communicator parameters is
MPI_COMM_WORLD
9MPI Basics MPI_INIT
- Initiates a MPI computation
- Must be called before any other MPI function
- Must be called exactly once per computation
10MPI Basics MPI_FINALIZE
- Shuts down a MPI computation
- No MPI functions can be called after this call
- Also called once per computation
11MPI Basics MPI_COMM_SIZE/RANK
- MPI_COMM_SIZE determines the number of processes
in a computation - MPI_COMM_RANK determines the process id
12MPI Basics MPI_COMM_SIZE/RANK
- Example
- program main
- begin
- MPI_INIT()
- MPI_COMM_SIZE(MPI_COMM_WORLD, count)
MPI_COMM_RANK(MPI_COMM_WORLD, myid) - print("I am", myid, "of", count)
- MPI_FINALIZE()
- end
13MPI Basics MPI_COMM_SIZE/RANK
- Example
- I am 1 of 4
- I am 3 of 4
- I am 0 of 4
- I am 2 of 4
- How does the system know how many processes?
14MPI Basics MPI_SEND
- MPI_SEND(buf, count, datatype, dest, tag, comm)
- A message containing count elements
- Of type datatype
- Starting at address buf
- Is to be sent to process dest
- The tag value is returned as a reference
15MPI Basics MPI_RECV
- MPI_RECV(buf, count, datatype, source, tag, comm,
status) - Attempts to receive a message that has an
envelope with the specified tag, comm, and source - And places elements of datatype
- At buf address that is large enough
- To contain count elements
- The status variable used for reference
16MPI Basics Envelopes
- A message will always have an envelope that it is
associated with - This envelope consists of a messages tag,
source, and context - A tag is set by the sending process to
distinguish the message - The source (in the MPI_RECV function) is used to
limit the scope of message received - The context is specified through the communicator
17Example program
18Language Binding
- MPI is a standard, not a language
- Each language must have its own implementation of
the MPI standard - Each implementation will be different in ways
that are specific to the language - Handles hide internal structure
19Language Binding C
- Defined in mpi.h
- Status codes are returned as integers
- The integer values correspond to constants (e.g.
MPI_SUCCESS) - Handles are special types
20Language Binding C
- The status variable is represented as a special
type MPI_Status - status.MPI_SOURCE field
- status.MPI_TAG field
- Each C datatype has a corresponding MPI datatype
(e.g. MPI_INT, MPI_LONG)
21C Example
22Language Binding Fortran
- Defined in mpif.h
- Function calls have an additional INTEGER type
argument for return codes - All handles are of type INTEGER
23Language Binding Fortran
- The status variable is an array of integers
- Array is of size MPI_STATUS_SIZE
- MPI_SOURCE contains the index of the source field
- MPI_TAG contains the index of the tag field
- Each Fortran datatype has a corresponding MPI
datatype (e.g. MPI_INTEGER, MPI_REAL)
24Fortran Example
25Language Binding - Java
- mpiJava - Modelled after the C binding for MPI.
Implementation through JNI wrappers to native MPI
software. - JavaMPI - Automatic generation of wrappers to
legacy MPI libraries. C-like implementation based
on the JCI code generator. - MPIJ - Pure Java implementation of MPI closely
based on the C binding. A large subset of MPI
is implemented using native marshaling.
26Determinism
- MPI is naturally nondeterministic Why?
- BUT, MPI does guarantee that two messages sent
from one processes to another process will arrive
in the order sent - By specifying the source, tag, and/or context,
channels can be created that will guarantee
determinism
27Global Operations
- Convenience functions for collective
communication - These functions are called collectively
28Global Operations Barrier
- This call (MPI_BARRIER) blocks a process until
ALL processes have called it - Used to synchronize processes
- Can be used to separate two phases of a
computation
29Global Operations Data Movement
- Collective data movement functions
- All processes interact with a root process to
broadcast, gather, or scatter data
30Global Operations MPI_BCAST
- MPI_BCAST(inbuf, incnt, intype, root, comm)
- One-to-all broadcast of data
- The root process sends data to all processes
- The data of type intype are located in inbuf and
there are incnt elements - After the call, the data is replicated
31Global Operations MPI_GATHER
- MPI_GATHER(inbuf, incnt, intype, outbuf, outcnt,
outtype, root, comm) - All-to-one data gathering
- All processes (inc. root) send data of type
intype and of size incnt that is located in inbuf
to the root process - The data is place in outbuf as contiguous
elements in process id order
32Global Operations MPI_SCATTER
- MPI_SCATTER(inbuf, incnt, intype, outbuf, outcnt,
outtype, root, comm) - One-to-all data scattering operation
- Similar to BCAST except that the i th portion of
inbuf is sent to process i - The outbuf is used to hold the sent data
33Global Operations Data Movement
34Global Operations Reductions
- Performs some operation on data from all
processes - MPI_ALLREDUCE(inbuf, outbuf, outcnt, outtype,
op, root, comm) - Applies the op operation to values in the inbuf
of all processes and places the result in the
outbuf of the root process or all processes
35Global Operations Reductions
36Global Operations Reductions
- Valid operations are
- MPI_MIN
- MPI_MAX
- MPI_SUM
- MPI_PROD
- MPI_LAND, MPI_LOR, MPI_LXOR
- MPI_BAND, MPI_BOR, MPI_BXOR
37(No Transcript)
38C Example finite difference
39Asynchronous Communication
- MPI_IPROBE(source, tag, comm, flag, status)
- Non-blocking call that sets the flag value to
indicate if a message is waiting - MPI_PROBE(source, tag, comm, status)
- Blocking version of IPROBE
- MPI_GET_COUNT(status, datatype, count)
- Sets count value using status from receive call
40Asynchronous Communication
- Example
- int count, buf, source
- MPI_Probe(MPI_ANY_SOURCE, 0, comm, status)
- source status.MPI_SOURCE
- MPI_Get_count(status, MPI_INT, count)
- buf malloc(countsizeof(int))
- MPI_Recv(buf, count, MPI_INT, source, 0, comm,
status)
41Modularity
- Modularity in MPI is accomplished through
communicators - This allows a subset of processes to communicate
and perform global operation without
interfering with other subsets - Sequential and parallel models are supported
while concurrent is not
42Modularity Communicators
- MPI_COMM_DUP(comm, newcomm)
- Create a new communicator made up of the same
processes as comm - MPI_COMM_FREE(comm)
- Destroys a communicator
43Modularity Communicators
- MPI_COMM_SPLIT(comm, color, key, newcomm)
- Creates one or more new communicators
- The color determines which communicator this
process is assigned to - The key determines the order of the processes
within that communicator
44Modularity Inter-group Communication
- Communicators allow collective operations and
intra-group communication
45Modularity Inter-group Communication
- MPI_INTERCOMM_CREATE(comm, local_leader,
peercomm, remote_leader, tag, intercomm) - comm is current communicator for this process
- local_leader is the leader process for this group
- peercomm is the parent communicator
- remote_leader is the leader process for the other
group - intercomm is the new intercommunicator
46C Inter-communicator Example
47Performance Issues
- SEND/RECV pair requires one communication
- PROBE is the same as a RECV
- Communicator creations do not involve
communications (inter-communicator costs 1
communication)
48Performance Issues
49Summary
- Described the MPI standard
- Showed how it is implemented in C and Fortran
- Discussed the use of communicators
- And how to pass messages in a number of different
ways
50Break then Case Study