An Introduction to Parallel Programming with MPI - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

An Introduction to Parallel Programming with MPI

Description:

BUF is the parameter in which MPI determines the starting point for the memory ... The rank value is with respect to the communicator in the COMM parameter. ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 53
Provided by: davidb122
Category:

less

Transcript and Presenter's Notes

Title: An Introduction to Parallel Programming with MPI


1
An Introduction to Parallel Programming with MPI
  • March 22, 24, 29, 31
  • 2005
  • David Adams daadams3_at_vt.edu
  • http//research.cs.vt.edu/lasca/schedule

2
MPI and Classical References
  • MPI
  • M. Snir, W. Gropp, MPI The Complete Reference
    (2-volume set), MIT Press, MA, (1998).
  • Parallel Computing
  • D. P. Bertsekas and J. N. Tsitsiklis, Parallel
    and Distributed Computation, Prentice-Hall,
    Englewood Cliffs, NJ, (1989).
  • M. J. Quinn, Designing Efficient Algorithms for
    Parallel Computers, Mcgraw-Hill, NY, (1987).

3
Outline
  • Disclaimers
  • Overview of basic parallel programming on a
    cluster with the goals of MPI
  • Batch system interaction
  • Startup procedures
  • Quick review
  • Blocking message passing
  • Non-blocking message passing
  • Collective communications

4
Review
  • Messages are the only way processors can pass
    information.
  • MPI hides the low level details of message
    transport leaving the user to specify only the
    message logic.
  • Parallel algorithms are built from identifying
    the concurrency opportunities in the problem
    itself, not in the serial algorithm.
  • Communication is slow.
  • Partitioning and pipelining are two primary
    methods for exploiting concurrency.
  • To make good use of the hardware we want to
    balance the computational load across all
    processors and maintain a compute bound process
    rather than a communication bound process.

5
More Review
  • MPI messages specify a starting point, a length,
    and data type information.
  • MPI messages are read from contiguous memory.
  • These functions will generally appear in all MPI
    programs
  • MPI_INIT MPI_FINALIZE
  • MPI_COMM_SIZE MPI_COMM_RANK
  • MPI_COMM_WORLD is the global communicator
    available at the start of all MPI runs.

6
Hello WorldFortran90
  • PROGRAM Hello_World
  • IMPLICIT NONE
  • INCLUDE 'mpif.h'
  • INTEGER ierr_p, rank_p, size_p
  • INTEGER, DIMENSION(MPI_STATUS_SIZE) status_p
  • CALL MPI_INIT(ierr_p)
  • CALL MPI_COMM_RANK(MPI_COMM_WORLD, rank_p,
    ierr_p)
  • CALL MPI_COMM_SIZE(MPI_COMM_WORLD, size_p,
    ierr_p)
  • IF (rank_p0) THEN
  • WRITE(,) Hello world! I am process 0 and I am
    special!
  • ELSE
  • WRITE(,) Hello world! I am process, rank_p
  • END IF
  • CALL MPI_FINALIZE(ierr_p)

7
Hello WorldC (case sensitive)
  • include ltstdio.hgt
  • include ltmpi.hgt
  • int main (int argc, char argv)
  • int rank_p,size_p
  • MPI_Init(argc, argv)
  • MPI_Comm_rank(MPI_COMM_WORLD, rank_p)
  • MPI_Comm_size(MPI_COMM_WORLD, size_p)
  • if (rank_p0)
  • printf("d Hello World! I am special!\n",
    rank_p)
  • else
  • printf("d Hello World!\n", size_p)
  • MPI_Finalize()

8
MPI Messages
  • Messages are non-overtaking.
  • All MPI messages are completed in two parts
  • Send
  • Can be blocking or non-blocking.
  • Identifies the destination, data type and length,
    and a message type identifier (tag).
  • Identifies to MPI a space in memory specifically
    reserved for the sending of this message.
  • Receive
  • Can be blocking or non-blocking
  • Identifies the source, data type and length, and
    a message type identifier (tag).
  • Identifies to MPI a space in memory specifically
    reserved for the completion of this message.

9
Message Semantics(Modes)
  • Standard
  • The completion of the send does not necessarily
    mean that the matching receive has started, and
    no assumption should be made in the application
    program about whether the out-going data is
    buffered.
  • All buffering is made at the discretion of your
    MPI implementation.
  • Completion of an operation simply means that the
    message buffer space can now be modified safely
    again.
  • Buffered
  • Synchronous
  • Ready

10
Message Semantics(Modes)
  • Standard
  • Buffered (not recommended)
  • The user can guarantee that a certain amount of
    buffer space is available.
  • The catch is that the space must be explicitly
    provided by the application program.
  • Making sure the buffer space does not become full
    is completely the users responsibility.
  • Synchronous
  • Ready

11
Message Semantics(Modes)
  • Standard
  • Buffered (not recommended)
  • Synchronous
  • A rendezvous semantic between sender and receiver
    is used.
  • Completion of a send signals that the receive has
    at least started.
  • Ready

12
Message Semantics(Modes)
  • Standard
  • Buffered (not recommended)
  • Synchronous
  • Ready (not recommended)
  • Allows the user to exploit extra knowledge to
    simplify the protocol and potentially achieve
    higher performance.
  • In a ready-mode send, the user asserts that the
    matching receive already has been posted.

13
Blocking Message Passing(SEND)
  • MPI_SEND (BUF, COUNT, DATATYPE, DEST, TAG, COMM,
    IERROR)
  • IN lttypegt BUF()
  • IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,
  • OUT IERROR
  • Performs a standard-mode, blocking send.
  • Blocking means that the code can not continue
    until the send has completed.
  • Completion of the send means either that the data
    has been buffered non-locally or locally and that
    the message buffer is now free to modify.
  • Completion implies nothing about the matching
    receive.

14
Buffer
  • MPI_SEND (BUF, COUNT, DATATYPE, DEST, TAG, COMM,
    IERROR)
  • IN lttypegt BUF()
  • IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,
  • OUT IERROR
  • BUF is an array. It can be an array of one
    object but it must be an array.
  • The definition
  • INTEGER X
  • DOES NOT EQUAL
  • INTEGER X(1)

15
Buffer
  • MPI_SEND (BUF, COUNT, DATATYPE, DEST, TAG, COMM,
    IERROR)
  • IN lttypegt BUF()
  • IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,
  • OUT IERROR
  • BUF is the parameter in which MPI determines the
    starting point for the memory space to be
    allocated to this message.
  • Recall that this memory space must be contiguous
    and allocatable arrays in Fortran90 are not
    necessarily contiguous. Also, array segments are
    certainly not in general contiguous.

16
Buffer
  • MPI_SEND (BUF, COUNT, DATATYPE, DEST, TAG, COMM,
    IERROR)
  • IN lttypegt BUF()
  • IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,
  • OUT IERROR
  • Until the send is complete the data inside BUF is
    undefined.
  • Any attempt to change the data in BUF before the
    send completes is also an undefined operation
    (though possible).
  • Once a send operation begins it is the users job
    to see that no modifications to BUF are made.
  • Completion of the send ensures the user that it
    is safe to modify the contents of BUF again.

17
DATATYPE
  • MPI_SEND (BUF, COUNT, DATATYPE, DEST, TAG, COMM,
    IERROR)
  • IN lttypegt BUF()
  • IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,
  • OUT IERROR
  • DATATYPE is an MPI specific data type
    corresponding to the type of data stored in BUF.
  • An array of integers would be sent using the
    MPI_INTEGER data type
  • An array of logical variables would be sent using
    the MPI_LOGICAL data type
  • etc.

18
MPI Types in Fortran 77
  • MPI_INTEGER INTEGER
  • MPI_REAL REAL
  • MPI_DOUBLE_PRECISION DOUBLE PRECISION
  • MPI_COMPLEX COMPLEX
  • MPI_LOGICAL LOGICAL
  • MPI_CHARACTER CHARACTER(1)
  • MPI_BYTE
  • MPI_PACKED

19
MPI types in C
  • MPI_CHAR signed char
  • MPI_SHORT signed short int
  • MPI_INT signed int
  • MPI_LONG signed long int
  • MPI_UNSIGNED_CHAR unsigned short int
  • MPI_UNSIGNED unsigned int
  • MPI_UNSIGNED_LONG unsigned long int
  • MPI_FLOAT float
  • MPI_DOUBLE double
  • MPI_LONG_DOUBLE long double
  • MPI_BYTE
  • MPI_PACKED

20
COUNT
  • MPI_SEND (BUF, COUNT, DATATYPE, DEST, TAG, COMM,
    IERROR)
  • IN lttypegt BUF()
  • IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,
  • OUT IERROR
  • COUNT specifies the number of entries of type
    DATATYPE in the buffer BUF.
  • From the combined information of COUNT, DATATYPE,
    and BUF, MPI can determine the starting point in
    memory for the message and the number of bytes to
    move.

21
Communicator
  • MPI_SEND (BUF, COUNT, DATATYPE, DEST, TAG, COMM,
    IERROR)
  • IN lttypegt BUF()
  • IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,
  • OUT IERROR
  • COMM provides MPI with the reference point for
    the communication domain applied to this send.
  • For most MPI programs MPI_COMM_WORLD will be
    sufficient as the argument for this parameter.

22
DESTINATION
  • MPI_SEND (BUF, COUNT, DATATYPE, DEST, TAG, COMM,
    IERROR)
  • IN lttypegt BUF()
  • IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,
  • OUT IERROR
  • DEST is an integer representing the rank of the
    process I am trying to send a message to.
  • The rank value is with respect to the
    communicator in the COMM parameter.
  • For MPI_COMM_WORLD, the value in DEST is the
    absolute rank of the processor you are trying to
    reach.

23
TAG
  • MPI_SEND (BUF, COUNT, DATATYPE, DEST, TAG, COMM,
    IERROR)
  • IN lttypegt BUF()
  • IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,
  • OUT IERROR
  • The TAG parameter is an integer between 0 and
    some upper bound where the upper bound is machine
    dependent. The value for the upper bound is
    found in the attribute MPI_TAG_UB.
  • This integer value can be used to distinguish
    messages since send-receive pairs will only match
    if their TAG values also match.

24
IERROR
  • MPI_SEND (BUF, COUNT, DATATYPE, DEST, TAG, COMM,
    IERROR)
  • IN lttypegt BUF()
  • IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,
  • OUT IERROR
  • Assuming everything is working as planned then
    the value of IERROR on exit will be MPI_SUCCESS.
  • Values not equal to MPI_SUCCESS indicate some
    error but these values are implementation
    specific.

25
Send Modes
  • Standard
  • MPI_SEND
  • Buffered (not recommended)
  • MPI_BSEND
  • Synchronous
  • MPI_SSEND
  • Ready (not recommended)
  • MPI_RSEND

26
Blocking Message Passing(RECEIVE)
  • MPI_RECV (BUF, COUNT, DATATYPE, SOURCE, TAG,
    COMM, STATUS, IERROR)
  • IN lttypegt BUF()
  • IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,
  • OUT IERROR, STATUS(MPI_STATUS_SIZE)
  • Performs a standard-mode, blocking receive.
  • Blocking means that the code can not continue
    until the receive has completed.
  • Completion of the receive means that the data has
    been placed into the message buffer locally and
    that the message buffer is now safe to modify or
    use.
  • Completion implies nothing about the completion
    of the matching send (except that the send has
    started).

27
BUFFER, DATATYPE, COMM, and IERROR
  • MPI_RECV (BUF, COUNT, DATATYPE, SOURCE, TAG,
    COMM, STATUS, IERROR)
  • IN lttypegt BUF()
  • IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,
  • OUT IERROR, STATUS(MPI_STATUS_SIZE)
  • The parameters BUF, DATATYPE and IERROR follow
    the same rules as that of the send.
  • Send receive pairs will only match if their
    SOURCE/DEST, TAG, and COMM information match.

28
COUNT
  • MPI_RECV (BUF, COUNT, DATATYPE, SOURCE, TAG,
    COMM, STATUS, IERROR)
  • IN lttypegt BUF()
  • IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,
  • OUT IERROR, STATUS(MPI_STATUS_SIZE)
  • Like in the send operation, the COUNT parameter
    indicates the number of entries of type DATATYPE
    in BUF.
  • The COUNT values of a send-receive pair, however,
    do not need to match.
  • It is the users responsibility to see that the
    buffer on the receiving end is big enough to
    store the incoming message. An overflow error
    would be returned in IERROR in the case when BUF
    is too small.

29
Source
  • MPI_RECV (BUF, COUNT, DATATYPE, SOURCE, TAG,
    COMM, STATUS, IERROR)
  • IN lttypegt BUF()
  • IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,
  • OUT IERROR, STATUS(MPI_STATUS_SIZE)
  • SOURCE is an integer representing the rank of the
    process I am willing to receive a message from.
  • The rank value is with respect to the
    communicator in the COMM parameter.
  • For MPI_COMM_WORLD, the value in SOURCE is the
    absolute rank of the processor you are willing to
    receive from.
  • The receiver can specify a wildcard value for
    SOURCE (MPI_ANY_SOURCE) indicating that any
    source is acceptable as long as the TAG and COMM
    parameters match.

30
Source
  • MPI_RECV (BUF, COUNT, DATATYPE, SOURCE, TAG,
    COMM, STATUS, IERROR)
  • IN lttypegt BUF()
  • IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,
  • OUT IERROR, STATUS(MPI_STATUS_SIZE)
  • The TAG value is an integer that must be matched
    with the TAG value of the corresponding send.
  • The receiver can specify a wildcard value for TAG
    (MPI_ANY_TAG) indicating that it is willing to
    receive any tag value as long as the source and
    COMM values match.

31
Source
  • MPI_RECV (BUF, COUNT, DATATYPE, SOURCE, TAG,
    COMM, STATUS, IERROR)
  • IN lttypegt BUF()
  • IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,
  • OUT IERROR, STATUS(MPI_STATUS_SIZE)
  • The STATUS parameter is a returned parameter that
    contains information about the completion of the
    message.
  • When using wildcards you may need to find out who
    it was that sent you a message, what it was
    about, and how long the message was before
    continuing to process. This is the type of
    information found in STATUS.

32
Source
  • MPI_RECV (BUF, COUNT, DATATYPE, SOURCE, TAG,
    COMM, STATUS, IERROR)
  • IN lttypegt BUF()
  • IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,
  • OUT IERROR, STATUS(MPI_STATUS_SIZE)
  • In FORTRAN77 STATUS is an array of integers of
    size MPI_STATUS_SIZE.
  • The three constants, MPI_SOURCE, MPI_TAG, and
    MPI_ERROR are the indices of the entries that
    store the source, tag and error fields
    respectively.
  • In C, STATUS is a structure of type MPI_Status
    that contains three fields named MPI_Source,
    MPI_Tag, and MPI_Error.
  • Notice that the length of the message doesnt
    appear to be included

33
Questions/Answers
  • Question What is the purpose of having the
    error returned in the STATUS data structure? It
    seems redundant.
  • Answer It is possible for a single function such
    as MPI_WAIT_ALL( ) to complete multiple messages
    in a single call. In these cases each individual
    message may produce its own error code and that
    code is what is returned in the STATUS data
    structure.

34
MPI_GET_COUNT
  • MPI_GET_COUNT(STATUS, DATATYPE, COUNT, IERROR)
  • IN INTEGER STATUS(MPI_STATUS_SIZE), DATA_TYPE,
  • OUT COUNT, IERROR
  • MPI_GET_COUNT will allow you to determine the
    number of entities of type DATATYPE were received
    in the message.
  • For advanced users see also MPI_GET_ELEMENT

35
Six Powerful Functions
  • MPI_INIT
  • MPI_FINALIZE
  • MPI_COMM_RANK
  • MPI_COMM_SIZE
  • MPI_SEND
  • MPI_RECV

36
Deadlock
  • MPI does not enforce a safe programming style.
  • It is the users responsibility to ensure that it
    is impossible for the program to fall into a
    deadlock condition.
  • Deadlock occurs when a process blocks to wait for
    an event that, given the current state of the
    system, can never happen.

37
Deadlock examples
  • CALL MPI_COMM_RANK(comm, rank, ierr)
  • IF (rank .EQ. 0) THEN
  • CALL MPI_RECV(recvbuf, count, MPI_REAL, 1, tag,
    comm, status, ierr)
  • CALL MPI_SEND(sendbuf, count, MPI_REAL, 1, tag,
    comm, ierr)
  • ELSE IF (rank .EQ. 1) THEN
  • CALL MPI_RECV(recvbuf, count, MPI_REAL, 0, tag,
    comm, status, ierr)
  • CALL MPI_SEND(sendbuf, count, MPI_REAL, 0, tag,
    comm, ierr)
  • END IF
  • This program will always deadlock.

38
Deadlock examples
  • CALL MPI_COMM_RANK(comm, rank, ierr)
  • IF (rank .EQ. 0) THEN
  • CALL MPI_SEND(sendbuf, count, MPI_REAL, 1, tag,
    comm, ierr)
  • CALL MPI_RECV(recvbuf, count, MPI_REAL, 1, tag,
    comm, status, ierr)
  • ELSE IF (rank .EQ. 1) THEN
  • CALL MPI_SEND(sendbuf, count, MPI_REAL, 0, tag,
    comm, ierr)
  • CALL MPI_RECV(recvbuf, count, MPI_REAL, 0, tag,
    comm, status, ierr)
  • END IF
  • This program is unsafe. Why?

39
Safe Way
  • CALL MPI_COMM_RANK(comm, rank, ierr)
  • IF (rank .EQ. 0) THEN
  • CALL MPI_SEND(sendbuf, count, MPI_REAL, 1, tag,
    comm, ierr)
  • CALL MPI_RECV(recvbuf, count, MPI_REAL, 1, tag,
    comm, status, ierr)
  • ELSE IF (rank .EQ. 1) THEN
  • CALL MPI_RECV(recvbuf, count, MPI_REAL, 0, tag,
    comm, status, ierr) CALL MPI_SEND(sendbuf, count,
    MPI_REAL, 0, tag, comm, ierr)
  • END IF
  • This is a silly exampleno one would ever try to
    do it the other waysright?

40
Motivating Example for Deadlock
41
Motivating Example for Deadlock
Timestep 1
42
Motivating Example for Deadlock
Timestep 2
43
Motivating Example for Deadlock
Timestep 3
44
Motivating Example for Deadlock
Timestep 4
45
Motivating Example for Deadlock
Timestep 5
46
Motivating Example for Deadlock
Timestep 6
47
Motivating Example for Deadlock
Timestep 7
48
Motivating Example for Deadlock
Timestep 8
49
Motivating Example for Deadlock
Timestep 9
50
Motivating Example for Deadlock
Timestep 10!
51
Super Idea!
  • CALL MPI_COMM_RANK(comm, rank, ierr)
  • IF (rank .EQ. 0) THEN
  • CALL MPI_SEND(sendbuf, count, MPI_REAL, 1, tag,
    comm, ierr)
  • CALL MPI_RECV(recvbuf, count, MPI_REAL, 1, tag,
    comm, status, ierr)
  • ELSE IF (rank .EQ. 1) THEN
  • CALL MPI_SEND(sendbuf, count, MPI_REAL, 2, tag,
    comm, ierr)
  • CALL MPI_RECV(recvbuf, count, MPI_REAL, 0, tag,
    comm, status, ierr)
  • ELSE IF (rank .EQ. 2) THEN
  • Ill cleverly order my sends so that they all
    happen at the same time and all the communication
    will be completed in one time step!

52
WRONG!
  • The code will be unsafe.
  • It worked perfectly for me, why doesnt it work
    on this machine?
  • It ran fine on Washday and now it doesnt work. I
    havent changed anything!
  • My code works if I send smaller messages. Maybe
    your machine cant handle my optimized code.
  • Why?
  • http//research.cs.vt.edu/lasca/schedule
  • Please send any additional questions to
  • lasca_at_cs.vt.edu
Write a Comment
User Comments (0)
About PowerShow.com