Introduction to MPI Programming - PowerPoint PPT Presentation

1 / 67
About This Presentation
Title:

Introduction to MPI Programming

Description:

Review point to point communications. Data types. Data packing ... Collective communication will not interfere with point-to-point communication and vice-versa. ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 68
Provided by: wrgridGro
Category:

less

Transcript and Presenter's Notes

Title: Introduction to MPI Programming


1
  • Introduction to MPI Programming
  • (Part II)?
  • Michael Griffiths, Deniz Savas Alan Real
  • January 2006

2
Overview
  • Review point to point communications
  • Data types
  • Data packing
  • Collective Communication
  • Broadcast, Scatter Gather of data
  • Reduction Operations
  • Barrier Synchronisation
  • Patterns for Parallel Programming
  • Exercises

3
Blocking operations
  • Relate to when the operation has completed
  • Only return from the subroutine call when the
    operation has completed

4
Non-blocking operations
  • Return straight away and allow the sub-program to
    return to perform other work.
  • At some time later the sub-program should test or
    wait for the completion of the non-blocking
    operation.
  • A non-blocking operation immediately followed by
    a matching wait is equivalent to a blocking
    operation.
  • Non-blocking operations are not the same as
    sequential subroutine calls as the operation
    continues after the call has returned.

5
(No Transcript)
6
Non-blocking communication
  • Separate communication into three phases
  • Initiate non-blocking communication
  • Do some work
  • Perhaps involving other communications
  • Wait for non-blocking communication to complete.

7
Non-blocking send
Receive
MPI_COMM_WORLD
Send req
Wait
  • Send is initiated and returns straight away.
  • Sending process can do other things
  • Can test later whether operation has completed.

8
Non-blocking receive
Rec req
Wait
MPI_COMM_WORLD
Send
  • Receive is initiated and returns straight away.
  • Receiving process can do other things
  • Can test later whether operation has completed.

9
The Request Handle
  • Same arguments as non-blocking call
  • Additional request handle
  • In C/C is of type MPI_Request/MPIRequest
  • In Fortran is an INTEGER
  • Request handle is allocated when a communication
    is initiated
  • Can query to test whether non-blocking operation
    has completed

10
Non-blocking synchronous send
  • Fortran
  • CALL MPI_ISSEND(buf, count, datatype, dest, tag,
    comm, request, error)?
  • CALL MPI_WAIT(request, status, error)?
  • C
  • MPI_Issend(buf, count, datatype, dest, tag,
    comm, request)
  • MPI_Wait(request, status)
  • C
  • request comm.Issend(buf, count, datatype,
    dest, tag)
  • request.Wait()

11
Non-blocking synchronous receive
  • Fortran
  • CALL MPI_IRECV(buf, count, datatype, src, tag,
    comm, request, error)?
  • CALL MPI_WAIT(request, status, error)?
  • C
  • MPI_Irecv(buf, count, datatype, src, tag, comm,
    request)
  • MPI_Wait(request, status)
  • C
  • request comm.Irecv(buf, count, datatype, src,
    tag)
  • request.Wait(status)

12
Blocking v Non-blocking
  • Send and receive can be blocking or non-blocking.
  • A blocking send can be used with a non-blocking
    receive, and vice versa.
  • Non-blocking sends can use any mode
  • Synchronous mode affects completion, not
    initiation.
  • A non-blocking call followed by an explicit wait
    is identical to the corresponding blocking
    communication.

13
Completion
  • Can either wait or test for completion
  • Fortran (LOGICAL flag)
  • CALL MPI_WAIT(request, status, ierror)?
  • CALL MPI_TEST(request, flag, status, ierror)?
  • C (int flag)
  • MPI_Wait(request, status)?
  • MPI_Test(request, flag, status)?
  • C (bool flag)
  • request.Wait()?
  • flag request.Test() (for sends)?
  • request.Wait(status)
  • flag request.Test(status) (for receives)?

14
Other related wait and test routines
  • If multiple non-blocking calls are issued
  • MPI_TESTANY Tests if any one of a list of
    requests (they could be send or receive
    requests) have been completed.
  • MPI_WAITANY Waits until any one of the list of
    requests have been completed.
  • MPI_TESTALL Test if all the requests in a list
    are completed.
  • MPI_WAITALL Waits until all the requests in a
    list are completed.
  • MPI_PROBE , MPI_IPROBE Allows for the incoming
    messages to be checked for without actually
    receiving them. Note that MPI_PROBE is
    blocking. It waits until there is something to
    probe for.
  • MPI_CANCEL Cancels pending communication. Last
    resort, clean- up operation !
  • All routines take an array of requests and can
    return an array of statuses.
  • any routines return an index of the completed
    operation

15
Merging send and receive operations into a single
unit
  • The following is the syntax of the MPI_Sendrecv
    command
  • IN C
  • int MPI_Sendrecv( void sendbuf, int sendcount,
    MPI_Datatype sendtype, int dest, int sendtag,
    void recvbuf, int recvcount, MPI_Datatye
    recvtype ,int source , int recvtag, MPI_Comm
    comm, MPI_Status status )?
  • IN FORTRAN
  • ltsendtypegt sendbuf()
  • ltrecvtypegt recvbuf()?
  • INTEGER sendcount,sendtype, dest, sendtag,
    recvcount, recvtype,
  • INTEGER source, recvtag, comm, status(MPI_STATUS_S
    IZE), ierror
  • MPI_SENDRECV( sendbuf,sendcount,sendtype, dest,
    sendtag, recvbuf, recvcount , recvtype , source,
    recvtag , comm , status , ierror )?

16
Important Notes about MPI_Sendrecv
  • Beware! A message sent by MPI_sendrecv is
    receivable by a regular receive operation if the
    destination and tag match.
  • For the destination and source MPI_PROC_NULL can
    be specified to allow one directional working.
    (Useful in non-circular communication for the
    very end-nodes).
  • Any communication with MPI_PROC_NULL returns
    immediately with no effect but as if the
    operation has been successful. This can make
    programming easier.
  • The send and receive buffers must not overlap,
    they must be separate memory locations. This
    restriction can be avoided by using the
    MPI_Sendrecv_replace routine

17
Data Packing
  • Up until now we have only seen contiguous data
    of pre-defined data-types being communicated by
    MPI calls. This can be rather restricting if what
    we are intending to transfer involves structures
    of data made up of mixtures of primitive data
    types, such as integer count followed by a
    sequence of real numbers.
  • One solution to this problem is to use the
    MPI_PACK and MPI_UNPACK routines. The philosophy
    used is similar to the Fortran write/read to/from
    internal buffers and the scanf function in C.
  • MPI_PACK routine can be called consecutively to
    compress the data into a send_buffer, the
    resulting buffer of data can then be sent by
    using MPI_SEND or equivalent with the data_type
    set to MPI_PACKED.
  • At the receiving-end it can be received by
    using MPI_RECV with the data type MPI_PACKED. The
    received data can then be unpacked by using
    MPI_UNPACK to recover the original packed data.
    This method of working can also improve
    communications efficiency by reducing the number
    of data transfer send-receive calls. There are
    usually fixed overheads associated with setting
    up the communications that would cause
    inefficiencies if the sent/received messages are
    just too small.

18
MPI_Pack
  • Fortran
  • lttypegt INBUF() , OUTBUF()?
  • INTEGER INCOUNT,DATATYPE,OUTSIZE,POSITION,COMM,IE
    RROR
  • MPI_PACK(INBUF,INCOUNT,DATATYPE,OUTBUF,
    OUTSIZE,POSITION, COMM,IERROR )?
  • C
  • int MPI_Pack(void inbuf, int incount,
    MPI_Datatype datatype, void outbuf ,int outsize,
    int position, MPI_Comm comm )?
  • Packs the message in inbuf of type datatype and
    lengthincount and stores it in outbuf . Outbuf
    size is specified in bytes. Outsize being the
    maximum length of outbuf in bytes, rather than
    its actuaL size.
  • On entry position indicates the starting
    location at the outbuf where data will be
    written. On exit position points to the first
    free position in outbuf following the location
    occupied by the packed message. This can then be
    readily used as the position parameter for the
    next mpi_pack call.

19
MPI_Unpack
  • Fortran
  • lttypegt INBUF() , OUTBUF()?
  • INTEGER INSIZE, POSITION, OUTCOUNT,DATATYPE,
    COMM,IERROR
  • MPI_UNPACK(INBUF,INSIZE,POSITION,
    OUTBUF,OUTCOUNT,DATATYPE, ,COMM,IERROR )?
  • C
  • int MPI_Unpack(void inbuf, int insize, int
    position, void outbuf ,int outcount,
    MPI_Datatype datatype, MPI_Comm comm )?
  • Unpacks the message which is in inbuf as data
    of type datatype and length of outcounts and
    stores it in outbuf .
  • On entry, position indicates the starting
    location of data in inbuf where data will be read
    from. On exit position points to the first
    position of the next set of data in inbuf. This
    can then be readily used as the position
    parameter for the next mpi_unpack call.

20
Derived Datatypes
  • Basic data types provided in MPI allow us to send
    messages consisting of arrays of these types. We
    can also pack mixtures of these arrays into a
    single array of type MPI_PACKED to be sent at one
    go.
  • However under certain circumstances a better
    approach would be to define a data type of our
    own choosing, constructed by using the existing
    data types and then define our messages in units
    of this newly invented data-type. This is a
    similar approach to defining structs in C and
    user-defined types in Fortran. It improves
    efficiency by reducing the number of
    communications calls needed to communicate data (
    as it can not be a mixture of basic types).
  • The following table shows how the new data types
    are specified as an ordered list of its
    constituent components and the location of each
    component within the over-all structure. This is
    referred to as the type-map of the new data type.
    Displacements are measured from the beginning of
    the structure.

21
Derived data types
  • Use of derived data types involve the following
    steps
  • Construct define the new data type
  • Commit the new data type
  • Use the new type in message passing
    (send/receive) calls.
  • Optionally any no-longer needed data-types can be
    freed.
  • CONSTRUCTING THE NEW DATA TYPE
  • Rather than a single routine, the MPI library
    provides a set of routines for constructing new
    data_types, each one suitable for a particular
    form of data.
  • These being
  • Contiguous
  • Vector
  • Indexed
  • Structure

22
Derived data types
  • The following routines help construct new data
    types
  • MPI_Type_Contiguous
  • MPI_Type_vector, MPI_Type_hvector,
  • MPI_Type_indexed, MPI_Type_hindexed
  • MPI_Create_type_struct
  • MPI_Type_Contiguous will allow you to refer to a
    contiguous vector of a primitive type as a new
    type to be used in communications. A bit like
    being able to reference a matrix with its name
    only.
  • MPI_Type_vector will allow us to refer to a
    collection of elements that are seperated from
    each other by constant strides. For example
    elements (1) , (3) , (5) , (7) . of an existing
    vector as our unit.
  • MPI_Type_indexed allows the vector strides to
    vary in a predefined manner which is not possible
    to define using MPI_Type_vector.
  • We shall study only MPI_Type_struct as an
    example, as it is the most general and complex of
    all types.

23
MPI_Create_type_struct
  • FORTRAN
  • MPI_CREATE_TYPE_STRUCT( COUNT,BLOCK_LENGHTS,DISPLA
    CEMENTS, TYPES,NEWTYPE,IERROR )
  • INTEGER COUNT, LENGTHS() , DISPLACEMENTS()
    ,TYPES() ,NEWTYPE,IERROR )
  • C
  • Int MPI_Create_type_struct(int count, int
    block_lengths, MPI_Aint displacements,
    MPI_Datatype types , MPI_Datatype newtype )?
  • The data is made up of (COUNT) blocks. Each
    block(i) is made up of
  • block_lenghts(i) number of data of types(i).
    Displacement of each block within the type is
    given by displacements(i) in BYTES.
  • When the type is successfully created NEWTYPE
    returns a handle to the new data type that can be
    used in subsequent send/receive calls. For
    example a structure which is made up of 2
    integers followed by 6 reals followed by a
    character string of 5 characters is seen as a
    structure which is 3-blocks, lengths of which are
    2,6,5 respectively and data_types are
    (MPI_INTEGER, MPI_REAL, MPI_CHARACTER )
    Displacements are (0 , 4,28 ) (BYTES)?
  • NOTEThe MPI1 standards define the function name
    as MPI_TYPE_STRUCT
  • which was changed in MPI2 . So, the old name is
    also valid.

24
MPI_Type_commit
  • Once a type is constructed, it can be committed
    for use by invoking this function. This allows
    us to send messages of new-type by using all
    the MPI message communications routines.
  • FORTRAN
  • MPI_TYPE_COMMIT( DATATYPE, IERROR )?
  • C
  • MPI_Type_commit( MPI_Datatype datatype )

25
Timers
  • Double precision MPI functions
  • Fortran, DOUBLE PRECISION t1
  • t1 MPI_WTIME()
  • C double t1
  • t1 MPI_Wtime()
  • C double t1
  • t1 MPIWtime()
  • Time is measured in seconds.
  • Time to perform a task is measured by consulting
    the timer before and after.

26
  • Collective Communication

27
Overview
  • Introduction characteristics
  • Barrier Synchronisation
  • Global reduction operations
  • Predefined operations
  • Broadcast
  • Scatter
  • Gather
  • Partial sums
  • Exercise

28
Collective communications
  • Are higher-level routines involving several
    processes at a time.
  • Can be built out of point-to-point
    communications.
  • Examples are
  • Barriers
  • Broadcast
  • Reduction operations

29
Collective Communication
  • Communications involving a group of processes.
  • Called by all processes in a communicator.
  • Examples
  • Broadcast, scatter, gather (Data Distribution)?
  • Global sum, global maximum, etc. (Reduction
    Operations)?
  • Barrier synchronisation
  • Characteristics
  • Collective communication will not interfere with
    point-to-point communication and vice-versa.
  • All processes must call the collective routine.
  • Synchronization not guaranteed (except for
    barrier)?
  • No non-blocking collective communication
  • No tags
  • Receive buffers must be exactly the right size

30
Collective Communications(one for all, all for
one!!!)?
  • Collective communication is defined as that which
    involves all the processes in a group. Collective
    communication routines can be divided into the
    following broad categories
  • Barrier synchronisation
  • Broadcast from one to all.
  • Scatter from one to all
  • Gather from all to one.
  • Scatter/Gather. From all to all.
  • Global reduction (distribute elementary
    operations)?
  • IMPORTANT NOTE Collective Communication
    operations and point-to-point operations we have
    seen earlier are invisible to each other and
    hence do not interfere with each other.
  • This is important to avoid dead-locks due to
    interference.

31
BARRIER SYNCHRONIZATION
T I M E
B A R R I E R STATEMENT
Here, there are seven processes running and three
of them are waiting idle at the barrier statement
for the other four to catch up.
32
Graphic Representations of Collective
Communication Types
P R O C E S S E S
ALLGATHER
BROADCAST
SCATTER
ALLTOALL
GATHER
D A T A
D A T A
D A T A
D A T A
33
Barrier Synchronisation
  • Each processes in communicator waits at barrier
    until all processes encounter the barrier.
  • Fortran
  • INTEGER comm, error
  • CALL MPI_BARRIER(comm, error)?
  • C
  • MPI_Barrier(MPI_Comm comm)
  • C
  • Comm.Barrier()
  • E.g.
  • MPICOMM_WORLD.Barrier()

34
Global reduction operations
  • Used to compute a result involving data
    distributed over a group of processes
  • Global sum or product
  • Global maximum or minimum
  • Global user-defined operation

35
Predefined operations
36
MPI_Reduce
  • Performs count operations (o) on individual
    elements of sendbuf between processes

Rank
0
1
BoEoH
2
AoDoG
37
MPI_Reduce syntax
  • Fortran
  • INTEGER count, type, count, rtype, root, comm,
    error
  • CALL MPI_REDUCE(sbuf, rbuf, count, rtype, op,
    root, comm, error)?
  • C
  • MPI_Reduce(void sbuf, void rbuf, int count,
    MPI_Datatype datatype, MPI_Op op, int root,
    MPI_Comm comm)
  • C
  • CommReduce(const void sbuf, void recvbuf, int
    count, const MPIDatatype datatype, const
    MPIOp op, int root)

38
MPI_Reduce example
  • Integer global sum
  • Fortran
  • INTEGER x, result, error
  • CALL MPI_REDUCE(x, result, 1, MPI_INTEGER,
    MPI_SUM, 0, MPI_COMM_WORLD, error)?
  • C
  • int x, result
  • MPI_Reduce(x, result, 1, MPI_INT, MPI_SUM, 0,
    MPI_COMM_WORLD)
  • C
  • int x, result
  • MPICOMM_WORLD.Reduce(x, result, 1, MPIINT,
    MPISUM)

39
MPI_Allreduce
  • No root process
  • All processes get results of reduction operation

Rank
0
1
2
AoDoG
40
MPI_Allreduce syntax
  • Fortran
  • INTEGER count, type, count, rtype, comm, error
  • CALL MPI_ALLREDUCE(sbuf, rbuf, count, rtype, op,
    comm, error)?
  • C
  • MPI_Allreduce(void sbuf, void rbuf, int count,
    MPI_Datatype datatype, MPI_Op op, MPI_Comm comm)
  • C
  • Comm.Allreduce(const void sbuf, void recvbuf,
    int count, const MPIDatatype datatype, const
    MPIOp op)

41
Practice Session 3
  • Using reduction operations
  • This example shows the use of the continued
    fraction method of calculating pi and makes each
    processor calculate a different portion of the
    expansion series.

42
Broadcast
  • Duplicates data from root process to other
    processes in communicator

A




Broadcast
A
A
A
A
A
Rank
0
1
2
3
43
Broadcast syntax
  • Fortran
  • INTEGER count, datatype, root, comm, error
  • CALL MPI_BCAST(buffer, count, datatype, root,
    comm, error)?
  • C
  • MPI_Bcast (void buffer, int count, MPI_Datatype
    datatype, int root, MPI_Comm comm)
  • C
  • Comm.Bcast(void buffer, int count, const
    MPIDatatype datatype, int root)
  • E.g broadcasting 10 integers from rank 0
  • int tenints10
  • MPICOMM_WORLD.Bcast(tenints, 10, MPIINT, 0)

44
Scatter
  • Distributes data from root process amongst
    processors within communicator.




Scatter
A
D
C
Rank
0
1
2
3
45
Scatter syntax
  • scount (and rcount) is number of elements each
    process is sent (i.e. no received)?
  • Fortran
  • INTEGER scount, stype, rcount, rtype, root, comm,
    error
  • CALL MPI_SCATTER(sbuf, scount, stype, rbuf,
    rcount, rtype, root, comm, error)?
  • C
  • MPI_Scatter(void sbuf, int scount, MPI_Datatype
    stype, void rbuf, int rcount, MPI_Datatype
    rtype, root, comm)
  • C
  • Comm.Scatter(const void sbuf, int scount, const
    MPIDatatype stype, void rbuf, int rcount,
    const MPIDatatype rtype, int root)

46
Gather
  • Collects data distributed amongst processes in
    communicator onto root process ( Collection done
    in rank order ) .

B
A
D
C
Gather
A
D
C
Rank
0
1
2
3
47
Gather syntax
  • Takes same arguments as Scatter operation
  • Fortran
  • INTEGER scount, stype, rcount, rtype, root, comm,
    error
  • CALL MPI_GATHER(sbuf, scount, stype, rbuf,
    rcount, rtype, root, comm, error)?
  • C
  • MPI_Gather(void sbuf, int scount, MPI_Datatype
    stype, void rbuf, int rcount, MPI_Datatype
    rtype, root, comm)
  • C
  • Comm.Gather(const void sbuf, int scount, const
    MPIDatatype stype, void rbuf, int rcount,
    const MPIDatatype rtype, int root)

48
All Gather
  • Collects all data on all processes in communicator

B
A
D
C
Gather
B
A
D
C
Rank
0
1
2
3
49
All Gather syntax
  • As Gather but no root defined.
  • Fortran
  • INTEGER scount, stype, rcount, rtype, comm, error
  • CALL MPI_GATHER(sbuf, scount, stype, rbuf,
    rcount, rtype, comm, error)?
  • C
  • MPI_Gather(void sbuf, int scount, MPI_Datatype
    stype, void rbuf, int rcount, MPI_Datatype
    rtype, comm)
  • C
  • Comm.Gather(const void sbuf, int scount, const
    MPIDatatype stype, void rbuf, int rcount,
    const MPIDatatype rtype)

50
MPI_Scan
  • Performs a partial reductions
  • E.g. partial sum

Rank
0
A
1
AoD
2
AoDoG
51
MPI_Scan syntax
  • Fortran
  • INTEGER count, type, count, rtype, comm, error
  • CALL MPI_SCAN(sbuf, rbuf, count, rtype, op, comm,
    error)?
  • C
  • MPI_Scan(void sbuf, void rbuf, int count,
    MPI_Datatype datatype, MPI_Op op, MPI_Comm comm)
  • C
  • Comm.Scan(const void sbuf, void recvbuf, int
    count, const MPIDatatype datatype, const
    MPIOp op)

52
Practice Session 4 diffusion example
  • Arrange processes to communicate round a ring.
  • Each process stores a copy of its rank in an
    integer variable.
  • Each process communicates this value to its right
    neighbour and receives a value from its left
    neighbour.
  • Each process computes the sum of all the values
    received.
  • Repeat for the number of processes involved and
    print out the sum stored at each process.

53
Generating Cartesian Topologies
  • MPI_Cart_create
  • Makes a new communicator to which topology
    information has been attached
  • MPI_Cart_coords
  • Determines process coords in cartesian topology
    given rank in group
  • MPI_Cart_shift
  • Returns the shifted source and destination ranks,
    given a shift direction and amount

54
MPI_Cart_create syntax
  • Fortran
  • INTEGER comm_old, ndims, dims(), comm_cart,
    ierror logical periods(), reorder
  • CALL MPI_CART_CREATE(comm_old, ndims, dims,
    periods, reorder, comm_cart, ierror)
  • C
  • MPI_Cart_create(MPI_Comm comm_old, int ndims, int
    dims, int periods, int reorder, MPI_Comm
    comm_cart )
  • C
  • MPIIntracommCreate_cart (int ndims, const int
    dims, const bool periods, bool reorder )

55
MPI_Cart_coords syntax
  • Fortran
  • CALL MPI_CART_COORDS(INTEGER COMM,INTEGER
    RANK,INTEGER MAXDIMS,INTEGER COORDS(),INTEGER
    IERROR)
  • C
  • int MPI_Cart_coords(MPI_Comm comm,int rank,int
    maxdims,int coords)
  • C
  • void MPICartcommGet_coords(int rank, int
    maxdims, int coords) const

56
MPI_Cart_shift syntax
  • Fortran
  • MPI_CART_SHIFT(INTEGER COMM,INTEGER
    DIRECTION,INTEGER DISP, INTEGER
    RANK_SOURCE,INTEGER RANK_DEST,INTEGER IERROR)?
  • C
  • int MPI_Cart_shift(MPI_Comm comm,int
    direction,int disp,int rank_source,int
    rank_dest)
  • C
  • void MPICartcommShift(int direction, int
    disp, int rank_source, int rank_dest) const

57
Topologies Examples
  • See Diffusion example
  • See cartesian example

58
Examples for Parallel Programming
  • Master slave
  • E.g. share work example
  • Example ising model
  • Communicating Sequential Elements Pattern
  • Poisson equation
  • Highly coupled processes
  • Systolic loop algorithm
  • E.g. md example

59
Poisson Solver Using Jacobi Iteration
  • Communicating Sequential Elements Pattern
  • Operations in each component depend on partial
    results in neighbour components.

Thread
Thread
Thread
Slave
Slave
Slave
Data Exchange
Data Exchange
Thread
Thread
Thread
Slave
Slave
Slave
60
Layered Decomposition of 2d Array
  • Distribute 2d array across processors
  • Processors store all columns
  • Rows allocated amongst processors
  • Each proc has left proc and right proc
  • Each proc has max and min vertex that it stores
  • Uijnew(Ui1jUi-1jUij1Uij-1)/4
  • Each proc has a ghost layer
  • Used in calculation of update (see above)?
  • Obtained from neighbouring left and right
    processors
  • Pass top and bottom layers to neighbouring
    processors
  • Become neighbours ghost layers
  • Distribute rows over processors N/nproc rows per
    proc
  • Every processor stores all N columns

61
N1
N
Processor 1
p1min
p2max
p1min
p2max
Processor 2
p2min
Send top layer
p3max
Receive bottom layer
p2min
p3max
Processor 3
Send bottom layer
Receive top layer
Processor 4
1
N1
62
Master Slave
Thread
Data Exchange
Slave
Thread
Slave
Master
Thread
Slave
  • A computation is required where independent
    computations are performed, perhaps repeatedly,
    on all elements of some ordered data.
  • Example
  • Image processing perform computation on different
    sets of pixels within an image

63
Highly Coupled Efficient Element Exchange
  • Highly Coupled Efficient Element Exchange using
    Systolic loop techniques
  • Extreme example of Communicating Sequential
    Elements Pattern

64
Systolic Loop
  • Distribute Elements Over Processors
  • Three buffers
  • Local elements
  • Travelling Elements (local elements at start)?
  • Send buffer
  • Loop over number of processors
  • Transfer travelling elements
  • Interleave send/receive to prevent deadlock
  • Send contents of send buffer to next proc
  • Receive buffer from previous proc to travelling
    elements
  • Point travelling elements to send buffer
  • Allow local elements to interact with travelling
    elements
  • Accumulate reduced computations over processors

65
Systolic Loop Element Pump
First cycle of 3 for 4 processor systolic loop
Proc 2
Proc 1
Proc 3
Proc 4
Local Elements
Local Elements
Local Elements
Local Elements
Moving Elements (from 1)?
Moving Elements (from 4)?
Moving Elements (from 2)?
Moving Elements (from 3)?
66
Practice Sessions 5 and 6
  • Defining and Using Processor Topologies
  • Patterns for parallel computing

67
Further Information
  • All MPI routines have a UNIX man page
  • Use C-style definition for Fortran/C/C
  • E.g. man MPI_Finalize will give correct syntax
    and information for Fortran, C and C calls.
  • Designing and building parallel programs (Ian
    Foster)?
  • http//www-unix.mcs.anl.gov/dbpp/
  • Standard documents
  • http//www.mpi-forum.org/
  • Many books and information on web.
  • EPCC documents.
Write a Comment
User Comments (0)
About PowerShow.com