High Performance Parallel Programming - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

High Performance Parallel Programming

Description:

int i, length, ierror; MPI_Status status; double start, finish, time; ... for (length = 1; length = 100000; length = 1000) { for (i = 1; i = 100; i ) ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 31
Provided by: dirk71
Learn more at: http://www.cloudbus.org
Category:

less

Transcript and Presenter's Notes

Title: High Performance Parallel Programming


1
High Performance Parallel Programming
  • Dirk van der Knijff
  • Advanced Research Computing
  • Information Division

2
High Performance Parallel Programming
  • Lecture 4 Message Passing Interface 3

3
So Far..
  • Messages
  • source, dest, data, tag, communicator
  • Communicators
  • MPI_COMM_WORLD
  • Point-to-point communications
  • different modes - standard, synchronous,
    buffered, ready
  • blocking vs non-blocking
  • Derived datatypes
  • construct then commit

4
Ping-pong exercise program
  • /
  • This file has been written as a sample
    solution to an exercise in a
  • course given at the Edinburgh Parallel
    Computing Centre. It is made
  • freely available with the understanding that
    every copy of this file
  • must include this header and that EPCC takes
    no responsibility for
  • the use of the enclosed teaching material.
  • Authors Joel Malard, Alan Simpson
  • Contact epcc-tec_at_epcc.ed.ac.uk
  • Purpose A program to experiment with
    point-to-point
  • communications.
  • Contents C source code.

  • /

5
  • include ltstdio.hgt
  • include ltmpi.hgt
  • define proc_A 0
  • define proc_B 1
  • define ping 101
  • define pong 101
  • float buffer100000
  • long float_size
  • void processor_A (void), processor_B (void)
  • void main ( int argc, char argv )
  • int ierror, rank, size
  • extern long float_size
  • MPI_Init(argc, argv)
  • MPI_Type_extent(MPI_FLOAT, float_size)
  • MPI_Comm_rank(MPI_COMM_WORLD, rank)
  • if (rank proc_A)
  • processor_A()
  • else if (rank proc_B)

6
  • void processor_A( void )
  • int i, length, ierror
  • MPI_Status status
  • double start, finish, time
  • extern float buffer100000
  • extern long float_size
  • printf("Length\tTotal Time\tTransfer
    Rate\n")
  • for (length 1 length lt 100000 length
    1000)
  • start MPI_Wtime()
  • for (i 1 i lt 100 i)
  • MPI_Ssend(buffer, length, MPI_FLOAT,
    proc_B, ping,
  • MPI_COMM_WORLD)
  • MPI_Recv(buffer, length, MPI_FLOAT,
    proc_B, pong,
  • MPI_COMM_WORLD, status)
  • finish MPI_Wtime()
  • time finish - start
  • printf("d\tf\tf\n", length, time/200.,

7
  • void processor_B( void )
  • int i, length, ierror
  • MPI_Status status
  • extern float buffer100000
  • for (length 1 length lt 100000 length
    1000)
  • for (i 1 i lt 100 i)
  • MPI_Recv(buffer, length, MPI_FLOAT,
    proc_A, ping,
  • MPI_COMM_WORLD, status)
  • MPI_Ssend(buffer, length, MPI_FLOAT,
    proc_A, pong,
  • MPI_COMM_WORLD)

8
Ping-pong exercise results
9
Ping-pong exercise results 2
10
Running ping-pong
  • compile
  • mpicc ping_pong.c -o ping_pong
  • submit
  • qsub ping_pong.sh
  • where ping_pong.sh is
  • PBS -q exclusive
  • PBS -l nodes2
  • cd ltyour sub_directorygt
  • mpirun ping_pong

11
Collective communication
  • Communications involving a group of processes
  • Called by all processes in a communicator
  • for sub-groups need to form a new communicator
  • Examples
  • Barrier synchronisation
  • Broadcast, Scatter, Gather
  • Global sum, Global maximum, etc.

12
Characteristics
  • Collective action over a communicator
  • All processes must communicate
  • Synchronisation may or may not occur
  • All collective operations are blocking
  • No tags
  • Recieve buffers must be exactly the right size
  • Collective communications and point-to-point
    communications cannot interfere

13
MPI_Barrier
  • Blocks each calling process until all other
    members have also called it.
  • Generally used to synchronise between phases of a
    program
  • Only one argument - no data is exchanged
  • MPI_Barrier(comm)

14
Broadcast
  • Copies data from a specified root process to all
    other processes in communicator
  • all processes must specify the same root
  • other aguments same as for point-to-point
  • datatypes and sizes must match
  • MPI_Bcast(buffer, count, datatype, root, comm)
  • Note MPI does not support a multicast function

15
Scatter, Gather
  • Scatter and Gather are inverse operations
  • Note that all processes partake - even root
  • Scatter

16
Gather
  • Gather

17
MPI_Scatter, MPI_Gather
  • MPI_Scatter(sendbuf, sendcount,
    sendtype, recvbuf, recvcount, recvtype, root,
    comm)
  • MPI_Gather(sendbuf, sendcount,
    sendtype, recvbuf, recvcount, recvtype, root,
    comm)
  • sendcount in scatter and recvcount in
    gatherrefer to the size of each individual
    message
  • (sendtype recvtype gt sendcount recvcount)
  • total type signatures must match

18
Example
  • MPI_Comm comm
  • int gsize, sendarray100
  • int root, myrank, rbuf
  • MPI_Datatype rtype
  • ...
  • MPI_Comm_rank(comm, myrank)
  • MPI_Comm_size(comm, gsize)
  • MPI_Type_contigous(100, MPI_INT, rtype)
  • MPI_Type_commit(rtype)
  • if (myrank root)
  • rbuf (int )malloc(gsize100sizeof(int))
  • MPI_Gather(sendarray, 100, MPI_INT, rbuf, 1,
    rtype, root, comm)

19
More routines
  • MPI_Allgather(sendbuf, sendcount, sendtype,
    recvbuf, recvcount, recvtype, comm)
  • MPI_Alltoall(sendbuf, sendcount, sendtype,
    recvbuf, recvcount, recvtype, comm)

20
Vector routines
  • MPI_Scatterv(sendbuf, sendcount, displs,
    sendtype, recvbuf, recvcount, recvtype, root,
    comm)
  • MPI_Gatherv(sendbuf, sendcount, sendtype,
    recvbuf, recvcount, displs, recvtype, root, comm)
  • MPI_Allgatherv(sendbuf, sendcount, sendtype,
    recvbuf, recvcount, displs, recvtype, comm)
  • MPI_Alltoallv(sendbuf, sendcount, sdispls,
    sendtype, recvbuf, recvcount, rdispls, recvtype,
    comm)
  • Allow send/recv to be from/to non-contiguous
    locationsin an array
  • Useful if sending different counts at different
    times

21
Global reduction routines
  • Used to compute a result which depends on data
    distributed over a number of processes
  • Examples
  • global sum or product
  • global maximum or minimum
  • global user-defined operation
  • Operation should be associative
  • aside remember floating-point operations
    technically arent associative but we usually
    dont care - can affect results in parallel
    programs though

22
Global reduction (cont.)
  • MPI_Reduce(sendbuf, recvbuf, count, datatype, op,
    root, comm)
  • combines count elements from each sendbuf using
    op and leaves results in recvbuf on process root
  • e.g.
  • MPI_Reduce(s, r, 2, MPI_INT, MPI_SUM, 1, comm)

r
r
r
r
r
2
1
3
1
1
3
1
2
1
2
s
s
s
s
s
r
r
r
r
r
2
1
3
1
1
2
1
3
1
1
s
s
s
s
s
8
9
23
Reduction operators
  • MPI_MAX Maximum
  • MPI_MIN Minumum
  • MPI_SUM Sum
  • MPI_PROD Product
  • MPI_LAND Logical AND
  • MPI_BAND Bitewise AND
  • MPI_LOR Logical OR
  • MPI_BOR Bitwise OR
  • MPI_LXOR Logical XOR
  • MPI_BXOR Bitwise XOR
  • MPI_MAXLOC Max value and location
  • MPI_MINLOC Min value and location

24
User-defined operators
  • In C the operator is defined as a function of
    type
  • typedef void MPI_User_function(void invec,
    void inoutvec, int len, MPI_Datatype
    datatype)
  • In Fortran must write a function as
  • function ltuser_functiongt(invec(),
    inoutvec(), len, type)
  • where the function has the following schema
  • for (i 1 to len)
  • inoutvec(i) inoutvec(i) op invec(i)
  • Then
  • MPI_Op_create(user_function, commute, op)
  • returns a handle op of type MPI_Op

25
Variants
  • MPI_Allreduce(sendbuf, recvbuf, count, datatype,
    op, comm)
  • All processes invloved receive identical results
  • MPI_Reduce_scatter(sendbuf, recvbuf, recvcounts,
    datatype, op, comm)
  • Acts as if a reduce was performed and then each
    process recieves recvcount(myrank) elements of
    the result.

26
Reduce-scatter
  • MPI_INT s, r, rc
  • int rank, gsize
  • ...
  • rc (/ 1, 2, 0, 1, 1 /)
  • MPI_Reduce-scatter(s, r, rc, MPI_INT, MPI_SUM,
    comm)

27
Scan
  • MPI_Scan(sendbuf, recvbuf, count, datatype, op,
    comm)
  • Performs a prefix reduction on data across group
  • recvbuf(myrank) op(sendbuf((i,i1,myrank)))
  • MPI_Scan(s, r, 5, MPI_INT, MPI_SUM, comm)

28
Further topics
  • Error-handling
  • Errors are handled by an error handler
  • MPI_ERRORS_ARE_FATAL - default for MPI_COMM_WORLD
  • MPI_ERRORS_RETURN - MPI state is undefined
  • MPI_Error_string(errorcode, string, resultlen)
  • Message probing
  • Messages can be probed
  • Note - wildcard reads may receive a different
    message
  • blocking and non-blocking
  • Persistent communications

29
Assignment 2.
  • Write a general procedure to multiply 2 matrices.
  • Start with
  • http//www.hpc.unimelb.edu.au/cs/assignment2/
  • This is a harness for last years assignment
  • Last year I asked them to optimise first
  • This year just parallelize
  • Next Tuesday I will discuss strategies
  • That doesnt mean dont start now
  • Ideas available in various places

30
High Performance Parallel Programming
  • Tomorrow - matrix multiplication
Write a Comment
User Comments (0)
About PowerShow.com