Introduction to Parallel Computing - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

Introduction to Parallel Computing

Description:

LAM. Cray T3E. MPICH-T3E. Unix / Windows NT. MPICH. Programming ... http://www.lam-mpi.org/ http://epcc.ed.ac.uk/chimp/ http://www-unix.mcs.anl.gov/mpi/www/www3 ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 52
Provided by: nus5
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Parallel Computing


1
Introduction to Parallel Computing
  • Part IIb

2
What is MPI?
  • Message Passing Interface (MPI) is a
  • standardised interface. Using this interface,
  • several implementations have been made.
  • The MPI standard specifies three forms of
  • subroutine interfaces
  • Language independent notation
  • Fortran notation
  • C notation.

3
MPI Features
  • MPI implementations provide
  • Abstraction of hardware implementation
  • Synchronous communication
  • Asynchronous communication
  • File operations
  • Time measurement operations

4
Implementations
5
Programming with MPI
  • What is the difference between programming
  • using the traditional approach and the MPI
  • approach
  • Use of MPI library
  • Compiling
  • Running

6
Compiling (1)
  • When a program is written, compiling it
  • should be done a little bit different from the
  • normal situation. Although details differ for
  • various MPI implementations, there are
  • two frequently used approaches.

7
Compiling (2)
  • First approach
  • Second approach

gcc myprogram.c o myexecutable -lmpi
mpicc myprogram.c o myexecutable
8
Running (1)
  • In order to run an MPI-Enabled application
  • we should generally use the command
  • mpirun
  • Where x is the number of processes to use,
  • and ltparametersgt are the arguments to the
  • Executable, if any.

mpirun np x myexecutable ltparametersgt
9
Running (2)
  • The mpirun program will take care of the
  • creation of processes on selected processors.
  • By default, mpirun will decide which
  • processors to use, this is usually determined
  • by a global configuration file. It is possible
  • to specify processors, but they may only be
  • used as a hint.

10
MPI Programming (1)
  • Implementations of MPI support Fortran, C,
  • or both. Here we only consider programming
  • using the C Libraries. The first step in writing
  • a program using MPI is to include the correct
  • header

include mpi.h
11
MPI Programming (2)
include mpi.h int main (int argc, char
argv) MPI_Init(argc, argv)
MPI_Finalize() return
12
MPI_Init
  • int MPI_Init (int argc, char argv)
  • The MPI_Init procedure should be called
  • before any other MPI procedure (except
  • MPI_Initialized). It must be called exactly
  • once, at program initialisation. If removes
  • the arguments that are used by MPI from the
  • argument array.

13
MPI_Finalize
  • int MPI_Finalize (void)
  • This routine cleans up all MPI states. It should
  • be the last MPI routine to be called in a
  • program no other MPI routine may be called
  • after MPI_Finalize. Pending communication
  • should be finished before finalisation.

14
Using multiple processes
  • When running an MPI enabled program using
  • multiple processes, each process will run an
  • identical copy of the program. So there must
  • be a way to know which process we are.
  • This situation is comparable to that of
  • programming using the fork statement. MPI
  • defines two subroutines that can be used.

15
MPI_Comm_size
  • int MPI_Comm_size (MPI_Comm comm, int size)
  • This call returns the number of processes
  • involved in a communicator. To find out how
  • many processes are used in total, call this
  • function with the predefined global
  • communicator MPI_COMM_WORLD.

16
MPI_Comm_rank
  • int MPI_Comm_rank (MPI_Comm comm, int rank)
  • This procedure determines the rank (index) of
  • the calling process in the communicator. Each
  • process is assigned a unique number within a
  • communicator.

17
MPI_COMM_WORLD
  • MPI communicators are used to specify to
  • what processes communication applies to.
  • A communicator is shared by a group of
  • processes. The predefined MPI_COMM_WORLD
  • applies to all processes. Communicators can
  • be duplicated, created and deleted. For most
  • application, use of MPI_COMM_WORLD
  • suffices.

18
Example Hello World!
  • include ltstdio.hgt
  • include "mpi.h"
  • int main (int argc, char argv)
  • int size, rank
  • MPI_Init (argc, argv)
  • MPI_Comm_size (MPI_COMM_WORLD, size)
  • MPI_Comm_rank (MPI_COMM_WORLD, rank)
  • printf ("Hello world! from processor
    (d/d)\n", rank1, size)
  • MPI_Finalize()
  • return 0

19
Running Hello World!
  • mpicc -o hello hello.c
  • mpirun -np 3 hello
  • Hello world! from processor (1/3)
  • Hello world! from processor (2/3)
  • Hello world! from processor (3/3)
  • _

20
MPI_Send
  • int MPI_Send (void buf, int count, MPI_Datatype
    datatype,
  • int dest, int tag,
    MPI_Comm comm )
  • Synchronously sends a message to dest. Data
  • is found in buf, that contains count elements
  • of datatype. To identify the send, a tag has to
  • be specified. The destination dest is the
  • processor rank in communicator comm.

21
MPI_Recv
  • int MPI_Recv (void buf, int count, MPI_Datatype
    datatype,
  • int source, int tag,
    MPI_Comm comm,
  • MPI_Status status)
  • Synchronously receives a message from source.
  • Buffer must be able to hold count elements of
  • datatype. The status field is filled with status
  • information. MPI_Recv and MPI_Send calls
  • should match equal tag, count, datatype.

22
Datatypes
  • MPI_CHAR signed char
  • MPI_SHORT signed short int
  • MPI_INT signed int
  • MPI_LONG signed long int
  • MPI_UNSIGNED_CHAR unsigned char
  • MPI_UNSIGNED_SHORT unsigned short int
  • MPI_UNSIGNED unsigned int
  • MPI_UNSIGNED_LONG unsigned long int
  • MPI_FLOAT float
  • MPI_DOUBLE double
  • MPI_LONG_DOUBLE long double
  • (http//www-jics.cs.utk.edu/MPI/MPIguide/MPIguide.
    html)

23
Example send / receive
  • include ltstdio.hgt
  • include "mpi.h"
  • int main (int argc, char argv)
  • MPI_Status s
  • int size, rank, i, j
  • MPI_Init (argc, argv)
  • MPI_Comm_size (MPI_COMM_WORLD, size)
  • MPI_Comm_rank (MPI_COMM_WORLD, rank)
  • if (rank 0) // Master process
  • printf ("Receiving data . . .\n")
  • for (i 1 i lt size i)
  • MPI_Recv ((void )j, 1, MPI_INT, i,
    0xACE5, MPI_COMM_WORLD, s)
  • printf ("d sent d\n", i, j)

24
Running send / receive
  • mpicc -o sendrecv sendrecv.c
  • mpirun -np 4 sendrecv
  • Receiving data . . .
  • 1 sent 1
  • 2 sent 4
  • 3 sent 9
  • _

25
MPI_Bcast
  • int MPI_Bcast (void buffer, int count,
    MPI_Datatype datatype,
  • int root, MPI_Comm
    comm)
  • Synchronously broadcasts a message from
  • root, to all processors in communicator comm
  • (including itself). Buffer is used as source in
  • root processor, as destination in others.

26
MPI_Barrier
  • int MPI_Barrier (MPI_Comm comm)
  • Blocks until all processes defined in comm
  • have reached this routine. Use this routine to
  • synchronize processes.

27
Example broadcast / barrier
  • int main (int argc, char argv)
  • int rank, i
  • MPI_Init (argc, argv)
  • MPI_Comm_rank (MPI_COMM_WORLD, rank)
  • if (rank 0) i 27
  • MPI_Bcast ((void )i, 1, MPI_INT, 0,
    MPI_COMM_WORLD)
  • printf ("d i d\n", rank, i)
  • // Wait for every process to reach this code
  • MPI_Barrier (MPI_COMM_WORLD)
  • MPI_Finalize()
  • return 0

28
Running broadcast / barrier
  • mpicc -o broadcast broadcast.c
  • mpirun -np 3 broadcast
  • 0 i 27
  • 1 i 27
  • 2 i 27
  • _

29
MPI_Sendrecv
  • int MPI_Sendrecv (void sendbuf, int sendcount,
    MPI_Datatype sendtype,
  • int dest, int sendtag,
  • void recvbuf, int recvcount,
    MPI_Datatype recvtype,
  • int source, int recvtag, MPI_Comm
    comm, MPI_Status status)
  • int MPI_Sendrecv_replace( void buf, int count,
    MPI_Datatype datatype,
  • int
    dest, int sendtag, int source, int recvtag,

  • MPI_Comm comm, MPI_Status status )
  • Send and receive (2nd, using only one buffer).

30
Other useful routines
  • MPI_Scatter
  • MPI_Gather
  • MPI_Type_vector
  • MPI_Type_commit
  • MPI_Reduce / MPI_Allreduce
  • MPI_Op_create

31
Example scatter / reduce
  • int main (int argc, char argv)
  • int data 1, 2, 3, 4, 5, 6, 7 // Size
    must be gt processors
  • int rank, i -1, j -1
  • MPI_Init (argc, argv)
  • MPI_Comm_rank (MPI_COMM_WORLD, rank)
  • MPI_Scatter ((void )data, 1, MPI_INT,
  • (void )i , 1, MPI_INT,
  • 0, MPI_COMM_WORLD)
  • printf ("d Received i d\n", rank, i)
  • MPI_Reduce ((void )i, (void )j, 1,
    MPI_INT,
  • MPI_PROD, 0, MPI_COMM_WORLD)
  • printf ("d j d\n", rank, j)
  • MPI_Finalize()

32
Running scatter / reduce
  • mpicc -o scatterreduce scatterreduce.c
  • mpirun -np 4 scatterreduce
  • 0 Received i 1
  • 0 j 24
  • 1 Received i 2
  • 1 j -1
  • 2 Received i 3
  • 2 j -1
  • 3 Received i 4
  • 3 j -1
  • _

33
Some reduce operations
34
Measuring running time
  • double MPI_Wtime (void)

double timeStart, timeEnd ... timeStart
MPI_Wtime() // Code to measure time for goes
here. timeEnd MPI_Wtime() ... printf (Running
time f seconds\n, timeEnd
timeStart)
35
Parallel sorting (1)
  • Sorting an sequence of numbers using the
  • binarysort method. This method divides
  • a given sequence into two halves (until
  • only one element remains) and sorts both
  • halves recursively. The two halves are then
  • merged together to form a sorted sequence.

36
Binary sort pseudo-code
  • sorted-sequence BinarySort (sequence)
  • if ( elements in sequence gt 1)
  • seqA first half of sequence
  • seqB second half of sequence
  • BinarySort (seqA)
  • BinarySort (seqB)
  • sorted-sequence merge (seqA, seqB)
  • else sorted-sequence sequence

37
Merge two sorted sequences
1
7
8
4
5
6
2
3
38
Example binary sort
39
Parallel sorting (2)
  • This way of dividing work and gathering the
  • results is a quite natural way to use for a
  • parallel implementation. Divide work in two
  • to two processors. Have each of these
  • processors divide their work again, until either
  • no data can be split again or no processors are
  • available anymore.

40
Implementation problems
  • Number of processors may not be a power of two
  • Number of elements may not be a power of two
  • How to achieve an even workload?
  • Data size is less than number of processors

41
Parallel matrix multiplication
  • We use the following partitioning of data (p4)

P1
P1
P2
P2
P3
P3
P4
P4
42
Implementation
  • Master (process 0) reads data
  • Master sends size of data to slaves
  • Slaves allocate memory
  • Master broadcasts second matrix to all other
    processes
  • Master sends respective parts of first matrix to
    all other processes
  • Every process performs its local multiplication
  • All slave processes send back their result.

43
Multiplication 1000 x 1000
44
Multiplication 5000 x 5000
45
Gaussian elimination
  • We use the following partitioning of data (p4)

P1
P1
P2
P2
P3
P3
P4
P4
46
Implementation (1)
  • Master reads both matrices
  • Master sends size of matrices to slaves
  • Slaves calculate their part and allocate memory
  • Master sends each slave its respective part
  • Set sweeping row to 0 in all processes
  • Sweep matrix (see next sheet)
  • Slave send back their result

47
Implementation (2)
  • While sweeping row not past final row do
  • Have every process decide whether they own the
    current sweeping row
  • The owner sends a copy of the row to every other
    process
  • All processes sweep their part of the matrix
    using the current row
  • Sweeping row is incremented

48
Programming hints
  • Keep it simple!
  • Avoid deadlocks
  • Write robust code even at cost of speed
  • Design in advance, debugging is more difficult
    (printing output is different)
  • Error handing requires synchronisation, you cant
    just exit the program.

49
References (1)
  • MPI Forum Home Page
  • http//www.mpi-forum.org/index.html
  • Beginners guide to MPI (see also /MPI/)
  • http//www-jics.cs.utk.edu/MPI/MPIguide/MPIguide.h
    tml
  • MPICH
  • http//www-unix.mcs.anl.gov/mpi/mpich/

50
References (2)
  • Miscellaneous
  • http//www.erc.msstate.edu/labs/hpcl/projects/mpi/
  • http//nexus.cs.usfca.edu/mpi/
  • http//www-unix.mcs.anl.gov/gropp/
  • http//www.epm.ornl.gov/walker/mpitutorial/
  • http//www.lam-mpi.org/
  • http//epcc.ed.ac.uk/chimp/
  • http//www-unix.mcs.anl.gov/mpi/www/www3/

51
Thank you for coming!
Write a Comment
User Comments (0)
About PowerShow.com