Title: Introduction to Parallel Computing
1Introduction to Parallel Computing
2What is MPI?
- Message Passing Interface (MPI) is a
- standardised interface. Using this interface,
- several implementations have been made.
- The MPI standard specifies three forms of
- subroutine interfaces
- Language independent notation
- Fortran notation
- C notation.
3MPI Features
- MPI implementations provide
- Abstraction of hardware implementation
- Synchronous communication
- Asynchronous communication
- File operations
- Time measurement operations
4Implementations
5Programming with MPI
- What is the difference between programming
- using the traditional approach and the MPI
- approach
- Use of MPI library
- Compiling
- Running
6Compiling (1)
- When a program is written, compiling it
- should be done a little bit different from the
- normal situation. Although details differ for
- various MPI implementations, there are
- two frequently used approaches.
7Compiling (2)
- First approach
- Second approach
gcc myprogram.c o myexecutable -lmpi
mpicc myprogram.c o myexecutable
8Running (1)
- In order to run an MPI-Enabled application
- we should generally use the command
- mpirun
- Where x is the number of processes to use,
- and ltparametersgt are the arguments to the
- Executable, if any.
mpirun np x myexecutable ltparametersgt
9Running (2)
- The mpirun program will take care of the
- creation of processes on selected processors.
- By default, mpirun will decide which
- processors to use, this is usually determined
- by a global configuration file. It is possible
- to specify processors, but they may only be
- used as a hint.
10MPI Programming (1)
- Implementations of MPI support Fortran, C,
- or both. Here we only consider programming
- using the C Libraries. The first step in writing
- a program using MPI is to include the correct
- header
include mpi.h
11MPI Programming (2)
include mpi.h int main (int argc, char
argv) MPI_Init(argc, argv)
MPI_Finalize() return
12MPI_Init
- int MPI_Init (int argc, char argv)
- The MPI_Init procedure should be called
- before any other MPI procedure (except
- MPI_Initialized). It must be called exactly
- once, at program initialisation. If removes
- the arguments that are used by MPI from the
- argument array.
13MPI_Finalize
- int MPI_Finalize (void)
- This routine cleans up all MPI states. It should
- be the last MPI routine to be called in a
- program no other MPI routine may be called
- after MPI_Finalize. Pending communication
- should be finished before finalisation.
14Using multiple processes
- When running an MPI enabled program using
- multiple processes, each process will run an
- identical copy of the program. So there must
- be a way to know which process we are.
- This situation is comparable to that of
- programming using the fork statement. MPI
- defines two subroutines that can be used.
15MPI_Comm_size
- int MPI_Comm_size (MPI_Comm comm, int size)
- This call returns the number of processes
- involved in a communicator. To find out how
- many processes are used in total, call this
- function with the predefined global
- communicator MPI_COMM_WORLD.
16MPI_Comm_rank
- int MPI_Comm_rank (MPI_Comm comm, int rank)
- This procedure determines the rank (index) of
- the calling process in the communicator. Each
- process is assigned a unique number within a
- communicator.
17MPI_COMM_WORLD
- MPI communicators are used to specify to
- what processes communication applies to.
- A communicator is shared by a group of
- processes. The predefined MPI_COMM_WORLD
- applies to all processes. Communicators can
- be duplicated, created and deleted. For most
- application, use of MPI_COMM_WORLD
- suffices.
18Example Hello World!
- include ltstdio.hgt
- include "mpi.h"
- int main (int argc, char argv)
- int size, rank
- MPI_Init (argc, argv)
- MPI_Comm_size (MPI_COMM_WORLD, size)
- MPI_Comm_rank (MPI_COMM_WORLD, rank)
- printf ("Hello world! from processor
(d/d)\n", rank1, size) - MPI_Finalize()
- return 0
-
19Running Hello World!
- mpicc -o hello hello.c
- mpirun -np 3 hello
- Hello world! from processor (1/3)
- Hello world! from processor (2/3)
- Hello world! from processor (3/3)
- _
20MPI_Send
- int MPI_Send (void buf, int count, MPI_Datatype
datatype, - int dest, int tag,
MPI_Comm comm ) - Synchronously sends a message to dest. Data
- is found in buf, that contains count elements
- of datatype. To identify the send, a tag has to
- be specified. The destination dest is the
- processor rank in communicator comm.
21MPI_Recv
- int MPI_Recv (void buf, int count, MPI_Datatype
datatype, - int source, int tag,
MPI_Comm comm, - MPI_Status status)
- Synchronously receives a message from source.
- Buffer must be able to hold count elements of
- datatype. The status field is filled with status
- information. MPI_Recv and MPI_Send calls
- should match equal tag, count, datatype.
22Datatypes
- MPI_CHAR signed char
- MPI_SHORT signed short int
- MPI_INT signed int
- MPI_LONG signed long int
- MPI_UNSIGNED_CHAR unsigned char
- MPI_UNSIGNED_SHORT unsigned short int
- MPI_UNSIGNED unsigned int
- MPI_UNSIGNED_LONG unsigned long int
- MPI_FLOAT float
- MPI_DOUBLE double
- MPI_LONG_DOUBLE long double
- (http//www-jics.cs.utk.edu/MPI/MPIguide/MPIguide.
html)
23Example send / receive
- include ltstdio.hgt
- include "mpi.h"
- int main (int argc, char argv)
- MPI_Status s
- int size, rank, i, j
- MPI_Init (argc, argv)
- MPI_Comm_size (MPI_COMM_WORLD, size)
- MPI_Comm_rank (MPI_COMM_WORLD, rank)
- if (rank 0) // Master process
- printf ("Receiving data . . .\n")
- for (i 1 i lt size i)
- MPI_Recv ((void )j, 1, MPI_INT, i,
0xACE5, MPI_COMM_WORLD, s) - printf ("d sent d\n", i, j)
-
-
24Running send / receive
- mpicc -o sendrecv sendrecv.c
- mpirun -np 4 sendrecv
- Receiving data . . .
- 1 sent 1
- 2 sent 4
- 3 sent 9
- _
25MPI_Bcast
- int MPI_Bcast (void buffer, int count,
MPI_Datatype datatype, - int root, MPI_Comm
comm) - Synchronously broadcasts a message from
- root, to all processors in communicator comm
- (including itself). Buffer is used as source in
- root processor, as destination in others.
26MPI_Barrier
- int MPI_Barrier (MPI_Comm comm)
- Blocks until all processes defined in comm
- have reached this routine. Use this routine to
- synchronize processes.
27Example broadcast / barrier
- int main (int argc, char argv)
- int rank, i
- MPI_Init (argc, argv)
- MPI_Comm_rank (MPI_COMM_WORLD, rank)
- if (rank 0) i 27
- MPI_Bcast ((void )i, 1, MPI_INT, 0,
MPI_COMM_WORLD) - printf ("d i d\n", rank, i)
- // Wait for every process to reach this code
- MPI_Barrier (MPI_COMM_WORLD)
- MPI_Finalize()
- return 0
-
28Running broadcast / barrier
- mpicc -o broadcast broadcast.c
- mpirun -np 3 broadcast
- 0 i 27
- 1 i 27
- 2 i 27
- _
29MPI_Sendrecv
- int MPI_Sendrecv (void sendbuf, int sendcount,
MPI_Datatype sendtype, - int dest, int sendtag,
- void recvbuf, int recvcount,
MPI_Datatype recvtype, - int source, int recvtag, MPI_Comm
comm, MPI_Status status) - int MPI_Sendrecv_replace( void buf, int count,
MPI_Datatype datatype, - int
dest, int sendtag, int source, int recvtag, -
MPI_Comm comm, MPI_Status status ) - Send and receive (2nd, using only one buffer).
30Other useful routines
- MPI_Scatter
- MPI_Gather
- MPI_Type_vector
- MPI_Type_commit
- MPI_Reduce / MPI_Allreduce
- MPI_Op_create
31Example scatter / reduce
- int main (int argc, char argv)
- int data 1, 2, 3, 4, 5, 6, 7 // Size
must be gt processors - int rank, i -1, j -1
- MPI_Init (argc, argv)
- MPI_Comm_rank (MPI_COMM_WORLD, rank)
- MPI_Scatter ((void )data, 1, MPI_INT,
- (void )i , 1, MPI_INT,
- 0, MPI_COMM_WORLD)
- printf ("d Received i d\n", rank, i)
- MPI_Reduce ((void )i, (void )j, 1,
MPI_INT, - MPI_PROD, 0, MPI_COMM_WORLD)
- printf ("d j d\n", rank, j)
- MPI_Finalize()
32Running scatter / reduce
- mpicc -o scatterreduce scatterreduce.c
- mpirun -np 4 scatterreduce
- 0 Received i 1
- 0 j 24
- 1 Received i 2
- 1 j -1
- 2 Received i 3
- 2 j -1
- 3 Received i 4
- 3 j -1
- _
33Some reduce operations
34Measuring running time
double timeStart, timeEnd ... timeStart
MPI_Wtime() // Code to measure time for goes
here. timeEnd MPI_Wtime() ... printf (Running
time f seconds\n, timeEnd
timeStart)
35Parallel sorting (1)
- Sorting an sequence of numbers using the
- binarysort method. This method divides
- a given sequence into two halves (until
- only one element remains) and sorts both
- halves recursively. The two halves are then
- merged together to form a sorted sequence.
36Binary sort pseudo-code
- sorted-sequence BinarySort (sequence)
- if ( elements in sequence gt 1)
- seqA first half of sequence
- seqB second half of sequence
- BinarySort (seqA)
- BinarySort (seqB)
- sorted-sequence merge (seqA, seqB)
-
- else sorted-sequence sequence
37Merge two sorted sequences
1
7
8
4
5
6
2
3
38Example binary sort
39Parallel sorting (2)
- This way of dividing work and gathering the
- results is a quite natural way to use for a
- parallel implementation. Divide work in two
- to two processors. Have each of these
- processors divide their work again, until either
- no data can be split again or no processors are
- available anymore.
40Implementation problems
- Number of processors may not be a power of two
- Number of elements may not be a power of two
- How to achieve an even workload?
- Data size is less than number of processors
41Parallel matrix multiplication
- We use the following partitioning of data (p4)
P1
P1
P2
P2
P3
P3
P4
P4
42Implementation
- Master (process 0) reads data
- Master sends size of data to slaves
- Slaves allocate memory
- Master broadcasts second matrix to all other
processes - Master sends respective parts of first matrix to
all other processes - Every process performs its local multiplication
- All slave processes send back their result.
43Multiplication 1000 x 1000
44Multiplication 5000 x 5000
45Gaussian elimination
- We use the following partitioning of data (p4)
P1
P1
P2
P2
P3
P3
P4
P4
46Implementation (1)
- Master reads both matrices
- Master sends size of matrices to slaves
- Slaves calculate their part and allocate memory
- Master sends each slave its respective part
- Set sweeping row to 0 in all processes
- Sweep matrix (see next sheet)
- Slave send back their result
47Implementation (2)
- While sweeping row not past final row do
- Have every process decide whether they own the
current sweeping row - The owner sends a copy of the row to every other
process - All processes sweep their part of the matrix
using the current row - Sweeping row is incremented
48Programming hints
- Keep it simple!
- Avoid deadlocks
- Write robust code even at cost of speed
- Design in advance, debugging is more difficult
(printing output is different) - Error handing requires synchronisation, you cant
just exit the program.
49References (1)
- MPI Forum Home Page
- http//www.mpi-forum.org/index.html
- Beginners guide to MPI (see also /MPI/)
- http//www-jics.cs.utk.edu/MPI/MPIguide/MPIguide.h
tml - MPICH
- http//www-unix.mcs.anl.gov/mpi/mpich/
50References (2)
- Miscellaneous
- http//www.erc.msstate.edu/labs/hpcl/projects/mpi/
- http//nexus.cs.usfca.edu/mpi/
- http//www-unix.mcs.anl.gov/gropp/
- http//www.epm.ornl.gov/walker/mpitutorial/
- http//www.lam-mpi.org/
- http//epcc.ed.ac.uk/chimp/
- http//www-unix.mcs.anl.gov/mpi/www/www3/
51Thank you for coming!