Title: MPI
1MPI
- Message Passing Interface
- Beowulf and O2K
- Today we will discuss two simple MPI programs
- Important MPI makes fewer assumptions about the
computing environment than pthreads and Open MP.
2SPMD
- Single Program Multiple Data
- The same program is run on different processors
- Each processor has an identifier (rank)
- The rank of the processor is used to produce
different results on the distinct processors.
3Reference
- Parallel Programming with MPI
- By Peter S. Pacheco
- The website is http//fawlty.cs.usfca.edu/mpi.
- To get the user's guide, click on the "A
User'sGuide to MPI" link in the first paragraph.
Ghostscript and Adobe read it.
4Compiling and Running MPI
- O2K
- cc ltcompile optionsgt filename(s).c lmpi
- mpirun np ofprocs a.out
5Greetings
- include ltstdio.hgt
- include ltstring.hgt
- include "mpi.h"
-
- main(int argc, char argv)
- int my_rank / rank
of process / - int p /
number of processes / - int source / rank
of sender / - int dest /
rank of receiver / - int tag 0 / tag
for messages / - char message100 / storage for
message / - MPI_Status status / return
status for receive / -
6 Greetings Continued
- / Start up MPI / MPI_Init(argc,
argv) / Find out process rank /
MPI_Comm_rank(MPI_COMM_WORLD,
my_rank) / Find out number of
processes / MPI_Comm_size(MPI_COMM_WORLD,
p)
7Greetings Continued
- if (my_rank ! 0)
- / Create message /
- sprintf(message, "Greetings from process
d!", - my_rank)
- dest 0
- / Use strlen1 so that '\0' gets
transmitted / - MPI_Send(message, strlen(message)1,
MPI_CHAR, - dest, tag, MPI_COMM_WORLD)
- else / my_rank 0 /
- for (source 1 source lt p source)
- MPI_Recv(message, 100, MPI_CHAR,
source, tag, - MPI_COMM_WORLD, status)
- printf("s\n", message)
-
-
8Wrap up
- / Shut down MPI /
- MPI_Finalize()
- / main /
9Using Matlab program for DD
- y f(t,y, y) y(a) c y(b) d
- h (b-a)/2n e guess1 f guess2
- Solve a, r (n1)h y(r) e ? Y1(0,,n1)
- Solve L (n-1)h, b y(L) f ? Y2(0,,n1)
- If (Y2(2) e Y1(n-1) f ) lt eps Exit
- Else e ? Y2(2) f ? Y1(n-1)
- Go to 1.
10Trapezoid Rule
- Compute the integral of f using the trapezoid
rule - Single processor case
- MP case
- Communication
11MPI Send
- int MPI_Send( void buffer,
- int count,
- MPI_Datatype dt,
- int
destination, - int tag,
- MPI_Comm communicator)
12MPI Receive
- int MPI_Recv( void buffer,
- int count,
- MPI_Datatype dt,
- int source,
- int tag,
- MPI_Comm communicator
- MPI_Status status)
13MPI Broadcast
- int MPI_Bcast( void message,
- int count,
- MPI_Datatype dt,
- int root,
- MPI_Comm communicator)
14MPI Reduce
- int MPI_Reduce( void operand,
- void result,
- int count,
- MPI_Datatype dt,
- MPI_Op operator,
- int root,
- MPI_Comm communicator)
15MPI_Op
- MPI_MAX MPI_LXOR
- MPI_MIN MPI_BXOR
- MPI_SUM MPI_MAXLOC
- MPI_PROD MPI_MINLOC
- MPI_LAND
- MPI_LOR
- MPI_BOR
16MPI Matrix Multiply Design
- Where are the matrices at the beginning of the
computation? - Where will the result be at the end of the
computation? - What will each processor compute?
17Example Lab 4
- Design and implement a Domain Decomposition
program using matlab. - Assume that there are 4 intervals.
- T, Y dd_ode(), where T and Y are the global
grid and global solution respectively. - dd_ode_4(a,b,c,f, left, right, bleft, bright,
npts) - Global mesh T left h right where h
(right left)/(4npts) - Thus npts is (nearly) the number of points per
interval
18Lab 4 (2)
- The 4 intervals become
- T(1), T(npts1) and
- T((k-1)npts), T( knpts1) for k 2, 3, 4
- Take a guess at the missing boundary values, use
Solve_ode to solve the 4 bvps and then exchange
information and iterate. - Extra credit Write the routine for any number
of intervals passing this number in the argument
list.
19Lab 4 (3)
- Comment on the efficiency of the program
- Compare the run time of the dd program to the
runtime for solving the global problem using
Solve_ode with n 4npts. - Solve y y y cos(t3) y(0) 0, y(3) 0.
For n 4500 2000.
20Solving Tridiagonal Systems
- ai xi-1 di xi ci xi1 bi for i 1, ,n
- x0 and xn1 are both 0
- Use Gauss Elimination
21- subroutine trid(sub, diag, sup, b, n, ans)
- integer n, i
- real(kind8), dimension(n) diag, sub, sup,
ans, b -
- if (n .le. 1) then
- ans(1) b(1)/diag(1)
- return
- end if
- do i 1, n
- ans(i) b(i)
- end do
- do i 2,n
- sub(i) sub(i)/diag(i-1)
- diag(i) diag(i) - sub(i) sup(i-1)
- ans(i) ans(i) - sub(i)ans(i-1)
- end do
- ans(n) ans(n)/diag(n)
- do i n-1, 1, -1
- ans(i) (ans(i) - sup(i)ans(i1))/diag(i)
22 integer n, i real(kind8),
dimension(), allocatable a, b, c, ans, d n
10 allocate( a(n), d(n), c(n), b(n), ans(n)) do
i1,n a(i) -1.d0 d(i) 2.d0 c(i)
-1.d0 b(i) 0.d0 enddo b(1) 1.d0 call
trid(a, d, c, b, n, ans) do i 1, n print,
i, ans(i) - (n1-i)/(n1.d0) end
do deallocate(a,d,c,b,ans) stop end