Title: MPI
1MPI OpenMPI
- Davies Muche Jason Shields
2Outline
- A brief Background on Parallel Computers
- A Brief History of MPI
- MPI
- A Brief History of OpenMPI
- OpenMPI
3What is a Parallel Computer?
- A parallel computer is a set of processors that
are able to work cooperatively to solve a
computational problem - This definition is broad enough to include
parallel supercomputers that have hundreds or
thousands of processors, networks of
workstations, multiple-processor workstations,
and embedded systems
4More on Parallel Computers
- At the heart of all parallel machines is a
collection of processors, often in the hundreds
or thousands - Each processor has its own cache and local memory
- Parallel computers all have some communication
facility - powerful switches are normally used to
interconnect processors, while busses are used to
connect processors in local clusters
5Models of Parallel Machines
- There are three most important classes of
parallel machines
6Shared Memory
7Shared Disk
8Shared Nothing
9Shared Nothing Continued
- All communication is via the communication
network, from processor to processor - Relatively inexpensive to build
- If one processor P wants to read data from the
disk of another processor Q, Processor P sends a
message to Q and sends the data over the network
in another message - -Both processors must execute a program that
supports message transfer
10MPI
- The concept of message transferring so that
processes communicate with other processes by
sending and receiving messages, is the core of
the Message Passing Interface (MPI)
11What is MPI?
- MPI is a message-passing library specification
proposed as a standard by a committee of vendors,
implementers, and users. It is designed to
permit the development of parallel software
libraries - WHAT ITS NOT!
- - A compiler
- -A specific Product
12A Brief History of MPI
- Related work before the MPI Standard
- PICL, PVM - Oak Ridge National Laboratory
- PARMACS, P4, Chameleon - Argonne National
Laboratory - Express - Caltech/ParaSoft
- LAM - Ohio Supercomputer Center
- TCGMSG - Specially designed for Quantum Chemistry
- ISIS (Cornell University)
- Linda (Yale),
13Birth of MPI
- April 1992, during one-day workshop on Standards
for Message Passing in Distributed-Memory
Environment, they all realized that they were
continuously reinventing the wheel duplicating
each other's efforts - They got together (Supercomputing '92) and
decided to thrash out a common standard in the
same hotel in Dallas, where HPF Forum met - The first standard (MPI-1.0) was completed in May
1994 - The Beta version of the second, enhanced
standard (MPI-2.0) was released in July of 1997 - Industrial participants included Convex, Cray,
IBM, Intel, Meiko, nCUBE, NEC, Thinking Machines
14Where to use MPI
- MPI is primarily for SPMD and MIMD types of
parallel computing environments -
- SPMD Same Program, Different Data
- MIND Different Programs, Different Data
15Features of MPI
- MPI is large. Its extensive functionality
require many functions (126 functions) - However, MPI can be small too !
- Many parallel programs can be written with just
six basic functions
16The Six Basic Functions of MPI
- MPI_INIT ( Int arg, char argv)
- Initiate an MPI computation.
- argc, argv are required only in C language
binding, where they are are main programs
argument. - MPI_FINALIZE ()
- Shutdown a computation.
- MPI_COMM_SIZE (comm, size)
- Determine the number of processes in a
computation. - comm communicator (handle)
- size number of processes in the group of comms
(Integer)
17Basic functions continued
- MPI_COMM_RANK (comm, pid)
- Determine the identifier of the current process.
- comm communicator (handle)
- pid process id in the group of comm
(Integer)
18Basic functions continued
- MPI_SEND (buf, count, datatype, dest, tag comm)
- Send a message.
-
- buf Address of send buffer (choice)
- count number of elements to send (Integer gt0)
- datatype datatype of send buffer elements
(handle) - dest process id of destination process
(Integer) - tag message tag (Integer)
- comm communicator (handle)
19Basic functions continued
- MPI_RECV (buf, count, datatype, source, tag,
comm, status) - Receive a message.
-
- buf Address of receive buffer (choice)
- count size of receive buffer, in elements
(Integer gt0) - datatype datatype of receive buffer elements
(handle) - source process id of source process, or
MPI_ANY_SOURCE (Integer) - tag message tag, or MPI_ANY_TAG (Integer)
- comm communicator (handle)
- status status object (status)
20Sample C Program
- include ltstdio.hgt
- include ltmpi.hgt
- main(int argc, char argv)
-
- int size, rank
- MPI_Init(argc, argv)
- MPI_Comm_size(MPI_COMM_WORLD, size)
- MPI_Comm_rank(MPI_COMM_WORLD, rank)
- printf("Hello world! I'm d of d\n",
rank1, size) - MPI_Finalize()
-
21Sample Output run over four processors
- Hello world! Im 1 of 4
- Hello world! Im 3 of 4
- Hello world! Im 0 of 4
- Hello world! Im 2 of 4
- Since the Printf is a local statement every
processor execute it
22Compiling and Running MPI Programs
- Compiling
- Mpicc can be used to compile small programs
- Example mpicc o execute myprog.c
- For larger programs, it is ideal to make use of a
makefile - Running
- The MPI standard does not specify how a parallel
computation is started - Example a typical mechanism could be a command
line argument indicating the number of processes
that are to be created for example, myprog -n 4,
where myprog is the name of the executable
23Commentary
- All MPI programs must include a header file.
mpi.h - All MPI programs must call MPI_INIT as the first
MPI call. - This establishes the MPI environment.
- All MPI programs must call MPI_FINALIZE as the
last call, - this terminates the MPI environment.
- Only one invocation of MPI_INIT can occur in each
program - NO calls to MPI can be made after MPI_FINALIZE is
called - All non MPI routine are local i.e. Print,
Hello world! - runs on each processor
24References
- H. Garcia-Molina, J.D. Ullman, J. Widom, Database
Systems The Complete Book, Prentice Hall, Upper
Saddle River, NJ 07458 (2002) pp. 775 777. - http//www-unix.mcs.anl.gov/dbpp/text/node8.htmlf
igmulticomputer - http//www.mcs.anl.gov/mpi/index.html
- http//www.mgnet.org/douglas/ccd-old-classes.html
- http//www-unix.mcs.anl.gov/dbpp/text/node7.htmlS
ECTION02210000000000000000 - http//beige.ucs.indiana.edu/B673/node115.html
25OpenMP
26What is OpenMP?
- OpenMP is specs for a set of compiler
directives, library routines, and environment
variables that used to specify shared memory
parallelism. - Supports Fortran (77, 90, and 95), C, and C
- MP Multi Processing
27What is OpenMP? (cont.)
- An Application Program Interface (API) that may
be used to explicitly direct multi-threaded,
shared memory parallelism - OpenMP stands for Open specifications for Multi
Processing via collaborative work with interested
parties from the hardware and software industry,
government and academia
28What OpenMP isnt.
- A specific language or compiler
- Meant for distributed memory parallel systems
(without help) - Implemented the same by every vendor
- Guaranteed to make the most efficient use of
shared memory
29History of OpenMP
- In the early 90's, vendors of shared-memory
machines were making similar programming
extensions for Fortran. - Done in order to specify which loops were to be
parallelized in a program. - The compiler would be responsible for
automatically parallelizing such loops across the
SMP processors
30History (cont.)
- Implementations were all functionally similar,
but were diverging (as usual). - The vendors saw a need for a standard for shared
memory machine programming.
31History (cont.)
- First attempt at a standard was the draft for
ANSI X3H5 in 1994. It was never adopted, largely
due to waning interest as distributed memory
machines became popular
32History of OpenMP (cont.)
- By 1997, newer shared memory machines started to
become popular again. - In the spring of 1997, specifications for OpenMP
started, picking up where ANSI X3H5 left off.
33OpenMP Release Dates
- October 1997 Fortran version 1.0
- Late 1998 C/C version 1.0
- June 2000 Fortran version 2.0
- April 2002 C/C version 2.0
34Why is OpenMP needed?
- Most software vendors have ignored parallel
computing. - There are others but most have not been very
successful - Sets a standard for shared memory parallel
programming. - It is portable
- Can be used for multiple programs
- Similar to a header file in c
35How does OpenMP work?
36Thread Based Parallelism
- OpenMP is based upon the ability of a shared
memory process to consist of multiple threads. - A process can be divided into several different
threads, distributed, then recombined. - OpenMP takes advantage of this.
37Explicit Parallelism
- Within OpenMP, the parallelism of a program is
not automatic. - It is specified by the programmer
- Gives the programmer full control of how the
parallelism is implemented. - Does not allow the owner to specify commands to
specific processors.
38Fork - Join Model
- OpenMP uses to Fork-Join Model of execution.
- Programs begin as a single process, called the
master thread - When a parallel region is encountered, the master
thread forks, creating a team of parallel
threads. - When the team completes their threads, they
synchronize the data, then terminate, joining the
data to the master thread.
39How is OpenMP used?
40How is OpenMP typically used?
- OpenMP is usually used to parallelize loops
- Find your ugliest, most time consuming loops.
- Split them up between threads.
- Threads communicate by sharing variables
41Compiler Directive Based
- Parallelism in OpenMP is specified by imbedding
compiler directives into the code. - These direct the compiler, letting them know
that the following code segment is a to be
processed using parallel processing. - There are many various compiler directives that
specify parallelism in various ways.
42Basic OpenMP protocol
- The same basic protocol must be used to specify
each parallel region in a program.
43Fortran
- The first object of each OpenMP Fortran line is a
sentinel. - In Fixed Form Source, !OMP, COMP, and
OMP are accepted sentinels. - In Free Form Source, only !OMP is accepted
- Example
- sentinel directive-name optional clause ...
- Parallel section executed by all threads
- . . .
- All threads join master thread and disband
- sentinel end directive
44C
- The first line of each OpenMP C line must start
with pragma - Each parallel region is enclosed inside brackets,
similar to a function - Example
- pragma directive-name clause //must be followed
by a new-line -
- Parallel section executed by all threads
- . . .
- All threads join master thread and disband
45Directive scoping
- The scope of a directive determines how many
different threads will be made for a parallel
region. - Static specifies a specific of threads
- Dynamic allows the of threads to be increased
dynamically - Orphaned Specifies an individual thread to be
used for a process
46Examples of Compiler Directives
47PARALEL Region Construct
- Specifies a block of code that will be executed
by multiple threads. - Fundamental OpenMP parallel construct
- pragma omp parallel clause ... newline
- if (scalar_expression)
- private (list)
- shared (list)
- default (shared none)
- firstprivate (list)
- reduction (operator list)
- copyin (list) structured_block
-
- structured code block
-
48Work-Sharing Constructs
- The following directives are designed
specifically for distributing the execution of
the enclosed code throughout the members of the
team that encounter it. - for
- SECTIONS
- SINGLE
49 for SECTIONS SINGLE
50for
- This specifies the iterations of the following
loop - Assumes that a parallel region has already been
made - PARALLEL for can be used otherwise
51for example
- pragma omp for clause ... newline
- schedule (type ,chunk)
- ordered
- private (list)
- firstprivate (list)
- lastprivate (list)
- shared (list)
- reduction (operator list)
- nowait
- for_loop
52SECTIONS and SINGLE
- SECTIONS divides the team into different sections
and gives code for each section. - SINGLE specifies that only one thread is to
execute the following thread - These directives can also be used with the
PARALLEL directive - For examples http//www.llnl.gov/computing/tutori
als/openMP/
53Synchronization Constructs
- MASTER Code only to be executed by the master
thread - CRITICAL Code executed by only one thread at a
time - BARRIER The ATOMIC directive specifies that a
specific memory location must be updated
atomically, rather than letting multiple threads
attempt to write to it. In essence, this
directive provides a mini-CRITICAL section. - Several others.
54References
- www.openmp.org
- www.llnl.gov/computing/tutorials/openMP
- www.openmp.org/presentations/sc99/sc99_tutorial_fi
les/frame.htm