Title: Basic of Parallel Programming and MPI Introduction
1Lecture 3
- Basic of Parallel Programming and MPI Introduction
2Parallel Programming
- The activity of constructing a parallel program
from a given algorithm. - Approaches
- Library Subroutine - P4, PVM, MPI
- New Constructs to handle parallelism Fortran90
- Compiler Directives or formatted comments are
added to the language to mark the parallel block
- HPF
3What is a process?
- A process is an entity with 4 components P
(P,C,D,S) - P is a program or a piece of code that a process
is associated with. - C is a control state, defined by a set of control
variables that indicate where to execute next. - D is a data state, defined by a set of data
variables that hold user data - S is a process status ready, suspended,
running, etc.
4Process Status
5Execution Mode
- A computer system has two modes, which are system
mode and user mode. - Kernel (important programs in OS) process are
always execute in system mode. - Processes created by user programs are in user
mode unless such process requests services from
kernel (I/O function, exception handles)
6Address Space
A new address space is crate when a new process
is created.
- Hold the code of the processes, fixed size and
not writable - Hold states and dynamic data of the process,
- can grow and shrink
- 3. Hold activation records of the procedure calls
made by - a process, can grow and shrink
- 4. Hold process status and context
7Context switching
- Context of a process is the part of a program
that is stored in the processor register. - When context switch occurs, the processs current
context will be saved and the new one will be
loaded. - Actions such as keyboard interrupt, status
changes of a process to suspended, will cause
context switching.
8Process Control
- Performs by a kernel using information in process
descriptors. - A kernel will check to ensure that a process only
accesses the resource (Processor, memory, I/O) it
suppose to. - A Kernel will also act as a scheduler assigning
resource to a process.
9Resource Sharing
- Resource sharing, handling by the scheduler, can
happen in several forms - Dedicated mode not share, occupied from start
to finish - Batch mode user process once scheduled to run
will run to completion - Time-sharing mode multiple process are running
simultaneously in an interleaved fashion - Preemption mode a high priority process can take
away processor from a lower priority process
10Threads
- Creating a new process, hundreds of thousands of
clock cycles will be wasted because a new address
space has to be created. - In some case, creating a heavy weight process is
not suitable (When a process holds a large data
set). - To reduce the overhead, a light- weight process
or thread is considered. - Threads can exist sharing and address space
within a process. - Creating a thread is much faster because less
memory allocation is required.
11Single vs. Multiple Programs
- Single-program users can write one program that
will be run on all nodes - pid my-process-id()
- if (pid 1) A()
- else if (pid 2) B()
- else if (pid 3) C()
- Multiple-program users can write 3 executable
programs A, B and C which are loaded to three
nodes in a shell script - run A on node 1
- run B on node 2
- run C on node 3
12Static vs. Dynamic Parallelism
- Static A program that the number of processes
can be predetermined at the compile time. This
type of parallelism has less run time overhead
and more efficient binary code. - Dynamic A program that a process can be created
during run-time and the number of processors to
be created is unknown at the beginning. This
type of parallelism is more flexible.
13Fork/Join
- Process A
- begin
- Fork(B)
- X foo(z)
- Join(B) Output(xy)
- end
- Process B
- begin
- Y boo(z)
- end
- A is a parent process and B is a child process.
- When fork(B) is executed, A and B execute in
parallel - Join(B) forces A to wait until B terminates
before executing - the output.
14Process Interaction
- Communication passes data among two or more
processes (shared variables or message passing) - Synchronization causes process to wait for one
another. - Aggregation merges partial results computed by
the component process of a parallel program.
15Synchronous vs. Asynchronous Interaction
- Suppose there are n processes execute a parallel
program, where there is an interaction code C - If code C cannot be executed until all n
processes have reached C, then the
Interaction is called Synchronous. - If when a process reaches C, it can proceed to
execute C without having to wait for others, then
the interaction is Asynchronous. - Ex. Synchronous send/receive process will not
return from send/receive function until the
message is both sent and received.
16Blocking vs. Nonblocking Interaction
- Blocking if the process have to wait until a
certain event happen. - Ex. Blocking sent - a process can move on only
after the message is sent out (not necessarily
been received). - Non-Blocking no wait time.
- Ex. Non-Blocking sent - a process can move to the
next operation as soon as it has requested to
send (not necessarily sent out). - This non-blocking scheme implies that the send
buffer can not yet be safely re-written.
17Problems in Parallel Programming
- Lost Update Problem
- Temporary Update Problem
- Incorrect Summary Problem
18Lost Update
- Process 1 Process 2
- Read(A)
- Read(A)
- A Am Read(B)
- Write(A) AAB
- Write(A)
19Temporary Update
- When a process, P1, updates data and then fails
for some reasons, another process, P2, accesses
the updated data before the value is put back to
the original state.
20Incorrect Summary
- A process is aggregating values on a set of data,
while another process updating some of these
data. Inconsistency can occur - Process1 Process2
- Data A, B, C in x
- Count the number of items in x
- Write(new item D in x)
- Read(x)
- Sum(x) // (ABCD)
- Ave sum/count
21To solve the problems
- Divide a parallel program into a set of
transactions. - Each transaction must have the following
properties - Consistency
- Atomicity
- Durability
- Isolation
- Serializability
22Concurrency Control
- Locking mechanism
- Timestamp
- Optimistic concurrency control (OCC)
23Locking
- Data to be accessed is locked so that no other
process can gain access to that data. - When the process, which has the lock is through
using that data, the data is unlocked. - A lock can be defined using three variables Lock
(L, C, I). - L is a Boolean variable indicates lock/unlock
state. - C is a condition on which a process may wait on.
- I is an identifier on process which has the lock.
24Timestamp
- Each access to shared data is stamped with time.
- During an attempt to access data, the current
time will be compared with the timestamps of
other processes on this data. - If the timestamp associated with the process that
request an access is less than the timestamp of
other processes, the access will be denied. - A request to write is granted if data was last
read and written by older process. - A read request is granted if the data was last
written by older process. - This scheme will introduce no deadlock.
25OCC
- All processes can concurrently access data, but
before any modification is committed, a check is
made to see if any concurrent access takes place.
If so, modification is rejected, the data will
remain in the original state. - OCC is based on the assumption that the
likelihood of two processes accessing the same
data is low.
26Introduction to MPI
27Message-passing Model
28Processes
- Number is specified at start-up time
- Remains constant throughout execution of program
- All execute same program
- Each has unique ID number
- Alternately performs computations and communicates
29Advantages of Message-passing Model
- Gives programmer ability to manage the memory
hierarchy - Portability to many architectures
- Provides a platform- and language- independent
standard for message passing. - Simplifies debugging
30The Message Passing Interface
- Late 1980s vendors had unique libraries
- 1989 Parallel Virtual Machine (PVM) developed at
Oak Ridge National Lab - 1992 Work on MPI standard begun
- 1994 Version 1.0 of MPI standard
- 1997 Version 2.0 of MPI standard
- Today MPI is dominant message passing library
standard
31Include Files
include ltmpi.hgt
include ltstdio.hgt
Standard I/O header file
32Local Variables
int main (int argc, char argv) int i
int id / Process rank / int p / Number
of processes /
- Include argc and argv they are needed to
initialize MPI - One copy of every variable for each process
running this program
33Initialize MPI
MPI_Init (argc, argv)
- First MPI function called by each process
- Not necessarily first executable statement
- Allows system to do any necessary setup
34Communicators
- Communicator opaque object that provides
message-passing environment for processes - MPI_COMM_WORLD
- Default communicator
- Includes all processes
- Possible to create new communicators
- Will do this later
35Communicator
MPI_COMM_WORLD
0
5
2
1
4
3
36Determine Number of Processes
MPI_Comm_size (MPI_COMM_WORLD, p)
- First argument is the communicator
- Number of processes returned through second
argument
37Determine Process Rank
MPI_Comm_rank (MPI_COMM_WORLD, id)
- First argument is the communicator
- Process rank (in range 0, 1, , p-1) returned
through second argument
38Replication of Automatic Variables
39Shutting Down MPI
MPI_Finalize()
- Call after all other MPI library calls
- Allows system to free up MPI resources
40Point-to-point Communication
- Involves a pair of processes
- One process sends a message
- Other process receives the message
41Send/Receive Not Collective
42Function MPI_Send
int MPI_Send ( void message,
int count, MPI_Datatype
datatype, int dest, int
tag, MPI_Comm comm )
43Function MPI_Recv
int MPI_Recv ( void message,
int count, MPI_Datatype
datatype, int source, int
tag, MPI_Comm comm,
MPI_Status status )
44Return from MPI_Send
- Function blocks until message buffer free
- Message buffer is free when
- Message copied to system buffer, or
- Message transmitted
- Typical scenario
- Message copied to system buffer
- Transmission overlaps computation
45Return from MPI_Recv
- Function blocks until message in buffer
- If message never arrives, function never returns
46Deadlock
- Deadlock process waiting for a condition that
will never become true - Easy to write send/receive code that deadlocks
- Two processes both receive before send
- Send tag doesnt match receive tag
- Process sends message to wrong destination process
47Compiling MPI Programs
mpicc -O -o foo foo.c
- mpicc script to compile and link CMPI programs
- Flags same meaning as C compiler
- -O ?? optimize
- -o ltfilegt ? where to put executable
48Running MPI Programs
- mpirun -help
- mpirun -np ltpgt ltexecgt ltarg1gt
- -np ltpgt ? number of processes
- ltexecgt ? executable
- ltarg1gt ? command-line arguments
49Specifying Host Processors
- mpirun -p4pg ltpgfilegt ltexecgt ltarg1gt
- ltpgfilegt lists host processors in order of their
use (put in in home directory) - Exampe of a pgfile contents
- olab1 0 / home/ tiranee/ myprog
- olab2 1 / home/ tiranee/ myprog
- olab3 2 / home/ tiranee/ myprog
- olab1 1 / home/ tiranee/ myprog
50Enabling Remote Logins
- MPI needs to be able to initiate processes on
other processors without supplying a password - Each processor in group must list all other
processors in its .rhosts file e.g.,
51Deciphering Output
- Output order only partially reflects order of
output events inside parallel computer - If process A prints two messages, first message
will appear before second - If process A calls printf before process B, there
is no guarantee process As message will appear
before process Bs message - Try to put fflush() after every printf()
52Lab 1
- Write a Hello World program where each process in
the group sends a Hello world message to
process 0 and process 0 print out all messages.