Title: Introduction in algorithms and applications
1Course Outline
- Introduction in algorithms and applications
- Parallel machines and architectures
- Programming methods, languages, and environments
- Message passing (SR, MPI, Java)
- Higher-level languages HPF
- Applications
- N-body problems, search algorithms,
bioinformatics - Grid computing
- Multimedia content analysis on Grids
- (guest lecture Frank Seinstra, 4 December)
2Approaches to Parallel Programming
- Sequential language library
- MPI, PVM
- Extend sequential language
- C/Linda, Concurrent C
- New languages designed for parallel or
distributed programming - SR, occam, Ada, Orca
3Paradigms for Parallel Programming
- Processes shared variables
- Processes message passing
- Concurrent object-oriented languages
- Concurrent functional languages
- Concurrent logic languages
- Data-parallelism (SPMD model)
- -
- SR and MPI
- Java
- -
- -
- HPF
4Interprocess Communicationand Synchronization
based on Message PassingHenri Bal
5Overview
- Message passing
- General issues
- Examples rendezvous, Remote Procedure Calls,
Broadcast - Nondeterminism
- Select statement
- Example language SR (Synchronizing Resources)
- Traveling Salesman Problem in SR
- Example library MPI (Message Passing Interface)
6Point-to-point Message Passing
- Basic primitives send receive
- As library routines
- send(destination, MsgBuffer)
- receive(source, MsgBuffer)
- As language constructs
- send MsgName(arguments) to destination
- receive MsgName(arguments) from source
7Issues in Message Passing
- Naming the sender and receiver
- Explicit or implicit receipt of messages
- Synchronous versus asynchronous messages
8Direct naming
- Sender and receiver directly name each other
- S send M to R
- R receive M from S
- Asymmetric direct naming (more flexible)
- S send M to R
- R receive M
- Direct naming is easy to implement
- Destination of message is know in advance
- Implementation just maps logical names to machine
addresses
9Indirect naming
- Indirect naming uses extra indirection level
- R send M to P -- P is a port name
- S receive M from P
- Sender and receiver need not know each other
- Port names can be moved around (e.g., in a
message) - send ReplyPort(P) to U -- P is name of reply
port - Most languages allow only a single process at a
time to receive from any given port - Some languages allow multiple receivers that
service messages on demand -gt called a mailbox
10Explicit Message Receipt
- Explicit receive by an existing process
- Receiving process only handles message when it is
willing to do so
process main() // regular computation
here receive M( .) // explicit message
receipt // code to handle message
// more regular computations .
11Implicit message receipt
- Receipt by a new thread of control, created for
handling the incoming message
int X process main( ) // just regular
computations, this code can access
X message-handler M( ) // created whenever a
message M arrives // code to handle the
message, can also access X
12Threads
- Threads run in (pseudo-) parallel on the same
node - Each thread has its own program counter and local
variables - Threads share global variables
X
time
main
M
M
13Differences (1)
- Implicit receipt is used if its unknown when a
message will arrive example request for remote
data
process main() int X while (true) if
(there is a message readX) receive readX(S)
send valueX(X) to S // regular
computations
int X process main( ) // regular
computations int message-handler readX( S)
send valueX(X) to S
14Differences (2)
- Explicit receive gives more control over when to
accept which messages e.g., SR allows - receive ReadFile(file, offset, NrBytes) by
NrBytes - // sorts messages by (increasing) 3rd
parameter, i.e. small reads go first - MPI has explicit receive ( polling for implicit
receive) - Java has implicit receive Remote Method
Invocation (RMI) - SR has both
15Synchronous vs. asynchronous Message Passing
- Synchronous message passing
- Sender is blocked until receiver has accepted the
message - Too restrictive for many parallel applications
- Asynchronous message passing
- Sender continues immediately
- More efficient
- Ordering problems
- Buffering problems
16Message ordering
- Ordering with asynchronous message passing
- SENDER RECEIVER
- send message(1) receive message(N) print N
- send message(2) receive message(M) print M
- Messages may be received in any order, depending
on the protocol
message(1)
message(2)
17Example ATT crash
Are you still alive?
Somethings wrong, Id better crash!
18Message buffering
- Keep messages in a buffer until the receive( ) is
done - What if the buffer overflows?
- Continue, but delete some messages (e.g., oldest
one), or - Use flow control block the sender temporarily
- Flow control changes the semantics since it
introduces synchronization - S send zillion messages to R receive messages
- R send zillion messages to S receive messages
- -gt deadlock!
19Example communication primitives
- Rendezvous (Ada)
- Remote Procedure Call (RPC)
- Broadcast
20Rendezvous (Ada)
- Two-way interaction
- Synchronous (blocking) send
- Explicit receive
- Output parameters sent back to caller
- Entry procedure implemented by a task that can
be called remotely
21Example
task SERVER is entry INCREMENT(X integer Y
out integer) end entry call S.INCREMENT(2,
A) -- invoke entry of task S
22Accept statement
task body SERVER is begin accept INCREMENT(X
integer Y out integer) do Y X 1 --
handle entry call end ... end
- Entry call is fully synchronous
- Invoker waits until server is ready to accept
- Accept statement waits for entry call
- Caller proceeds after accept statement has been
executed
23Remote Procedure Call (RPC)
- Similar to traditional procedure call
- Caller and receiver are different processes
- Possibly on different machines
- Fully synchronous
- Sender waits for RPC to complete
- Implicit message receipt
- New thread of control within receiver
24Broadcast
- Many networks (e.g., Ethernet) support
- broadcast send message to all machines
- multicast send messages to a set of machines
- Hardware multicast is very efficient
- Ethernet same delay as for a unicast
- Multicast can be made reliable using software
protocols
25Nondeterminism
- Interactions may depend on run-time conditions
- e.g. wait for a message from either A or B,
whichever comes first - Need to express and control nondeterminism
- specify when to accept which message
- Example (bounded buffer)
- do simultaneously
- when buffer not full accept request to store
message - when buffer not empty accept request to fetch
message
26Select statement
- several alternatives of the form
- WHEN condition gt ACCEPT message DO statement
- Each alternative may
- succeed, if conditiontrue a message is
available - fail, if conditionfalse
- suspend, if conditiontrue no message available
yet - Entire select statement may
- succeed, if any alternative succeeds -gt pick one
nondeterministically - fail, if all alternatives fail
- suspend, if some alternatives suspend and none
succeeds yet
27Example bounded buffer in Ada
select when not FULL(BUFFER) gt accept
STORE_ITEM(X INTEGER) do store X in
buffer end or when not EMPTY(BUFFER)
gt accept FETCH_ITEM(X out INTEGER) do X
first item from buffer end end select
28Synchronizing Resources (SR)
- Developed at University of Arizona
- Goals of SR
- Expressiveness
- Many message passing primitives
- Ease of use
- Minimize number of underlying concepts
- Clean integration of language constructs
- Efficiency
- Each primitive must be efficient
29Overview of SR
- Multiple forms of message passing
- Asynchronous message passing
- Rendezvous (explicit receipt)
- Remote Procedure Call (implicit receipt)
- Multicast
- Powerful receive-statement
- Conditional ordered receive, based on contents
of message - Select statement
- Resource module run on 1 node
(uni/multiprocessor) - Contains multiple threads that share variables
30Orthogonality in SR
- The send and receive primitives can be combined
in all 4 possible ways
31Example
body S sender send R.m1 asynchr. mp
send R.m2 fork call R.m1 rendezvous
call R.m2 RPC end S
body R receiver proc M2( ) implicit
receipt code to handle M2 end
initial main process of R do true -gt
infinite loop in m1( )
explicit receive code to
handle m1 ni od
end end R
32Traveling Salesman Problem (TSP) in SR
- Find shortest route for salesman among given set
of cities - Each city must be visited once, no return to
initial city
New York
2
2
3
1
Chicago
Saint Louis
4
3
Miami
33Sequential branch-and-bound
- Structure the entire search space as a tree,
sorted using nearest-city first heuristic
34Pruning the search tree
- Keep track of best solution found so far (the
bound) - Cut-off partial routes gt bound
35Parallelizing TSP
- Distribute the search tree over the CPUs
- CPUs analyze different routes
- Results in reasonably large-grain jobs
36Distribution of TSP search tree
Subtasks - New York -gt Chicago - New York -gt
Saint Louis - New York -gt Miami
CPU 1
CPU 2
CPU 3
37Distribution of the tree (2)
- Static distribution each CPU gets a fixed part
of the tree - Load balancing problem subtrees take different
amounts of time
3
2
2
3
3
4
4
1
1
3
3
4
4
1
38Dynamic distribution Replicated Workers Model
- Master process generates large number of jobs
(subtrees) and repeatedly hands them out - Worker processes (subcontractors) repeatedly take
work and execute it - 1 worker per processor
- General, frequently-used model for parallel
processing
39Implementing TSP in SR
- Need communication to distribute work
- Need communication to implement global bound
40Distributing work
- Master generates jobs to be executed by workers
- Not known in advance which worker will execute
which job - A mailbox (port with gt1 receivers) would have
helped - Use intermediate buffer process instead
workers
Master
buffer
41Implementing the global bound
- Problem the bound is a global variable, but it
must be implemented with message passing - The bound is accessed millions of times, but
updated only when a better route is found - Only efficient solution is to manually replicate
it
42Managing a replicated variable in SR
- Use a BoundManager process to serialize updates
BoundManager
Worker 1
Worker 2
M 3
Process 2 assigns to M
Assign asynchr. explicit ordered recv.
Update synchr.implicit recv.multicast
43SR code fragments for TSP
body worker var M int Infinite copy of
bound sem sema semaphore proc
update(value int) P(sema) lock copy
M value V(sema) unlock
end update initial main code for worker
- can read M (using sema) - can use
send BoundManager.Assign(value)
body BoundManager var M int Infinite
do true -gt handle requests 1 by 1 in
Assign(value) by value -gt if value lt
M -gt M value
co(i 1 to ncpus) multicast
call workeri.update(value)
co fi ni od end
BoundManager
44Search overhead
Problem Path with length6 not yet computed by
CPU 1 when CPU 3 starts n-gtm-gts Parallel
algorithm does more work than sequential
algorithm search overhead
CPU 1
CPU 2
CPU 3
45Performance of TSP in SR
- Communication overhead
- Distribution of jobs updating the global bound
(small overhead) - Load imbalances
- Replicated worker model has automatic load
balancing - Synchronization overhead
- Mutual exclusion (locking) needed for accessing
copy of bound - Search overhead
- Main performance problem
- In practice high speedups possible