Introduction in algorithms and applications - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction in algorithms and applications

Description:

Pruning the search tree. Keep track of best solution found so far (the 'bound' ... Can be pruned. Parallelizing TSP. Distribute the search tree over the CPUs ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 46
Provided by: csVu
Category:

less

Transcript and Presenter's Notes

Title: Introduction in algorithms and applications


1
Course Outline
  • Introduction in algorithms and applications
  • Parallel machines and architectures
  • Programming methods, languages, and environments
  • Message passing (SR, MPI, Java)
  • Higher-level languages HPF
  • Applications
  • N-body problems, search algorithms,
    bioinformatics
  • Grid computing
  • Multimedia content analysis on Grids
  • (guest lecture Frank Seinstra, 4 December)

2
Approaches to Parallel Programming
  • Sequential language library
  • MPI, PVM
  • Extend sequential language
  • C/Linda, Concurrent C
  • New languages designed for parallel or
    distributed programming
  • SR, occam, Ada, Orca

3
Paradigms for Parallel Programming
  • Processes shared variables
  • Processes message passing
  • Concurrent object-oriented languages
  • Concurrent functional languages
  • Concurrent logic languages
  • Data-parallelism (SPMD model)
  • -
  • SR and MPI
  • Java
  • -
  • -
  • HPF

4
Interprocess Communicationand Synchronization
based on Message PassingHenri Bal
5
Overview
  • Message passing
  • General issues
  • Examples rendezvous, Remote Procedure Calls,
    Broadcast
  • Nondeterminism
  • Select statement
  • Example language SR (Synchronizing Resources)
  • Traveling Salesman Problem in SR
  • Example library MPI (Message Passing Interface)

6
Point-to-point Message Passing
  • Basic primitives send receive
  • As library routines
  • send(destination, MsgBuffer)
  • receive(source, MsgBuffer)
  • As language constructs
  • send MsgName(arguments) to destination
  • receive MsgName(arguments) from source

7
Issues in Message Passing
  • Naming the sender and receiver
  • Explicit or implicit receipt of messages
  • Synchronous versus asynchronous messages

8
Direct naming
  • Sender and receiver directly name each other
  • S send M to R
  • R receive M from S
  • Asymmetric direct naming (more flexible)
  • S send M to R
  • R receive M
  • Direct naming is easy to implement
  • Destination of message is know in advance
  • Implementation just maps logical names to machine
    addresses

9
Indirect naming
  • Indirect naming uses extra indirection level
  • R send M to P -- P is a port name
  • S receive M from P
  • Sender and receiver need not know each other
  • Port names can be moved around (e.g., in a
    message)
  • send ReplyPort(P) to U -- P is name of reply
    port
  • Most languages allow only a single process at a
    time to receive from any given port
  • Some languages allow multiple receivers that
    service messages on demand -gt called a mailbox

10
Explicit Message Receipt
  • Explicit receive by an existing process
  • Receiving process only handles message when it is
    willing to do so

process main() // regular computation
here receive M( .) // explicit message
receipt // code to handle message
// more regular computations .
11
Implicit message receipt
  • Receipt by a new thread of control, created for
    handling the incoming message

int X process main( ) // just regular
computations, this code can access
X message-handler M( ) // created whenever a
message M arrives // code to handle the
message, can also access X
12
Threads
  • Threads run in (pseudo-) parallel on the same
    node
  • Each thread has its own program counter and local
    variables
  • Threads share global variables

X
time
main
M
M
13
Differences (1)
  • Implicit receipt is used if its unknown when a
    message will arrive example request for remote
    data

process main() int X while (true) if
(there is a message readX) receive readX(S)
send valueX(X) to S // regular
computations
int X process main( ) // regular
computations int message-handler readX( S)
send valueX(X) to S
14
Differences (2)
  • Explicit receive gives more control over when to
    accept which messages e.g., SR allows
  • receive ReadFile(file, offset, NrBytes) by
    NrBytes
  • // sorts messages by (increasing) 3rd
    parameter, i.e. small reads go first
  • MPI has explicit receive ( polling for implicit
    receive)
  • Java has implicit receive Remote Method
    Invocation (RMI)
  • SR has both

15
Synchronous vs. asynchronous Message Passing
  • Synchronous message passing
  • Sender is blocked until receiver has accepted the
    message
  • Too restrictive for many parallel applications
  • Asynchronous message passing
  • Sender continues immediately
  • More efficient
  • Ordering problems
  • Buffering problems

16
Message ordering
  • Ordering with asynchronous message passing
  • SENDER RECEIVER
  • send message(1) receive message(N) print N
  • send message(2) receive message(M) print M
  • Messages may be received in any order, depending
    on the protocol

message(1)
message(2)
17
Example ATT crash
Are you still alive?
Somethings wrong, Id better crash!
18
Message buffering
  • Keep messages in a buffer until the receive( ) is
    done
  • What if the buffer overflows?
  • Continue, but delete some messages (e.g., oldest
    one), or
  • Use flow control block the sender temporarily
  • Flow control changes the semantics since it
    introduces synchronization
  • S send zillion messages to R receive messages
  • R send zillion messages to S receive messages
  • -gt deadlock!

19
Example communication primitives
  • Rendezvous (Ada)
  • Remote Procedure Call (RPC)
  • Broadcast

20
Rendezvous (Ada)
  • Two-way interaction
  • Synchronous (blocking) send
  • Explicit receive
  • Output parameters sent back to caller
  • Entry procedure implemented by a task that can
    be called remotely

21
Example
task SERVER is entry INCREMENT(X integer Y
out integer) end entry call S.INCREMENT(2,
A) -- invoke entry of task S
22
Accept statement
task body SERVER is begin accept INCREMENT(X
integer Y out integer) do Y X 1 --
handle entry call end ... end
  • Entry call is fully synchronous
  • Invoker waits until server is ready to accept
  • Accept statement waits for entry call
  • Caller proceeds after accept statement has been
    executed

23
Remote Procedure Call (RPC)
  • Similar to traditional procedure call
  • Caller and receiver are different processes
  • Possibly on different machines
  • Fully synchronous
  • Sender waits for RPC to complete
  • Implicit message receipt
  • New thread of control within receiver

24
Broadcast
  • Many networks (e.g., Ethernet) support
  • broadcast send message to all machines
  • multicast send messages to a set of machines
  • Hardware multicast is very efficient
  • Ethernet same delay as for a unicast
  • Multicast can be made reliable using software
    protocols

25
Nondeterminism
  • Interactions may depend on run-time conditions
  • e.g. wait for a message from either A or B,
    whichever comes first
  • Need to express and control nondeterminism
  • specify when to accept which message
  • Example (bounded buffer)
  • do simultaneously
  • when buffer not full accept request to store
    message
  • when buffer not empty accept request to fetch
    message

26
Select statement
  • several alternatives of the form
  • WHEN condition gt ACCEPT message DO statement
  • Each alternative may
  • succeed, if conditiontrue a message is
    available
  • fail, if conditionfalse
  • suspend, if conditiontrue no message available
    yet
  • Entire select statement may
  • succeed, if any alternative succeeds -gt pick one
    nondeterministically
  • fail, if all alternatives fail
  • suspend, if some alternatives suspend and none
    succeeds yet

27
Example bounded buffer in Ada
select when not FULL(BUFFER) gt accept
STORE_ITEM(X INTEGER) do store X in
buffer end or when not EMPTY(BUFFER)
gt accept FETCH_ITEM(X out INTEGER) do X
first item from buffer end end select
28
Synchronizing Resources (SR)
  • Developed at University of Arizona
  • Goals of SR
  • Expressiveness
  • Many message passing primitives
  • Ease of use
  • Minimize number of underlying concepts
  • Clean integration of language constructs
  • Efficiency
  • Each primitive must be efficient

29
Overview of SR
  • Multiple forms of message passing
  • Asynchronous message passing
  • Rendezvous (explicit receipt)
  • Remote Procedure Call (implicit receipt)
  • Multicast
  • Powerful receive-statement
  • Conditional ordered receive, based on contents
    of message
  • Select statement
  • Resource module run on 1 node
    (uni/multiprocessor)
  • Contains multiple threads that share variables

30
Orthogonality in SR
  • The send and receive primitives can be combined
    in all 4 possible ways

31
Example
body S sender send R.m1 asynchr. mp
send R.m2 fork call R.m1 rendezvous
call R.m2 RPC end S
body R receiver proc M2( ) implicit
receipt code to handle M2 end
initial main process of R do true -gt
infinite loop in m1( )
explicit receive code to
handle m1 ni od
end end R
32
Traveling Salesman Problem (TSP) in SR
  • Find shortest route for salesman among given set
    of cities
  • Each city must be visited once, no return to
    initial city

New York
2
2
3
1
Chicago
Saint Louis
4
3
Miami
33
Sequential branch-and-bound
  • Structure the entire search space as a tree,
    sorted using nearest-city first heuristic

34
Pruning the search tree
  • Keep track of best solution found so far (the
    bound)
  • Cut-off partial routes gt bound

35
Parallelizing TSP
  • Distribute the search tree over the CPUs
  • CPUs analyze different routes
  • Results in reasonably large-grain jobs

36
Distribution of TSP search tree
Subtasks - New York -gt Chicago - New York -gt
Saint Louis - New York -gt Miami
CPU 1
CPU 2
CPU 3
37
Distribution of the tree (2)
  • Static distribution each CPU gets a fixed part
    of the tree
  • Load balancing problem subtrees take different
    amounts of time

3
2
2
3
3
4
4
1
1
3
3
4
4
1
38
Dynamic distribution Replicated Workers Model
  • Master process generates large number of jobs
    (subtrees) and repeatedly hands them out
  • Worker processes (subcontractors) repeatedly take
    work and execute it
  • 1 worker per processor
  • General, frequently-used model for parallel
    processing

39
Implementing TSP in SR
  • Need communication to distribute work
  • Need communication to implement global bound

40
Distributing work
  • Master generates jobs to be executed by workers
  • Not known in advance which worker will execute
    which job
  • A mailbox (port with gt1 receivers) would have
    helped
  • Use intermediate buffer process instead

workers
Master
buffer
41
Implementing the global bound
  • Problem the bound is a global variable, but it
    must be implemented with message passing
  • The bound is accessed millions of times, but
    updated only when a better route is found
  • Only efficient solution is to manually replicate
    it

42
Managing a replicated variable in SR
  • Use a BoundManager process to serialize updates

BoundManager
Worker 1
Worker 2
M 3
Process 2 assigns to M
Assign asynchr. explicit ordered recv.
Update synchr.implicit recv.multicast
43
SR code fragments for TSP
body worker var M int Infinite copy of
bound sem sema semaphore proc
update(value int) P(sema) lock copy
M value V(sema) unlock
end update initial main code for worker
- can read M (using sema) - can use
send BoundManager.Assign(value)
body BoundManager var M int Infinite
do true -gt handle requests 1 by 1 in
Assign(value) by value -gt if value lt
M -gt M value
co(i 1 to ncpus) multicast
call workeri.update(value)
co fi ni od end
BoundManager
44
Search overhead
Problem Path with length6 not yet computed by
CPU 1 when CPU 3 starts n-gtm-gts Parallel
algorithm does more work than sequential
algorithm search overhead
CPU 1
CPU 2
CPU 3
45
Performance of TSP in SR
  • Communication overhead
  • Distribution of jobs updating the global bound
    (small overhead)
  • Load imbalances
  • Replicated worker model has automatic load
    balancing
  • Synchronization overhead
  • Mutual exclusion (locking) needed for accessing
    copy of bound
  • Search overhead
  • Main performance problem
  • In practice high speedups possible
Write a Comment
User Comments (0)
About PowerShow.com