Lectures on Parallel and Distributed Algorithms - PowerPoint PPT Presentation

About This Presentation
Title:

Lectures on Parallel and Distributed Algorithms

Description:

Lectures on Parallel and Distributed Algorithms COMP 523: Advanced Algorithmic Techniques Lecturer: Dariusz Kowalski * Lectures on Parallel and Distributed Algorithms – PowerPoint PPT presentation

Number of Views:104
Avg rating:3.0/5.0
Slides: 21
Provided by: Dare70
Category:

less

Transcript and Presenter's Notes

Title: Lectures on Parallel and Distributed Algorithms


1
Lectures on Parallel and Distributed Algorithms
  • COMP 523 Advanced Algorithmic Techniques
  • Lecturer Dariusz Kowalski

2
Overview
  • These lectures
  • Parallel machine
  • Prefix computation
  • Distributed computing
  • Consensus problem

3
Parallel machine - model
  • Set of n processors and m memory cells
  • Computation in synchronized rounds
  • During one round each processor does either of
  • local computation step (constant local cache)
  • read/write to shared memory
  • Minimize
  • Time
  • Work (total number of processors steps)
  • Number of processors
  • Additional memory

4
Types of parallel machines
  • EREW Exclusive Read Exclusive Write
  • CREW Concurrent Read Exclusive Write
  • ERCW Exclusive Read Concurrent Write
  • CRCW Concurrent Read Concurrent Write
  • In each round a cell can be either read or
    written
  • Exclusive Read/Write only one processor can
    read/write to a memory cell during one round
  • Concurrent Read/Write many processors can
    read/write to a memory cell during one round
  • Concurrent Write arbitrary, maximum, sum, etc.

5
Problem - prefix computation
  • Input m memory cells with integers
  • Goal for each cell i compute a function F(1,i),
    where F(?,?) is such that
  • F(i,k) can be computed in constant time from
    F(i,j) and F(j1,k) for any j between i and k
  • F(i,i) is a value stored originally in cell i
  • Examples
  • Computing a maximum (for every prefix)
  • Computing a sum (for every prefix)

6
CRCW - simple solution
  • Let the result of the concurrent writing of two
    processors be according to the function F(?,?)
  • m memory cells, m additional memory cells, m2
    processors
  • Algorithm
  • Processor with Id imj reads cell i ? j ? m and
    then writes the value to cell j
  • Time 2 Memory m Work O(m2)

7
EREW - algorithm
  • m memory cells, n m/log m processors
  • Additional array M1n
  • Recursive Algorithm
  • Parallel Preprocessing
  • each processor i sequentially computes functions
  • F(i log m 1 , i log m 1) , , F(i log m
    1,(i1)log m)
  • then writes Mi F(i log m 1,(i1)log m)
  • Parallel Recursion (pointer jumping)
  • in step 1 ? t ? log n if i - 2t-1gt 0 then a
    processor with ID i reads Mi - 2t-1 and
    combines it with its current value Mi -- as if
    Mi - 2t-1 correspond to F((i - 2t) log m 1
    , (i - 2t-1) log m) and as if Mi correspond to
  • F((i - 2t-1) log m 1 , i log m) -- and writes
    the result to Mi
  • Parallel Post-processing
  • each processor i sequentially computes functions
  • F(1 , i log m 1),,F(1 , (i1)log m) using
    value F(1, i log m) stored in Mi and
    previously computed (in preprocessing part)
    values
  • F(i log m 1 , i log m 1) , , F(i log m
    1,(i1)log m)

8
Analysis
  • Correctness
  • It is sufficient to show that after step t of
    recursive part each location Mi contains
    computed value F(max1 , (i - 2t) log m 1 , i
    log m)
  • Proof by induction
  • for t 1 it follows from initialization of M
    and preprocessing part
  • the inductive step follows immediately from the
    recursive algorithm
  • Memory O(n)
  • for additional memory M used during recursion
  • or none if modify the original values
  • Time O(log m)
  • Parallel preprocessing and post-processing
    O(log m)
  • Parallel recursion O(log m)
  • Work O(m)
  • time O(log m) times number of processors O(m/log
    m)

9
Conclusions
  • Prefix computation
  • Finding maximum/minimum
  • Computing sums
  • for all m prefixes, in optimal logarithmic time
    and linear work

10
Textbook and Questions
  • How to modify the prefix algorithms for
    smaller/larger number of processors?
  • There is given a regular expression containing
    braces of type ( ) and . How to check in
    parallel, in logarithmic time, if it is a proper
    expression (each open brace has its corresponding
    closing counterpart)?
  • Is it easier if there is only one kind of braces
    in the expression?

11
Distributed message-passing model
  • Set of n processors/processes with different IDs
    p1,...,pn
  • In each step each processor can either (depending
    on the algorithm)
  • send a message to any subset of other processors
  • receive incoming messages
  • perform local computation
  • Computation can be either (depending on the
    adversary)
  • in synchronized rounds in a round every
    processor performs three steps local
    computation, sending and receiving, e.g., (p1,p2,
    p3), (p1,p2, p3), (p1,p2, p3),...
  • in asynchronous pattern steps are done according
    to some arbitrary order unknown to the
    processors, e.g., p1,p2,p2,p3,p2,p3,p2,p1,...

12
Fault-tolerance
  • Failures in the system
  • Lack of synchrony unknown order of steps is
    generated by the adversary
  • Processors crashes adversary decides which
    processors crash and chooses steps for these
    events
  • Messages are lost (not properly sent or
    received) malicious processors/links are
    selected by the adversary
  • Byzantine failures processors may cheat, e.g.,
    can behave on the way described above, mess up
    content of messages, pretend they have different
    ID, etc.

13
Analysis of distributed algorithms
  • Designing the algorithm, our goal is to prove
  • Correctness because the lack of central
    information and because of failures
  • Termination because of the lack of central
    control
  • Efficiency
  • Time
  • Work (total number of processors steps)
  • Number of messages sent
  • Total size of messages sent

14
Consensus in synchronous crash model
  • Consensus
  • Each processor has its initial value
  • Goal processors decide on the same value among
    initial ones
  • We require from the algorithm
  • Agreement no two processors decide on different
    value
  • Termination each processor decides eventually
    unless fails
  • Validity if all initial values are the same then
    this value is a decision

15
Model for consensus problem
  • We consider model with crash failures (easier
    than
  • others, e.g., Byzantine failures) a processor
    stops every
  • activity, and messages sent during crash are
    delivered or
  • lost arbitrarily (depending on the adversary)
  • Asynchronous impossible to solve even if one
    processor can crash
  • Synchronous requires at least f 1 rounds if f
    processors crash
  • Consensus can be viewed as a kind of
    maximum-finding
  • problem lets agree on the largest initial value
    (although
  • could be easier, since we could agree on any
    initial value)

16
Flooding algorithm for consensus
  • f-resilient algorithm algorithm that solves
    consensus problem if at most f crashes occur
  • Flooding Algorithm
  • During each round 1 ? j ? f 1 each processor
    sends to all other processors all the initial
    values about which it has already learnt
  • Decision of a processor if the set of collected
    initial values is a singleton then decide on this
    value, otherwise decide on default value (e.g.,
    maximum)

17
Flooding algorithm - example
  • 4 processors, f 2 crashes, default maximum
  • Init R1 R2 R3 Decision
  • p1 1 --- --- --- ---
  • p2 0 0,1 --- --- ---
  • p3 0 0 0,1 0,1 1
  • p4 0 0 0 0,1 1

18
Analysis of Flooding algorithm
  • Agreement there is a round j (clean) when no
    crash occurs. During this round all non-faulty
    processors exchange messages, hence sets of
    collected values will be the same after this
    round. Obviously they will not change after this
    round, and consequently all non-faulty processors
    decide the same
  • Termination after round f 1
  • Validity if all initial values are the same, set
    of collected initial values is always a
    singleton, and decision is on this value
    otherwise on max among received values
  • Message complexity - total number of messages
    sent O(f n2)

19
Decreasing message complexity
  • Modification of the algorithm
  • Processor sends messages to all processors during
    the first round and during round j gt 1 only if in
    the previous round it has learnt about a new
    initial value
  • Termination and Validity remain the same
  • Agreement similar argument the only difference
    that the message exchange may not happen in a
    clean round, but by the end of the clean round
    all previously learnt values were sent before
    this round, new ones are sent during this round
  • Communication there are constant number of
    different values and each of them causes sending
    it as newly learnt value at most n times, each
    time to at most n-1 processors, hence in total
    O(n2) messages.

20
Conclusion and Reading
  • Distributed models
  • Message-passing
  • Synchronous/asynchronous
  • Fault-tolerance
  • Distributed problems and algorithms
  • Consensus in synchronous crash setting
  • Textbook
  • Johnsonbaugh, Schaefer Algorithms, Chapter 12
  • Attiya, Welch Distributed Computing, Chapter 5
Write a Comment
User Comments (0)
About PowerShow.com