Lectures on Parallel and Distributed Algorithms - PowerPoint PPT Presentation

About This Presentation

Title:

Lectures on Parallel and Distributed Algorithms

Description:

Lectures on Parallel and Distributed Algorithms COMP 523: Advanced Algorithmic Techniques Lecturer: Dariusz Kowalski * Lectures on Parallel and Distributed Algorithms – PowerPoint PPT presentation

Number of Views:104

Avg rating:3.0/5.0

Slides: 21

Provided by: Dare70

Category:

more less

Transcript and Presenter's Notes

Title: Lectures on Parallel and Distributed Algorithms

1
Lectures on Parallel and Distributed Algorithms

COMP 523 Advanced Algorithmic Techniques
Lecturer Dariusz Kowalski

2
Overview

These lectures
Parallel machine
Prefix computation
Distributed computing
Consensus problem

3
Parallel machine - model

Set of n processors and m memory cells
Computation in synchronized rounds
During one round each processor does either of
local computation step (constant local cache)
read/write to shared memory
Minimize
Time
Work (total number of processors steps)
Number of processors
Additional memory

4
Types of parallel machines

EREW Exclusive Read Exclusive Write
CREW Concurrent Read Exclusive Write
ERCW Exclusive Read Concurrent Write
CRCW Concurrent Read Concurrent Write
In each round a cell can be either read or
written
Exclusive Read/Write only one processor can
read/write to a memory cell during one round
Concurrent Read/Write many processors can
read/write to a memory cell during one round
Concurrent Write arbitrary, maximum, sum, etc.

5
Problem - prefix computation

Input m memory cells with integers
Goal for each cell i compute a function F(1,i),
where F(?,?) is such that
F(i,k) can be computed in constant time from
F(i,j) and F(j1,k) for any j between i and k
F(i,i) is a value stored originally in cell i
Examples
Computing a maximum (for every prefix)
Computing a sum (for every prefix)

6
CRCW - simple solution

Let the result of the concurrent writing of two
processors be according to the function F(?,?)
m memory cells, m additional memory cells, m2
processors
Algorithm
Processor with Id imj reads cell i ? j ? m and
then writes the value to cell j
Time 2 Memory m Work O(m2)

7
EREW - algorithm

m memory cells, n m/log m processors
Additional array M1n
Recursive Algorithm
Parallel Preprocessing
each processor i sequentially computes functions
F(i log m 1 , i log m 1) , , F(i log m
1,(i1)log m)
then writes Mi F(i log m 1,(i1)log m)
Parallel Recursion (pointer jumping)
in step 1 ? t ? log n if i - 2t-1gt 0 then a
processor with ID i reads Mi - 2t-1 and
combines it with its current value Mi -- as if
Mi - 2t-1 correspond to F((i - 2t) log m 1
, (i - 2t-1) log m) and as if Mi correspond to
F((i - 2t-1) log m 1 , i log m) -- and writes
the result to Mi
Parallel Post-processing
each processor i sequentially computes functions
F(1 , i log m 1),,F(1 , (i1)log m) using
value F(1, i log m) stored in Mi and
previously computed (in preprocessing part)
values
F(i log m 1 , i log m 1) , , F(i log m
1,(i1)log m)

8
Analysis

Correctness
It is sufficient to show that after step t of
recursive part each location Mi contains
computed value F(max1 , (i - 2t) log m 1 , i
log m)
Proof by induction
for t 1 it follows from initialization of M
and preprocessing part
the inductive step follows immediately from the
recursive algorithm
Memory O(n)
for additional memory M used during recursion
or none if modify the original values
Time O(log m)
Parallel preprocessing and post-processing
O(log m)
Parallel recursion O(log m)
Work O(m)
time O(log m) times number of processors O(m/log
m)

9
Conclusions

Prefix computation
Finding maximum/minimum
Computing sums
for all m prefixes, in optimal logarithmic time
and linear work

10
Textbook and Questions

How to modify the prefix algorithms for
smaller/larger number of processors?
There is given a regular expression containing
braces of type ( ) and . How to check in
parallel, in logarithmic time, if it is a proper
expression (each open brace has its corresponding
closing counterpart)?
Is it easier if there is only one kind of braces
in the expression?

11
Distributed message-passing model

Set of n processors/processes with different IDs
p1,...,pn
In each step each processor can either (depending
on the algorithm)
send a message to any subset of other processors
receive incoming messages
perform local computation
Computation can be either (depending on the
adversary)
in synchronized rounds in a round every
processor performs three steps local
computation, sending and receiving, e.g., (p1,p2,
p3), (p1,p2, p3), (p1,p2, p3),...
in asynchronous pattern steps are done according
to some arbitrary order unknown to the
processors, e.g., p1,p2,p2,p3,p2,p3,p2,p1,...

12
Fault-tolerance

Failures in the system
Lack of synchrony unknown order of steps is
generated by the adversary
Processors crashes adversary decides which
processors crash and chooses steps for these
events
Messages are lost (not properly sent or
received) malicious processors/links are
selected by the adversary
Byzantine failures processors may cheat, e.g.,
can behave on the way described above, mess up
content of messages, pretend they have different
ID, etc.

13
Analysis of distributed algorithms

Designing the algorithm, our goal is to prove
Correctness because the lack of central
information and because of failures
Termination because of the lack of central
control
Efficiency
Time
Work (total number of processors steps)
Number of messages sent
Total size of messages sent

14
Consensus in synchronous crash model

Consensus
Each processor has its initial value
Goal processors decide on the same value among
initial ones
We require from the algorithm
Agreement no two processors decide on different
value
Termination each processor decides eventually
unless fails
Validity if all initial values are the same then
this value is a decision

15
Model for consensus problem

We consider model with crash failures (easier
than
others, e.g., Byzantine failures) a processor
stops every
activity, and messages sent during crash are
delivered or
lost arbitrarily (depending on the adversary)
Asynchronous impossible to solve even if one
processor can crash
Synchronous requires at least f 1 rounds if f
processors crash
Consensus can be viewed as a kind of
maximum-finding
problem lets agree on the largest initial value
(although
could be easier, since we could agree on any
initial value)

16
Flooding algorithm for consensus

f-resilient algorithm algorithm that solves
consensus problem if at most f crashes occur
Flooding Algorithm
During each round 1 ? j ? f 1 each processor
sends to all other processors all the initial
values about which it has already learnt
Decision of a processor if the set of collected
initial values is a singleton then decide on this
value, otherwise decide on default value (e.g.,
maximum)

17
Flooding algorithm - example

4 processors, f 2 crashes, default maximum
Init R1 R2 R3 Decision
p1 1 --- --- --- ---
p2 0 0,1 --- --- ---
p3 0 0 0,1 0,1 1
p4 0 0 0 0,1 1

18
Analysis of Flooding algorithm

Agreement there is a round j (clean) when no
crash occurs. During this round all non-faulty
processors exchange messages, hence sets of
collected values will be the same after this
round. Obviously they will not change after this
round, and consequently all non-faulty processors
decide the same
Termination after round f 1
Validity if all initial values are the same, set
of collected initial values is always a
singleton, and decision is on this value
otherwise on max among received values
Message complexity - total number of messages
sent O(f n2)

19
Decreasing message complexity

Modification of the algorithm
Processor sends messages to all processors during
the first round and during round j gt 1 only if in
the previous round it has learnt about a new
initial value
Termination and Validity remain the same
Agreement similar argument the only difference
that the message exchange may not happen in a
clean round, but by the end of the clean round
all previously learnt values were sent before
this round, new ones are sent during this round
Communication there are constant number of
different values and each of them causes sending
it as newly learnt value at most n times, each
time to at most n-1 processors, hence in total
O(n2) messages.