Title: Locality Sensitive Distributed Computing Exercise Set 2
1Locality Sensitive Distributed ComputingExercise
Set 2
- David PelegWeizmann Institute
2Basic partition construction algorithm
Simple distributed implementation for Algorithm
BasicPart Single thread of computation (single
locus of activity at any given moment)
3Basic partition construction algorithm
Components ClusterCons Procedure for
constructing a cluster around a chosen center
v NextCtr Procedure for selecting the next
center v around which to grow a cluster RepEdge
Procedure for selecting a representative
inter-cluster edge between any two adjacent
clusters
4Cluster construction procedure ClusterCons
Goal Invoked at center v, construct cluster and
BFS tree (rooted at v) spanning it Tool
Variant of Dijkstra's algorithm.
5Recall Dijkstras BFS algorithm
phase p1
6Main changes to Algorithm DistDijk
1. Ignoring covered vertices Global BFS
algorithm sends exploration msgs to all neighbors
save those known to be in tree New variant
ignores also vertices known to belong to
previously constructed clusters 2. Bounding
depth BFS tree grown to limited depth, adding
new layers tentatively, based on halting
condition (G(S) lt Sn1/k)
7Distributed Implementation
- Before deciding to expand tree T
- by adding newly discovered layer L
-
- Count vertices in L by convergecast process
- Leaf w ? T set Zw new children in L
- Internal vertex add and upcast counts.
8Distributed Implementation
- Root compare final count Zv to total vertices
in T (known from previous phase). - If ratio n1/k, then broadcast next Pulse msg
- (confirm new layer and start next phase)
- Otherwise, broadcast message Reject
- (reject new layer, complete current cluster)
- Final broadcast step has 2 more goals
- mark cluster by unique name (e.g., ID of root),
- inform all vertices of new cluster name
9Distributed Implementation (cont)
This information is used to define cluster
borders. I.e., once cluster is complete, each
vertex in it informs all neighbors of its new
residence. ? nodes of cluster under construction
know which neighbors already belong to existing
clusters.
10Center selection procedure NextCtr
Fact Algorithm's center of activity always
located at currently constructed cluster
C. Idea Select as center for next cluster some
vertex v adjacent to C ( v from rejected
layer) Implementation Via convergecast
process. (leaf pick arbitrary neighbor from
rejected layer, upcast to parent internal node
upcast arbitrary candidate)
11Center selection procedure (NextCtr)
Problem What if rejected layer is empty?
(It might still be that the entire process is not
yet complete there may be some yet unclustered
nodes elsewhere in G)
??
?
r0
12Center selection procedure (NextCtr)
Solution Traverse the graph (using cluster
construction procedure within a global search
procedure)
?
r0
13Distributed Implementation
- Use DFS algorithm for traversing the tree of
- constructed cluster.
- Start at originator vertex r0, invoke ClusterCons
to construct the first cluster. - Whenever the rejected layer is nonempty, choose
one rejected vertex as next cluster center - Each cluster center marks a parent cluster
in the cluster DFS tree, namely, the cluster from
which it was selected
14Distributed Implementation (cont)
- DFS algorithm (cont)
- Once the search cannot progress forward (rejected
layer is empty) - the DFS backtracks to previous cluster and looks
for new center among neighboring nodes - If no neighbors are available, the DFS process
continues backtracking on the cluster DFS tree
15Inter-cluster edge selection RepEdge
Goal Select one representative inter-cluster
edge between every two adjacent clusters C and C'
E(C,C') edges connecting C and C'
(known to endpoints in C, as C vertices know the
cluster-residence of each neighbor)
r0
16Inter-cluster edge selection RepEdge
? Representative edge can be selected by
convergecast process on all edges of
E(C,C'). Requirement C and C' must select same
edge Solution Using unique ordering of edges
- pick minimum E(C,C') edge. Q Define unique
edge order by unique ID's?
17Inter-cluster edge selection (RepEdge)
E.g., Define ID-weight of edge e(v,w), where
ID(v) lt ID(w), as pair h ID(v),ID(w) i, and
order ID-weights lexicographically This ensures
distinct weights and allows consistent selection
of inter-cluster edges
18Inter-cluster edge selection (RepEdge)
- Problem
- Cluster C must carry selection process
- for every adjacent cluster C' individually
- Solution
- Inform each C vertex of identities of all
clusters adjacent to C by convergecast
broadcast - Pipeline individual selection processes
19Analysis
(C1,C2,...,Cp) clusters constructed by
algorithm For cluster Ci Ei edges with at
least one endpoint in Ci ni Ci, mi Ei,
riRad(Ci)
20Analysis (cont)
ClusterCons Depth-bounded Dijkstra procedure
constructs Ci and BFS tree in O(ri2) time and
O(niri mi) messages
? Time(ClusterCons) ?i O(ri2) ?i O(rik)
k ?i O(ni) O(kn)
Q Prove O(n) bound
21Analysis (cont)
Ci and BFS tree cost O(ri2) time and O(niri
mi) messages
? Comm(ClusterCons) ?i O(niri mi)
Each edge occurs in 2 distinct sets Ei,
hence Comm(ClusterCons) O(nk E)
22Analysis (NextCtr)
DFS process on the cluster tree is more
expensive than plain DFS
DFS step
Deciding next step
visiting cluster Ci and deciding the next step
requires O(ri) time and O(ni) comm.
DFS step
23Analysis (NextCtr)
- DFS visits clusters in cluster tree O(p) times
- Entire DFS process (not counting Procedure
ClusterCons invocations) requires - Time(NextCtr) O(pk) O(nk)
- Comm(NextCtr) O(pn) O(n2)
24Analysis (RepEdge)
si neighboring clusters surrounding
Ci Convergecasting ID of neighboring cluster C'
in Ci costs O(ri) time and O(ni) messages For
all si neighboring clusters O(siri) time
(pipelining) O(sini) messages
25Analysis (RepEdge)
Pipelined inter-cluster edge selection
similar. As si n, we get Time(RepEdge) maxi
O(si ri) O(n) Comm(RepEdge) ?i O(si ni)
O(n2)
26Analysis
Thm Distributed Algorithm BasicPart
requires Time O(nk) Comm O(n2)
27Sparse spanners
Example - m-dimensional hypercube Hm(Vm,Em),
Vm0,1m, Em (x,y) x and y differ in
exactly one bit Vm2m, Emm 2m-1,
diameter m Ex Prove that for every m 0, the
m-cube has a 3-spanner with edges 72m
28Regional Matchings
Locality sensitive tool for distributed
match-making
29Distributed match making
Paradigm for establishing client-server
connection in a distributed system (via specified
rendezvous locations in the network)
Ads of server v written in locations Write(v)
v
client u reads ads in locations Read(u)
u
30Regional Matchings
Requirement read and write sets must
intersect for every v,u ? V, Write(v) Ã… Read(u)
? ?
Write(v)
v
Client u must find an ad of server v
Read(u)
u
31Regional Matchings (cont)
Distance considerations taken into
account Client u must find an ad of server
v only if they are sufficiently close
l-regional matching read and write sets RW
Read(v) , Write(v) v?V s.t. for
every v,u?V, dist(u,v) l ? Write(v) Ã…
Read(u) ? ?
32Regional Matchings (cont)
Degree parameters Dwrite(RW) maxv?V
Write(v) Dread(RW) maxv?V Read(v)
33Regional Matchings (cont)
Radius parameters Strwrite(RW) maxu,v?V
dist(u,v) u ? Write(v) / l Strread(RW)
maxu,v?V dist(u,v) u ? Read(v) / l
34Regional matching construction
- Given graph G, k,l 1,
- construct regional matching RWl,k
- Set S ? Gsl(V)
- (l-neighborhood
- cover)
35Regional matching construction
- Build coarsening cover T as in
Max-Deg-Cover Thm
36Regional matching construction
- Select a center vertex r0(T) in each cluster T?T
37Regional matching construction
- Select for every v a cluster Tv?T s.t. Gl(v) ?
Tv
TvT1
Gl(v)
v
38Regional matching construction
- Set
- Read(v) r0(T) v?T
- Write(v) r0(Tv)
-
T1
r1
Gl(v)
v
Read(v) r1,r2,r3 Write(v) r1
39Analysis
Claim Resulting RWl,k is an l-regional
matching. Proof Consider u,v such that
dist(u,v) l Let Tv be cluster s.t. Write(v)
r0(Tv)
40Analysis (cont)
By definition, u ? Gl(v). Also Gl(v) ? Tv ? u ?
Tv ? r0(Tv) ? Read(u) ? Read(u) Ã… Write(v) ? ?
41Analysis (cont)
Thm For every graph G(V,E,w), l,k1, there is
an l-regional matching RWl,k with Dread(RWl,k)
2k n1/k Dwrite(RWl,k) 1 Strread(RWl,k)
2k1 Strwrite(RWl,k) 2k1
42Analysis (cont)
- Taking klog n we get
- Corollary For every graph G(V,E,w), l1,
- there is an l-regional matching RWl with
- Dread(RWl) O(log n)
- Dwrite(RWl) 1
- Strread(RWl) O(log n)
- Strwrite(RWl) O(log n)