Locality Sensitive Distributed Computing Exercise Set 2 - PowerPoint PPT Presentation

About This Presentation
Title:

Locality Sensitive Distributed Computing Exercise Set 2

Description:

Internal vertex: add and upcast counts. Distributed Implementation ... upcast to parent. internal node: upcast arbitrary candidate) Center selection procedure ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 43
Provided by: yuv9
Category:

less

Transcript and Presenter's Notes

Title: Locality Sensitive Distributed Computing Exercise Set 2


1
Locality Sensitive Distributed ComputingExercise
Set 2
  • David PelegWeizmann Institute

2
Basic partition construction algorithm
Simple distributed implementation for Algorithm
BasicPart Single thread of computation (single
locus of activity at any given moment)
3
Basic partition construction algorithm
Components ClusterCons Procedure for
constructing a cluster around a chosen center
v NextCtr Procedure for selecting the next
center v around which to grow a cluster RepEdge
Procedure for selecting a representative
inter-cluster edge between any two adjacent
clusters
4
Cluster construction procedure ClusterCons
Goal Invoked at center v, construct cluster and
BFS tree (rooted at v) spanning it Tool
Variant of Dijkstra's algorithm.
5
Recall Dijkstras BFS algorithm
phase p1
6
Main changes to Algorithm DistDijk
1. Ignoring covered vertices Global BFS
algorithm sends exploration msgs to all neighbors
save those known to be in tree New variant
ignores also vertices known to belong to
previously constructed clusters 2. Bounding
depth BFS tree grown to limited depth, adding
new layers tentatively, based on halting
condition (G(S) lt Sn1/k)
7
Distributed Implementation
  • Before deciding to expand tree T
  • by adding newly discovered layer L
  • Count vertices in L by convergecast process
  • Leaf w ? T set Zw new children in L
  • Internal vertex add and upcast counts.

8
Distributed Implementation
  • Root compare final count Zv to total vertices
    in T (known from previous phase).
  • If ratio n1/k, then broadcast next Pulse msg
  • (confirm new layer and start next phase)
  • Otherwise, broadcast message Reject
  • (reject new layer, complete current cluster)
  • Final broadcast step has 2 more goals
  • mark cluster by unique name (e.g., ID of root),
  • inform all vertices of new cluster name

9
Distributed Implementation (cont)
This information is used to define cluster
borders. I.e., once cluster is complete, each
vertex in it informs all neighbors of its new
residence. ? nodes of cluster under construction
know which neighbors already belong to existing
clusters.
10
Center selection procedure NextCtr
Fact Algorithm's center of activity always
located at currently constructed cluster
C. Idea Select as center for next cluster some
vertex v adjacent to C ( v from rejected
layer) Implementation Via convergecast
process. (leaf pick arbitrary neighbor from
rejected layer, upcast to parent internal node
upcast arbitrary candidate)
11
Center selection procedure (NextCtr)
Problem What if rejected layer is empty?
(It might still be that the entire process is not
yet complete there may be some yet unclustered
nodes elsewhere in G)
??
?
r0
12
Center selection procedure (NextCtr)
Solution Traverse the graph (using cluster
construction procedure within a global search
procedure)
?
r0
13
Distributed Implementation
  • Use DFS algorithm for traversing the tree of
  • constructed cluster.
  • Start at originator vertex r0, invoke ClusterCons
    to construct the first cluster.
  • Whenever the rejected layer is nonempty, choose
    one rejected vertex as next cluster center
  • Each cluster center marks a parent cluster
    in the cluster DFS tree, namely, the cluster from
    which it was selected

14
Distributed Implementation (cont)
  • DFS algorithm (cont)
  • Once the search cannot progress forward (rejected
    layer is empty)
  • the DFS backtracks to previous cluster and looks
    for new center among neighboring nodes
  • If no neighbors are available, the DFS process
    continues backtracking on the cluster DFS tree

15
Inter-cluster edge selection RepEdge
Goal Select one representative inter-cluster
edge between every two adjacent clusters C and C'
E(C,C') edges connecting C and C'
(known to endpoints in C, as C vertices know the
cluster-residence of each neighbor)
r0
16
Inter-cluster edge selection RepEdge
? Representative edge can be selected by
convergecast process on all edges of
E(C,C'). Requirement C and C' must select same
edge Solution Using unique ordering of edges
- pick minimum E(C,C') edge. Q Define unique
edge order by unique ID's?
17
Inter-cluster edge selection (RepEdge)
E.g., Define ID-weight of edge e(v,w), where
ID(v) lt ID(w), as pair h ID(v),ID(w) i, and
order ID-weights lexicographically This ensures
distinct weights and allows consistent selection
of inter-cluster edges
18
Inter-cluster edge selection (RepEdge)
  • Problem
  • Cluster C must carry selection process
  • for every adjacent cluster C' individually
  • Solution
  • Inform each C vertex of identities of all
    clusters adjacent to C by convergecast
    broadcast
  • Pipeline individual selection processes

19
Analysis
(C1,C2,...,Cp) clusters constructed by
algorithm For cluster Ci Ei edges with at
least one endpoint in Ci ni Ci, mi Ei,
riRad(Ci)
20
Analysis (cont)
ClusterCons Depth-bounded Dijkstra procedure
constructs Ci and BFS tree in O(ri2) time and
O(niri mi) messages
? Time(ClusterCons) ?i O(ri2) ?i O(rik)
k ?i O(ni) O(kn)
Q Prove O(n) bound
21
Analysis (cont)
Ci and BFS tree cost O(ri2) time and O(niri
mi) messages
? Comm(ClusterCons) ?i O(niri mi)
Each edge occurs in 2 distinct sets Ei,
hence Comm(ClusterCons) O(nk E)
22
Analysis (NextCtr)
DFS process on the cluster tree is more
expensive than plain DFS
DFS step
Deciding next step
visiting cluster Ci and deciding the next step
requires O(ri) time and O(ni) comm.
DFS step
23
Analysis (NextCtr)
  • DFS visits clusters in cluster tree O(p) times
  • Entire DFS process (not counting Procedure
    ClusterCons invocations) requires
  • Time(NextCtr) O(pk) O(nk)
  • Comm(NextCtr) O(pn) O(n2)

24
Analysis (RepEdge)
si neighboring clusters surrounding
Ci Convergecasting ID of neighboring cluster C'
in Ci costs O(ri) time and O(ni) messages For
all si neighboring clusters O(siri) time
(pipelining) O(sini) messages
25
Analysis (RepEdge)
Pipelined inter-cluster edge selection
similar. As si n, we get Time(RepEdge) maxi
O(si ri) O(n) Comm(RepEdge) ?i O(si ni)
O(n2)
26
Analysis
Thm Distributed Algorithm BasicPart
requires Time O(nk) Comm O(n2)
27
Sparse spanners
Example - m-dimensional hypercube Hm(Vm,Em),
Vm0,1m, Em (x,y) x and y differ in
exactly one bit Vm2m, Emm 2m-1,
diameter m Ex Prove that for every m 0, the
m-cube has a 3-spanner with edges 72m
28
Regional Matchings
Locality sensitive tool for distributed
match-making
29
Distributed match making
Paradigm for establishing client-server
connection in a distributed system (via specified
rendezvous locations in the network)
Ads of server v written in locations Write(v)
v
client u reads ads in locations Read(u)
u
30
Regional Matchings
Requirement read and write sets must
intersect for every v,u ? V, Write(v) Ã… Read(u)
? ?
Write(v)
v
Client u must find an ad of server v
Read(u)
u
31
Regional Matchings (cont)
Distance considerations taken into
account Client u must find an ad of server
v only if they are sufficiently close
l-regional matching read and write sets RW
Read(v) , Write(v) v?V s.t. for
every v,u?V, dist(u,v) l ? Write(v) Ã…
Read(u) ? ?
32
Regional Matchings (cont)
Degree parameters Dwrite(RW) maxv?V
Write(v) Dread(RW) maxv?V Read(v)
33
Regional Matchings (cont)
Radius parameters Strwrite(RW) maxu,v?V
dist(u,v) u ? Write(v) / l Strread(RW)
maxu,v?V dist(u,v) u ? Read(v) / l
34
Regional matching construction
  • Given graph G, k,l 1,
  • construct regional matching RWl,k
  • Set S ? Gsl(V)
  • (l-neighborhood
  • cover)

35
Regional matching construction
  • Build coarsening cover T as in
    Max-Deg-Cover Thm

36
Regional matching construction
  • Select a center vertex r0(T) in each cluster T?T

37
Regional matching construction
  • Select for every v a cluster Tv?T s.t. Gl(v) ?
    Tv

TvT1
Gl(v)
v
38
Regional matching construction
  • Set
  • Read(v) r0(T) v?T
  • Write(v) r0(Tv)

T1
r1
Gl(v)
v
Read(v) r1,r2,r3 Write(v) r1
39
Analysis
Claim Resulting RWl,k is an l-regional
matching. Proof Consider u,v such that
dist(u,v) l Let Tv be cluster s.t. Write(v)
r0(Tv)
40
Analysis (cont)
By definition, u ? Gl(v). Also Gl(v) ? Tv ? u ?
Tv ? r0(Tv) ? Read(u) ? Read(u) Ã… Write(v) ? ?
41
Analysis (cont)
Thm For every graph G(V,E,w), l,k1, there is
an l-regional matching RWl,k with Dread(RWl,k)
2k n1/k Dwrite(RWl,k) 1 Strread(RWl,k)
2k1 Strwrite(RWl,k) 2k1
42
Analysis (cont)
  • Taking klog n we get
  • Corollary For every graph G(V,E,w), l1,
  • there is an l-regional matching RWl with
  • Dread(RWl) O(log n)
  • Dwrite(RWl) 1
  • Strread(RWl) O(log n)
  • Strwrite(RWl) O(log n)
Write a Comment
User Comments (0)
About PowerShow.com