Title: External-Memory MST
1External-Memory MST
2Minimum-Spanning Tree
- Given a weighted, undirected graph G(V,E), the
minimum-spanning tree (MST) problem is the
problem of finding a spanning tree for G of
minimum weight. - Assumptions
- G is connected
- No two edges in G have the same weight.
3External-Memory Graph Algorithms
- Standard two-level I/O model with a single disk
- N V E
- M number of vertices/edges that can fit into
internal memory. - B number of vertices/edges per disk block.
- The graph is given as a list of edges sorted by
vertex.
4External-Memory Graph Algorithms (2)
- For MST and CC, randomize O(sort(E)) I/Os
algorithms are known.
5Prims Algorithm
7
b,a
1
3
a,c
5
c,d
d,e
9
8
6
2
a, f
4
a b c d e f
Priority Queue
6Prims Algorithm (2)
- Prims algorithm cannot be implemented
efficiently in external memory - It is not guaranteed that even the priority queue
alone fits in memory. - Thus, we cannot in general get the current vertex
priority without using an I/O. - A direct implementation leads to an ?(E) I/O
algorithm.
7Prims Algorithm (3)
Modification store edges in the priority-queue
instead of vertices.
7
b,a
1
3
a,c
5
c,d
d,e
9
8
6
2
a, f
4
d,e (4) b,d (6) c,b (5) a, f
(7) b,c (5) c,e (8) d,b (6)
b,d (6) e,c (8) d,b (6) c,e
(8) a, f (7) e, f (9)
a, f (7) e,c (8) c,e (8) e, f (9)
a,c (3) b,c (5) b,d (6) a, f (7)
c,d (2) b,d (6) c,b (5) a, f
(7) b,c (5) c,e (8)
c,b (5) a, f (7) b,c (5) e,c
(8) b,d (6) c,e (8) d,b (6) e,
f (9)
e,c (8) c,e (8) e, f (9) f, e (9)
b,a (1) b,c (5) b,d (6)
Any two edges have distinct weights
Priority Queue
8Modified Prim Algorithm
- The correctness follows directly from the
correctness of the original algorithm (blue
rule still applies). - Efficiency
- At least one I/O per vertex in order to read its
adjacency list gt O(V E/B) I/Os. - O(E) operations on external priority queue can be
performed in O(sort(E)). - Thus in total we have O(V sort(E)) I/Os.
9Boruvkas Algorithm
(1) Select for each vertex the minimum weight
edge adjacent to it. (2) Contract the graph and
return to (1)
b,a
7
1
3
5
c,d
d,e
9
8
6
2
a, f
4
10Boruvkas Algorithm
(1) Select for each vertex the minimum weight
edge adjacent to it. (2) Contract the graph and
return to (1)
b,a
abf
a,c
c,d
3,5,6,9
d,e
a, f
cde
11External-Memory Boruvkas Step
- For each vertex v, let C(v) be the lightest
vertex adjacent to it. - Let G be the graph obtained by taking only edges
of the form (v, C(v)) for each v. - Let Gd be the graph obtained by directing each
edge (v, C(v)) in G from C(v) to v. - The goal is to contract each connected component
in G into a single vertex.
12Unique Representatives
- In each connected component of Gd
- Each vertex has indegree 1.
- The weight of the edges along any root-leaf path
is increasing. - There is exactly one cycle, consisting of the
minimal weight edge.
13External-Memory Boruvkas Step (2)
- The roots can be easily identified, and we can
choose them to be the unique representatives of
the components in G. - We would like to replace each edge (u, v) with an
edge (ur, vr), where ur and vr are the unique
representatives of the components containing u
and v respectively. - Then, we can remove parallel self edges, and
obtain the contracted graph.
14External-Memory Boruvkas Step (3)
L
Output
(b,a) (1) (a, f) (7) (c,d) (2) (d,e) (4) (d,e)
(4) (a, f) (7)
G
G
Gd
b ? b c ? c a ? b d ? c f ? b e ? c
1
7
3
5
9
8
Priority Queue
6
2
a (1) b d (2) c
d (2) c f (7) b
e (4) c f (7) b
4
Initialized with each vertex that is an immediate
successor of a root vertex.
15External-Memory Boruvkas Step (4)
- To finish the contraction
- sort the output of the previous phase and E by
the first component. Then scan the two lists
simultaneously, replacing each edge (v, u) in E
with (vr,u). - sort the output and E by the second component,
and then scan the two lists replacing each edge
(vr, u) in E with (vr, ur). - sort E by both components and by weight, and with
a single scan remove duplicate self edges.
16Boruvkas Step - I/O efficiency
- Lightest incident edges can be collected in
O(E/B) I/Os in a simple scan of the edge-list
representation of G (we assume it is sorted). - Detection of cycles in Gd can be done in
O(sort(V)) I/Os - sort the collected edges by weight and find
duplicates in a single scan. - remove edges to break cycles and identify unique
representatives.
17Boruvkas Step - I/O efficiency (2)
- The list L contains each edge in Gd at most
twice, and can be constructed in O(sort(V)) I/Os
- sort one instance of the list of edges by the
second component. - sort another instance by the first component.
- create the structure of L in a single scan and
sort it by weight. - 4. The PQ can be initialized in a similar way in
O(sort(V)) I/Os.
18Boruvkas Step - I/O efficiency (3)
- 5. We perform a total of V insertions to PQ, and
V extract-min operations. That can be performed
in O(sort(V)) I/Os. - 6. Replacing the edges of G with the unique
representatives is done using a few sorting and
scanning operations as described before. Here the
entire edge list is sorted, and thus O(sort(E))
I/Os are needed. - Total
- O(E/B sort(V) sort(E)) O(sort(E)) I/Os.
19Results So Far
O(V sort(E)) I/Os
Modified Prim
O(sort(E) lgV) I/Os
Modified Boruvka
O(sort(E)lg(VB/E)) I/Os
- Contract G until V E/B using Boruvkas steps.
- Run Prim on the result.
It is possible to perform lg(VB/E) Boruvkas
steps using lglg(VB/E) superphases requiring
O(sort(E)) I/Os each.
20Yet a better MST algorithm
- Superphase Algorithm
- At superphase i
- Let Ni 2(3/2)i (Ni1 Ni(Ni)1/2)
- Let Gi (Vi, Ei) be the graph prior to
superphase i. - Let Ei ? Ei be the set that for each vertex
contains the ?vNi? lightest edges incident to it. - Let the blocking value for a vertex be the weight
of the ?vNi 1?th lightest edge incident to it
(or infinity if no such edge exists). - Ei and blocking values can be found with
O(sort(Ei)) I/Os as described earlier.
21Superphase Algorithm
- At superphase i, perform on Gi ?logvNi?
contraction phases as described before, but now
select the lightest edge incident to a vertex
only if it is smaller than its blocking value. - After a single contraction, the blocking value of
a supervertex is set to be the minimum of the
blocking values of the contracted vertices. - After that, the remaining edges of Ei contain
all edges of Ei adjacent to supervertex v with
weight smaller than the blocking value of v. - Thus only edges that actually belong to the MST
are contracted.
22Superphase Algorithm (2)
- But how many vertices remain after each
superphase? - The blocking value might prevents us from
selecting an edge for v. But if so than - The blocking value of v corresponds to the
blocking value of some vertex u in Vi, and v must
contain the ?vNi? edges adjacent to u in Ei. - Thus v must be the contraction of at least vNi
vertices from Vi - If no blocking value prevents us from selecting
an edge for v, then after ?logvNi? phases, v must
be the contraction of at least 2logvNi vNi
vertices.
23Superphase Algorithm (3)
- It can be proved by induction on i that Vi 2V /
Ni - For i 0, Ni 2 and V0 V.
- Vi1 Vi / vNi (2V / Ni) / vNi 2V /
Ni1 - Conclusion Ei Vi ?vNi? 2V / vNi
- Thus, in order to reduce the number of vertices
by a factor of vNi we used so far - O(sort(Ei) sort(Ei) logvNi)
- O(sort(E) sort(V / vNi) logvNi)
- O(sort(E)) I/Os.
24Superphase Algorithm (4)
- In order to finish a superphase, we need to
reincorporate edges from Ei not selected to Ei - During the contraction phases, maintain a list C
of the form (v, vs) for v ? Vi. - Use the output of the Boruvkas step, as
described earlier, in order to update C - Sort C by second component and the output by
first component and scan them simultaneously. - This is done using O(sort(Vi)) I/Os.
- In total, in order to maintain C, we use
- O(sort(Vi)logvNi) O(sort(V / Ni)logvNi)
O(sort(V)) I/Os.
25Superphase Algorithm I/O Efficiency
- Ei and blocking values are computed in
O(sort(Ei)) I/Os. - Each superphase takes up O(sort(E)) I/Os.
- Maintaining the list C during the superphase is
done with O(sort(V)) I/Os. - Given C, the edges in (Ei \ Ei) can be
reincorporated in O(sort(E)) as we did in the
single contraction algorithm. - Finally, in order to reduce V to E/B,
log3/2lg(VB / E) superphases are needed. - Total O(sort(E)lglg(VB / E)) I/Os.