IO Efficient Minimum Spanning Tree Algorithm - PowerPoint PPT Presentation

1 / 40

About This Presentation

Title:

IO Efficient Minimum Spanning Tree Algorithm

Description:

The connectivity problems in IO efficient manner (pre-knowledge for MST in external memory) ... For the unchecked edge (x, y) with the smallest weight, if L(x)!=L(y) ... – PowerPoint PPT presentation

Number of Views:79

Avg rating:3.0/5.0

Slides: 41

Provided by: course2

Category:

more less

Transcript and Presenter's Notes

Title: IO Efficient Minimum Spanning Tree Algorithm

1
IO Efficient Minimum Spanning Tree Algorithm

Dawei Chen
20080415

2
Outline

What is Minimum Spanning Tree (MST)
MST algorithms in internal memory
The connectivity problems in IO efficient manner
(pre-knowledge for MST in external memory)
IO Efficient MST algorithm

3
Outline

What is Minimum Spanning Tree (MST)
MST algorithms in internal memory
The connectivity problems in IO efficient manner
(pre-knowledge for MST in external memory)
IO Efficient MST algorithm

4
Minimum Spanning Tree (MST)

Spanning Tree (ST) Given a connected, undirected
graph, a spanning tree of it is a subgraph which
is a tree and contains all its vertices.
Minimum Spanning Tree (MST) Given a connected,
undirected graph and assign weights to all its
edges, an MST is an ST whose sum of the weights
of all its edges is not larger than any of other
STs.

5
Minimum Spanning Tree (MST)
6
Outline

What is Minimum Spanning Tree (MST)
MST algorithms in internal memory
The connectivity problems in IO efficient manner
(pre-knowledge for MST in external memory)
IO Efficient MST algorithm

7
MST algorithms in internal memory

Prims and Kruskal's algorithm
Only introduce Kruskals algorithm

It works as follows 1.create a forest F (a set
of trees), where each vertex in the graph is a
separate tree 2.create a set S containing all
the edges in the graph 3.while S is nonempty
a. remove an edge with minimum weight from S
b. if that edge connects two different trees,
then add it to the forest, combining two trees
into a single tree c. otherwise discard that
edge
8
MST algorithms in internal memory
This is our original graph. The numbers near the
arcs indicate their weight. None of the arcs are
highlighted.
9
MST algorithms in internal memory
AD and CE are the shortest arcs, with length 5,
and AD has been arbitrary chosen, so it is
highlighted.
10
MST algorithms in internal memory
CE is now the shortest arc that does not form a
cycle, with length 5, so it is highlighted as the
second arc.
11
MST algorithms in internal memory
The next arc, DF with length 6, is highlighted
using much the same method.
12
MST algorithms in internal memory
The next-shortest arcs are AB and BE, both with
length 7. AB is chosen arbitrarily, and is
highlighted. The arc BD has been highlighted in
red, because there already exists a path (in
green) between B and D, so it would form a cycle
(ABD) if it were chosen.
13
MST algorithms in internal memory
The process continues to highlight the
next-smallest arc, BE with length 7. Many more
arcs are highlighted in red at this stage BC
because it would form the loop BCE, DE because it
would form the loop DEBA, and FE because it would
form FEBAD.
14
MST algorithms in internal memory
Finally, the process finishes with the arc EG of
length 9, and the minimum spanning tree is found.
15
MST algorithms in internal memory

Complexity of Kruskals algorithm O(ElogE)
Proof Not presented here.

16
Outline

What is Minimum Spanning Tree (MST)
MST algorithms in internal memory
The connectivity problems in IO efficient manner
(pre-knowledge for MST in external memory)
IO Efficient MST algorithm

17
Connectivity Problem

Connectivity Problem is to compute number of the
connected parts (components) of a given graph

18
Connectivity Problem

Connectivity Problem is equivalent to the
labeling problem if two vertices are in the same
component, mark them as the same label.

19
SemiExternalConnectivity

SemiExternalConnectivity algorithm assume all
vertices can be loaded to the main memory, i.e.
VltM

Let L(x) denotes the label of x. L(x) is a
number. For each edge (x, y), if L(x)!L(y)
For every vertex m, if L(m)L(x) or L(m)L(y),
then let L(m)minL(x),L(y)
20
SemiExternalConnectivity

Black solid line have been checked.
Dash line have not been checked.
Red solid line be checking

21
SemiExternalConnectivity

Correctness of SemiExternalConnectivity It is
obvious.
Complexity O(VE)

22
FullExternalConnectivity

FullExternalConnectivity algorithm for graph G

If VltM, then apply SemiExternalConnectivity Else
Let wv denotes the smallest neighbor of vertex
w. Create a subgraph H of G that constructed by
edges (w, wv), for all w in G. Compress the
connected part to a single vertex. Recursively
run FullExternalConnectivity algorithm.
23
FullExternalConnectivity

The compress procedure
Lines edges in GRed lines edges in H (in G
also)

24
FullExternalConnectivity

Correctness obvious
Comlexity

25
Outline

What is Minimum Spanning Tree (MST)
MST algorithms in internal memory
The connectivity problems in IO efficient manner
(pre-knowledge for MST in external memory)
IO Efficient MST algorithm

26
IO Efficient MST algorithm

SemiExternal case similar as the connectivity
problem.
FullExternal case similar as the connectivity
problem, too.

27
SemiExternal case

All the vertices can be loaded to memory, i.e.
VltM.
Similar with connectivity problem.

28
SemiExternal case

The modifications every time we just check the
edges with the smallest weight.
S is the result edge set.

Let L(x) denotes the label of x. L(x) is a
number. For the unchecked edge (x, y) with the
smallest weight, if L(x)!L(y) For every
vertex m, if L(m)L(x) or L(m)L(y), then let
L(m)minL(x),L(y), and add (x, y) to S
29
SemiExternal case

Correctness obvious because it is actually the
Kruskal algorithm.
Complexity O(sort(E))

30
FullExternal case

Similar with connectivity problem.
The modification during construction of subgraph
H of G, we use (w, wv) where (w, wv) is the
lowest weighted edges of w, rather than that wv
is ws smallest neighbor.
The final H is the result MST.

31
FullExternal case

The construction of H
Lines edges in GRed lines edges in H (in G
also)

32
FullExternal case

Correctness prove later.
Complexity

There are optimizations but not presented here.
33
Proof of FullExternal

First of all, we need to show that, with such
construction method, the result subgraph H is a
tree.
We will prove this by showing that there will be
no cycles in H.
To simplify the problem, we assume that any two
the weights are different.

34
Proof of FullExternal

If there is a cycle in H
It is obviously that a1 is the lowest weighted
edges of either vertex 1 or vertex 2. Similar to
a2, a3 an
Assume a1lta2, then a2 is not vertex2s lowest
weighted edges, so a2 must be vertex3s. So a2lta3
Since a2lta3, similar as above, we will get a3lta4.
So we have a1lta2lta3ltanlta1, which is impossible.

35
Proof of FullExternal

Further, for a tree T which is an MST of G, we
prove
If not, there must be an edge (v, wv) that in H
but not in T. Further, since H and T are both ST,
the number of edges are the same. So there must
be an edge (x,y) that in T but not in H.

36
Proof of FullExternal

Edge (x, y) is not random selected. W.l.o.g, we
assume that in T, the path from y to v are the
same as in H. (note that y and v may be the same
vertex)

37
Proof of FullExternal

Assume the path from v to y is (a1, a2 an)
where a1v and any. Because (v, wv) is the
lowest weighted edge of v, so (v, wv)lt(a1, a2),
where va1. Similar we have (a1,a2)lt(a2,a3)ltlt(an-
1,an)lt(x,y). So we get (v, wv)lt(x,y)

38
Proof of FullExternal

Then in T, we replace (x,y) by (v, wv), we get
another ST(), whose sum of edges is smaller than
T, which conflicts with the fact that T is an
MST.
So we have
Further, because of the definition of MST, H is
an MST of G

() please note that for a connected graph G, G
is a tree if and only if VE1. So since E
does not change, a tree will remain a tree,
provided that it is still connected.
39
Reference