A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees - PowerPoint PPT Presentation

About This Presentation

Title:

A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees

Description:

A minimum spanning tree is a tree formed from a subset of the edges in a given ... Find a minimum spanning tree for a graph by. linear time with very high probability! ... – PowerPoint PPT presentation

Number of Views:419

Avg rating:3.0/5.0

Slides: 48

Provided by: csieN

Category:

more less

Transcript and Presenter's Notes

Title: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees

1
A Randomized Linear-Time Algorithm to Find
Minimum Spanning Trees

David R. Karger
Philip N. Klein
Robert E. Tarjan

2
Talk Outline

Objective related work from literatures
Intuition
Definitions
Algorithm
Proof Analysis
Conclusion and future work

3
Objective

A minimum spanning tree is a tree formed from a
subset of the edges in a given undirected graph,
with two properties
1. it spans the graph, i.e., it includes every
vertex in the graph, and
2. it is a minimum, i.e., the total weight of
all the edges is as low as possible.

Find a minimum spanning tree for a graph by
linear time with very high probability!!
4
Related Work

Boruvka 1926, textbook algorithms
Yao 1975
Cheriton and Tarjan 1976
Fredman and Tarjan 1987
Gabow 1986
Chazelle 1995

Deterministic results! How about the randomized
one??
5
Intuition

Cycle Property
Cut Property
Randomization

6
Intuition

For any cycle C in a graph, the heaviest edge in
C does not apper in the minimum spanning tree.

7
Cycle Property
8
Cycle Property

For any graph, find all possible cycles and
remove the heaviest edge from each cycle. Then we
can get a minimum spanning tree??
How about the time complexity?
How to detect the cycles in the graph??

9
Cut Property

For any proper nonempty subset X of the
vertices, the lightest edge with exactly one
endpoint in X belongs to the minimum spanning
tree.

10
Cut Property
11
Boruvka Algorithm

For each vertex, select the minimum-weight edge
incident to the vertex. Contract all the selected
edges, replacing by a single vertex each
connected component defined by the selected edges
and deleting all resulting isolated vertices,
loops (edges both of whose endpoints are the
same), and all but the lowest-weight edge among
each set of multiple edges.

O(m log n)
12
Randomization

How the randomization can help us to achieve our
goal?
Boruvka Cycle Property Randomization
Linear time with very high probability

13
Definition
14
Definition

Let G be a graph with weighted edges.
w(x, y) the weight of edge x, y
If F is a forest of a subgraph in G,
F(x, y) the path (if any) connecting x and y
in F, the maximum weight of an edge on
F(x, y), with the convention that
if x and y are not
connected in F.

15
Definition

F-heavy
Otherwise, x, y is F-light.

16
F-heavy F-light
F-heavy
F-light
17
F-heavy F-light

Note that the edges of F are all F-light.
For any forest F, no F-heavy edge can be in the
minimum spanning forest of G.

Cycle Property!!
18
Recursive function call Input A undirected
graph Output A minimum spanning forest Time
for the worst case O(m) with very high
probability
Algorithm
19
Algorithm

Step 1. Apply two successive Boruvka steps to the
graph, thereby reducing the number of vertices by
at least a factor of four.

20
Algorithm
21
Algorithm

Step 2. In the contracted graph, choose a
subgraph H by selecting each edge independently
with probability 1/2. Apply the algorithm
recursively to H, producing a minimum spanning
forest F of H. Find all the F-heavy edges (both
those in H and those not in H) and delete them.

22
Algorithm
Back to analysis
23
Algorithm

Step 3. Apply the algorithm recursively to the
remaining graph to compute a spanning forest
. Return those edges contracted in Step 1
together with the edges of .

24
Algorithm
Back to analysis
25
Algorithm
F - light
Those not in H
F - heavy
Edges of H
26
Analysis

Correctness?
Worst-case time complexity?
Expected time complexity?

27
Correctness

Completeness
By the cut property, every edge contracted
during Step 1 is in the minimum spanning forest.
Hence the remaining edges of the minimum spanning
forest of the original graph form a minimum
spanning forest of the contracted graph.

28
Correctness

Soundness
By the cycle property, the edges deleted in Step
2 do not belong to minimum spanning forest. By
the inductive hypothesis, the minimum spanning
forest of the remaining graph is correctly
determined in recursive call of Step 3.

29
Worst-case time complexity

The worst-case running time of the mini-spanning
forest algorithm is
, the same as the bound for Boruvkas algorithm.
Count the total number of edges. Step 1 reduces
the size to ¼ as its original. A subproblem at
depth d contains at most edges. Summing
all subproblems gives an bound on the
total number of edges.

30
Worst-case time complexity

Parent E(G)
Left child E(H)
Right child
Number of edges in next recursion level
E(G) E(F)
E(G) V(G)/2 V(G)/4

31
Worst-case time complexity
m edges
32
Worst-case time complexity

The total time spent in Steps 1-3 is linear in
the number of edges
Step 1 is just two steps of Boruvkas algorithm.
Step 2 takes linear time using the modified
Dixon-Rauch-Tarjan verification algorithm.
- F-heavy edges of G can be computed in time
linear in the number of edges of G.

33
Analysis

Given graph G with n vertices and m edges
After one Boruvka step, Boruvka step forms
connected components and replaces each by single
vertex. Since each component connects more than 2
edges, there are at most n/2 vertices remained.
For component with k vertices, exactly k 1
edges are removed. Thus the edges removed is at
leastwhere ? is set of connected components.
Since there is at most n/2 components, there is
at least n/2 edges removed.

34
Analysis

Given F MST(H), for (x, y) in H
If (x, y) is in F, (x, y) is F-light
If (x, y) is not in F, assume (x, y) is F-light,
the heaviest edge in cycle P ?(x, y) would be on
P, and is belong to no MST according to cycle
property. This causes contradiction, thus (x, y)
is F-heavy.
Thus, each F-light edge in H is also in F, and
vice versa.

35
Analysis
According to the distribution of edges used by H
and G', edges of F are used twice by calling
MST(H) and MST(G').
36
Analysis
The binary tree represents the recursive
invocation of MST
Left child represents invocation of
MST(H). Right child represents invocation of
MST(G').
Since 2 Boruvka step are performed before
invocation of MST(H) and MST(G'), number of
vertices is reduced in factor of 4. Thus, the
height of invocation is at most log4n.
37
Analysis - Worst Case

Given graph G with m edges and n vertices
After 2 Boruvka steps, at most n/4 vertices and m
n /2 edges remain for G. This is true also for
H and G' which are subgraph of G.
Since F MST(H), F has at most vH 1 edges, and
thus less than n/4.
According to the edge distribution,eH eG' ?
eG eF ? m n/2 vH ?
m n/2 n/4 ? mThus, the number of edges in
subproblems is less than originals.

38
Analysis - Worst Case

Since total edge number of subproblems at the
same depth is bound by m, and the depth is at
most log4n, the overall edge number is at most m
log4n.
Since vertex number for submproblem at depth d is
at most n/4d, the edge is at most (n/4d)2.
Overall edge number is also bound by

Since running time of the algorithm is
proportional to edge number, we could give time
complexity as

O(minn2, m log n)
39
Analysis Average Case
Here, we analyze the average case by partitioning
the invocations as left paths (red paths
above). After reckoning edges of subproblems
along each left path, sum them up and we will
get the overall estimate.
40
Analysis Average Case

For G with k edges, after sampling with 1/2
probability for each edge, EeH k/2.Since G
? G, we have EeG ? E(eG) and
EeH EeG/2 ? EeG/2.
Along the left path with starting EeG k, the
expected value of total edges is

41
Analysis Average Case
Given vG n, F MST(H) where H ? G

For each F-light edge, there is 1/2 probability
of being sampled into H.
Since each F-light edge in H is also in F and F
includes no edges not in H, the chance that an
F-light is in F is also 1/2.
For edge e with weight heavier than the lightest
of F is never F-light since there would be cycle
with e as heaviest edge.
Thus, the heaviest F-light edge is always in F.
Given eFk, eG' is the number trials before k
successes (selected into H), and it forms a
negative binomial distribution.

42
Analysis Average Case
Given vG n, F MST(H) where H ? G

For eF k, eG' is of negative binomial
distribution with parameter 1/2 and k. Thus
EeG' k/(½) 2k.
Summing all cases, we get

43
Analysis Average Case

For all right subproblems, expected sum of edges
is at most

For each left path, the expected total number of
edges is twice of the leading subproblem, which
is root or right child. So the overall expected
value is at most 2(m n).
Since running time is proportional to overall
edge number, so its expected value is O(m) O(m
n).

44
Analysis Probability of Linearity
Chernoff Bound Given xi as i.d.d. random
variables and 0lt i ? n, and X is the sum of all
xi, for t gt 0, we have
Thus, the probability that less than s successes
(each with cance p) within k trail is
45
Analysis Probability of Linearity

Given a path with leading problem G, eG k
For each edge in G, it has 1/2 less chance to be
kept in next subproblem. and each edge-keep
contributes 1 to the total edge number. The path
ends when the k-th edge-move occurs.
The probability there are 3k more total edges is
probability there are k less edge-remove in k3k
trail. According to Chernoff bound, the
probability is exp(-?(k)).

46
Analysis Probability of Linearity

Given vG n'. For each edge in G', it has 1/2
chance to be in F. Since eF n' 1, the
probability that eG' gt 3n' is probability there
are n' - 1 less F edge in 3k trail. According to
Chernoff bound, the probability is exp(-?(n')).
There is at most n/2 total vertices in all G. If
we take all the trail as a whole, the probability
that there are more than 3n/2 edges in all right
subproblem is exp(-?(n)).

47
Analysis Probability of Linearity
Combined with previous two analysis, there is at
least probability as below that total edges never
exceeds 3(m3n/2), where ? is the set of all
right problems

Thus, the probability that time complexity is
O(m) is1 exp(?(m)).

Write a Comment

User Comments (0)