Title: BFS and DFS
1BFS and DFS
- BFS and DFS in directed graphs
- BFS in undirected graphs
- An improved undirected BFS-algorithm
2The Buffered Repository Tree (BRT)
- Stores key-value pairs (k,v)
- Supported operations
- INSERT(k,v) inserts a new pair (k,v) into T
- EXTRACT(k) extracts all pairs with key k
- Complexity
- INSERT O((1/B)log2(N/B)) amortized
- EXTRACT O(log2(N/B) K/B) amortized (K
number of reported elements)
3The Buffered Repository Tree (BRT)
- Leaves store between B/4 and B elements
- Internal nodes have buffers of size B
- Root in main memory, rest on disk
4INSERT(k,v)
- O(X/B) I/Os to empty buffer of size X ? B
- Amortized charge per element and level O(1/B)
- Height of tree O(log2(N/B))
- Insertion cost O((1/B)log2(N/B)) amortized
5EXTRACT(k)
- Number of traversed nodes O(log2(N/B) K/B)
- I/Os per node O(1)
- Cost of operation O(log2(N/B) K/B)
- But careful with removal of extracted elements
Elements with key k
6Cost of Rebalancing
- O(N/B) leaf creations and deletions
- O(N/B) node splits, fusions, merges
- Each such operation costs O(1) I/Os
- O(N/B) I/Os for rebalancing
- Theorem The BRT supports INSERT and EXTRACT
operations in O((1/B)log2(N/B)) andO(log2(N/B)
K/B) I/Os amortized.
7Directed DFS
- Algorithm proceeds as internal memory algorithm
- Use stack to determine order in which vertices
are visited - For current vertex v
- Find unvisited out-neighbor w
- Push w on the stack
- Continue search at w
- If no unvisited out-neighbor exists
- Remove v from stack
- Continue search at vs parent
- Stack operations cost O(N/B) I/Os
- Problem Finding an unvisited vertex
8Directed DFS
- Data structures
- BRT T
- Stores directed edges (v,w) with key v
- Priority queues P(v), one per vertex
- Stores unexplored out-edges of v
- Invariant
Not in P(v) In P(v) and in T In P(v), but not in T
9Directed DFS
- Finding next vertex after vertex v
TotalO((V E/B)log2(E/B))
w
EXTRACT(v) Retrieve red edges from T
O(log2(E/B) K1/B)
O(V log2(E/B) E/B)
Remove these edges from P(v) using DELETE
O(sort(K1))
O(V sort(E))
Retrieve next edge using DELETEMIN on P(v)
O((1/B)logm(E/B))
O(sort(E))
Insert in-edges of w into T
O(1 (K2/B)log2(E/B))
O((E/B)log2(E/B))
Push w on the stack
O(1/B) amortized
O(V/B)
10Directed DFS BFS
- BFS can be solved using same algorithm
- Only modification Use queue (FIFO) instead of
stack - Theorem Depth first-search and breadth-first
search in a directed graph G (V,E) can be
solved in O((VE/B)log2(E/B)) I/Os. - Exercise Convince yourself that the priority
queues P(v) are not necessary in the case of BFS.
11Undirected BFS
Partition graph into levels L(0), L(1),
...around source L(0), L(1), L(2), L(3)
- Observation For v ? L(i), all its neighbors are
inL(i 1) ? L(i) ? L(i 1). - Build BFS-tree level by level
- Initially, L(0) r
- Given levels L(i 1) and L(i)
- Let X(i) set of all neighbors of vertices in
L(i) - Let L(i 1) X(i) \ (L(i 1) ? L(i))
12Undirected BFS
- Constructing L(i 1)
- Retrieve adjacency lists of vertices in L(i) ?
X(i) - Sort X(i)
- Scan L(i 1), L(i), and X(i) to
- Remove duplicates from X(i)
- Compute X(i) \ (L(i 1) ? L(i))
- Complexity O(L(i) sort(L(i 1) X(i)))
I/Os
O( ) I/Os
V
sort(E)
Theorem Breadth-first search in an undirected
graph G (V,E) can be solved in O(V
sort(E)) I/Os.
13A Faster BFS-Algorithm
- Problem with simple BFS-algorithm
- Random accesses to retrieve adjacency lists
- Idea for a faster algorithm
- Load more than one adjacency list at a time
- Reduces number of random accesses
- Causes edges to be involved in more than one
iteration of the algorithm - Trade-off
14A Faster BFS-Algorithm (Randomized)
- Let 0 lt m lt 1 be a parameter (specified later)
- Two phases
- Build mV disjoint clusters of diameter O(1/m)
- Perform modified version of SIMPLEBFS
- Clusters C1,...,Cq formed using BFS from randomly
chosen set V r1,...,rq of masters - Vertex is chosen as a master with probability
m(coin flip) - Observation EV mV. That is, the
expected number of clusters is mV.
15Forming Clusters (Randomized)
s
- Apply SIMPLEBFS to form clusters
- L(0) V
- v ? Ci if v is descendant of ri
16Forming Clusters (Randomized)
- Lemma The expected diameter of a cluster is 2/m.
- Ek ? 1/m
- Corollary The clusters are formed in expected
O((1/m)sort(E)) I/Os.
vk
s
v5
v4
v3
v2
v1
x
17Forming Clusters (Randomized)
- Form files F1,...,Fq, one per clusterFi
concatenation of adjacency lists of vertices in
Ci - Augment every edge (v,w) ? Fi with the start
position of file Fj s.t. w ? Cj - Edge triple (v,w,pj)
s
18The BFS-Phase
- Maintain a sorted pool H of edges s.t. adjacency
lists of vertices in L(i) are contained in H - Scan L(i) and H to find vertices in L(i) whose
adjacency lists are not in H - Form list of start positions of files containing
these adjacency lists and remove duplicates - Retrieve files, sort them, and merge resulting
list H with H - Scan L(i) and H to build X(i)
- Construct L(i 1) from L(i 1), L(i), and X(i)
as before
O((L(i) H)/B)
O(sort(L(i)))
O(K sort(H) H/B)
O((L(i) H)/B)
O(sort(L(i) L(i1) X(i)))
19The BFS-Phase
- I/O-complexity of single step
- O(K H/B sort(H L(i 1) L(i)
X(i))) - Expected I/O-complexityO(mV E/(mB)
sort(E)) - Choose
- Theorem BFS in an undirected graph G (V,E) can
- be solved in
I/Os.
20Single Source Shortest Paths
- The tournament tree
- SSSP in undirected graphs
- SSSP in planar graphs
21Single Source Shortest Paths
- Need
- I/O-efficient priority queue
- I/O-efficient method to update only
unvisited vertices
22The Tournament Tree
- I/O-efficient priority queue
- Supports
- INSERT(x,p)
- DELETE(x)
- DELETEMIN
- DECREASEKEY(x,p)
- All operations take O((1/B)log2(N/B)) I/Os
amortized - Note N size of the universe ? elements in
the tree
23The Tournament Tree
- Static binary tree over all elements in the
universe
- Elements map to leaves, M elements per leaf
- Internal nodes store between M/2 and M elements
- Internal nodes have signal buffers of size M
- Root in main memory, rest on disk
24The Tournament Tree
- Elements stored at each node are sorted by
priority - Elements at node v have smaller priority than
elements at vs descendants - Convention x ? T if and only if p(x) is finite
25The Tournament TreeDeletions
- Operation DELETE(x) ? signal DELETE(x)
x
DELETE(x)
UPDATE(x,?)
26The Tournament TreeInsertions and Updates
- Operations INSERT(x,p) and DECREASEKEY(x,p)?
signal UPDATE(x,p)
x
- All elements lt p
- Forward signal to w
- At least one element ? p
- Insert x
- Send DELETE(x) to w
Current priority p If p lt p Update If p ? p
Do nothing
27The Tournament TreeHandling Overflow
- Let y be element with highest priority py
- Send signal PUSH(y,py) to appropriate child of v
y
28The Tournament TreeKeeping the Nodes Filled
O(M/B) I/Os to move M/2 elements one level up the
tree
29The Tournament TreeSignal Propagation
- Scan vs signal, partition into sets Xu and Xw
- Load u into memory, apply signals in Xu to
u,insert signals into us signal buffer - Do the same for w
- O((X M)/B) O(X/B) I/Os
30The Tournament TreeAnalysis
- Elements travel up the tree
- Cost O(1/B) I/Os amortized per element and level
- O((K/B)log2(N/B)) I/Os for K operations
- Signals travel down the tree
- Cost O(1/B) I/Os amortized per signal and level
- O(K) signals for K operations
- O((K/B)log2(N/B)) I/Os
- Theorem The tournament tree supports INSERT,
DELETE, DELETEMIN, and DECREASEKEY operations in
O((1/B)log2(N/B)) I/Os amortized.
31Single Source Shortest Paths
- Modified Dijkstra
- Retrieve next vertex v from priority queue Q
using DELETEMIN - Retrieve vs adjacency list
- Update distances of all of vs neighbors, except
predecessor u on the path from s to v - Repeat
- O(V (E/B)log2(V/B)) I/Os using tournament tree
32Single Source Shortest Paths
- Problem
- Observation If v performs a spurious update of
u,u has tried to update v before. - Record this update attempt of u on v by
insterting u into another priority queue
QPriority d(s,u) w(u,v)
u
v
33Single Source Shortest Paths
- Second modification
- Retrieve next vertex using two DELETEMINs,one
on Q, one on Q - Let (x,px) be the element retrieved from Q,let
(y,py) be the element retrieved from Q - If px ? py re-insert (y,py) into Q and proceed
as normal - If px lt py re-insert (x,px) into Q and perform a
DELETE(y) on Q
34Single Source Shortest Paths
- Lemma A spurious update is removed from Q before
the targeted vertex can be retrieved using
DELETEMIN. - Event A Spurious update happens (time d(s,v))
- Event B Vertex u is deleted by retrieval of u
from Q (time d(s,u) w(e)) - Event C Vertex u is retrieved from Q using
DELETEMIN operation (time d(s,v) w(e))
u
v
35Single Source Shortest Paths
- Assume that all vertices have different distance
from source s - d(u) lt d(v)
- d(v) ? d(u) w(e) lt d(u) w(e)
- Sequence of events A ? B ? C
- Theorem The single source shortest path problem
on an undirected graph G (V,E) can be solved
inO(V (E/B)log2(V/B)) I/Os.
36Planar Graphs
- Shortest paths in planar graphs
- Planar separators
- Planar DFS
37Shortest Paths in Planar Graphs
s
38Shortest Paths in Planar Graphs
- Observation For every separator vertex v, the
distances from s to v in G and GR are the same. - The distances from s to all separator vertices
can be computed in GR.
v
s
s
v
39Shortest Paths in Planar Graphs
- Observation For every vertex v in Gi,dist(s,v)
mindist(s,x) dist(x,v) v ? ?Gi. - Can compute dist(s,v) in the following graph
v
s
40Shortest Paths in Planar Graphs
- Three main steps
- Solve all-pairs shortest paths in subgraphs Gi
- Compute shortest paths from s to separator
vertices in GR - Compute shortest paths from s to all remaining
vertices
41Shortest Paths in Planar Graphs
- Regular h-partition
- O(N/h) subgraphs G1,...,Gr
- Each Gi has size at most h
- Each Gi has boundary size at most
- Total number of separator vertices
- Number of boundary sets is O(N/h)
42Shortest Paths in Planar Graphs
- Three main steps
- Solve all-pairs shortest paths in subgraphs Gi
- Compute shortest paths from s to separator
vertices in GR - Compute shortest paths from s to all remaining
vertices - Assume the given partition is regular
B2-partition - Steps 1 and 3 take O(scan(N)) I/Os
- Graph GR has O(N/B) vertices and O(N) edges
43Shortest Paths in Planar Graphs
- Data structures
- List L storing tentative distances of all
vertices - Priority queue Q storing vertices with their
tentative distances as priorities - One step
- Retrieve next vertex v using DELETEMIN
- Get distances of vs neighbors from L
- Update their distances in Q using DELETE and
INSERT - O(N sort(N)) I/Os
44Shortest Paths in Planar Graphs
- One I/O per boundary set
- Each boundary set is touched O(B) times
- Once per vertex on the boundary of the region
- O(N/B2) boundary sets ? O(N/B) I/Os
45Planar Separator
- Goal Compute a separator S of size
whose removal partitions G into subgraphs of size
at most h. - Basic idea
- Compute hierarchy of log(DB) graphs of
geometrically decreasing size using graph
contraction - Compute a separator of the smallest graph
- Undo the contractions and maintain the separator
while doing this - Assumption M W(h log2 B)
46Planar Separator
47Planar Separator
- Properties
- All Gi are planar
- Gi1 ? Gi/2
- Every vertex in Gi1 represents only a constant
number of vertices in Gi - Every vertex in Gi1 represents at most 2i2
vertices in G0 - r log2(DB) graphs G0,,Gr
- Gr O(N/(DB))
48Planar Separator
49Planar Separator
- Compute separator Sr of Gr
- Sr Sr? partitions Gr into connected components
of size at most h?log2(DB) - Takes O(Gr) O(N/B) I/Os AD96
50Planar Separator
- Compute Si from Si1
- Let S?i be the set of vertices in Gi represented
by the vertices in Si1 - Connected components of Gi S?i have size at
most c?h?log2(DB) - Partition every connected components of size more
than h?log2(DB) into components of size
h?log2(DB) ? separator Si? - Takes O(sort(Gi)) I/Os
- Connected components O(sort(Gi))
- Partitioning happens in internal memory
- Total O(sort(N)) I/Os
51Planar Separator
- Separator S0 partitions G0 into connected
components of size at most h?log2(DB) - Size of S0
52Planar Separator
- Compute a superset S of S0 so that no connected
component of G S has size more than h - Partition every connected component of G S0
separately in internal memory - Total number of extra separator vertices is
- Extra cost O(sort(N)) I/Os
53Building the Graph Hierarchy
- Properties
- All Gi are planar
- Gi1 ? Gi/2
- Every vertex in Gi1 represents only a constant
number of vertices in Gi - Every vertex in Gi1 represents at most 2i2
vertices in G0
- Build Gi1 from Gi by
- Contracting edges
- Merging vertices of degree 2 with the same
neighbors
54Building the Graph Hierarchy
- Iterative approach
- Extract set of edges that can be contracted
- Contract subset of these edges to reduce number
of vertices by a factor of two - Repeat until no contractible edges remain
- Problem
- Standard graph contraction procedure may contract
too many vertices into a single vertex.
55Building the Graph Hierarchy
- Solution
- Compute maximal matching of contractible subgraph
- Contract edges in the matching
- New problem
- We may not contract sufficient number of edges to
reduce number of vertices by a constant factor
- Two-stage contraction
- Contract maximal matching
- Contract edges between matched and unmatched
vertices
56Building the Graph Hierarchy
- Why is this two-stage approach good?
- No unmatched vertex remains in contractible
subgraph - Every matched vertex represents at least two
vertices before the contraction - Size of graph reduces by a factor of two
- If a single iteration takes O(sort(Gi)) I/Os,
the whole construction of Gi1 from Gi
takesO(sort(Gi)) I/Os
57A Single Contraction Phase
- Maximal matching can be computed and contracted
in O(sort(H)) I/Os, where H is the current
contractible subgraph - Bipartite contraction
- Takes O(sort(H)) I/Os using buffer tree as
priority queue
58Building the Graph Hierarchy
- Lemma Graph Gi1 can be constructed from Gi in
O(sort(Gi)) I/Os. - Corollary The whole graph hierarchy can be built
in O(sort(G0)) O(sort(N)) I/Os.
59Planar DFS
60Planar DFS
s
61Planar DFS
- Observation Every cycle in the i-th layer is a
boundary cycle of graph Gi.
- Every bicomp of a layer is a cycle.
62DFS in a Layer
63Planar DFS
- DFS in a single layer Hi takes O(sort(Hi))
I/Os - Compute the bicomps
- Root the bicomp tree
- Remove one of the edges incident to parent
cutpoint in each cycle - Total I/O-complexity O(sort(N))
64Planar DFS
65Planar DFS
66Building the Face-on-Vertex Graph
67Lower Bounds and Open Problems
- Lower bounds
- List ranking, BFS, DFS, and shortest paths
- Connected and biconnected components
- Open problems
68Lower BoundsSplit Proximate Neighbors
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
8
69Lower BoundsSplit Proximate Neighbors
- Lemma Split proximate neighbors requires
W(perm(N)) I/Os.
- Total O(I(N) scan(N)) O(I(N))
- I(N) W(perm(N))
70Lower BoundsList Ranking
- Consider general algorithms for weighted list
ranking - Algorithm is only allowed to use associativity of
sum operator - Algorithm can be made to have the following
property - For every vertex v, v and succ(v) are both in
main memory at some point during the course of
the algorithm - Note The lower bound we show does not hold for
unweighted list ranking or weighted list ranking
over groups.
71Lower BoundsList Ranking
- When both copies of x are in main memory, move to
buffer of size B - When buffer full, flush to disk
- Split proximate neighbors could be solved
inO(I(N) scan(N)) I/Os - I(N) W(perm(N))
72Lower BoundsList Ranking, BFS, DFS, and Shortest
Paths
- Theorem List ranking requires W(perm(N)) I/Os.
- List ranking can be solved using BFS, DFS, or
SSSP from the head of the list. - Theorem BFS, DFS, and SSSP require W(perm(N))
I/Os. - Note Again, lower bound holds only for
algorithms that compute distances from source
only by adding path lengths.
73Lower BoundsSegmented Duplicate Elimination
- Let P ? N ? P2
- Elements drawn from interval 2P1,3P
- Construct Boolean array C2P1..3P s.t.Ci 1
iff i ? S - Proposition Segmented duplicate elimination
requires W(perm(N)) I/Os.
S
17
18
19
20
22
23
19
19
20
20
22
20
18
23
17
19
P/2
P/2
P/2
P/2
74Lower BoundsConnected Components
17
18
19
20
22
23
19
19
20
20
22
20
18
23
17
19
S1
S2
S3
S4
17
1
18
19
2
20
21
3
22
23
4
24
- Graph construction O(scan(N)) I/Os
- V Q(P), E N
75Lower BoundsConnected and Biconnected Components
- Theorem Computing the connected components of a
graph G (V,E) requires W(perm(E)) I/Os.
Theorem Computing the biconnected components of
a graph G (V,E) requires W(perm(E)) I/Os.
76More Classes of Sparse Graphs
- Grid graphs
- Separators Size in O(sort(N))
I/Os - BFS/SSSP O(sort(N))
- DFS
- Graphs of bounded treewidth
- Separators O(N/h) in O(sort(N)) I/Os
- BFS/SSSP O(sort(N))
- DFS ???
77Open Problems
- Optimal separators for grid graphs
- DFS
- Grid graphs
- Graphs of bounded treewidth
- Semi-external shortest paths
- Optimal connectivity
- Optimal BFS, DFS, and shortest paths or lower
bounds - Directed graphs
- Topological sorting
- Strongly connected components