Title: Online Topological Ordering
1Online Topological Ordering
- Siddhartha Sen, COS 518
- 11/20/2007
2Outline
- Problem statement and motivation
- Prior work (summary)
- Result by Ajwani et al.
- Algorithm
- Correctness
- Running time
- Implementation
- Comparison to prior work
- Incremental complexity analysis
- Practical implications
- Open problems
- Breaking news
3Problem statement
- Offline or static version (STO)
- Given a DAG G (V,E) (with n ?V? and m ?E?),
find a linear ordering T of its nodes such that
for all directed paths from x ? V to y ? V (x ?
y), T(x) lt T(y), where TV ? 1..n is a
bijective mapping - Online version (DTO)
- Edges of G are not known before hand, but are
revealed one by one - Each time an edge is added to the graph, T must
be updated
4Problem statement
u ? v invalidates topological order
a
b
c
d
u
v
affected region
5Motivation
- Traditional applications
- Online cycle detection in pointer analysis
- Incremental evaluation of computational circuits
- Semantic checking by structure-based editors
- Maintaining dependences between modules during
compilation - Other applications
- Scheduling jobs in grid computing systems, where
dependences arise between the subtasks of a job
6Prior work (summary)
- Offline problem per edge
- for m edges
- Alpern et al. (AHRSZ, 90)
per edge - Marchetti-Spaccamela et al. (MNR, 96)
per edge (amortized) - for m edges
- Pearce and Kelly (PK, 04)
per edge - Katriel and Bodlaender (KB, 05)
.
per edge (amortized) -
for m edges
incremental complexity analysis
7Ajwani et al. (AFM)
- Contributions
- Solves DTO in O(n2.75) time, regardless of the
number of edges m inserted - Uses generic bucket data structure with efficient
support for insert, delete, collect-all - Analysis based on tunable parameter t max
number of nodes in each bucket - Contributions
- Poor discussion of motivating applications
- No insight into how algorithm works or achieves
running time - No intuitive comparison with prior algorithms
(AHRSZ, MNR, etc.)
8Notation
- d(u,v) denotes ?T(u) T(v)?
- u lt v is shorthand for T(u) lt T(v)
- u ? v denotes an edge from u to v
- u ? v means v is reachable from u
9Algorithm AFM
10Algorithm AFM
u ? v invalidates topological order
a
b
c
d
u
v
Call Set A Set B Recursion depth
Reorder(u,v) v , a c , u
11Algorithm AFM
a
b
c
d
u
v
Call Set A Set B Recursion depth
Reorder(c,a) Ø Ø
12Algorithm AFM
c
b
a
d
u
v
Call Set A Set B Recursion depth
Reorder(c,a) Ø Ø
Swap!
13Algorithm AFM
c
b
a
d
u
v
Call Set A Set B Recursion depth
Reorder(u,v) v , a c , u
14Algorithm AFM
c
b
a
d
u
v
Call Set A Set B Recursion depth
Reorder(u,a) a , b u
15Algorithm AFM
c
b
a
d
u
v
Call Set A Set B Recursion depth
Reorder(u,b) Ø Ø
16Algorithm AFM
c
u
a
d
b
v
Call Set A Set B Recursion depth
Reorder(u,b) Ø Ø
Swap!
17Algorithm AFM
c
u
a
d
b
v
Call Set A Set B Recursion depth
Reorder(u,a) a , b u
18Algorithm AFM
c
u
a
d
b
v
Call Set A Set B Recursion depth
Reorder(u,a) Ø Ø
19Algorithm AFM
c
a
u
d
b
v
Call Set A Set B Recursion depth
Reorder(u,a) Ø Ø
Swap!
20Algorithm AFM
c
a
u
d
b
v
Call Set A Set B Recursion depth
Reorder(u,v) v , a c , u
21Algorithm AFM
c
a
u
d
b
v
Call Set A Set B Recursion depth
Reorder(c,v) Ø Ø
22Algorithm AFM
v
a
u
d
b
c
Call Set A Set B Recursion depth
Reorder(c,v) Ø Ø
Swap!
23Algorithm AFM
v
a
u
d
b
c
Call Set A Set B Recursion depth
Reorder(u,v) v , a c , u
24Algorithm AFM
v
a
u
d
b
c
Call Set A Set B Recursion depth
Reorder(u,v) Ø Ø
25Algorithm AFM
u
a
v
d
b
c
Call Set A Set B Recursion depth
Reorder(u,v) Ø Ø
Swap!
26Algorithm AFM
u
a
v
d
b
c
Call Set A Set B Recursion depth
Reorder(u,v) Ø Ø
Done!
27Data structures
- Store T and T-1 as arrays
- O(1) lookup for topological order and inverse
- Graph stored as array of vertices, where each
vertex has two adjacency lists (for
incoming/outgoing edges) - Each adjacency list stored as array of buckets
- Each bucket contains at most t nodes for a fixed
t - i-th bucket of node u contains all adjacent nodes
v with i ? t ? d(u,v) ? (i 1) ? t
28Data structures
- A bucket is any data structure with efficient
support for the following operations - Insert insert an element into a given bucket
- Delete given an element and a bucket, delete the
element from the bucket (if found otherwise,
return 0) - Collect-all copy all elements from a given
bucket to some vector - Analysis assumes a generic bucket data structure
and counts the number of bucket operations - Later, we will consider different implementations
of the data structure and corresponding running
times/space usage
29Correctness
- Theorem 1. Algorithm AFM returns a valid
topological order after each edge insertion. - Lemma 1. Given a DAG G and a valid topological
order, if u ? v and u ? v, then all subsequent
calls to REORDER will maintain u ? v. - Lemma 2. Given a DAG G with v ? y and x ? u, a
call of REORDER(u,v) will ensure that x lt y. - Theorem 2. The algorithm detects a cycle iff
there is a cycle in the given edge sequence.
30Correctness
- Theorem 1. Algorithm AFM returns a valid
topological order after each edge insertion. - Proof use Lemmas 1 and 2.
- For graph with no edges, any ordering is a
topological ordering - Need to show that Insert(u,v) maintains correct
topological order of G G ? (u,v) - If u ? v, this is trivial otherwise,
- Show that x ? y for all nodes x,y of G with x ?
y. If there was a path x ? y in G, Lemma 1 gives
x ? y. Otherwise, x ? y was introduced to G by
(u,v), and Lemma 2 gives x ? y in G since there
is x ? u ? v ? y in G.
31Correctness
- Lemma 1. Given a DAG G and a valid topological
order, if u ? v and u ? v, then all subsequent
calls to Reorder will maintain u ? v. - Proof by contradiction
- Consider the first call of Reorder that leads to
u ? v. Either this led to swapping u and w with w
? v or swapping w and v with w ? u. In the first
case - Call was Reorder(w,u) and A Ø
- However,?? x ? A for which u ? x ? v (since v is
between u and w), leading to a contradiction
32Correctness
- Lemma 2. Given a DAG G with v ? y and x ? u, a
call of Reorder(u,v) will ensure that x lt y. - Proof by induction on recursion depth of
Reorder(u,v) - For leaf nodes, A B Ø. If x ? y before, Lemma
1 ensures x ? y will continue otherwise, x u
and y v and swapping gives x ? y. - Assume lemma is true up to a certain tree level
(show this implies higher levels). If A ? Ø,
there is a v such that v ? v ? y, otherwise v
v y. If B ? Ø, there is a u such that x ? u
? u, otherwise u u x. Hence v ? y ? x ? u. - For loops will call Reorder(u,v), which ensures
x ? y by inductive hypothesis - Lemma 1 ensures further calls to Reorder maintain
x ? y
33Correctness
- Theorem 2. The algorithm detects a cycle iff
there is a cycle in the given edge sequence. - Proof ?
- Within a call to Insert(u,v), there are paths v ?
v and u ? u for each recursive call to
Reorder(u,v) - Trivial for first call and follows by definition
of A and B for subsequent calls - If algorithm detects a cycle in line 1, then we
have v ? v u ? u and adding u ? v completes
the cycle
34Correctness
- Theorem 2. The algorithm detects a cycle iff
there is a cycle in the given edge sequence. - Proof ?, by induction on number of nodes in path
v ? u - Consider edge (u,v) of the cycle v ? u ? v
inserted last. Since v ? u before inserting this
edge, Theorem 1 states that v ? u, so Reorder
(u,v) will be called. - Call of Reorder (u,v) with u v or v ? u
clearly reports a cycle - Consider path v ? x ? y ? u of length k ? 2 and
call to Reorder(u,v). Since v ? x ? y ? u before
the call, x ? A and y ? B, so Reorder(y,x) will
be called. y ? x has k 2 nodes in the path, so
call to Reorder will detect the cycle (by the
inductive hypothesis).
35Algorithm AFM
36Running time
- Theorem 3. Online topological ordering can be
computed using O(n3.5/t) bucket inserts and
deletes, O(n3/t) bucket collect-all operations
collecting O(n2t) elements, and O(n2.5 n2t)
operations for sorting. - Lemma 4. Reorder is called O(n2) times.
- Lemma 5. The summation of ?A? ?B? over all
calls of Reorder is O(n2). - Lemma 6. Calculating the sorted sets A and B over
all calls of Reorder can be done by O(n3/t)
bucket collect-all operations touching a total of
O(n2t) elements and O(n2.5 n2t) operations for
sorting these elements. - Lemma 9. Updating the data structure over all
calls of Reorder requires O(n3.5/t) bucket
inserts and deletes.
37Running time
- Theorem 3. Online topological ordering can be
computed using O(n3.5/t) bucket inserts and
deletes, O(n3/t) bucket collect-all operations
collecting O(n2t) elements, and O(n2.5 n2t)
operations for sorting. - Proof
- Use lemmas 4, 6, and 9. Additionally, show that
merging sets A and B (lines 6-7 in the algorithm)
takes O(n2) time - Merging takes O(?A? ?B?), which is O(n2) over
all calls to Reorder by Lemma 5 finding vertices
in B that exceed the chosen v takes O(the number
of those vertices), which is also the number of
recursive calls to Reorder made. Lemma 4 says the
latter value is O(n2).
38Running time
- Lemma 4. Reorder is called O(n2) times.
- Proof
- Consider the first time Reorder(u,v) is called.
If A B Ø, then u and v are swapped.
Otherwise, Reorder(u,v) is called recursrivelly
for all v ? v ? A and u ? B ? v with u ?
v. The order in which recursive calls are made
and the fact that Reorder is local (only touches
the affected region) ensures that Reorder(u,v) is
not called except as the last recursive call. In
this second call to Reorder(u,v), A B Ø - Consider all v ? A and v ? B from the first
call of Reorder(u,v). Reorder(u,v) and
Reorder(u,v) must have been called by the for
loops before the second call to Reorder(u,v).
Therefore, u ? v and u ? v for all v ? A and
v ? B, so u and v are swapped during the second
call. - Reorder(u,v) will not be called again because u ?
v.
39Running time
- Lemma 9. Updating the data structure over all
calls of REORDER requires O(n3.5/t) bucket
inserts and deletes. - Proof use LP
- Data structure requires O(d(u,v)n/t) bucket
inserts and deletes to swap two nodes u and v. - Need to update adjacency lists of u and v and all
w adjacent to u and/or v. If d(u,v) ? t, build
from scratch in O(n). Otherwise, can show that at
most d(u,v) nodes need to transfer between any
pair of consecutive buckets. This yields a bound
of O(d(u,v)n/t). - Each node pair is swapped at most once (Lemma 7),
so summing up over all calls of REORDER(u,v)
where u and v are swapped, we need O(? d(u,v)n/t)
bucket inserts and deletes. ?d(u,v) O(n2.5) by
Lemma 8, so the result follows.
40Running time
- How to prove ? d(u,v) O(n2.5)?
- Use an LP
- Let T denote the final topological ordering and
- Model some linear constraints on X(i,j)
- 0 ? X(i,j) ? n for all i,j ?1..n
- X(i,j) 0 for all j ? i
- ?j?i X(i,j) ?jlti X(j,i) ? n for all 1 ? i ? n
- Over insertion of all edges, a nodes net
movement right and left in the topological
ordering must be less than n
if and when Reorder(u,v) leads to a
swapping otherwise
41Running time
- Yields the following LP
- And its dual
42Running time
- Which yields the following feasible solution
- This solution has a value of
43Implementation of data structure
- Balanced binary tree gives O(1 log?) time
insert and delete and O(1 ?) collect-all - Total time is O(n2t n3.5 log n/t) by Theorem 3.
Setting t n0.75 (log n)1/2, we get a total time
of O(n2.75 (log n)1/2) and O(n2) space - n-bit array gives O(1) insert and delete and
O(total output size total of deletes)
collect-all operation - Total time is O(n2t n3.5/t). Setting t n0.75
gives O(n2.75) time and O(n2.25) space for
O(n2/t) buckets - Uniform hashing is similar to n-bit array
- O(n2.75) expected time and O(n2) space
44Empirical comparison
- Compared against PK, MNR, and AHRSZ for the
following hard-case graph
45Empirical comparison
46Comparison to prior work
- No insight provided by Ajwani et al.
- Pearce and Kelly compare PK, AHRSZ, and MNR using
incremental complexity analysis - In dynamic problems, typically no fixed input
captures the minimal amount of work to be
performed - Use complexity analysis based on input size
measure work in terms of a paramter ?
representing the (minimal) change in input and
output required - For DTO problem, input is current DAG and
topological order, output after an edge insertion
is updated DAG and (any) valid ordering - Algorithm is bounded if time complexity can be
expressed only in terms of ??? otherwise, it is
unbounded
47Comparison to prior work
- Runtime comparisons
- AHRSZ is bounded by ?Kmin?, the minimal cover of
vertices that are incorrectly ordered after an
edge insertion, plus adjacent edges - PK is bounded by ??uv?, the set of vertices in
the affected region which reach u or are
reachable from v, plus adjacent edges PK is
worst-case optimal wrt number of vertices
reordered - MNR takes ?(???uvF?? ARuv) in the incremental
complexity model, where ARuv is the set of
vertices in the affected region - ?Kmin? ? ??uv? ? ?ARuv?, so AHRSZ is strictly
better than PK, but PK and MNR are more difficult
to compare (former expected to outperform the
latter on sparse graphs) - KB analyzes a variant of AHRSZ
- AFM appears to improve the bound on the time to
insert m edges for AHRSZ
48Comparison to prior work
- Intuitive comparison
- AHRSZ performs simultaneous forward and backward
searches from u and v until the two frontiers
meet nodes with incorrect priorities are placed
in a set and corrected using DFSs in this set - MNR does a similar DFS to discover incorrect
priorities, but visits all nodes in the affected
region during reassignment - PK is similar to MNR but reassigns priorities
using only positions previously held by members
of ?uv - KB and AFM appear to be improvements in the
runtime analysis of variants of AHRSZ
49Comparison to prior work
- Practical implications
- PK and MNR use simpler data structures (arrays)
than AHRSZ (priority queues and Diez and Sleator
ordered list structure) - PK and MNR use simpler traversal algorithms than
AHRSZ - PK visits fewer nodes during reassignments
- Experiments run by Pearce and Kelly
- MNR performs poorly on sparse graphs, but is the
most efficient on dense graphs - PK performs well on very sparse/dense graphs, but
not so well in between - AHRSZ is relatively poor on sparse graphs, but
has constant performance otherwise (competitive
with the others)
50Open problems
- Only lower bound in the problem is ?(n log n) for
inserting n 1 edges, by Ramalingam and Reps
better lower bounds? - Reduce the (wide) gap between best known lower
and upper bounds - Answer does the definition of ? for DTO need to
include adjacent edges? - Does the bounded complexity model capture the
power of amortization? - Include edge deletions in the analysis of AFM or
any of the other algorithms - Perform a theoretical and empirical analysis of a
parallel version of AFM or any of the other
algorithms
51Breaking news
- Kavitha and Mathew improve the upper bound to
O(min?n2.5, (m n log n)m0.5?) - Doesnt appear to be anything wildly unique about
their algorithm - Do a better job of keeping the sizes of sets ?uvF
and ?uvB close to each other
52Thank you