Online Topological Ordering - PowerPoint PPT Presentation

About This Presentation

Title:

Online Topological Ordering

Description:

Given a DAG G and a valid topological order, if u v and u v, then all subsequent ... For graph with no edges, any ordering is a topological ordering ... – PowerPoint PPT presentation

Number of Views:58

Avg rating:3.0/5.0

Slides: 53

Provided by: sid82

Learn more at: https://www.cs.princeton.edu

Category:

more less

Transcript and Presenter's Notes

Title: Online Topological Ordering

1
Online Topological Ordering

Siddhartha Sen, COS 518
11/20/2007

2
Outline

Problem statement and motivation
Prior work (summary)
Result by Ajwani et al.
Algorithm
Correctness
Running time
Implementation
Comparison to prior work
Incremental complexity analysis
Practical implications
Open problems
Breaking news

3
Problem statement

Offline or static version (STO)
Given a DAG G (V,E) (with n ?V? and m ?E?),
find a linear ordering T of its nodes such that
for all directed paths from x ? V to y ? V (x ?
y), T(x) lt T(y), where TV ? 1..n is a
bijective mapping
Online version (DTO)
Edges of G are not known before hand, but are
revealed one by one
Each time an edge is added to the graph, T must
be updated

4
Problem statement
u ? v invalidates topological order
a
b
c
d
u
v
affected region
5
Motivation

Traditional applications
Online cycle detection in pointer analysis
Incremental evaluation of computational circuits
Semantic checking by structure-based editors
Maintaining dependences between modules during
compilation
Other applications
Scheduling jobs in grid computing systems, where
dependences arise between the subtasks of a job

6
Prior work (summary)

Offline problem per edge
for m edges
Alpern et al. (AHRSZ, 90)
per edge
Marchetti-Spaccamela et al. (MNR, 96)
per edge (amortized)
for m edges
Pearce and Kelly (PK, 04)
per edge
Katriel and Bodlaender (KB, 05)
.
per edge (amortized)
for m edges

incremental complexity analysis
7
Ajwani et al. (AFM)

Contributions
Solves DTO in O(n2.75) time, regardless of the
number of edges m inserted
Uses generic bucket data structure with efficient
support for insert, delete, collect-all
Analysis based on tunable parameter t max
number of nodes in each bucket
Contributions
Poor discussion of motivating applications
No insight into how algorithm works or achieves
running time
No intuitive comparison with prior algorithms
(AHRSZ, MNR, etc.)

8
Notation

d(u,v) denotes ?T(u) T(v)?
u lt v is shorthand for T(u) lt T(v)
u ? v denotes an edge from u to v
u ? v means v is reachable from u

9
Algorithm AFM
10
Algorithm AFM
u ? v invalidates topological order
a
b
c
d
u
v
Call Set A Set B Recursion depth
Reorder(u,v) v , a c , u
11
Algorithm AFM
a
b
c
d
u
v
Call Set A Set B Recursion depth
Reorder(c,a) Ø Ø
12
Algorithm AFM
c
b
a
d
u
v
Call Set A Set B Recursion depth
Reorder(c,a) Ø Ø
Swap!
13
Algorithm AFM
c
b
a
d
u
v
Call Set A Set B Recursion depth
Reorder(u,v) v , a c , u
14
Algorithm AFM
c
b
a
d
u
v
Call Set A Set B Recursion depth
Reorder(u,a) a , b u
15
Algorithm AFM
c
b
a
d
u
v
Call Set A Set B Recursion depth
Reorder(u,b) Ø Ø
16
Algorithm AFM
c
u
a
d
b
v
Call Set A Set B Recursion depth
Reorder(u,b) Ø Ø
Swap!
17
Algorithm AFM
c
u
a
d
b
v
Call Set A Set B Recursion depth
Reorder(u,a) a , b u
18
Algorithm AFM
c
u
a
d
b
v
Call Set A Set B Recursion depth
Reorder(u,a) Ø Ø
19
Algorithm AFM
c
a
u
d
b
v
Call Set A Set B Recursion depth
Reorder(u,a) Ø Ø
Swap!
20
Algorithm AFM
c
a
u
d
b
v
Call Set A Set B Recursion depth
Reorder(u,v) v , a c , u
21
Algorithm AFM
c
a
u
d
b
v
Call Set A Set B Recursion depth
Reorder(c,v) Ø Ø
22
Algorithm AFM
v
a
u
d
b
c
Call Set A Set B Recursion depth
Reorder(c,v) Ø Ø
Swap!
23
Algorithm AFM
v
a
u
d
b
c
Call Set A Set B Recursion depth
Reorder(u,v) v , a c , u
24
Algorithm AFM
v
a
u
d
b
c
Call Set A Set B Recursion depth
Reorder(u,v) Ø Ø
25
Algorithm AFM
u
a
v
d
b
c
Call Set A Set B Recursion depth
Reorder(u,v) Ø Ø
Swap!
26
Algorithm AFM
u
a
v
d
b
c
Call Set A Set B Recursion depth
Reorder(u,v) Ø Ø
Done!
27
Data structures

Store T and T-1 as arrays
O(1) lookup for topological order and inverse
Graph stored as array of vertices, where each
vertex has two adjacency lists (for
incoming/outgoing edges)
Each adjacency list stored as array of buckets
Each bucket contains at most t nodes for a fixed
t
i-th bucket of node u contains all adjacent nodes
v with i ? t ? d(u,v) ? (i 1) ? t

28
Data structures

A bucket is any data structure with efficient
support for the following operations
Insert insert an element into a given bucket
Delete given an element and a bucket, delete the
element from the bucket (if found otherwise,
return 0)
Collect-all copy all elements from a given
bucket to some vector
Analysis assumes a generic bucket data structure
and counts the number of bucket operations
Later, we will consider different implementations
of the data structure and corresponding running
times/space usage

29
Correctness

Theorem 1. Algorithm AFM returns a valid
topological order after each edge insertion.
Lemma 1. Given a DAG G and a valid topological
order, if u ? v and u ? v, then all subsequent
calls to REORDER will maintain u ? v.
Lemma 2. Given a DAG G with v ? y and x ? u, a
call of REORDER(u,v) will ensure that x lt y.
Theorem 2. The algorithm detects a cycle iff
there is a cycle in the given edge sequence.

30
Correctness

Theorem 1. Algorithm AFM returns a valid
topological order after each edge insertion.
Proof use Lemmas 1 and 2.
For graph with no edges, any ordering is a
topological ordering
Need to show that Insert(u,v) maintains correct
topological order of G G ? (u,v)
If u ? v, this is trivial otherwise,
Show that x ? y for all nodes x,y of G with x ?
y. If there was a path x ? y in G, Lemma 1 gives
x ? y. Otherwise, x ? y was introduced to G by
(u,v), and Lemma 2 gives x ? y in G since there
is x ? u ? v ? y in G.

31
Correctness

Lemma 1. Given a DAG G and a valid topological
order, if u ? v and u ? v, then all subsequent
calls to Reorder will maintain u ? v.
Proof by contradiction
Consider the first call of Reorder that leads to
u ? v. Either this led to swapping u and w with w
? v or swapping w and v with w ? u. In the first
case
Call was Reorder(w,u) and A Ø
However,?? x ? A for which u ? x ? v (since v is
between u and w), leading to a contradiction

32
Correctness

Lemma 2. Given a DAG G with v ? y and x ? u, a
call of Reorder(u,v) will ensure that x lt y.
Proof by induction on recursion depth of
Reorder(u,v)
For leaf nodes, A B Ø. If x ? y before, Lemma
1 ensures x ? y will continue otherwise, x u
and y v and swapping gives x ? y.
Assume lemma is true up to a certain tree level
(show this implies higher levels). If A ? Ø,
there is a v such that v ? v ? y, otherwise v
v y. If B ? Ø, there is a u such that x ? u
? u, otherwise u u x. Hence v ? y ? x ? u.
For loops will call Reorder(u,v), which ensures
x ? y by inductive hypothesis
Lemma 1 ensures further calls to Reorder maintain
x ? y

33
Correctness

Theorem 2. The algorithm detects a cycle iff
there is a cycle in the given edge sequence.
Proof ?
Within a call to Insert(u,v), there are paths v ?
v and u ? u for each recursive call to
Reorder(u,v)
Trivial for first call and follows by definition
of A and B for subsequent calls
If algorithm detects a cycle in line 1, then we
have v ? v u ? u and adding u ? v completes
the cycle

34
Correctness

Theorem 2. The algorithm detects a cycle iff
there is a cycle in the given edge sequence.
Proof ?, by induction on number of nodes in path
v ? u
Consider edge (u,v) of the cycle v ? u ? v
inserted last. Since v ? u before inserting this
edge, Theorem 1 states that v ? u, so Reorder
(u,v) will be called.
Call of Reorder (u,v) with u v or v ? u
clearly reports a cycle
Consider path v ? x ? y ? u of length k ? 2 and
call to Reorder(u,v). Since v ? x ? y ? u before
the call, x ? A and y ? B, so Reorder(y,x) will
be called. y ? x has k 2 nodes in the path, so
call to Reorder will detect the cycle (by the
inductive hypothesis).

35
Algorithm AFM
36
Running time

Theorem 3. Online topological ordering can be
computed using O(n3.5/t) bucket inserts and
deletes, O(n3/t) bucket collect-all operations
collecting O(n2t) elements, and O(n2.5 n2t)
operations for sorting.
Lemma 4. Reorder is called O(n2) times.
Lemma 5. The summation of ?A? ?B? over all
calls of Reorder is O(n2).
Lemma 6. Calculating the sorted sets A and B over
all calls of Reorder can be done by O(n3/t)
bucket collect-all operations touching a total of
O(n2t) elements and O(n2.5 n2t) operations for
sorting these elements.
Lemma 9. Updating the data structure over all
calls of Reorder requires O(n3.5/t) bucket
inserts and deletes.

37
Running time

Theorem 3. Online topological ordering can be
computed using O(n3.5/t) bucket inserts and
deletes, O(n3/t) bucket collect-all operations
collecting O(n2t) elements, and O(n2.5 n2t)
operations for sorting.
Proof
Use lemmas 4, 6, and 9. Additionally, show that
merging sets A and B (lines 6-7 in the algorithm)
takes O(n2) time
Merging takes O(?A? ?B?), which is O(n2) over
all calls to Reorder by Lemma 5 finding vertices
in B that exceed the chosen v takes O(the number
of those vertices), which is also the number of
recursive calls to Reorder made. Lemma 4 says the
latter value is O(n2).

38
Running time

Lemma 4. Reorder is called O(n2) times.
Proof
Consider the first time Reorder(u,v) is called.
If A B Ø, then u and v are swapped.
Otherwise, Reorder(u,v) is called recursrivelly
for all v ? v ? A and u ? B ? v with u ?
v. The order in which recursive calls are made
and the fact that Reorder is local (only touches
the affected region) ensures that Reorder(u,v) is
not called except as the last recursive call. In
this second call to Reorder(u,v), A B Ø
Consider all v ? A and v ? B from the first
call of Reorder(u,v). Reorder(u,v) and
Reorder(u,v) must have been called by the for
loops before the second call to Reorder(u,v).
Therefore, u ? v and u ? v for all v ? A and
v ? B, so u and v are swapped during the second
call.
Reorder(u,v) will not be called again because u ?
v.

39
Running time

Lemma 9. Updating the data structure over all
calls of REORDER requires O(n3.5/t) bucket
inserts and deletes.
Proof use LP
Data structure requires O(d(u,v)n/t) bucket
inserts and deletes to swap two nodes u and v.
Need to update adjacency lists of u and v and all
w adjacent to u and/or v. If d(u,v) ? t, build
from scratch in O(n). Otherwise, can show that at
most d(u,v) nodes need to transfer between any
pair of consecutive buckets. This yields a bound
of O(d(u,v)n/t).
Each node pair is swapped at most once (Lemma 7),
so summing up over all calls of REORDER(u,v)
where u and v are swapped, we need O(? d(u,v)n/t)
bucket inserts and deletes. ?d(u,v) O(n2.5) by
Lemma 8, so the result follows.

40
Running time

How to prove ? d(u,v) O(n2.5)?
Use an LP
Let T denote the final topological ordering and
Model some linear constraints on X(i,j)
0 ? X(i,j) ? n for all i,j ?1..n
X(i,j) 0 for all j ? i
?j?i X(i,j) ?jlti X(j,i) ? n for all 1 ? i ? n
Over insertion of all edges, a nodes net
movement right and left in the topological
ordering must be less than n

if and when Reorder(u,v) leads to a
swapping otherwise
41
Running time

Yields the following LP
And its dual

42
Running time

Which yields the following feasible solution
This solution has a value of

43
Implementation of data structure

Balanced binary tree gives O(1 log?) time
insert and delete and O(1 ?) collect-all
Total time is O(n2t n3.5 log n/t) by Theorem 3.
Setting t n0.75 (log n)1/2, we get a total time
of O(n2.75 (log n)1/2) and O(n2) space
n-bit array gives O(1) insert and delete and
O(total output size total of deletes)
collect-all operation
Total time is O(n2t n3.5/t). Setting t n0.75
gives O(n2.75) time and O(n2.25) space for
O(n2/t) buckets
Uniform hashing is similar to n-bit array
O(n2.75) expected time and O(n2) space

44
Empirical comparison

Compared against PK, MNR, and AHRSZ for the
following hard-case graph

45
Empirical comparison
46
Comparison to prior work

No insight provided by Ajwani et al.
Pearce and Kelly compare PK, AHRSZ, and MNR using
incremental complexity analysis
In dynamic problems, typically no fixed input
captures the minimal amount of work to be
performed
Use complexity analysis based on input size
measure work in terms of a paramter ?
representing the (minimal) change in input and
output required
For DTO problem, input is current DAG and
topological order, output after an edge insertion
is updated DAG and (any) valid ordering
Algorithm is bounded if time complexity can be
expressed only in terms of ??? otherwise, it is
unbounded

47
Comparison to prior work

Runtime comparisons
AHRSZ is bounded by ?Kmin?, the minimal cover of
vertices that are incorrectly ordered after an
edge insertion, plus adjacent edges
PK is bounded by ??uv?, the set of vertices in
the affected region which reach u or are
reachable from v, plus adjacent edges PK is
worst-case optimal wrt number of vertices
reordered
MNR takes ?(???uvF?? ARuv) in the incremental
complexity model, where ARuv is the set of
vertices in the affected region
?Kmin? ? ??uv? ? ?ARuv?, so AHRSZ is strictly
better than PK, but PK and MNR are more difficult
to compare (former expected to outperform the
latter on sparse graphs)
KB analyzes a variant of AHRSZ
AFM appears to improve the bound on the time to
insert m edges for AHRSZ

48
Comparison to prior work

Intuitive comparison
AHRSZ performs simultaneous forward and backward
searches from u and v until the two frontiers
meet nodes with incorrect priorities are placed
in a set and corrected using DFSs in this set
MNR does a similar DFS to discover incorrect
priorities, but visits all nodes in the affected
region during reassignment
PK is similar to MNR but reassigns priorities
using only positions previously held by members
of ?uv
KB and AFM appear to be improvements in the
runtime analysis of variants of AHRSZ

49
Comparison to prior work

Practical implications
PK and MNR use simpler data structures (arrays)
than AHRSZ (priority queues and Diez and Sleator
ordered list structure)
PK and MNR use simpler traversal algorithms than
AHRSZ
PK visits fewer nodes during reassignments
Experiments run by Pearce and Kelly
MNR performs poorly on sparse graphs, but is the
most efficient on dense graphs
PK performs well on very sparse/dense graphs, but
not so well in between
AHRSZ is relatively poor on sparse graphs, but
has constant performance otherwise (competitive
with the others)

50
Open problems

Only lower bound in the problem is ?(n log n) for
inserting n 1 edges, by Ramalingam and Reps
better lower bounds?
Reduce the (wide) gap between best known lower
and upper bounds
Answer does the definition of ? for DTO need to
include adjacent edges?
Does the bounded complexity model capture the
power of amortization?
Include edge deletions in the analysis of AFM or
any of the other algorithms
Perform a theoretical and empirical analysis of a
parallel version of AFM or any of the other
algorithms

51
Breaking news