Title: A sextic algorithm for website design
1A sextic algorithm for website design
- Brent Heeringa (heeringa_at_cs.umass.edu)
- (Joint work with Micah Adler)
- 21 October 2004
- Union College
2A website design problem(for example a new
kitchen store)
- Given products, their popularity, and their
organization - How do we create a good website?
- Navigation is natural
- Access to information is timely
3Good website Natural Navigation
- Organization is a DAG
- TC of DAG enumerates all viable categorical
relationships and introduces shortcuts - Subgraph of TC preserves logical relationship
between categories
TC
A
B
C
A
B
C
4Good website Timely Access to Info
- Two obstacles to finding info quickly
- Time scanning a page for correct link
- Time descending the DAG
- Associate a cost with each obstacle
- Page cost (function of out-degree of node)
- Path cost (sum of page costs on path)
- Good access structure
- Minimize expected path cost
- Optimal subgraph is always a full tree
1/2
Page Cost links Path Cost 325 Weighted
Path Cost 5/2
5Constrained Subtree Selection (CSS)
- An instance of CSS is a triple (G,?,w)
- G is a rooted, DAG with n leaves (constraint
graph) - ? is a function of the out-degree of each
internal node (degree cost) - w is a probability distribution over the n
leaves (weights) - A solution is any directed subtree of the
transitive closure of G which includes the root
and leaves - An optimal solution is one which minimizes the
expected path cost
C
B
D
A
1/4
1/4
1/4
1/4
?(x)x
6Constrained Subtree Selection (CSS)
- An instance of CSS is a triple (G,?,w)
- G is a rooted, DAG with n leaves (constraint
graph) - ? is a function of the out-degree of each
internal node (degree cost) - w is a probability distribution over the n
leaves (weights) - A solution is any directed subtree of the
transitive closure of G which includes the root
and leaves - An optimal solution is one which minimizes the
expected path cost
C
B
D
A
1/4
1/4
1/4
1/4
3(1/4)
?(x)x Cost4
7Constrained Subtree Selection (CSS)
- An instance of CSS is a triple (G,?,w)
- G is a rooted, DAG with n leaves (constraint
graph) - ? is a function of the out-degree of each
internal node (degree cost) - w is a probability distribution over the n
leaves (weights) - A solution is any directed subtree of the
transitive closure of G which includes the root
and leaves - An optimal solution is one which minimizes the
expected path cost
C
B
D
A
1/4
1/4
1/4
1/4
3(1/4)
5(1/4)
?(x)x Cost4
8Constrained Subtree Selection (CSS)
- An instance of CSS is a triple (G,?,w)
- G is a rooted, DAG with n leaves (constraint
graph) - ? is a function of the out-degree of each
internal node (degree cost) - w is a probability distribution over the n
leaves (weights) - A solution is any directed subtree of the
transitive closure of G which includes the root
and leaves - An optimal solution is one which minimizes the
expected path cost
C
B
D
A
1/4
1/4
1/4
1/4
3(1/4)
5(1/4)
5(1/4)
?(x)x Cost4
9Constrained Subtree Selection (CSS)
- An instance of CSS is a triple (G,?,w)
- G is a rooted, DAG with n leaves (constraint
graph) - ? is a function of the out-degree of each
internal node (degree cost) - w is a probability distribution over the n
leaves (weights) - A solution is any directed subtree of the
transitive closure of G which includes the root
and leaves - An optimal solution is one which minimizes the
expected path cost
C
B
D
A
1/4
1/4
1/4
1/4
3(1/4)
5(1/4)
5(1/4)
3(1/4)
?(x)x Cost4
10Constrained Subtree Selection (CSS)
- An instance of CSS is a triple (G,?,w)
- G is a rooted, DAG with n leaves (constraint
graph) - ? is a function of the out-degree of each
internal node (degree cost) - w is a probability distribution over the n
leaves (weights) - A solution is any directed subtree of the
transitive closure of G which includes the root
and leaves - An optimal solution is one which minimizes the
expected path cost
C
B
D
A
1/4
1/4
1/4
1/4
1/4(3553) 1/4(16) 4
?(x)x Cost4
11Constrained Subtree Selection (CSS)
- An instance of CSS is a triple (G,?,w)
- G is a rooted, DAG with n leaves (constraint
graph) - ? is a function of the out-degree of each
internal node (degree cost) - w is a probability distribution over the n
leaves (weights) - A solution is any directed subtree of the
transitive closure of G which includes the root
and leaves - An optimal solution is one which minimizes the
expected path cost
C
B
D
A
1/2
1/6
1/6
1/6
?(x)x Cost 3 1/2
12Constraint-Free Graphs and k-favorability
- Constraint-Free Graph
- Every directed, full tree with n leaves is a
subtree of the TC - CSS is no longer constrained by the graph
- k-favorable degree cost ?
- Fix ?. There exists kgt1 for any constraint-free
instance of CSS under ? where an optimal tree
has maximal out-degree k
13Linear Degree Cost - ?(x)x
- 3 paths w/ cost 5
- 2 paths w/ cost 4
- Unweighted path costs are all less, so weighted
path costs must all be less - Generalization to ngt6 paths is straightforward
14Linear Degree Cost - ?(x)x
15Linear Degree Cost - ?(x)x
gt 1/2
- Prefer binary structure when a leaf has at least
- half the mass
- Prefer ternary structure when mass is
- uniformly distributed
- CSS with 2-favorable degree costs and C.F.
graphs is Huffman coding problem - Examples quadratic, exp, ceiling of log
16Results
- Complexity NP-Complete for equal weights and
many ? - Sufficient condition on ?
- Hardness depends on constraint graph
- Highlighted Algorithm
- Theorem O(n6)-time DP algorithm
- ?(x)x and G is constraint free
- Other results
- Characterizations of optimal trees for uniform
probability distributions - Theorem poly-time constant-approximation
- ?1 and k-favorable G has constant out-degree
- Approximate Hotlink Assignment - Kranakis et.
al
17Related Work
- Adaptive Websites Perkowitz Etzioni
- Challenge to the AI community
- Novel views of websites Page synthesis problem
- Hotlink Assignment Kranakis, Krizanc, Shende,
et. al. - Add 1 hotlink per page to minimize expected
distance from root to leaves - Recently pages have cost proportional to their
size - Hotlinks dont change page cost
- Optimal Prefix-Free Codes Golin Rote
- Min code for n words with r symbols where symbol
ai has cost ci - Resembles CSS without a constraint graph
18Dynamic Programming Review
- Problems which exhibit
- Optimal substructure
- An optimal sol. may be written in terms of opt.
solutions to subproblems - Inductive definition
- Overlapping subproblems
- Different problem instances share subproblems
- Repeated computation
19Dynamic Programming Fib
0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144,
Problem What is the ith Fibonacci number?
- Optimal substructure (inductive definition)
-
- Overlapping subproblems
- Fib(7) Fib(6) Fib(5) (but Fib(6) calls
Fib(5)) - We only need to calculate Fib(5) once
- Dont repeat computations
- Idea Store solutions to subproblems in a table
Fib(0) 0 Fib(1) 1 Fib(i) Fib(i-1)
Fib(i-2)
20Dynamic Programming Fib
- General Approach
- Write inductive definition
- Range of parameters in definition defines table
size - Fill in table using definition
- Analysis (Table size) ( of lookups)
Fib(0) 0 Fib(1) 1 Fib(i) Fib(i-1)
Fib(i-2)
Fib(14) 0 i 14
Fib(i)
0
1
1
2
3
5
8
144
233
377
i
12 13 14
0 1 2 3 4 5 6
21Dynamic Programming Subset Sum
- Subset Sum (SS) Given a set of n positive
integers X(x1,,xn) and a positive integer T, is
there a subset of X which sums to T?
- Example X2, 3, 5, 9, 10, 15, 17 and T28
22Dynamic Programming Subset Sum
- Subset Sum (SS) Given a set of n positive
integers X(x1,,xn) and a positive integer T, is
there a subset of X which sums to T?
- Example X2, 3, 5, 9, 10, 15, 17 and T28
- Yes 2, 9, 17 and 3, 10, 15
23Dynamic Programming Subset Sum
- Subset Sum (SS) Given a set of n positive
integers X(x1,,xn) and a positive integer T, is
there a subset of X which sums to T?
- Example X2, 3, 5, 9, 10, 15, 17 and T28
- Yes 2, 9, 17 and 3, 10, 15
- Inductive definition
Let Xi (x1,,xi) the first i integers of X
SS(t,i) TRUE if there is a subset of Xi which
sums to t FALSE, otherwise
24Dynamic Programming Review
The ith element is in the subset
SS(0,i) TRUE SS(t,0) FALSE SS(t,i)
SS(t-xi,i-1) OR SS(t,i-1)
The ith element is not in the subset
T
Parameter Range 0 t T 0 I n
n
(t,i)
- Table Size Tn
- Each cell (t,i) depends on 2 other cells
- O(Tn) time for SS
25Lopsided Trees
- Recall ?(x)x (3-favorable) and G is constraint
free - Node level path cost
- Adding an edge increases level
- Grow lopsided trees level by level
26Lopsided Trees
27Lopsided Trees
28Lopsided Trees
29Lopsided Trees
- We know exact cost of tree up to the current
level i - Exact cost of m leaves
- Remaining n-m leaves must have path-cost at
least i
30Lopsided Trees Cost
- Exact cost of C 3 (1/3)1
- Remaining mass up to level 4 (2/3) 4 8/3
- Total 18/311/3
31Lopsided Trees Cost
- Tree cost at Level 5 in terms of Tree cost at
Level 4 - Add in the mass of remaining leaves
- Cost at Level 5
- No new leaves
- 11/32/313/3
- Cost updates dont depend on level
32Lopsided Trees
33Lopsided Trees
34Lopsided Trees
- Equality on trees
- Equal number of leaves at or above frontier
- Equal number of leaves at each relative level
below frontier - Nodes have outdegree 3
- Node below frontier ?(3)3
- (ml1, l2, l3) signature
- Example Signature (2 3, 2, 0)
- 2 C and F are leaves
- 3 G, H, I are 1 level past the frontier
- 2 J and K are 2 levels past the frontier
- Signature if F is interior node with 3 children?
35Inductive Definition
- Let CSS(m,l1,l2,l3) min cost tree with sig
(ml1, l2, l3) - Can we define CSS(m,l1,l2,l3) in terms of optimal
solutions to subproblems? - Which trees, when grown by one level, have sig
(ml1,l2,l3)? - Which parent sigs (ml1,l2,l3) lead to the
child sigs (ml1,l2,l3)
36Different Signatures
(2 2, 0, 0)
(0 4, 0, 0)
37Same Signature (2 0, 2, 3)
Different signatures lead to (2 0, 2, 3)
38The other direction(which signatures can a tree
grow)
Sig (0 2, 0, 0)
- Growing a tree only affects frontier
- Only l1 affects next level
- Choose of leaves
- The remaining nodes are internal
- Choose degree-2 (d2)
- Remaining nodes are degree-3 (d3)
- O(n2) choices
Sig (1 0, 0, 3)
39The original question(warning here be symbols)
- Which (ml1,l2,l3) (ml1,l2,l3)
-
CHILD
PARENT
40The original question(warning here be symbols)
- Which (ml1,l2,l3) (ml1,l2,l3)
- Suppose we know
- l1 (the of nodes one level below the frontier)
- d2 (the of l1 which are degree-2 interior
nodes in (m,l1,l2,l3)) - Lets determine the values of the remaining
variables -
1
1
2
2
3
d2 nodes
l1 nodes
3
41The original question(warning here be symbols)
- Which (ml1,l2,l3) (ml1,l2,l3)
- Suppose we know
- l1 (the of nodes one level below the frontier)
- d2 (the of l1 which are degree-2 nodes in
(m,l1,l2,l3))
The old number of leaves
Internal nodes of degree 2
1
2
m m l1 - d2 - d3
3
Nodes at one level below the frontier
Internal nodes of degree 3
The new number of leaves
42The original question(warning here be symbols)
- Which (ml1,l2,l3) (ml1,l2,l3)
- Suppose we know
- l1 (the of nodes one level below the frontier)
- d2 (the of l1 which are degree-2 nodes in
(m,l1,l2,l3))
The old number of leaves
Internal nodes of degree 2
1
m m l1 - d2 - l3/3
2
3
Nodes at one level below the frontier
Internal nodes of degree 3
The new number of leaves
43The original question(warning here be symbols)
- Which (ml1,l2,l3) (ml1,l2,l3)
- Suppose we know
- l1 (the of nodes one level below the frontier)
- d2 (the of l1 which are degree-2 nodes in
(m,l1,l2,l3))
The old number of nodes at 2 levels below the
frontier
New nodes one level below the frontier
l2 l1
44The original question(warning here be symbols)
- Which (ml1,l2,l3) (ml1,l2,l3)
- Suppose we know
- l1 (the of nodes one level below the frontier)
- d2 (the of l1 which are degree-2 nodes in
(m,l1,l2,l3))
The new number of nodes 2 levels below the
frontier
d2 nodes are binary so they contribute 2d2 to the
frontier
l2 l32d2
45The original question(warning here be symbols)
- Which (ml1,l2,l3) (ml1,l2,l3)
- l1 and d2 are sufficient
- l1 and d2 are both O(n)
- O(n2) possibilities for (ml1,l2,l3)
- CSS(m,l1,l2,l3) min cost tree with sig. (ml1,
l2, l3) - CSS(m,l1,l2,l3)
cm for 1d2l1n - (cm are the smallest n-m weights)
- CSS(n,0,0,0) cost of optimal tree
- Analysis
- Table size O(n4)
- Each cell takes O(n2) lookups
- O(n6) algorithm
46Some Observations
- Generalize algorithm
- Theorem O(n?(k)k)-time DP algorithm
- ? is positive, integer-valued, non-decreasing,
k-favorable and G is constraint free - Signatures ?(k)1 vectors
- Table size ?(k)1
- Each cell requires k-1 lookups
47(No Transcript)
48(No Transcript)
49(No Transcript)
50(extra slides follow)
51Motivation and Lower Bound
- Many constraint graphs have constant out-degree
- Remains NP-Hard for many degree costs
- Lemma 1 H(w)/log(k) is a lower bound on the cost
of an optimal tree - For any k-favorable degree cost ?, with ?1
- G is constraint-free
T
T
T
1
1
1
1
1
1
1
1
1
C(T) c(T)
c(T) H(w)/log(k)
(shannon)
52A Simple Lemma
- Lemma 2 For any tree with m weighted nodes
there exists 1 node (splitter) which, when
removed, divides the tree into subtrees with at
most half the weight of the original tree.
splitter
lt1/2
lt 1/2
lt 1/2
53Aproximation Algorithm
- Let G be a DAG where out-degree of every node ? d
- Choose a spanning tree T from G
- Balance-Tree(T)
- Find a splitter node in T (Lemma 2)
- Stop if splitter is child of root
- Disconnect the splitter and reconnect it to the
root - root has degree at most d1
- Call Balance-Tree on all subtrees
splitter
Mass of each subtree is at least half of whole
tree
54Approximation Algorithm
- Analysis
- Mass under any node is half of mass under its
grandparent - Path length to leaf with weight wi is -2log(wi)
- Theorem
- O(m)-time O(log(k)?(d1))-approx to optimal
solution - For any DAG G with m nodes and out-degree ? d
- For every k-favorable degree cost ? 1,
-
Upper Bound on Node Cost
Weighted Path Length
55Open Problems
- Theorem There is an for any instance (G,?,w) of
CSS where G is constraint free, ? is
k-favorable, maps the positive integers to the
positive integers and is non-decreasing
NO
- Proof
- c(T) c(T) c(T) H(w)/log(k)
- T is optimal tree for CSS cost c
- T is optimal tree for OPC cost c for k symbols
each with weight 1 (i.e. ?(x)1) - H is entropy