A sextic algorithm for website design - PowerPoint PPT Presentation

About This Presentation

Title:

A sextic algorithm for website design

Description:

There exists k 1 for any constraint-free instance of CSS under where an optimal ... Theorem: O(n6)-time DP algorithm (x)=x and G is constraint free. Other results: ... – PowerPoint PPT presentation

Number of Views:46

Avg rating:3.0/5.0

Slides: 56

Provided by: brenthe

Learn more at: https://www.cs.williams.edu

Category:

more less

Transcript and Presenter's Notes

Title: A sextic algorithm for website design

1
A sextic algorithm for website design

Brent Heeringa (heeringa_at_cs.umass.edu)
(Joint work with Micah Adler)
21 October 2004
Union College

2
A website design problem(for example a new
kitchen store)

Given products, their popularity, and their
organization
How do we create a good website?
Navigation is natural
Access to information is timely

3
Good website Natural Navigation

Organization is a DAG
TC of DAG enumerates all viable categorical
relationships and introduces shortcuts
Subgraph of TC preserves logical relationship
between categories

TC
A
B
C
A
B
C
4
Good website Timely Access to Info

Two obstacles to finding info quickly
Time scanning a page for correct link
Time descending the DAG
Associate a cost with each obstacle
Page cost (function of out-degree of node)
Path cost (sum of page costs on path)
Good access structure
Minimize expected path cost
Optimal subgraph is always a full tree

1/2
Page Cost links Path Cost 325 Weighted
Path Cost 5/2
5
Constrained Subtree Selection (CSS)

An instance of CSS is a triple (G,?,w)
G is a rooted, DAG with n leaves (constraint
graph)
? is a function of the out-degree of each
internal node (degree cost)
w is a probability distribution over the n
leaves (weights)
A solution is any directed subtree of the
transitive closure of G which includes the root
and leaves
An optimal solution is one which minimizes the
expected path cost

C
B
D
A
1/4
1/4
1/4
1/4
?(x)x
6
Constrained Subtree Selection (CSS)

An instance of CSS is a triple (G,?,w)
G is a rooted, DAG with n leaves (constraint
graph)
? is a function of the out-degree of each
internal node (degree cost)
w is a probability distribution over the n
leaves (weights)
A solution is any directed subtree of the
transitive closure of G which includes the root
and leaves
An optimal solution is one which minimizes the
expected path cost

C
B
D
A
1/4
1/4
1/4
1/4
3(1/4)
?(x)x Cost4
7
Constrained Subtree Selection (CSS)

An instance of CSS is a triple (G,?,w)
G is a rooted, DAG with n leaves (constraint
graph)
? is a function of the out-degree of each
internal node (degree cost)
w is a probability distribution over the n
leaves (weights)
A solution is any directed subtree of the
transitive closure of G which includes the root
and leaves
An optimal solution is one which minimizes the
expected path cost

C
B
D
A
1/4
1/4
1/4
1/4
3(1/4)
5(1/4)
?(x)x Cost4
8
Constrained Subtree Selection (CSS)

An instance of CSS is a triple (G,?,w)
G is a rooted, DAG with n leaves (constraint
graph)
? is a function of the out-degree of each
internal node (degree cost)
w is a probability distribution over the n
leaves (weights)
A solution is any directed subtree of the
transitive closure of G which includes the root
and leaves
An optimal solution is one which minimizes the
expected path cost

C
B
D
A
1/4
1/4
1/4
1/4
3(1/4)
5(1/4)
5(1/4)
?(x)x Cost4
9
Constrained Subtree Selection (CSS)

An instance of CSS is a triple (G,?,w)
G is a rooted, DAG with n leaves (constraint
graph)
? is a function of the out-degree of each
internal node (degree cost)
w is a probability distribution over the n
leaves (weights)
A solution is any directed subtree of the
transitive closure of G which includes the root
and leaves
An optimal solution is one which minimizes the
expected path cost

C
B
D
A
1/4
1/4
1/4
1/4
3(1/4)
5(1/4)
5(1/4)
3(1/4)
?(x)x Cost4
10
Constrained Subtree Selection (CSS)

An instance of CSS is a triple (G,?,w)
G is a rooted, DAG with n leaves (constraint
graph)
? is a function of the out-degree of each
internal node (degree cost)
w is a probability distribution over the n
leaves (weights)
A solution is any directed subtree of the
transitive closure of G which includes the root
and leaves
An optimal solution is one which minimizes the
expected path cost

C
B
D
A
1/4
1/4
1/4
1/4
1/4(3553) 1/4(16) 4
?(x)x Cost4
11
Constrained Subtree Selection (CSS)

An instance of CSS is a triple (G,?,w)
G is a rooted, DAG with n leaves (constraint
graph)
? is a function of the out-degree of each
internal node (degree cost)
w is a probability distribution over the n
leaves (weights)
A solution is any directed subtree of the
transitive closure of G which includes the root
and leaves
An optimal solution is one which minimizes the
expected path cost

C
B
D
A
1/2
1/6
1/6
1/6
?(x)x Cost 3 1/2
12
Constraint-Free Graphs and k-favorability

Constraint-Free Graph
Every directed, full tree with n leaves is a
subtree of the TC
CSS is no longer constrained by the graph
k-favorable degree cost ?
Fix ?. There exists kgt1 for any constraint-free
instance of CSS under ? where an optimal tree
has maximal out-degree k

13
Linear Degree Cost - ?(x)x

5 paths w/ cost 5

3 paths w/ cost 5
2 paths w/ cost 4

Unweighted path costs are all less, so weighted
path costs must all be less
Generalization to ngt6 paths is straightforward

14
Linear Degree Cost - ?(x)x

4 paths w/ cost 4

4 paths w/ cost 4

15
Linear Degree Cost - ?(x)x
gt 1/2

Prefer binary structure when a leaf has at least
half the mass

Prefer ternary structure when mass is
uniformly distributed

CSS with 2-favorable degree costs and C.F.
graphs is Huffman coding problem
Examples quadratic, exp, ceiling of log

16
Results

Complexity NP-Complete for equal weights and
many ?
Sufficient condition on ?
Hardness depends on constraint graph
Highlighted Algorithm
Theorem O(n6)-time DP algorithm
?(x)x and G is constraint free
Other results
Characterizations of optimal trees for uniform
probability distributions
Theorem poly-time constant-approximation
?1 and k-favorable G has constant out-degree
Approximate Hotlink Assignment - Kranakis et.
al

17
Related Work

Adaptive Websites Perkowitz Etzioni
Challenge to the AI community
Novel views of websites Page synthesis problem
Hotlink Assignment Kranakis, Krizanc, Shende,
et. al.
Add 1 hotlink per page to minimize expected
distance from root to leaves
Recently pages have cost proportional to their
size
Hotlinks dont change page cost
Optimal Prefix-Free Codes Golin Rote
Min code for n words with r symbols where symbol
ai has cost ci
Resembles CSS without a constraint graph

18
Dynamic Programming Review

Problems which exhibit
Optimal substructure
An optimal sol. may be written in terms of opt.
solutions to subproblems
Inductive definition
Overlapping subproblems
Different problem instances share subproblems
Repeated computation

19
Dynamic Programming Fib
0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144,
Problem What is the ith Fibonacci number?

Optimal substructure (inductive definition)
Overlapping subproblems
Fib(7) Fib(6) Fib(5) (but Fib(6) calls
Fib(5))
We only need to calculate Fib(5) once
Dont repeat computations
Idea Store solutions to subproblems in a table

Fib(0) 0 Fib(1) 1 Fib(i) Fib(i-1)
Fib(i-2)
20
Dynamic Programming Fib

General Approach
Write inductive definition
Range of parameters in definition defines table
size
Fill in table using definition
Analysis (Table size) ( of lookups)

Fib(0) 0 Fib(1) 1 Fib(i) Fib(i-1)
Fib(i-2)
Fib(14) 0 i 14

Fib(i)
0
1
1
2
3
5
8
144
233
377
i
12 13 14
0 1 2 3 4 5 6
21
Dynamic Programming Subset Sum

Subset Sum (SS) Given a set of n positive
integers X(x1,,xn) and a positive integer T, is
there a subset of X which sums to T?

Example X2, 3, 5, 9, 10, 15, 17 and T28

22
Dynamic Programming Subset Sum

Subset Sum (SS) Given a set of n positive
integers X(x1,,xn) and a positive integer T, is
there a subset of X which sums to T?

Example X2, 3, 5, 9, 10, 15, 17 and T28
Yes 2, 9, 17 and 3, 10, 15

23
Dynamic Programming Subset Sum

Subset Sum (SS) Given a set of n positive
integers X(x1,,xn) and a positive integer T, is
there a subset of X which sums to T?

Example X2, 3, 5, 9, 10, 15, 17 and T28
Yes 2, 9, 17 and 3, 10, 15
Inductive definition

Let Xi (x1,,xi) the first i integers of X
SS(t,i) TRUE if there is a subset of Xi which
sums to t FALSE, otherwise
24
Dynamic Programming Review
The ith element is in the subset
SS(0,i) TRUE SS(t,0) FALSE SS(t,i)
SS(t-xi,i-1) OR SS(t,i-1)
The ith element is not in the subset
T

Parameter Range 0 t T 0 I n

n
(t,i)

Table Size Tn
Each cell (t,i) depends on 2 other cells
O(Tn) time for SS

25
Lopsided Trees

Recall ?(x)x (3-favorable) and G is constraint
free
Node level path cost
Adding an edge increases level
Grow lopsided trees level by level

26
Lopsided Trees
27
Lopsided Trees
28
Lopsided Trees
29
Lopsided Trees

We know exact cost of tree up to the current
level i
Exact cost of m leaves
Remaining n-m leaves must have path-cost at
least i

30
Lopsided Trees Cost

Exact cost of C 3 (1/3)1
Remaining mass up to level 4 (2/3) 4 8/3
Total 18/311/3

31
Lopsided Trees Cost

Tree cost at Level 5 in terms of Tree cost at
Level 4
Add in the mass of remaining leaves
Cost at Level 5
No new leaves
11/32/313/3
Cost updates dont depend on level

32
Lopsided Trees
33
Lopsided Trees
34
Lopsided Trees

Equality on trees
Equal number of leaves at or above frontier
Equal number of leaves at each relative level
below frontier
Nodes have outdegree 3
Node below frontier ?(3)3
(ml1, l2, l3) signature
Example Signature (2 3, 2, 0)
2 C and F are leaves
3 G, H, I are 1 level past the frontier
2 J and K are 2 levels past the frontier
Signature if F is interior node with 3 children?

35
Inductive Definition

Let CSS(m,l1,l2,l3) min cost tree with sig
(ml1, l2, l3)
Can we define CSS(m,l1,l2,l3) in terms of optimal
solutions to subproblems?
Which trees, when grown by one level, have sig
(ml1,l2,l3)?
Which parent sigs (ml1,l2,l3) lead to the
child sigs (ml1,l2,l3)

36
Different Signatures
(2 2, 0, 0)
(0 4, 0, 0)
37
Same Signature (2 0, 2, 3)
Different signatures lead to (2 0, 2, 3)
38
The other direction(which signatures can a tree
grow)
Sig (0 2, 0, 0)

Growing a tree only affects frontier
Only l1 affects next level
Choose of leaves
The remaining nodes are internal
Choose degree-2 (d2)
Remaining nodes are degree-3 (d3)
O(n2) choices

Sig (1 0, 0, 3)
39
The original question(warning here be symbols)

Which (ml1,l2,l3) (ml1,l2,l3)

CHILD
PARENT
40
The original question(warning here be symbols)

Which (ml1,l2,l3) (ml1,l2,l3)
Suppose we know
l1 (the of nodes one level below the frontier)
d2 (the of l1 which are degree-2 interior
nodes in (m,l1,l2,l3))
Lets determine the values of the remaining
variables

1
1
2
2
3
d2 nodes
l1 nodes
3
41
The original question(warning here be symbols)

Which (ml1,l2,l3) (ml1,l2,l3)
Suppose we know
l1 (the of nodes one level below the frontier)
d2 (the of l1 which are degree-2 nodes in
(m,l1,l2,l3))

The old number of leaves
Internal nodes of degree 2
1
2
m m l1 - d2 - d3
3
Nodes at one level below the frontier
Internal nodes of degree 3
The new number of leaves
42
The original question(warning here be symbols)

Which (ml1,l2,l3) (ml1,l2,l3)
Suppose we know
l1 (the of nodes one level below the frontier)
d2 (the of l1 which are degree-2 nodes in
(m,l1,l2,l3))

The old number of leaves
Internal nodes of degree 2
1
m m l1 - d2 - l3/3
2
3
Nodes at one level below the frontier
Internal nodes of degree 3
The new number of leaves
43
The original question(warning here be symbols)

Which (ml1,l2,l3) (ml1,l2,l3)
Suppose we know
l1 (the of nodes one level below the frontier)
d2 (the of l1 which are degree-2 nodes in
(m,l1,l2,l3))

The old number of nodes at 2 levels below the
frontier
New nodes one level below the frontier
l2 l1
44
The original question(warning here be symbols)

Which (ml1,l2,l3) (ml1,l2,l3)
Suppose we know
l1 (the of nodes one level below the frontier)
d2 (the of l1 which are degree-2 nodes in
(m,l1,l2,l3))

The new number of nodes 2 levels below the
frontier
d2 nodes are binary so they contribute 2d2 to the
frontier
l2 l32d2
45
The original question(warning here be symbols)

Which (ml1,l2,l3) (ml1,l2,l3)
l1 and d2 are sufficient
l1 and d2 are both O(n)
O(n2) possibilities for (ml1,l2,l3)
CSS(m,l1,l2,l3) min cost tree with sig. (ml1,
l2, l3)
CSS(m,l1,l2,l3)
cm for 1d2l1n
(cm are the smallest n-m weights)
CSS(n,0,0,0) cost of optimal tree
Analysis
Table size O(n4)
Each cell takes O(n2) lookups
O(n6) algorithm

46
Some Observations

Generalize algorithm
Theorem O(n?(k)k)-time DP algorithm
? is positive, integer-valued, non-decreasing,
k-favorable and G is constraint free
Signatures ?(k)1 vectors
Table size ?(k)1
Each cell requires k-1 lookups

47
(No Transcript)
48
(No Transcript)
49
(No Transcript)
50
(extra slides follow)
51
Motivation and Lower Bound

Many constraint graphs have constant out-degree
Remains NP-Hard for many degree costs
Lemma 1 H(w)/log(k) is a lower bound on the cost
of an optimal tree
For any k-favorable degree cost ?, with ?1
G is constraint-free

T
T
T
1
1
1
1
1
1
1
1
1
C(T) c(T)
c(T) H(w)/log(k)
(shannon)
52
A Simple Lemma

Lemma 2 For any tree with m weighted nodes
there exists 1 node (splitter) which, when
removed, divides the tree into subtrees with at
most half the weight of the original tree.

splitter
lt1/2
lt 1/2
lt 1/2
53
Aproximation Algorithm

Let G be a DAG where out-degree of every node ? d
Choose a spanning tree T from G
Balance-Tree(T)
Find a splitter node in T (Lemma 2)
Stop if splitter is child of root
Disconnect the splitter and reconnect it to the
root
root has degree at most d1
Call Balance-Tree on all subtrees

splitter
Mass of each subtree is at least half of whole
tree
54
Approximation Algorithm

Analysis
Mass under any node is half of mass under its
grandparent
Path length to leaf with weight wi is -2log(wi)
Theorem
O(m)-time O(log(k)?(d1))-approx to optimal
solution
For any DAG G with m nodes and out-degree ? d
For every k-favorable degree cost ? 1,

Upper Bound on Node Cost
Weighted Path Length
55
Open Problems

Theorem There is an for any instance (G,?,w) of
CSS where G is constraint free, ? is
k-favorable, maps the positive integers to the
positive integers and is non-decreasing