A sextic algorithm for website design - PowerPoint PPT Presentation

About This Presentation
Title:

A sextic algorithm for website design

Description:

There exists k 1 for any constraint-free instance of CSS under where an optimal ... Theorem: O(n6)-time DP algorithm (x)=x and G is constraint free. Other results: ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 56
Provided by: brenthe
Category:

less

Transcript and Presenter's Notes

Title: A sextic algorithm for website design


1
A sextic algorithm for website design
  • Brent Heeringa (heeringa_at_cs.umass.edu)
  • (Joint work with Micah Adler)
  • 21 October 2004
  • Union College

2
A website design problem(for example a new
kitchen store)
  • Given products, their popularity, and their
    organization
  • How do we create a good website?
  • Navigation is natural
  • Access to information is timely

3
Good website Natural Navigation
  • Organization is a DAG
  • TC of DAG enumerates all viable categorical
    relationships and introduces shortcuts
  • Subgraph of TC preserves logical relationship
    between categories

TC
A
B
C
A
B
C
4
Good website Timely Access to Info
  • Two obstacles to finding info quickly
  • Time scanning a page for correct link
  • Time descending the DAG
  • Associate a cost with each obstacle
  • Page cost (function of out-degree of node)
  • Path cost (sum of page costs on path)
  • Good access structure
  • Minimize expected path cost
  • Optimal subgraph is always a full tree

1/2
Page Cost links Path Cost 325 Weighted
Path Cost 5/2
5
Constrained Subtree Selection (CSS)
  • An instance of CSS is a triple (G,?,w)
  • G is a rooted, DAG with n leaves (constraint
    graph)
  • ? is a function of the out-degree of each
    internal node (degree cost)
  • w is a probability distribution over the n
    leaves (weights)
  • A solution is any directed subtree of the
    transitive closure of G which includes the root
    and leaves
  • An optimal solution is one which minimizes the
    expected path cost

C
B
D
A
1/4
1/4
1/4
1/4
?(x)x
6
Constrained Subtree Selection (CSS)
  • An instance of CSS is a triple (G,?,w)
  • G is a rooted, DAG with n leaves (constraint
    graph)
  • ? is a function of the out-degree of each
    internal node (degree cost)
  • w is a probability distribution over the n
    leaves (weights)
  • A solution is any directed subtree of the
    transitive closure of G which includes the root
    and leaves
  • An optimal solution is one which minimizes the
    expected path cost

C
B
D
A
1/4
1/4
1/4
1/4
3(1/4)
?(x)x Cost4
7
Constrained Subtree Selection (CSS)
  • An instance of CSS is a triple (G,?,w)
  • G is a rooted, DAG with n leaves (constraint
    graph)
  • ? is a function of the out-degree of each
    internal node (degree cost)
  • w is a probability distribution over the n
    leaves (weights)
  • A solution is any directed subtree of the
    transitive closure of G which includes the root
    and leaves
  • An optimal solution is one which minimizes the
    expected path cost

C
B
D
A
1/4
1/4
1/4
1/4
3(1/4)
5(1/4)
?(x)x Cost4
8
Constrained Subtree Selection (CSS)
  • An instance of CSS is a triple (G,?,w)
  • G is a rooted, DAG with n leaves (constraint
    graph)
  • ? is a function of the out-degree of each
    internal node (degree cost)
  • w is a probability distribution over the n
    leaves (weights)
  • A solution is any directed subtree of the
    transitive closure of G which includes the root
    and leaves
  • An optimal solution is one which minimizes the
    expected path cost

C
B
D
A
1/4
1/4
1/4
1/4
3(1/4)
5(1/4)
5(1/4)
?(x)x Cost4
9
Constrained Subtree Selection (CSS)
  • An instance of CSS is a triple (G,?,w)
  • G is a rooted, DAG with n leaves (constraint
    graph)
  • ? is a function of the out-degree of each
    internal node (degree cost)
  • w is a probability distribution over the n
    leaves (weights)
  • A solution is any directed subtree of the
    transitive closure of G which includes the root
    and leaves
  • An optimal solution is one which minimizes the
    expected path cost

C
B
D
A
1/4
1/4
1/4
1/4
3(1/4)
5(1/4)
5(1/4)
3(1/4)
?(x)x Cost4
10
Constrained Subtree Selection (CSS)
  • An instance of CSS is a triple (G,?,w)
  • G is a rooted, DAG with n leaves (constraint
    graph)
  • ? is a function of the out-degree of each
    internal node (degree cost)
  • w is a probability distribution over the n
    leaves (weights)
  • A solution is any directed subtree of the
    transitive closure of G which includes the root
    and leaves
  • An optimal solution is one which minimizes the
    expected path cost

C
B
D
A
1/4
1/4
1/4
1/4
1/4(3553) 1/4(16) 4
?(x)x Cost4
11
Constrained Subtree Selection (CSS)
  • An instance of CSS is a triple (G,?,w)
  • G is a rooted, DAG with n leaves (constraint
    graph)
  • ? is a function of the out-degree of each
    internal node (degree cost)
  • w is a probability distribution over the n
    leaves (weights)
  • A solution is any directed subtree of the
    transitive closure of G which includes the root
    and leaves
  • An optimal solution is one which minimizes the
    expected path cost

C
B
D
A
1/2
1/6
1/6
1/6
?(x)x Cost 3 1/2
12
Constraint-Free Graphs and k-favorability
  • Constraint-Free Graph
  • Every directed, full tree with n leaves is a
    subtree of the TC
  • CSS is no longer constrained by the graph
  • k-favorable degree cost ?
  • Fix ?. There exists kgt1 for any constraint-free
    instance of CSS under ? where an optimal tree
    has maximal out-degree k

13
Linear Degree Cost - ?(x)x
  • 5 paths w/ cost 5
  • 3 paths w/ cost 5
  • 2 paths w/ cost 4
  • Unweighted path costs are all less, so weighted
    path costs must all be less
  • Generalization to ngt6 paths is straightforward

14
Linear Degree Cost - ?(x)x
  • 4 paths w/ cost 4
  • 4 paths w/ cost 4

15
Linear Degree Cost - ?(x)x
gt 1/2
  • Prefer binary structure when a leaf has at least
  • half the mass
  • Prefer ternary structure when mass is
  • uniformly distributed
  • CSS with 2-favorable degree costs and C.F.
    graphs is Huffman coding problem
  • Examples quadratic, exp, ceiling of log

16
Results
  • Complexity NP-Complete for equal weights and
    many ?
  • Sufficient condition on ?
  • Hardness depends on constraint graph
  • Highlighted Algorithm
  • Theorem O(n6)-time DP algorithm
  • ?(x)x and G is constraint free
  • Other results
  • Characterizations of optimal trees for uniform
    probability distributions
  • Theorem poly-time constant-approximation
  • ?1 and k-favorable G has constant out-degree
  • Approximate Hotlink Assignment - Kranakis et.
    al

17
Related Work
  • Adaptive Websites Perkowitz Etzioni
  • Challenge to the AI community
  • Novel views of websites Page synthesis problem
  • Hotlink Assignment Kranakis, Krizanc, Shende,
    et. al.
  • Add 1 hotlink per page to minimize expected
    distance from root to leaves
  • Recently pages have cost proportional to their
    size
  • Hotlinks dont change page cost
  • Optimal Prefix-Free Codes Golin Rote
  • Min code for n words with r symbols where symbol
    ai has cost ci
  • Resembles CSS without a constraint graph

18
Dynamic Programming Review
  • Problems which exhibit
  • Optimal substructure
  • An optimal sol. may be written in terms of opt.
    solutions to subproblems
  • Inductive definition
  • Overlapping subproblems
  • Different problem instances share subproblems
  • Repeated computation

19
Dynamic Programming Fib
0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144,
Problem What is the ith Fibonacci number?
  • Optimal substructure (inductive definition)
  • Overlapping subproblems
  • Fib(7) Fib(6) Fib(5) (but Fib(6) calls
    Fib(5))
  • We only need to calculate Fib(5) once
  • Dont repeat computations
  • Idea Store solutions to subproblems in a table

Fib(0) 0 Fib(1) 1 Fib(i) Fib(i-1)
Fib(i-2)
20
Dynamic Programming Fib
  • General Approach
  • Write inductive definition
  • Range of parameters in definition defines table
    size
  • Fill in table using definition
  • Analysis (Table size) ( of lookups)

Fib(0) 0 Fib(1) 1 Fib(i) Fib(i-1)
Fib(i-2)
Fib(14) 0 i 14

Fib(i)
0
1
1
2
3
5
8
144
233
377
i
12 13 14
0 1 2 3 4 5 6
21
Dynamic Programming Subset Sum
  • Subset Sum (SS) Given a set of n positive
    integers X(x1,,xn) and a positive integer T, is
    there a subset of X which sums to T?
  • Example X2, 3, 5, 9, 10, 15, 17 and T28

22
Dynamic Programming Subset Sum
  • Subset Sum (SS) Given a set of n positive
    integers X(x1,,xn) and a positive integer T, is
    there a subset of X which sums to T?
  • Example X2, 3, 5, 9, 10, 15, 17 and T28
  • Yes 2, 9, 17 and 3, 10, 15

23
Dynamic Programming Subset Sum
  • Subset Sum (SS) Given a set of n positive
    integers X(x1,,xn) and a positive integer T, is
    there a subset of X which sums to T?
  • Example X2, 3, 5, 9, 10, 15, 17 and T28
  • Yes 2, 9, 17 and 3, 10, 15
  • Inductive definition

Let Xi (x1,,xi) the first i integers of X
SS(t,i) TRUE if there is a subset of Xi which
sums to t FALSE, otherwise
24
Dynamic Programming Review
The ith element is in the subset
SS(0,i) TRUE SS(t,0) FALSE SS(t,i)
SS(t-xi,i-1) OR SS(t,i-1)
The ith element is not in the subset
T

Parameter Range 0 t T 0 I n


n
(t,i)
  • Table Size Tn
  • Each cell (t,i) depends on 2 other cells
  • O(Tn) time for SS

25
Lopsided Trees
  • Recall ?(x)x (3-favorable) and G is constraint
    free
  • Node level path cost
  • Adding an edge increases level
  • Grow lopsided trees level by level

26
Lopsided Trees
27
Lopsided Trees
28
Lopsided Trees
29
Lopsided Trees
  • We know exact cost of tree up to the current
    level i
  • Exact cost of m leaves
  • Remaining n-m leaves must have path-cost at
    least i

30
Lopsided Trees Cost
  • Exact cost of C 3 (1/3)1
  • Remaining mass up to level 4 (2/3) 4 8/3
  • Total 18/311/3

31
Lopsided Trees Cost
  • Tree cost at Level 5 in terms of Tree cost at
    Level 4
  • Add in the mass of remaining leaves
  • Cost at Level 5
  • No new leaves
  • 11/32/313/3
  • Cost updates dont depend on level

32
Lopsided Trees
33
Lopsided Trees
34
Lopsided Trees
  • Equality on trees
  • Equal number of leaves at or above frontier
  • Equal number of leaves at each relative level
    below frontier
  • Nodes have outdegree 3
  • Node below frontier ?(3)3
  • (ml1, l2, l3) signature
  • Example Signature (2 3, 2, 0)
  • 2 C and F are leaves
  • 3 G, H, I are 1 level past the frontier
  • 2 J and K are 2 levels past the frontier
  • Signature if F is interior node with 3 children?

35
Inductive Definition
  • Let CSS(m,l1,l2,l3) min cost tree with sig
    (ml1, l2, l3)
  • Can we define CSS(m,l1,l2,l3) in terms of optimal
    solutions to subproblems?
  • Which trees, when grown by one level, have sig
    (ml1,l2,l3)?
  • Which parent sigs (ml1,l2,l3) lead to the
    child sigs (ml1,l2,l3)

36
Different Signatures
(2 2, 0, 0)
(0 4, 0, 0)
37
Same Signature (2 0, 2, 3)
Different signatures lead to (2 0, 2, 3)
38
The other direction(which signatures can a tree
grow)
Sig (0 2, 0, 0)
  • Growing a tree only affects frontier
  • Only l1 affects next level
  • Choose of leaves
  • The remaining nodes are internal
  • Choose degree-2 (d2)
  • Remaining nodes are degree-3 (d3)
  • O(n2) choices

Sig (1 0, 0, 3)
39
The original question(warning here be symbols)
  • Which (ml1,l2,l3) (ml1,l2,l3)

CHILD
PARENT
40
The original question(warning here be symbols)
  • Which (ml1,l2,l3) (ml1,l2,l3)
  • Suppose we know
  • l1 (the of nodes one level below the frontier)
  • d2 (the of l1 which are degree-2 interior
    nodes in (m,l1,l2,l3))
  • Lets determine the values of the remaining
    variables

1
1
2
2
3
d2 nodes
l1 nodes
3
41
The original question(warning here be symbols)
  • Which (ml1,l2,l3) (ml1,l2,l3)
  • Suppose we know
  • l1 (the of nodes one level below the frontier)
  • d2 (the of l1 which are degree-2 nodes in
    (m,l1,l2,l3))

The old number of leaves
Internal nodes of degree 2
1
2
m m l1 - d2 - d3
3
Nodes at one level below the frontier
Internal nodes of degree 3
The new number of leaves
42
The original question(warning here be symbols)
  • Which (ml1,l2,l3) (ml1,l2,l3)
  • Suppose we know
  • l1 (the of nodes one level below the frontier)
  • d2 (the of l1 which are degree-2 nodes in
    (m,l1,l2,l3))

The old number of leaves
Internal nodes of degree 2
1
m m l1 - d2 - l3/3
2
3
Nodes at one level below the frontier
Internal nodes of degree 3
The new number of leaves
43
The original question(warning here be symbols)
  • Which (ml1,l2,l3) (ml1,l2,l3)
  • Suppose we know
  • l1 (the of nodes one level below the frontier)
  • d2 (the of l1 which are degree-2 nodes in
    (m,l1,l2,l3))

The old number of nodes at 2 levels below the
frontier
New nodes one level below the frontier
l2 l1
44
The original question(warning here be symbols)
  • Which (ml1,l2,l3) (ml1,l2,l3)
  • Suppose we know
  • l1 (the of nodes one level below the frontier)
  • d2 (the of l1 which are degree-2 nodes in
    (m,l1,l2,l3))

The new number of nodes 2 levels below the
frontier
d2 nodes are binary so they contribute 2d2 to the
frontier
l2 l32d2
45
The original question(warning here be symbols)
  • Which (ml1,l2,l3) (ml1,l2,l3)
  • l1 and d2 are sufficient
  • l1 and d2 are both O(n)
  • O(n2) possibilities for (ml1,l2,l3)
  • CSS(m,l1,l2,l3) min cost tree with sig. (ml1,
    l2, l3)
  • CSS(m,l1,l2,l3)
    cm for 1d2l1n
  • (cm are the smallest n-m weights)
  • CSS(n,0,0,0) cost of optimal tree
  • Analysis
  • Table size O(n4)
  • Each cell takes O(n2) lookups
  • O(n6) algorithm

46
Some Observations
  • Generalize algorithm
  • Theorem O(n?(k)k)-time DP algorithm
  • ? is positive, integer-valued, non-decreasing,
    k-favorable and G is constraint free
  • Signatures ?(k)1 vectors
  • Table size ?(k)1
  • Each cell requires k-1 lookups

47
(No Transcript)
48
(No Transcript)
49
(No Transcript)
50
(extra slides follow)
51
Motivation and Lower Bound
  • Many constraint graphs have constant out-degree
  • Remains NP-Hard for many degree costs
  • Lemma 1 H(w)/log(k) is a lower bound on the cost
    of an optimal tree
  • For any k-favorable degree cost ?, with ?1
  • G is constraint-free

T
T
T
1
1
1
1
1
1
1
1
1
C(T) c(T)
c(T) H(w)/log(k)
(shannon)
52
A Simple Lemma
  • Lemma 2 For any tree with m weighted nodes
    there exists 1 node (splitter) which, when
    removed, divides the tree into subtrees with at
    most half the weight of the original tree.

splitter
lt1/2
lt 1/2
lt 1/2
53
Aproximation Algorithm
  • Let G be a DAG where out-degree of every node ? d
  • Choose a spanning tree T from G
  • Balance-Tree(T)
  • Find a splitter node in T (Lemma 2)
  • Stop if splitter is child of root
  • Disconnect the splitter and reconnect it to the
    root
  • root has degree at most d1
  • Call Balance-Tree on all subtrees

splitter
Mass of each subtree is at least half of whole
tree
54
Approximation Algorithm
  • Analysis
  • Mass under any node is half of mass under its
    grandparent
  • Path length to leaf with weight wi is -2log(wi)
  • Theorem
  • O(m)-time O(log(k)?(d1))-approx to optimal
    solution
  • For any DAG G with m nodes and out-degree ? d
  • For every k-favorable degree cost ? 1,

Upper Bound on Node Cost
Weighted Path Length
55
Open Problems
  • Theorem There is an for any instance (G,?,w) of
    CSS where G is constraint free, ? is
    k-favorable, maps the positive integers to the
    positive integers and is non-decreasing

NO
  • Proof
  • c(T) c(T) c(T) H(w)/log(k)
  • T is optimal tree for CSS cost c
  • T is optimal tree for OPC cost c for k symbols
    each with weight 1 (i.e. ?(x)1)
  • H is entropy
Write a Comment
User Comments (0)
About PowerShow.com