This is an attempt at a title - PowerPoint PPT Presentation

About This Presentation
Title:

This is an attempt at a title

Description:

Ryan O Donnell - Microsoft Mike Saks - Rutgers Oded Schramm - Microsoft Rocco Servedio - Columbia – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 33
Provided by: Ryan1159
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: This is an attempt at a title


1
Decision Trees and Influences
Ryan ODonnell - Microsoft
Mike Saks - Rutgers
Oded Schramm - Microsoft
Rocco Servedio - Columbia
2
Part I Decision trees have large influences
3
Printer troubleshooter
Does anything print?
Right size paper?
Can print from Notepad?
Network printer?
Printer mis-setup?
File too complicated?
Solved
Solved
Driver OK?
Driver OK?
Solved
Solved
Call tech support
4
Decision tree complexity
  • f Attr1 Attr2 Attrn ? -1,1.
  • Whats the best DT for f, and how to find it?
  • Depth worst case of questions.
  • Expected depth avg. of questions.

5
Building decision trees
  • Identify the most influential/decisive/releva
    nt variable.
  • Put it at the root.
  • Recursively build DTs for its children.
  • Almost all real-world learning algs based on this
    CART, C4.5,
  • Almost no theoretical (PAC-style) learning algs
    based on this
  • Blum92, KM93, BBVKV97, PTF-folklore, OS04
    no
  • EH89, SJ03 sorta.
  • Conjd to be good for some problems (e.g.,
    percolation SS04) but unprovable

6
Boolean DTs
  • f -1,1n ? -1,1.
  • D(f) min depth of a DT for f.
  • 0 D(f) n.

x1
x2
Maj3
x3
-1
1
-1
1
-1
1
7
Boolean DTs
  • -1,1n viewed as a probability space, with
    uniform probability distribution.
  • uniformly random path down a DT, plus a uniformly
    random setting of the unqueried variables,
    defines a uniformly random input
  • expected depth d(f).

8
Influences
  • influence of coordinate j on f
  • the probability that xj is relevant for f
  • Ij(f) Pr f(x) ? f(x (?j) ) .
  • 0 Ij(f) 1.

9
Main question
  • If a function f has a shallow decision tree,
    does it have a variable with significant
    influence?

10
Main question
  • No.
  • But for a silly reason
  • Suppose f is highly biased say Prf 1 p
    1.
  • Then for any j,
  • Ij(f) Prf(x) 1, f(x(?j)) -1
    Prf(x) -1, f(x(?j)) 1
  • Prf(x) 1 Prf(x(?j))
    1
  • p p
  • 2p.

11
Variance
  • ? Influences are always at most 2 minp,q.
  • Analytically nicer expression Varf.
  • Varf Ef2 Ef2
  • 1 (p q)2 1 (2p - 1)2 4p(1 p)
    4pq.
  • 2 minp,q 4pq 4 minp,q.
  • Its 1 for balanced functions.
  • So Ij(f) Varf, and it is fair to say Ij(f) is
    significant if its a significant fraction of
    Varf.

12
Main question
  • If a function f has a shallow decision
    tree,does it have a variable with influence at
    leasta significant fraction of Varf?

13
Notation
  • t(d) min max Ij(f) /
    Varf .

f D(f) d
j
14
Known lower bounds
  • Suppose f -1,1n ? -1,1.
  • An elementary old inequality states
  • Varf Ij(f).
  • Thus f has a variable with influence at least
    Varf/n.
  • A deep inequality of KKL88 shows there is
    always a coord. j such that Ij(f) Varf
    O(log n / n).
  • If D(f) d then f really has at most 2d
    variables.
  • Hence we get t(d) 1/2d from the first, and t(d)
    O(d/2d) from KKL.

15
Our result
  • t(d) 1/d.
  • This is tight
  • Then VarSEL 1, d 2, all three variables
    have infl. ½.
  • (Form recursive version, SEL(SEL, SEL, SEL) etc.,
    gives Var 1 fcn with d 2h, all influences 2-h
    for any h.)

SEL
16
Our actual main theorem
  • Given a decision tree f, let dj(f) Prtree
    queries xj.
  • Then
  • Varf dj(f) Ij(f).
  • Cor Fix the tree with smallest expected depth.
  • Then dj(f) Edepth of a path d(f)
    D(f).
  • ? Varf max Ij
    dj max Ij d(f)
  • ? max Ij Varf / d(f)
    Varf / D(f).

17
Proof
  • Pick a random path in the tree. This gives some
    set of variables, P (xJ1, , xJT), along with
    an assignment to them, ßP.
  • Call the remaining set of variables P and pick a
    random assignment ßP for them too.
  • Let X be the (uniformly random string) given by
    combining these two assignments, (ßP, ßP).
  • Also, define JT1, , Jn -.

18
Proof
  • Let ßP be an independent random asgn to vbls in
    P.
  • Let Z (ßP, ßP).
  • Note Z is also uniformly random.

JT1 Jn -
xJ1 1
J1
J2
J3
JT
xJ2 1
X (-1, 1, -1, , 1, )
1, -1, 1, -1
xJ3 -1
Z ( , )
1, -1, 1, -1
1,-1, -1, ,-1
xJT 1
1
19
Proof
  • Finally, for t 0T, let Yt be the same string
    as X, except that Zs assignments (ßP) for
    variables xJ1, , xJt are swapped in.
  • Note Y0 X, YT Z.
  • Y0 X (-1, 1, -1, , 1, 1, -1, 1, -1 )
  • Y1 ( 1, 1, -1, , 1, 1, -1, 1, -1 )
  • Y2 ( 1,-1, -1, , 1, 1, -1, 1, -1 )
  • YT Z ( 1,-1, -1, ,-1, 1, -1, 1, -1 )
  • Also define YT1 Yn Z.

20
  • Varf Ef2 Ef2
  • E f(X)f(X) E f(X)f(Z)
  • E f(X)f(Y0) f(X)f(Yn)
  • E f(X) (f(Yt-1) f(Yt))
  • E f(Yt-1) f(Yt)
  • 2 Prf(Yt-1) ? f(Yt)
  • PrJt j 2
    Prf(Yt-1) ? f(Yt) Jt j
  • PrJt j 2
    Prf(Yt-1) ? f(Yt) Jt j

21
Proof
  • PrJt j 2
    Prf(Yt-1) ? f(Yt) Jt j
  • Utterly Crucial Observation
  • Conditioned on Jt j,
  • (Yt-1, Yt) are jointly distributed exactly as
  • (W, W), where W is uniformly random, and W
    is W with jth bit rerandomized.

22
JT1 Jn -
xJ1 1
J1
J2
J3
JT
xJ2 1
X (-1, 1, -1, , 1, )
1, -1, 1, -1
  • Y0 X (-1, 1, -1, , 1, 1, -1, 1, -1 )
  • Y1 ( 1, 1, -1, , 1, 1, -1, 1, -1 )
  • Y2 ( 1,-1, -1, , 1, 1, -1, 1, -1 )
  • YT Z ( 1,-1, -1, ,-1, 1, -1, 1, -1 )

xJ3 1
Z ( , )
1, -1, 1, -1
1,-1, -1, ,-1
xJT 1
1
23
Proof
  • PrJt j 2
    Prf(Yt-1) ? f(Yt) Jt j
  • PrJt j 2
    Prf(W) ? f(W)
  • PrJt j Ij(f)
  • Ij PrJt j
  • Ij dj.

24
Part II Lower bounds for monotone graph
properties
25
Monotone graph properties
v2
  • Consider graphs on v vertices let n ( ).
  • Nontrivial monotone graph property
  • nontrivial property a (nonempty, nonfull)
    subset of all v-vertex graphs
  • graph property closed under permutations of
    the vertices (? no edge is distinguished)
  • monotone adding edges can only put you into the
    property, not take you out
  • e.g. Contains-A-Triangle, Connected,
    Has-Hamiltonian-Path, Non-Planar,
    Has-at-least-n/2-edges,

26
Aanderaa-Karp-Rosenberg conj.
  • Every nontrivial monotone graph propery has D(f)
    n.
  • Rivest-Vuillemin-75 v2/16.
  • Kleitman-Kwiatowski-80 v2/9.
  • Kahn-Saks-Sturtevant-84 n/2, n, if v is a
    prime power.
  • Topology group theory!
  • Yao-88 n in the bipartite case.

27
Randomized DTs
  • Have coin flip nodes in the trees that cost
    nothing.
  • Or, probability distribution over deterministic
    DTs.
  • Note We want both 0-sided error and worst-case
    input.
  • R(f) min, over randomized DTs that compute f
    with 0-error, of max over inputs x, of expected
    of queries.
  • The expectation is only over the DTs internal
    coins.

28
  • D(Maj3) 3.
  • Pick two inputs at random, check if theyre the
    same. If not, check the 3rd.
  • ? R(Maj3) 8/3.
  • Let f recursive-Maj3 Maj3 (Maj3 , Maj3 ,
    Maj3 ), etc
  • For depth-h version (n 3h),
  • D(f) 3h.
  • R(f) (8/3)h.
  • (Not best possible!)

Maj3
29
Randomized AKR / Yao conj.
  • Yao conjectured in 77 that every nontrivial
    monotone graph property f has R(f) O(v2).
  • Lower bound O( ) Who
  • v Yao-77
  • v log 1/12 v Yao-87
  • v5/4 King-88
  • v4/3 Hajnal-91
  • v4/3 log 1/3 v Chakrabarti-Khot-01
  • min v/p, v2/log v Fried.-Kahn-Wigd.-02
  • v4/3 / p1/3 us

30
Outline
  • Extend main inequality to the p-biased case.
    (Then LHS is 1.)
  • Use Yaos minmax principle Show that under
    p-biased -1,1n, d S dj avg queries is
    large for any tree.
  • Main inequality max influence is small ? d is
    large.
  • Graph property ? all vbls have the same
    influence.
  • Hence sum of influences is small ? d is large.
  • OS04 f monotone ? sum of influences vd.
  • Hence sum of influences is large ? d is large.
  • So either way, d is large.

31
Generalizing the inequality
n
  • Varf dj(f) Ij(f).
  • Generalizations (which basically require no proof
    change)
  • holds for randomized DTs
  • holds for randomized subcube partitions
  • holds for functions on any product probability
    space f O1 On ? -1,1 (with
    notion of influence suitably generalized)
  • holds for real-valued functions with (necessary)
    loss of a factor, at most vd

S
j 1
32
Closing thought
  • Its funny that our bound gets stuck roughly at
    the same level as Hajnal / Chakrabarti-Khot, n2/3
    v4/3.
  • Note that n2/3 I believe cannot be improved by
    more than a log factor merely for monotone
    transitive functions, due to BSW04.
  • Thus to get better than v4/3 for monotone graph
    properties, you must use the fact that its a
    graph property.
  • Chakrabarti-Khot does definitely use the fact
    that its a graph property (all sorts of graph
    packing lemmas).
  • Or do they? Since they get stuck at essentially
    v4/3, I wonder if theres any chance their result
    doesnt truly need the fact that its a graph
    property
Write a Comment
User Comments (0)
About PowerShow.com