Correlation Immune Functions and Learning - PowerPoint PPT Presentation

About This Presentation
Title:

Correlation Immune Functions and Learning

Description:

Includes joint work with Bernard Rosell (AT&T), Eric Bach and David Page (U. of ... L. M. Batten and J. Seberry, Eds. Lecture Notes In Computer Science, vol. 2384. ... – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 50
Provided by: dimacsR
Category:

less

Transcript and Presenter's Notes

Title: Correlation Immune Functions and Learning


1
Correlation Immune Functions and Learning
  • Lisa Hellerstein
  • Polytechnic Institute of NYU
  • Brooklyn, NY
  • Includes joint work with Bernard Rosell (ATT),
    Eric Bach and David Page (U. of Wisconsin), and
    Soumya Ray (Case Western)

2
Identifying relevant variables from random
examples

x
f(x) (1,1,0,0,0,1,1,0,1,0)
1 (0,1,0,0,1,0,1,1,0,1) 1 (1,0,0,1,0,1,0,0,1
,0) 0
3
Technicalities
  • Assume random examples drawn from uniform
    distribution over 0,1n
  • Have access to source of random examples

4
Detecting that a variable is relevant
  • Look for dependence between input variables and
    output
  • If xi irrelevant P(f1xi1)
    P(f1xi0)

  • If xi relevant P(f1xi1) ?
    P(f1xi0)
  • for previous
    function f

5
Unfortunately
xi relevant P(f1xi1) 1/2
P(f1xi0) xi irrelevant
P(f1xi1) 1/2 P(f1xi0)
Finding a relevant variable easy for some
functions. Not so easy for others.
6
How to find the relevant variables
  • Suppose you know r ( of relevant vars)
  • Assume r ltlt n
  • (Think of r log n)
  • Get m random examples, where
  • m poly(2r ,log n,1/d)
  • With probability gt 1-d, have enough info to
    determine which r variables are relevant
  • All other sets of r variables can be ruled out

7
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
f (1, 1, 0, 1, 1, 0, 1, 0, 1, 0)
1 (0, 1, 1, 1, 1, 0, 1, 1, 0, 0) 0 (1, 1,
1, 0, 0, 0, 0, 0, 0, 0) 1 (0, 0, 0, 1, 1,
0, 0, 0, 0, 0) 0 (1, 1, 1, 0, 0, 0, 1, 1,
1, 1) 0
8
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
f (1, 1, 0, 1, 1, 0, 1, 0, 1, 0)
1 (0, 1, 1, 1, 1, 0, 1, 1, 0, 0) 0 (1, 1,
1, 0, 0, 0, 0, 0, 0, 0) 1 (0, 0, 0, 1, 1,
0, 0, 0, 0, 0) 0 (1, 1, 1, 0, 0, 0, 1, 1,
0, 1) 0
9
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
f (1, 1, 0, 1, 1, 0, 1, 0, 1, 0)
1 (0, 1, 1, 1, 1, 0, 1, 1, 0, 0) 0 (1, 1,
1, 0, 0, 0, 0, 0, 0, 0) 1 (0, 0, 0, 1, 1,
0, 0, 0, 0, 0) 0 (1, 1, 1, 0, 0, 0, 1, 1,
0, 1) 0
x3, x5, x9 cant be the relevant variables
10
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
f (1, 1, 0, 1, 1, 0, 1, 0, 1, 0)
1 (0, 1, 1, 1, 1, 0, 1, 1, 0, 0) 0 (1, 1,
1, 0, 0, 0, 0, 0, 0, 0) 1 (0, 0, 0, 1, 1,
0, 0, 0, 0, 0) 0 (1, 1, 1, 0, 0, 0, 1, 1,
1, 1) 0
x1, x3, x10 ok
11
  • Naïve algorithm Try all combinations of r
    variables. Time nr
  • Mossel, ODonnell, Servedio STOC 2003
  • Algorithm that takes time ncr where c .704
  • Subroutine Find a single relevant variable
  • Still open Can this bound be improved?

12
  • If output of f is dependent on xi, can detect
    dependence (whp) in time poly(n, 2r) and identify
    xi as relevant.
  • Problematic Functions
  • Every variable is independent of output of f
  • Pf1xi0 Pf1xi1 for all
    xi
  • Equivalently, all degree 1 Fourier coeffs 0
  • Functions with this property said to be
  • CORRELATION-IMMUNE

13
  • Pf1xi0 Pf1xi1 for all xi
  • Geometrically

11
10
e.g. n2
01
00
14
  • Pf1xi0 Pf1xi1 for all xi
  • Geometrically

0
1
11
10
Parity(x1,x2)
01
00
0
1
15
  • Pf1xi0 Pf1xi1 for all xi
  • Geometrically

0
1
11
10
X11
X10
01
00
0
1
16
  • Pf1xi0 Pf1xi1 for all xi

X20
X21
0
1
11
10
01
00
0
1
17
  • Other correlation-immune functions besides
    parity?
  • f(x1,,xn) 1 iff x1 x2 xn

18
  • Other correlation-immune functions besides
    parity?
  • All reflexive functions

19
  • Other correlation-immune functions besides
    parity?
  • All reflexive functions
  • More

20
Correlation-immune functions and decision tree
learners
  • Decision tree learners in ML
  • Popular machine learning approach (CART, C4.5)
  • Given set of examples of Boolean function, build
    a decision tree
  • Heuristics for decision tree learning
  • Greedy, top-down
  • Differ in way choose which variable to put in
    node
  • Pick variable having highest gain
  • Pf1xi1 Pf1xi0 means 0 gain
  • Correlation-immune functions problematic for
    decision tree learners

21
  • Lookahead
  • Skewing An efficient alternative to lookahead
    for decision tree induction. IJCAI 2003 Page,
    Ray
  • Why skewing works learning difficult Boolean
    functions with greedy tree learners. ICML 2005
    Rosell, Hellerstein, Ray, Page

22
Story Part One
23
  • How many difficult functions?
  • More than

n
fns
n-1 2 2
24
  • How many different hard functions?
  • More than
  • SOMEONE MUST HAVE STUDIED THESE FUNCTIONS BEFORE

n
fns
n/2 2 2
25
(No Transcript)
26
(No Transcript)
27
Story Part Two
28
  • I had lunch with Eric Bach

29
  • Roy, B. K. 2002. A Brief Outline of Research on
    Correlation Immune Functions. In Proceedings of
    the 7th Australian Conference on information
    Security and Privacy (July 03 - 05, 2002). L. M.
    Batten and J. Seberry, Eds. Lecture Notes In
    Computer Science, vol. 2384. Springer-Verlag,
    London, 379-394.

30
Correlation-immune functions
  • k-correlation immune function
  • For every subset S of the input variables s.t.
  • 1 S k
  • Pf S Pf
  • Xiao, Massey 1988 Equivalently, all Fourier
    coefficients of degree i are 0, for
  • 1 i k

31
  • Siegenthalers Theorem
  • If f is k-correlation immune, then the GF2
    polynomial for f has degree at most n-k.

32
  • Siegenthalers Theorem 1984
  • If f is k-correlation immune, then the GF2
    polynomial for f has degree at most n-k.
  • Algorithm of Mossel, ODonnell, Servedio STOC
    2003 based on this theorem

33
End of Story
34
Non-uniform distributions
  • Correlation-immune functions are defined wrt the
    uniform distribution
  • What if distribution is biased?
  • e.g. each bit 1 with probability ¾

35
f(x1,x2) parity(x1,x2)each bit 1 with
probability 3/4
Pf1x11 ? Pf1x10
36
f(x1,x2) parity(x1,x2)p1 with probability 1/4
Pf1x11 ? Pf1x10
For added irrelevant variables, would be equal
37
Correlation-immunity wrt p-biased distributions
  • Definitions
  • f is correlation-immune wrt distribution D if
  • PDf1xi1 PDf1xi0
  • for all xi
  • p-biased distribution Dp each bit set to 1
    independently with probability p
  • For all p-biased distributions D,
  • PDf1xi1 PDf1xi0
  • for all irrelevant xi

38
  • Lemma Let f(x1,,xn) be a Boolean function with
    r relevant variables. Then f is correlation
    immune w.r.t. Dp for at most r-1 values of p.
  • Pf Correlation immune wrt Dp means
  • Pf1xi1 Pf1xi0 0 ()
  • for all xi.
  • Consider fixed f and xi. Can write lhs of ()
  • as polynomial h(p).

39
  • e.g. f(x1,x2, x3) parity(x1,x2, x3)p-biased
    distribution Dp
  • h(p) PDpf1x11 - PDpf1x10
  • ( p2 p(1-p) ) ( p(1-p) (1-p)p )
  • If add irrelevant variable, this polynomial
    doesnt change
  • h(p) for arbitrary f, variable xi, has degree lt
    r-1, where r is number of variables.
  • f correlation-immune wrt at most r-1 values of p,
    unless h(p) identically 0 for all xi.

40
  • h(p) PDpf1xi1 -PDpf1xi0
  • where wd is number of inputs x for which
    f(x)1, xi1, and x contains exactly d additional
    1s.
  • i.e. wd number of positive assignments of
    fxilt-1 of Hamming weight d
  • Similar expression for PDpf1xi0

41
  • PDpf1xi1 - PDpf1xi0
  • where wd number of positive assignments of
    fxilt-1 of Hamming weight d
  • rd number of positive assignments of
    fxilt-0 of Hamming weight d
  • Not identically 0 iff wd ? rd for some d

42
Property of Boolean functions
  • Lemma If f has at least one relevant variable,
    then for some relevant variable xi, and some d,
  • wd ? rd for some d
  • where
  • wd number of positive assignments of fxilt-1 of
    Hamming weight d
  • rd number of positive assignments of fxilt-0 of
    Hamming weight d

43
  • How much does it help to have access to examples
    from different distributions?

44
  • How much does it help to have access to examples
    from different distributions?
  • Hellerstein, Rosell, Bach, Page, Ray
  • Exploiting Product Distributions to Identify
    Relevant Variables of Correlation Immune
    Functions

Exploiting Product Distributions to Identify
Relevant Variables of Correlation Immune
Functions Hellerstein, Rosell, Bach, Ray, Page
45
  • Even if f is not correlation-immune wrt Dp, may
    need very large sample to detect relevant
    variable
  • if value of p very near root of h(p)
  • Lemma If h(p) not identically 0, then for some
    value of p in the set
  • 1/(r1),2/(r1),3/(r1), (r1)/(r1) ,
  • h(p) 1/(r1)r-1

46
  • Algorithm to find a relevant variable
  • Uses examples from distributions Dp, for
  • p 1/(r1),2/(r1),3/(r1), (r1)/(r1)
  • sample size poly((r1) r, log n, log 1/d)
  • Essentially same algorithm found independently
    by Arpe and Mossel, using very different
    techniques
  • Another algorithm to find a relevant variable
  • Based on proving (roughly) that if choose random
    p, then h2(p) likely to be reasonably large.
    Uses prime number theorem.
  • Uses examples from poly(2r, log 1/ d)
    distributions Dp.
  • Sample size poly(2r, log n, log 1/ d)

47
  • Better algorithms?

48
Summary
  • Finding relevant variables (junta-learning)
  • Correlation-immune functions
  • Learning from p-biased distributions

49
Moral of the Story
  • Handbook of integer sequences can be useful in
    doing literature search
  • Eating lunch with the right person can be much
    more useful
Write a Comment
User Comments (0)
About PowerShow.com