Correlation Immune Functions and Learning - PowerPoint PPT Presentation

About This Presentation

Title:

Correlation Immune Functions and Learning

Description:

Includes joint work with Bernard Rosell (AT&T), Eric Bach and David Page (U. of ... L. M. Batten and J. Seberry, Eds. Lecture Notes In Computer Science, vol. 2384. ... – PowerPoint PPT presentation

Number of Views:81

Avg rating:3.0/5.0

Slides: 50

Provided by: dimacsR

Learn more at: http://archive.dimacs.rutgers.edu

Category:

more less

Transcript and Presenter's Notes

Title: Correlation Immune Functions and Learning

1
Correlation Immune Functions and Learning

Lisa Hellerstein
Polytechnic Institute of NYU
Brooklyn, NY
Includes joint work with Bernard Rosell (ATT),
Eric Bach and David Page (U. of Wisconsin), and
Soumya Ray (Case Western)

2
Identifying relevant variables from random
examples

x
f(x) (1,1,0,0,0,1,1,0,1,0)
1 (0,1,0,0,1,0,1,1,0,1) 1 (1,0,0,1,0,1,0,0,1
,0) 0
3
Technicalities

Assume random examples drawn from uniform
distribution over 0,1n
Have access to source of random examples

4
Detecting that a variable is relevant

Look for dependence between input variables and
output
If xi irrelevant P(f1xi1)
P(f1xi0)
If xi relevant P(f1xi1) ?
P(f1xi0)
for previous
function f

5
Unfortunately
xi relevant P(f1xi1) 1/2
P(f1xi0) xi irrelevant
P(f1xi1) 1/2 P(f1xi0)
Finding a relevant variable easy for some
functions. Not so easy for others.
6
How to find the relevant variables

Suppose you know r ( of relevant vars)
Assume r ltlt n
(Think of r log n)
Get m random examples, where
m poly(2r ,log n,1/d)
With probability gt 1-d, have enough info to
determine which r variables are relevant
All other sets of r variables can be ruled out

7
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
f (1, 1, 0, 1, 1, 0, 1, 0, 1, 0)
1 (0, 1, 1, 1, 1, 0, 1, 1, 0, 0) 0 (1, 1,
1, 0, 0, 0, 0, 0, 0, 0) 1 (0, 0, 0, 1, 1,
0, 0, 0, 0, 0) 0 (1, 1, 1, 0, 0, 0, 1, 1,
1, 1) 0
8
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
f (1, 1, 0, 1, 1, 0, 1, 0, 1, 0)
1 (0, 1, 1, 1, 1, 0, 1, 1, 0, 0) 0 (1, 1,
1, 0, 0, 0, 0, 0, 0, 0) 1 (0, 0, 0, 1, 1,
0, 0, 0, 0, 0) 0 (1, 1, 1, 0, 0, 0, 1, 1,
0, 1) 0
9
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
f (1, 1, 0, 1, 1, 0, 1, 0, 1, 0)
1 (0, 1, 1, 1, 1, 0, 1, 1, 0, 0) 0 (1, 1,
1, 0, 0, 0, 0, 0, 0, 0) 1 (0, 0, 0, 1, 1,
0, 0, 0, 0, 0) 0 (1, 1, 1, 0, 0, 0, 1, 1,
0, 1) 0
x3, x5, x9 cant be the relevant variables
10
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
f (1, 1, 0, 1, 1, 0, 1, 0, 1, 0)
1 (0, 1, 1, 1, 1, 0, 1, 1, 0, 0) 0 (1, 1,
1, 0, 0, 0, 0, 0, 0, 0) 1 (0, 0, 0, 1, 1,
0, 0, 0, 0, 0) 0 (1, 1, 1, 0, 0, 0, 1, 1,
1, 1) 0
x1, x3, x10 ok
11

Naïve algorithm Try all combinations of r
variables. Time nr
Mossel, ODonnell, Servedio STOC 2003
Algorithm that takes time ncr where c .704
Subroutine Find a single relevant variable
Still open Can this bound be improved?

If output of f is dependent on xi, can detect
dependence (whp) in time poly(n, 2r) and identify
xi as relevant.
Problematic Functions
Every variable is independent of output of f
Pf1xi0 Pf1xi1 for all
xi
Equivalently, all degree 1 Fourier coeffs 0
Functions with this property said to be
CORRELATION-IMMUNE

Pf1xi0 Pf1xi1 for all xi
Geometrically

11
10
e.g. n2
01
00
14

Pf1xi0 Pf1xi1 for all xi
Geometrically

0
1
11
10
Parity(x1,x2)
01
00
0
1
15

Pf1xi0 Pf1xi1 for all xi
Geometrically

0
1
11
10
X11
X10
01
00
0
1
16

Pf1xi0 Pf1xi1 for all xi

X20
X21
0
1
11
10
01
00
0
1
17

Other correlation-immune functions besides
parity?
f(x1,,xn) 1 iff x1 x2 xn

Other correlation-immune functions besides
parity?
All reflexive functions

Other correlation-immune functions besides
parity?
All reflexive functions
More

20
Correlation-immune functions and decision tree
learners

Decision tree learners in ML
Popular machine learning approach (CART, C4.5)
Given set of examples of Boolean function, build
a decision tree
Heuristics for decision tree learning
Greedy, top-down
Differ in way choose which variable to put in
node
Pick variable having highest gain
Pf1xi1 Pf1xi0 means 0 gain
Correlation-immune functions problematic for
decision tree learners

Lookahead
Skewing An efficient alternative to lookahead
for decision tree induction. IJCAI 2003 Page,
Ray
Why skewing works learning difficult Boolean
functions with greedy tree learners. ICML 2005
Rosell, Hellerstein, Ray, Page

22
Story Part One
23

How many difficult functions?
More than

n
fns
n-1 2 2
24

How many different hard functions?
More than
SOMEONE MUST HAVE STUDIED THESE FUNCTIONS BEFORE

n
fns
n/2 2 2
25
(No Transcript)
26
(No Transcript)
27
Story Part Two
28

I had lunch with Eric Bach

Roy, B. K. 2002. A Brief Outline of Research on
Correlation Immune Functions. In Proceedings of
the 7th Australian Conference on information
Security and Privacy (July 03 - 05, 2002). L. M.
Batten and J. Seberry, Eds. Lecture Notes In
Computer Science, vol. 2384. Springer-Verlag,
London, 379-394.

30
Correlation-immune functions

k-correlation immune function
For every subset S of the input variables s.t.
1 S k
Pf S Pf
Xiao, Massey 1988 Equivalently, all Fourier
coefficients of degree i are 0, for
1 i k

Siegenthalers Theorem
If f is k-correlation immune, then the GF2
polynomial for f has degree at most n-k.

Siegenthalers Theorem 1984
If f is k-correlation immune, then the GF2
polynomial for f has degree at most n-k.
Algorithm of Mossel, ODonnell, Servedio STOC
2003 based on this theorem

33
End of Story
34
Non-uniform distributions

Correlation-immune functions are defined wrt the
uniform distribution
What if distribution is biased?
e.g. each bit 1 with probability ¾

35
f(x1,x2) parity(x1,x2)each bit 1 with
probability 3/4
Pf1x11 ? Pf1x10
36
f(x1,x2) parity(x1,x2)p1 with probability 1/4
Pf1x11 ? Pf1x10
For added irrelevant variables, would be equal
37
Correlation-immunity wrt p-biased distributions

Definitions
f is correlation-immune wrt distribution D if
PDf1xi1 PDf1xi0
for all xi
p-biased distribution Dp each bit set to 1
independently with probability p
For all p-biased distributions D,
PDf1xi1 PDf1xi0
for all irrelevant xi

Lemma Let f(x1,,xn) be a Boolean function with
r relevant variables. Then f is correlation
immune w.r.t. Dp for at most r-1 values of p.
Pf Correlation immune wrt Dp means
Pf1xi1 Pf1xi0 0 ()
for all xi.
Consider fixed f and xi. Can write lhs of ()
as polynomial h(p).

e.g. f(x1,x2, x3) parity(x1,x2, x3)p-biased
distribution Dp
h(p) PDpf1x11 - PDpf1x10
( p2 p(1-p) ) ( p(1-p) (1-p)p )
If add irrelevant variable, this polynomial
doesnt change
h(p) for arbitrary f, variable xi, has degree lt
r-1, where r is number of variables.
f correlation-immune wrt at most r-1 values of p,
unless h(p) identically 0 for all xi.

h(p) PDpf1xi1 -PDpf1xi0
where wd is number of inputs x for which
f(x)1, xi1, and x contains exactly d additional
1s.
i.e. wd number of positive assignments of
fxilt-1 of Hamming weight d
Similar expression for PDpf1xi0

PDpf1xi1 - PDpf1xi0
where wd number of positive assignments of
fxilt-1 of Hamming weight d
rd number of positive assignments of
fxilt-0 of Hamming weight d
Not identically 0 iff wd ? rd for some d

42
Property of Boolean functions

Lemma If f has at least one relevant variable,
then for some relevant variable xi, and some d,
wd ? rd for some d
where
wd number of positive assignments of fxilt-1 of
Hamming weight d
rd number of positive assignments of fxilt-0 of
Hamming weight d

How much does it help to have access to examples
from different distributions?

How much does it help to have access to examples
from different distributions?
Hellerstein, Rosell, Bach, Page, Ray
Exploiting Product Distributions to Identify
Relevant Variables of Correlation Immune
Functions

Exploiting Product Distributions to Identify
Relevant Variables of Correlation Immune
Functions Hellerstein, Rosell, Bach, Ray, Page
45

Even if f is not correlation-immune wrt Dp, may
need very large sample to detect relevant
variable
if value of p very near root of h(p)
Lemma If h(p) not identically 0, then for some
value of p in the set
1/(r1),2/(r1),3/(r1), (r1)/(r1) ,
h(p) 1/(r1)r-1

Algorithm to find a relevant variable
Uses examples from distributions Dp, for
p 1/(r1),2/(r1),3/(r1), (r1)/(r1)
sample size poly((r1) r, log n, log 1/d)
Essentially same algorithm found independently
by Arpe and Mossel, using very different
techniques
Another algorithm to find a relevant variable
Based on proving (roughly) that if choose random
p, then h2(p) likely to be reasonably large.
Uses prime number theorem.
Uses examples from poly(2r, log 1/ d)
distributions Dp.
Sample size poly(2r, log n, log 1/ d)