Fourier Analysis and Boolean Function Learning - PowerPoint PPT Presentation

1 / 48

About This Presentation

Title:

Fourier Analysis and Boolean Function Learning

Description:

Size sF (f) of f with respect to class F is size of smallest representation of f ... Monotone: For all x, i: f(x|xi=0) f(x|xi=1) Also exponential in 1/e (so ... – PowerPoint PPT presentation

Number of Views:83

Avg rating:3.0/5.0

Slides: 49

Provided by: duque8

Category:

more less

Transcript and Presenter's Notes

Title: Fourier Analysis and Boolean Function Learning

1
Fourier Analysis and Boolean Function Learning

Jeff Jackson
Duquesne University
www.mathcs.duq.edu/jackson

2
Themes

Fourier analysis is central to learning theoretic
results in wide variety of models
Results generally are the strongest known for
learning Boolean function classes with respect to
uniform distribution
Work on learning problems has led to some new
harmonic results
Spectral properties of Boolean function classes
Algorithms for approximating Boolean functions

3
Uniform Learning Model
Boolean Function Class F (e.g., DNF)
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) lt e
Target functionf 0,1n ? 0,1
Uniform Random Exampleslt x, f(x) gt
Example OracleEX(f)
Learning AlgorithmA
Accuracy e gt 0
4
Circuit Classes

Constant-depth AND/OR circuits (AC0 without the
polynomial-size restriction call this
CDC)
DNF depth-2 circuit with OR at root

Ù

Ú
Ú
Ú
d levels
. . .
. . .
Ù
Ù
Ù
. . .
. . .
. . .
v1 v2 v3
vn
Negations allowed
5
Decision Trees
v3
v2
v1
0
1
v4
0
0
1
6
Decision Trees
v3
x3 0
v2
v1
0
1
v4
0
0
1
x 11001
7
Decision Trees
v3
v2
v1
x1 1
0
1
v4
0
0
1
x 11001
8
Decision Trees
v3
v2
v1
0
1
v4
0
0
1
x 11001 f(x) 1
9
Function Size

Each function representation has a natural size
measure
CDC, DNF of gates
DT of leaves
Size sF (f) of f with respect to class F is size
of smallest representation of f within F
For all Boolean f, sCDC(f) sDNF(f) sDT(f)

10
Efficient Uniform Learning Model
Boolean Function Class F (e.g., DNF)
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) lt e
Time poly(n,sF ,1/e)
Target functionf 0,1n ? 0,1
Uniform Random Exampleslt x, f(x) gt
Example OracleEX(f)
Learning AlgorithmA
Accuracy e gt 0
11
Harmonic-Based Uniform Learning

LMN constant-depth circuits are
quasi-efficiently (n polylog(s/e)-time) uniform
learnable
BT monotone Boolean functions are uniform
learnable in time roughly 2vn logn
Monotone For all x, i f(xxi0) f(xxi1)
Also exponential in 1/e (so assumes e constant)
But independent of any size measure

12
Notation

Assume f 0,1n ? -1,1
For all a in 0,1n, ?a (x) (-1) a x
For all a in 0,1n, Fourier coefficient f(a) of
f at a is
Sometimes write, e.g., f(1) for f(100)

13
Fourier Properties of Classes

LMN f is a constant-depth circuit of depth d
andS a a lt logd(s/e) ( a of 1s
in a )
BTf is a monotone Boolean function andS
a a lt vn / e)

14
Spectral Properties
15
Proof Techniques

LMN Hastads Switching Lemma harmonic
analysis
BT Based on KKL
Define AS(f) n Prx,if(xxi0) ? f(xxi1)
If S a a lt AS(f)/e then SaÏS f2(a) lt e
For monotone f, harmonic analysis
Cauchy-Schwartz shows AS(f) vn
Note This is tight for MAJ

16
Function Approximation

For all Boolean f,
For S Í 0,1n, define
LMN

17
The Fourier Learning Algorithm

Given e (and perhaps s, d, ...)
Determine k such that for S a a lt k,
SaÏS f2(a) lt e
Draw sufficiently large sample of examples
ltx,f(x)gt to closely estimate f(a) for all aÎS
Chernoff bounds nk/e sample size sufficient
Output h sign(SaÎS f(a) ?a)
Run time n2k/e

18
Halfspaces

KOS Halfspaces are efficiently uniform
learnable (given e is constant)
Halfspace wÎRn1 s.t. f(x) sign(w (xº1))
If S a a lt (21/e)2 then åaÏS f2(a) lt e
Apply LMN algorithm
Similar result applies for arbitrary function
applied to constant number of halfspaces
Intersection of halfspaces key learning pblm

19
Halfspace Techniques

O (cf. BKS, BJTa)
Noise sensitivity of f at ? is probability that
corrupting each bit of x with probability ?
changes f(x)
NS? (f) ½(1-åa(1-2 ?)a f2(a))
KOS
If S a a lt 1/ ? then åaÏS f2(a) lt 3 NS?
(f)
If f is halfspace then NS?(f) lt 9v ?

20
Monotone DT

OS Monotone functions are efficiently
learnable given
e is constant
sDT(f) is used as the size measure
Techniques
Harmonic analysis for monotone f, AS(f) vlog
sDT(f)
BT If S a a lt AS(f)/e then SaÏS f2(a)
lt e
Friedgut T 2AS(f)/e s.t. SAËT f2(A) lt e

21
Weak Approximators

KKL also show that if f is monotone,there is an
i such that -f(i) log2n/n
Therefore Prf(x) -?i(x) ½ log2n/2n
In general, h s.t. Prf h ½ 1/poly(n,s) is
called a weak approximator to f
If A outputs a weak approximator for every f in
F , then F is weakly learnable

22
Uniform Learning Model
Boolean Function Class F (e.g., DNF)
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) lt e
Target functionf 0,1n ? 0,1
Uniform Random Exampleslt x, f(x) gt
Example OracleEX(f)
Learning AlgorithmA
Accuracy e gt 0
23
Weak Uniform Learning Model
Boolean Function Class F (e.g., DNF)
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) lt ½ - 1/p(n,s)
Target functionf 0,1n ? 0,1
Uniform Random Exampleslt x, f(x) gt
Example OracleEX(f)
Learning AlgorithmA
24
Efficient Weak Learning Algorithm for Monotone
Boolean Functions

Draw set of n2 examples ltx,f(x)gt
For i 1 to n
Estimate f(i)
Output h argmaxf(i)(-?i)

25
Weak Approximation for MAJ of Constant-Depth
Circuits

Note that adding a single MAJ to a CDC destroys
the LMN spectral property
JKS MAJ of CDCs is quasi-efficiently
quasi-weak uniform learnable
If f is a MAJ of CDCs of depth d, and if the
number of gates in f is s, then there is a set A
Í 0,1n such that
A lt logd s k
Prf(x) ?A(x) ½ 1/4snk

26
Weak Learning Algorithm

Compute k logds
Draw snk examples ltx,f(x)gt
Repeat for A lt k
Estimate f(A)
Until find A s.t. f(A) gt 1/2snk
Output h ?A
Run time npolylog(s)

27
Weak ApproximatorProof Techniques

Discriminator Lemma (HMPST)
Implies one of the CDCs is a weak approximator
to f
LMN spectral characterization of CDC
Harmonic analysis
Beigel result used to extend weak learning to CDC
with polylog MAJ gates

28
Boosting

In many (not all) cases, uniform weak learning
algorithms can be converted to uniform (strong)
learning algorithms using a boosting technique
(S, FS, )
Need to learn weakly with respect to near-uniform
distributions
For near-uniform distribution D, find weak hj
s.t. PrxDhj f gt ½ 1/poly(n,s)
Final h typically MAJ of weak approximators

29
Strong Learning for MAJ of Constant-Depth
Circuits

JKS MAJ of CDC is quasi-efficiently uniform
learnable
Show that for near-uniform distributions, some
parity function is a weak approximator
Beigel result again extends to CDC with poly-log
MAJ gates
KP boosting there are distributions for
which no parity is a weak approximator

30
Uniform Learning from a Membership Oracle
Boolean Function Class F (e.g., DNF)
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) lt e
Target functionf 0,1n ? 0,1
Membership OracleMEM(f)
Learning AlgorithmA
x
f(x)
Accuracy e gt 0
31
Uniform Membership Learning of Decision Trees

KM
L1(f) åa f(a) sDT(f)
If S a f(a) e/L1(f) then SaÏS f2(a) lt e
GL Algorithm (memberhip oracle) for finding a
f(a) ? in time n/?6
So can efficiently uniform membership learn DT
Output h same form as LMNh sign(SaÎS f(a) ?a)

32
Uniform Membership Learning of DNF

J
"(distributions D) ?a s.t. PrxDf(x)
?a(x) ½ 1/6sDNF
Modified GL can efficiently locate such ?a
given oracle for near-uniform D
Boosters can provide such an oracle when uniform
learning
Boosting provides strong learning
BJTb, KS, F
For near-uniform D, can find ?a in time ns2

33
Uniform Learning from a Random Walk Oracle
Boolean Function Class F (e.g., DNF)
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) lt e
Target functionf 0,1n ? 0,1
Random Walk Exampleslt x, f(x) gt
Random Walk Oracle RW(f)
Learning AlgorithmA
Accuracy e gt 0
34
Random Walk DNF Learning

BMOS
Noise sensitivity and related values can be
accurately estimated using a random walk oracle
NS? (f) ½(1-åa(1-2 ?)a f2(a))
T?b(f) åa ? b ?a f2(a)
Estimate of T?b(f) is efficient if b
logarithmic
Only need logarithmic b to learn DNF BF

35
Random Walk Parity Learning

JW (unpub)
Effectively, BMOS limited to finding heavy
Fourier coefficents f(a) for logarithmic a
Using a breadth-first variation of KM, can
locate any f(a) gt ? in time O(nlog 1/ ?)
Heavy coefficient corresponds to a parity
function that weakly approximates

36
Uniform Learning from a Classification Noise
Oracle
Boolean Function Class F (e.g., DNF)
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) lt e
Target functionf 0,1n ? 0,1
Classification Noise OracleEX? (f)
Learning AlgorithmA
Uniform random x
Prltx, f(x)gt1-? Prltx, -f(x)gt?
Accuracy e gt 0
Error rate ? gt 0
37
Uniform Learning from a Statistical Query Oracle
Boolean Function Class F (e.g., DNF)
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) lt e
Target functionf 0,1n ? 0,1
Statistical Query OracleSQ(f)
Learning AlgorithmA
( q(), t )
EUq(x, f(x)) t
Accuracy e gt 0
38
SQ and Classification Noise Learning

K
If F is uniform SQ learnable in time poly(n,
sF ,1/e, 1/t) then F is uniform CN learnable in
time poly(n, sF ,1/e, 1/t, 1/(1-2?))
Empirically, almost always true that if F is
efficiently uniform learnable then F is
efficiently uniform SQ learnable (i.e., 1/t poly
in other parameters)
Exception F PARn ?a a Î 0,1n, a n

39
Uniform SQ Hardness for PAR

BFJKMR
Harmonic analysis shows that for any q, ?a
EUq(x, ?a(x)) q(0n1) q(a º 1)
Thus adversarial SQ response to (q,t) is q(0n1)
whenever q(a º 1) lt t
Parseval q(b º 1) lt t for all but 1/t2 Fourier
coefficients
So bad query eliminates only poly coefficients
Even PARlog n not efficiently SQ learnable

40
Uniform Learning from an Attribute Noise Oracle
Boolean Function Class F (e.g., DNF)
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) lt e
Target functionf 0,1n ? 0,1
Attribute Noise OracleEXDN(f)
Learning AlgorithmA
Uniform random x
ltxÅr, f(x)gt, rDN
Accuracy e gt 0
Noise model DN
41
Uniform Learning with Independent Attribute Noise

BJTa
LMN algorithm produces estimates of f(a)
ErDN?a(r)
Example application
Assume noise process DN is a product
distribution
DN(x) ?i (pixi (1-pi)(1-xi))
Assume pi lt 1/polylog n, 1/e at most
quasi-poly(n) (mild restrictions)
Then modified LMN uniform learns attribute noisy
AC0 in quasi-poly time

42
Agnostic Learning Model
Arbitrary Boolean Function
Hypothesis h in H s.t. PrxU f(x) ? h(x) lt
optH e
Target functionf 0,1n ? 0,1
Uniform Random Exampleslt x, f(x) gt
Example OracleEX(f)
Learning AlgorithmA
Accuracy e gt 0
43
Agnostic Learning of Halfspaces

KKMS
Agnostic learning algorithm for H the set of
halfspaces
Algorithm is not Fourier-based (L1 regression)
However, a somewhat weaker result can be obtained
by simple Fourier analysis

44
Near-Agnostic Learning via LMN

KKMS
Let f be an arbitrary Boolean function
Fix any set S Í 1..n and fix e
Let g be any function s.t.
SaÏS g2(a) lt e and
Prf ? g (call this ?) is minimized for any such
g
Then for h learned by LMN by estimating
coefficients of f over S
Prf ? h lt 4? e

45
Summary

Most uniform-learning results for Boolean
function classes depend on harmonic analysis
Learning theory provides motivation for new
harmonic observations
Even very weak harmonic results can be useful
in learning-theory algorithms

46
Some Open Problems

Efficient uniform learning of monotone DNF
Best to date for small sDNF is Ser, time
nslog s (based on BT, M, LMN)
Non-uniform learning
Relatively easy to extend many results to product
distributions, e.g. FJS extends LMN
Key issue in real-world applicability

47
Open Problems (contd)

Weaker dependence on e
Several algorithms fully exponential (or worse)
in 1/e
Additional proper learning results
Allows for interpretation of learned hypothesis

48
References

Beigel When Do Extra Majority Gates Help? ...
BFJKMR Blum, Furst, Jackson, Kearns, Mansour,
Rudich. Weakly Learning DNF...
BJTa Bshouty, Jackson, Tamon.
Uniform-Distribution Attribute Noise
Learnability.
BJTb Bshouty, Jackson, Tamon. More Efficient
PAC-learning of DNF...
BKS Benjamini, Kalai, Schramm. Noise
Sensitivity of Boolean Functions...
BMOS Bshouty, Mossel, ODonnell, Servedio.
Learning DNF from Random Walks.
BT Bshouty, Tamon. On the Fourier Spectrum of
Monotone Functions.
F Feldman. Attribute Efficient and Non-adaptive
Learning of Parities...
FJS Furst, Jackson, Smith. Improved Learning of
AC0 Functions.
FS Freund, Schapire. A Decision-theoretic
Generalization of On-line Learning...
Friedgut Boolean Functions with Low Average
Sensitivity Depend on Few Coordinates.
HMPST Hajnal, Maass, Pudlak, Szegedy, Turan.
Threshold Circuits of Bounded Depth.
J Jackson. An Efficient Membership-Query
Algorithm for Learning DNF...
JKS Jackson, Klivans, Servedio. Learnability
Beyond AC0.
JW Jackson, Wimmer. In prep.
KKL Kahn, Kalai, Linial. The Influence of
Variables on Boolean Functions.
KKMS Kalai, Klivans, Mansour, Servedio. On
Agnostic Boosting and Parity Learning.
K Kearns. Efficient Noise-tolerant learning
from Statistical Queries.
KM Kushilevitz, Mansour. Learning Decision
Trees using the Fourier Spectrum.