Fourier Analysis and Boolean Function Learning - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Fourier Analysis and Boolean Function Learning

Description:

Size sF (f) of f with respect to class F is size of smallest representation of f ... Monotone: For all x, i: f(x|xi=0) f(x|xi=1) Also exponential in 1/e (so ... – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 49
Provided by: duque8
Category:

less

Transcript and Presenter's Notes

Title: Fourier Analysis and Boolean Function Learning


1
Fourier Analysis and Boolean Function Learning
  • Jeff Jackson
  • Duquesne University
  • www.mathcs.duq.edu/jackson

2
Themes
  • Fourier analysis is central to learning theoretic
    results in wide variety of models
  • Results generally are the strongest known for
    learning Boolean function classes with respect to
    uniform distribution
  • Work on learning problems has led to some new
    harmonic results
  • Spectral properties of Boolean function classes
  • Algorithms for approximating Boolean functions

3
Uniform Learning Model
Boolean Function Class F (e.g., DNF)
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) lt e
Target functionf 0,1n ? 0,1
Uniform Random Exampleslt x, f(x) gt
Example OracleEX(f)
Learning AlgorithmA
Accuracy e gt 0
4
Circuit Classes
  • Constant-depth AND/OR circuits (AC0 without the
    polynomial-size restriction call this
    CDC)
  • DNF depth-2 circuit with OR at root

Ù

Ú
Ú
Ú
d levels
. . .
. . .
Ù
Ù
Ù
. . .
. . .
. . .
v1 v2 v3
vn
Negations allowed
5
Decision Trees
v3
v2
v1
0
1
v4
0
0
1
6
Decision Trees
v3
x3 0
v2
v1
0
1
v4
0
0
1
x 11001
7
Decision Trees
v3
v2
v1
x1 1
0
1
v4
0
0
1
x 11001
8
Decision Trees
v3
v2
v1
0
1
v4
0
0
1
x 11001 f(x) 1
9
Function Size
  • Each function representation has a natural size
    measure
  • CDC, DNF of gates
  • DT of leaves
  • Size sF (f) of f with respect to class F is size
    of smallest representation of f within F
  • For all Boolean f, sCDC(f) sDNF(f) sDT(f)

10
Efficient Uniform Learning Model
Boolean Function Class F (e.g., DNF)
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) lt e
Time poly(n,sF ,1/e)
Target functionf 0,1n ? 0,1
Uniform Random Exampleslt x, f(x) gt
Example OracleEX(f)
Learning AlgorithmA
Accuracy e gt 0
11
Harmonic-Based Uniform Learning
  • LMN constant-depth circuits are
    quasi-efficiently (n polylog(s/e)-time) uniform
    learnable
  • BT monotone Boolean functions are uniform
    learnable in time roughly 2vn logn
  • Monotone For all x, i f(xxi0) f(xxi1)
  • Also exponential in 1/e (so assumes e constant)
  • But independent of any size measure

12
Notation
  • Assume f 0,1n ? -1,1
  • For all a in 0,1n, ?a (x) (-1) a x
  • For all a in 0,1n, Fourier coefficient f(a) of
    f at a is
  • Sometimes write, e.g., f(1) for f(100)




13
Fourier Properties of Classes
  • LMN f is a constant-depth circuit of depth d
    andS a a lt logd(s/e) ( a of 1s
    in a )
  • BTf is a monotone Boolean function andS
    a a lt vn / e)

14
Spectral Properties
15
Proof Techniques
  • LMN Hastads Switching Lemma harmonic
    analysis
  • BT Based on KKL
  • Define AS(f) n Prx,if(xxi0) ? f(xxi1)
  • If S a a lt AS(f)/e then SaÏS f2(a) lt e
  • For monotone f, harmonic analysis
    Cauchy-Schwartz shows AS(f) vn
  • Note This is tight for MAJ


16
Function Approximation
  • For all Boolean f,
  • For S Í 0,1n, define
  • LMN

17
The Fourier Learning Algorithm
  • Given e (and perhaps s, d, ...)
  • Determine k such that for S a a lt k,
    SaÏS f2(a) lt e
  • Draw sufficiently large sample of examples
    ltx,f(x)gt to closely estimate f(a) for all aÎS
  • Chernoff bounds nk/e sample size sufficient
  • Output h sign(SaÎS f(a) ?a)
  • Run time n2k/e




18
Halfspaces
  • KOS Halfspaces are efficiently uniform
    learnable (given e is constant)
  • Halfspace wÎRn1 s.t. f(x) sign(w (xº1))
  • If S a a lt (21/e)2 then åaÏS f2(a) lt e
  • Apply LMN algorithm
  • Similar result applies for arbitrary function
    applied to constant number of halfspaces
  • Intersection of halfspaces key learning pblm


19
Halfspace Techniques
  • O (cf. BKS, BJTa)
  • Noise sensitivity of f at ? is probability that
    corrupting each bit of x with probability ?
    changes f(x)
  • NS? (f) ½(1-åa(1-2 ?)a f2(a))
  • KOS
  • If S a a lt 1/ ? then åaÏS f2(a) lt 3 NS?
    (f)
  • If f is halfspace then NS?(f) lt 9v ?



20
Monotone DT
  • OS Monotone functions are efficiently
    learnable given
  • e is constant
  • sDT(f) is used as the size measure
  • Techniques
  • Harmonic analysis for monotone f, AS(f) vlog
    sDT(f)
  • BT If S a a lt AS(f)/e then SaÏS f2(a)
    lt e
  • Friedgut T 2AS(f)/e s.t. SAËT f2(A) lt e



21
Weak Approximators
  • KKL also show that if f is monotone,there is an
    i such that -f(i) log2n/n
  • Therefore Prf(x) -?i(x) ½ log2n/2n
  • In general, h s.t. Prf h ½ 1/poly(n,s) is
    called a weak approximator to f
  • If A outputs a weak approximator for every f in
    F , then F is weakly learnable


22
Uniform Learning Model
Boolean Function Class F (e.g., DNF)
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) lt e
Target functionf 0,1n ? 0,1
Uniform Random Exampleslt x, f(x) gt
Example OracleEX(f)
Learning AlgorithmA
Accuracy e gt 0
23
Weak Uniform Learning Model
Boolean Function Class F (e.g., DNF)
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) lt ½ - 1/p(n,s)
Target functionf 0,1n ? 0,1
Uniform Random Exampleslt x, f(x) gt
Example OracleEX(f)
Learning AlgorithmA
24
Efficient Weak Learning Algorithm for Monotone
Boolean Functions
  • Draw set of n2 examples ltx,f(x)gt
  • For i 1 to n
  • Estimate f(i)
  • Output h argmaxf(i)(-?i)



25
Weak Approximation for MAJ of Constant-Depth
Circuits
  • Note that adding a single MAJ to a CDC destroys
    the LMN spectral property
  • JKS MAJ of CDCs is quasi-efficiently
    quasi-weak uniform learnable
  • If f is a MAJ of CDCs of depth d, and if the
    number of gates in f is s, then there is a set A
    Í 0,1n such that
  • A lt logd s k
  • Prf(x) ?A(x) ½ 1/4snk

26
Weak Learning Algorithm
  • Compute k logds
  • Draw snk examples ltx,f(x)gt
  • Repeat for A lt k
  • Estimate f(A)
  • Until find A s.t. f(A) gt 1/2snk
  • Output h ?A
  • Run time npolylog(s)



27
Weak ApproximatorProof Techniques
  • Discriminator Lemma (HMPST)
  • Implies one of the CDCs is a weak approximator
    to f
  • LMN spectral characterization of CDC
  • Harmonic analysis
  • Beigel result used to extend weak learning to CDC
    with polylog MAJ gates

28
Boosting
  • In many (not all) cases, uniform weak learning
    algorithms can be converted to uniform (strong)
    learning algorithms using a boosting technique
    (S, FS, )
  • Need to learn weakly with respect to near-uniform
    distributions
  • For near-uniform distribution D, find weak hj
    s.t. PrxDhj f gt ½ 1/poly(n,s)
  • Final h typically MAJ of weak approximators

29
Strong Learning for MAJ of Constant-Depth
Circuits
  • JKS MAJ of CDC is quasi-efficiently uniform
    learnable
  • Show that for near-uniform distributions, some
    parity function is a weak approximator
  • Beigel result again extends to CDC with poly-log
    MAJ gates
  • KP boosting there are distributions for
    which no parity is a weak approximator

30
Uniform Learning from a Membership Oracle
Boolean Function Class F (e.g., DNF)
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) lt e
Target functionf 0,1n ? 0,1
Membership OracleMEM(f)
Learning AlgorithmA
x
f(x)
Accuracy e gt 0
31
Uniform Membership Learning of Decision Trees
  • KM
  • L1(f) åa f(a) sDT(f)
  • If S a f(a) e/L1(f) then SaÏS f2(a) lt e
  • GL Algorithm (memberhip oracle) for finding a
    f(a) ? in time n/?6
  • So can efficiently uniform membership learn DT
  • Output h same form as LMNh sign(SaÎS f(a) ?a)








32
Uniform Membership Learning of DNF
  • J
  • "(distributions D) ?a s.t. PrxDf(x)
    ?a(x) ½ 1/6sDNF
  • Modified GL can efficiently locate such ?a
    given oracle for near-uniform D
  • Boosters can provide such an oracle when uniform
    learning
  • Boosting provides strong learning
  • BJTb, KS, F
  • For near-uniform D, can find ?a in time ns2

33
Uniform Learning from a Random Walk Oracle
Boolean Function Class F (e.g., DNF)
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) lt e
Target functionf 0,1n ? 0,1
Random Walk Exampleslt x, f(x) gt
Random Walk Oracle RW(f)
Learning AlgorithmA
Accuracy e gt 0
34
Random Walk DNF Learning
  • BMOS
  • Noise sensitivity and related values can be
    accurately estimated using a random walk oracle
  • NS? (f) ½(1-åa(1-2 ?)a f2(a))
  • T?b(f) åa ? b ?a f2(a)
  • Estimate of T?b(f) is efficient if b
    logarithmic
  • Only need logarithmic b to learn DNF BF



35
Random Walk Parity Learning
  • JW (unpub)
  • Effectively, BMOS limited to finding heavy
    Fourier coefficents f(a) for logarithmic a
  • Using a breadth-first variation of KM, can
    locate any f(a) gt ? in time O(nlog 1/ ?)
  • Heavy coefficient corresponds to a parity
    function that weakly approximates



36
Uniform Learning from a Classification Noise
Oracle
Boolean Function Class F (e.g., DNF)
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) lt e
Target functionf 0,1n ? 0,1
Classification Noise OracleEX? (f)
Learning AlgorithmA
Uniform random x
Prltx, f(x)gt1-? Prltx, -f(x)gt?
Accuracy e gt 0
Error rate ? gt 0
37
Uniform Learning from a Statistical Query Oracle
Boolean Function Class F (e.g., DNF)
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) lt e
Target functionf 0,1n ? 0,1
Statistical Query OracleSQ(f)
Learning AlgorithmA
( q(), t )
EUq(x, f(x)) t
Accuracy e gt 0
38
SQ and Classification Noise Learning
  • K
  • If F is uniform SQ learnable in time poly(n,
    sF ,1/e, 1/t) then F is uniform CN learnable in
    time poly(n, sF ,1/e, 1/t, 1/(1-2?))
  • Empirically, almost always true that if F is
    efficiently uniform learnable then F is
    efficiently uniform SQ learnable (i.e., 1/t poly
    in other parameters)
  • Exception F PARn ?a a Î 0,1n, a n

39
Uniform SQ Hardness for PAR
  • BFJKMR
  • Harmonic analysis shows that for any q, ?a
    EUq(x, ?a(x)) q(0n1) q(a º 1)
  • Thus adversarial SQ response to (q,t) is q(0n1)
    whenever q(a º 1) lt t
  • Parseval q(b º 1) lt t for all but 1/t2 Fourier
    coefficients
  • So bad query eliminates only poly coefficients
  • Even PARlog n not efficiently SQ learnable






40
Uniform Learning from an Attribute Noise Oracle
Boolean Function Class F (e.g., DNF)
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) lt e
Target functionf 0,1n ? 0,1
Attribute Noise OracleEXDN(f)
Learning AlgorithmA
Uniform random x
ltxÅr, f(x)gt, rDN
Accuracy e gt 0
Noise model DN
41
Uniform Learning with Independent Attribute Noise
  • BJTa
  • LMN algorithm produces estimates of f(a)
    ErDN?a(r)
  • Example application
  • Assume noise process DN is a product
    distribution
  • DN(x) ?i (pixi (1-pi)(1-xi))
  • Assume pi lt 1/polylog n, 1/e at most
    quasi-poly(n) (mild restrictions)
  • Then modified LMN uniform learns attribute noisy
    AC0 in quasi-poly time


42
Agnostic Learning Model
Arbitrary Boolean Function
Hypothesis h in H s.t. PrxU f(x) ? h(x) lt
optH e
Target functionf 0,1n ? 0,1
Uniform Random Exampleslt x, f(x) gt
Example OracleEX(f)
Learning AlgorithmA
Accuracy e gt 0
43
Agnostic Learning of Halfspaces
  • KKMS
  • Agnostic learning algorithm for H the set of
    halfspaces
  • Algorithm is not Fourier-based (L1 regression)
  • However, a somewhat weaker result can be obtained
    by simple Fourier analysis

44
Near-Agnostic Learning via LMN
  • KKMS
  • Let f be an arbitrary Boolean function
  • Fix any set S Í 1..n and fix e
  • Let g be any function s.t.
  • SaÏS g2(a) lt e and
  • Prf ? g (call this ?) is minimized for any such
    g
  • Then for h learned by LMN by estimating
    coefficients of f over S
  • Prf ? h lt 4? e


45
Summary
  • Most uniform-learning results for Boolean
    function classes depend on harmonic analysis
  • Learning theory provides motivation for new
    harmonic observations
  • Even very weak harmonic results can be useful
    in learning-theory algorithms

46
Some Open Problems
  • Efficient uniform learning of monotone DNF
  • Best to date for small sDNF is Ser, time
    nslog s (based on BT, M, LMN)
  • Non-uniform learning
  • Relatively easy to extend many results to product
    distributions, e.g. FJS extends LMN
  • Key issue in real-world applicability

47
Open Problems (contd)
  • Weaker dependence on e
  • Several algorithms fully exponential (or worse)
    in 1/e
  • Additional proper learning results
  • Allows for interpretation of learned hypothesis

48
References
  • Beigel When Do Extra Majority Gates Help? ...
  • BFJKMR Blum, Furst, Jackson, Kearns, Mansour,
    Rudich. Weakly Learning DNF...
  • BJTa Bshouty, Jackson, Tamon.
    Uniform-Distribution Attribute Noise
    Learnability.
  • BJTb Bshouty, Jackson, Tamon. More Efficient
    PAC-learning of DNF...
  • BKS Benjamini, Kalai, Schramm. Noise
    Sensitivity of Boolean Functions...
  • BMOS Bshouty, Mossel, ODonnell, Servedio.
    Learning DNF from Random Walks.
  • BT Bshouty, Tamon. On the Fourier Spectrum of
    Monotone Functions.
  • F Feldman. Attribute Efficient and Non-adaptive
    Learning of Parities...
  • FJS Furst, Jackson, Smith. Improved Learning of
    AC0 Functions.
  • FS Freund, Schapire. A Decision-theoretic
    Generalization of On-line Learning...
  • Friedgut Boolean Functions with Low Average
    Sensitivity Depend on Few Coordinates.
  • HMPST Hajnal, Maass, Pudlak, Szegedy, Turan.
    Threshold Circuits of Bounded Depth.
  • J Jackson. An Efficient Membership-Query
    Algorithm for Learning DNF...
  • JKS Jackson, Klivans, Servedio. Learnability
    Beyond AC0.
  • JW Jackson, Wimmer. In prep.
  • KKL Kahn, Kalai, Linial. The Influence of
    Variables on Boolean Functions.
  • KKMS Kalai, Klivans, Mansour, Servedio. On
    Agnostic Boosting and Parity Learning.
  • K Kearns. Efficient Noise-tolerant learning
    from Statistical Queries.
  • KM Kushilevitz, Mansour. Learning Decision
    Trees using the Fourier Spectrum.
Write a Comment
User Comments (0)
About PowerShow.com