Harmonic Analysis in Learning Theory - PowerPoint PPT Presentation

About This Presentation

Title:

Harmonic Analysis in Learning Theory

Description:

Harmonic analysis is central to learning theoretic results ... Modified Levin algo finds ?a in time ~ns2. Uniform Learning from a. Classification Noise Oracle ... – PowerPoint PPT presentation

Number of Views:55

Avg rating:3.0/5.0

Slides: 48

Provided by: DuquesneU8

Learn more at: https://www.mathcs.duq.edu

Category:

more less

Transcript and Presenter's Notes

Title: Harmonic Analysis in Learning Theory

1
Harmonic Analysis in Learning Theory

Jeff Jackson
Duquesne University

2
Themes

Harmonic analysis is central to learning
theoretic results in wide variety of models
Results generally strongest known for learning
with respect to uniform distribution
Work on learning problems has led to some new
harmonic results
Spectral properties of Boolean function classes
Algorithms for approximating Boolean functions

3
Uniform Learning Model
Boolean Function Class F (e.g., DNF)
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) lt e
Target functionf 0,1n ? 0,1
Uniform Random Exampleslt x, f(x) gt
Example OracleEX(f)
Learning AlgorithmA
Accuracy e gt 0
4
Circuit Classes

Constant-depth AND/OR circuits (AC0 without the
polynomial-size restriction call this
CDC)
DNF depth-2 circuit with OR at root

Ù

Ú
Ú
Ú
d levels
. . .
. . .
Ù
Ù
Ù
. . .
. . .
. . .
v1 v2 v3
vn
Negations allowed
5
Decision Trees
v3
v2
v1
0
1
v4
0
0
1
6
Decision Trees
v3
x3 0
v2
v1
0
1
v4
0
0
1
x 11001
7
Decision Trees
v3
v2
v1
x1 1
0
1
v4
0
0
1
x 11001
8
Decision Trees
v3
v2
v1
0
1
v4
0
0
1
x 11001 f(x) 1
9
Function Size

Each function representation has a natural size
measure
CDC, DNF of gates
DT of leaves
Size sF (f) of f with respect to class F is size
of smallest representation of f within F
For all Boolean f, sCDC(f) sDNF(f) sDT(f)

10
Efficient Uniform Learning Model
Boolean Function Class F (e.g., DNF)
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) lt e
Time poly(n,sF ,1/e)
Target functionf 0,1n ? 0,1
Uniform Random Exampleslt x, f(x) gt
Example OracleEX(f)
Learning AlgorithmA
Accuracy e gt 0
11
Harmonic-Based Uniform Learning

LMN constant-depth circuits are
quasi-efficiently (n polylog(s/e)-time) uniform
learnable
BT monotone Boolean functions are uniform
learnable in time roughly 2vn logn
Monotone For all x, i f(xxi0) f(xxi1)
Also exponential in 1/e (so assumes e constant)
But independent of any size measure

12
Notation

Assume f 0,1n ? -1,1
For all a in 0,1n, ?a (x) (-1) a x
For all a in 0,1n, Fourier coefficient f(a) of
f at a is
Sometimes write, e.g., f(1) for f(100)

13
Fourier Properties of Classes

LMN f is a constant-depth circuit of depth d
andS a a lt logd(s/e) ( a of 1s
in a )
BTf is a monotone Boolean function andS
a a lt vn / e)

14
Spectral Properties
15
Proof Techniques

LMN Hastads Switching Lemma harmonic
analysis
BT Based on KKL
Define AS(f) n Prx,if(xxi0) ? f(xxi1)
If S a a lt AS(f)/e then SaÏS f2(a) lt e
For monotone f, harmonic analysis
Cauchy-Schwartz shows AS(f) vn
Note This is tight for MAJ

16
Function Approximation

For all Boolean f,
For S Í 0,1n, define
LMN

17
The Fourier Learning Algorithm

Given e (and perhaps s, d)
Determine k such that for S a a lt k,
SaÏS f2(a) lt e
Draw sufficiently large sample of examples
ltx,f(x)gt to closely estimate f(a) for all aÎS
Chernoff bounds nk/e sample size sufficient
Output h sign(SaÎS f(a) ?a)
Run time n2k/e

18
Halfspaces

KOS Halfspaces are efficiently uniform
learnable (given e is constant)
Halfspace wÎRn1 s.t. f(x) sign(w (xº1))
If S a a lt (21/e)2 then åaÏS f2(a) lt e
Apply LMN algorithm
Similar result applies for arbitrary function
applied to constant number of halfspaces
Intersection of halfspaces key learning pblm

19
Halfspace Techniques

O (cf. BKS, BJTa)
Noise sensitivity of f at ? is probability that
corrupting each bit of x with probability ?
changes f(x)
NS? (f) ½(1-åa(1-2 ?)a f2(a))
KOS
If S a a lt 1/ ? then åaÏS f2(a) lt 3 NS?
(f)
If f is halfspace then NSe lt 9v e

20
Monotone DT

OS Monotone functions are efficiently
learnable given
e is constant
sDT(f) is used as the size measure
Techniques
Harmonic analysis for monotone f, AS(f) vlog
sDT(f)
BT If S a a lt AS(f)/e then SaÏS f2(a)
lt e
Friedgut T 2AS(f)/e s.t. SAËT f2(A) lt e

21
Weak Approximators

KKL also show that if f is monotone,there is an
i such that -f(i) log2n/n
Therefore Prf(x) -?i(x) ½ log2n/2n
In general, h s.t. Prf h ½ 1/poly(n,s) is
called a weak approximator to f
If A outputs a weak approximator for every f in
F , then F is weakly learnable

22
Uniform Learning Model
Boolean Function Class F (e.g., DNF)
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) lt e
Target functionf 0,1n ? 0,1
Uniform Random Exampleslt x, f(x) gt
Example OracleEX(f)
Learning AlgorithmA
Accuracy e gt 0
23
Weak Uniform Learning Model
Boolean Function Class F (e.g., DNF)
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) lt ½ - 1/p(n,s)
Target functionf 0,1n ? 0,1
Uniform Random Exampleslt x, f(x) gt
Example OracleEX(f)
Learning AlgorithmA
24
Efficient Weak Learning Algorithm for Monotone
Boolean Functions

Draw set of n2 examples ltx,f(x)gt
For i 1 to n
Estimate f(i)
Output h argmaxf(i)(-?i)

25
Weak Approximation for MAJ of Constant-Depth
Circuits

Note that adding a single MAJ to a CDC destroys
the LMN spectral property
JKS MAJ of CDCs is quasi-efficiently
quasi-weak uniform learnable
If f is a MAJ of CDCs of depth d, and if the
number of gates in f is s, then there is a set A
Í 0,1n such that
A lt logd s k
Prf(x) ?A(x) ½ 1/4snk

26
Weak Learning Algorithm

Compute k logds
Draw snk examples ltx,f(x)gt
Repeat for A lt k
Estimate f(A)
Until find A s.t. f(A) gt 1/2snk
Output h ?A
Run time npolylog(s)

27
Weak ApproximatorProof Techniques

Discriminator Lemma (HMPST)
Implies one of the CDCs is a weak approximator
to f
LMN spectral characterization of CDC
Harmonic analysis
Beigel result used to extend weak learning to CDC
with polylog MAJ gates

28
Boosting

In many (not all) cases, uniform weak learning
algorithms can be converted to uniform (strong)
learning algorithms using a boosting technique
(S, F, )
Need to learn weakly with respect to near-uniform
distributions
For near-uniform distribution D, find weak hj
s.t. PrxDhj f gt ½ 1/poly(n,s)
Final h typically MAJ of weak approximators

29
Strong Learning for MAJ of Constant-Depth
Circuits

JKS MAJ of CDC is quasi-efficiently uniform
learnable
Show that for near-uniform distributions, some
parity function is a weak approximator
Beigel result again extends to CDC with poly-log
MAJ gates
KP boosting there are distributions for
which no parity is a weak approximator

30
Uniform Learning from a Membership Oracle
Boolean Function Class F (e.g., DNF)
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) lt e
Target functionf 0,1n ? 0,1
Membership OracleMEM(f)
Learning AlgorithmA
x
f(x)
Accuracy e gt 0
31
Uniform Membership Learning of Decision Trees

KM
L1(f) åa f(a) sDT(f)
If S a f(a) e/L1(f) then SaÏS f2(a) lt e
GL Algorithm (memberhip oracle) for finding a
f(a) ? in time n/?6
So can efficiently uniform membership learn DT
Output h same form as LMNh sign(SaÎS f(a) ?a)

32
Uniform Membership Learning of DNF

J
"(distributions D) ?a s.t. PrxDf(x)
?a(x) ½ 1/6sDNF
Modified GL can efficiently locate such ?a
given oracle for near-uniform D
Boosters can provide such an oracle when uniform
learning
Boosting provides strong learning
BJTb (see also KS)
Modified Levin algo finds ?a in time ns2

33
Uniform Learning from a Classification Noise
Oracle
Boolean Function Class F (e.g., DNF)
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) lt e
Target functionf 0,1n ? 0,1
Classification Noise OracleEX? (f)
Learning AlgorithmA
Uniform random x
Prltx, f(x)gt1-? Prltx, -f(x)gt?
Accuracy e gt 0
Error rate ? gt 0
34
Uniform Learning from a Statistical Query Oracle
Boolean Function Class F (e.g., DNF)
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) lt e
Target functionf 0,1n ? 0,1
Statistical Query OracleSQ(f)
Learning AlgorithmA
( q(), t )
EUq(x, f(x)) t
Accuracy e gt 0
35
SQ and Classification Noise Learning

K
If F is uniform SQ learnable in time poly(n,
sF ,1/e, 1/t) then F is uniform CN learnable in
time poly(n, sF ,1/e, 1/t, 1/(1-2?))
Empirically, almost always true that if F is
efficiently uniform learnable then F is
efficiently uniform SQ learnable (i.e., 1/t poly
in other parameters)
Exception F PARn ?a a Î 0,1n, a n

36
Uniform SQ Hardness for PAR

BFJKMR
Harmonic analysis shows that for any q, ?a
EUq(x, ?a(x)) q(0n1) q(a º 1)
Thus adversarial SQ response to (q,t) is q(0n1)
whenever q(a º 1) lt t
Parseval q(b º 1) lt t for all but 1/t2 Fourier
coefficients
So bad query eliminates only poly coefficients
Even PARlog n not efficiently SQ learnable

37
Uniform Learning from an Attribute Noise Oracle
Boolean Function Class F (e.g., DNF)
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) lt e
Target functionf 0,1n ? 0,1
Attribute Noise OracleEXDN(f)
Learning AlgorithmA
Uniform random x
ltxÅr, f(x)gt, rDN
Accuracy e gt 0
Noise model DN
38
Uniform Learning with Independent Attribute Noise

BJTa
LMN algorithm produces estimates of f(a)
ErDN?a(r)
Example application
Assume noise process DN is a product
distribution
DN(x) ?i (pi(xi) (1-pi)(1-xi))
Assume pi lt 1/polylog n, 1/e at most
quasi-poly(n) (mild restrictions)
Then modified LMN uniform learns attribute noisy
AC0 in quasi-poly time

39
Agnostic Learning Model
Arbitrary Boolean Function
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) minimized
Target functionf 0,1n ? 0,1
Uniform Random Exampleslt x, f(x) gt
Example OracleEX(f)
Learning AlgorithmA
40
Near-Agnostic Learning via LMN

KKM
Let f be an arbitrary Boolean function
Fix any set S Í 1..n and fix e
Let g be any function s.t.
SaÏS g2(a) lt e and
Prf ? g is minimized (call this ?)
Then for h learned by LMN by estimating
coefficients of f over S
Prf ? h lt 4? e

41
Average Case Uniform Learning Model
Boolean Function Class F (e.g., DNF)
Hypothesis h0,1n ? 0,1 s.t. PrxU f(x) ?
h(x) lt e
D -randomf 0,1n ? 0,1
Uniform Random Exampleslt x, f(x) gt
Example OracleEX(f)
Learning AlgorithmA
Accuracy e gt 0
42
Average Case Learning of DT

JSa
D uniform over complete, non-redundantlog-depth
DTs
DT efficiently uniform learnable on average
Output is a DT (proper learning)

43
Average Case Learning of DT

Technique
KM All Fourier coefficients of DT with min
depth d are rational with denominator 2d
In average-case tree, coefficient f(i) for at
least one variable vi has odd numerator
So log(denominator) is min depth of tree
Try all variables at root and find depth of child
trees, choosing root with shallowest children
Recurse on child trees to choose their roots

44
Average Case Learning of DNF

JSb
D s terms, each term uniform from terms of
length log s
Monotone DNF with ltn2 terms and DNF with ltn1.5
terms properly and efficiently uniform learnable
on average
Harmonic property
In average-case DNF, sign of f(i,j) (usually)
indicates whether vi and vj are in a common term
or not

45
Summary

Most uniform-learning results depend on harmonic
analysis
Learning theory provides motivation for new
harmonic observations
Even very weak harmonic results can be useful
in learning-theory algorithms

46
Some Open Problems

Efficient uniform learning of monotone DNF
Best to date for small sDNF is S, time nslog
s (based on BT, M, LMN)
Non-uniform learning
Relatively easy to extend many results to product
distributions, e.g. FJS extends LMN
Key issue in real-world applicability

47
Open Problems (contd)