Title: A Quarter-Century of Efficient Learnability
1A Quarter-Century of Efficient Learnability
- Rocco Servedio
- Columbia University
Valiant 60th Birthday Symposium Bethesda,
Maryland May 30, 2009
21984
and of course...
3(No Transcript)
4Probably Approximately Correct learningValiant84
Valiant84 presents range of learning models,
oracles
D models (possibly complex) world
typically or
- Concept class of Boolean functions over
domain X
- Unknown target concept to be learned
from examples
- Unknown and arbitrary distribution over X
Learner has access to i.i.d. draws from
labeled according to each
belongs to X, i.i.d. drawn from
5PAC learning concept class
- Learners goal
- come up with hypothesis that will
have high accuracy on future examples.
Efficiently
- For any target function
- for any distribution over X,
- with probability learner outputs
hypothesisthat is -accurate w.r.t.
Algorithm must be computationally efficient
should run in time
6So, what can be learned efficiently?
PAC model, and its variants, provide a clean
theoretical framework for studying the
computational complexity of learning
problems. From
The results of learnability theory would then
indicate the maximum granularity of the single
concepts that can be acquired without
programming. This paper attempts to explore
the limits of what is learnable as allowed by
algorithmic complexity.The identification of
these limits is a major goal of the line of work
proposed in this paper.
725 years of efficient learnability
(Didnt just ask the question what can be
learned efficiently he did a great deal
towards answering it. (highlight some of these
contributions and how the field has evolved since
then)
In the rest of the 1980s, Valiant
colleagues gave remarkable results on the
abilities and limitations of computationally
efficient learning algorithms. This work
introduced research directions and questions that
continue to be intensively studied to this day.
- Rest of talk survey some
- positive results (algorithms)
- negative results (two flavors of hardness
results)
8Positive results learning k-DNF
Theorem Valiant84 k-DNF learnable in
polynomial time for any kO(1). k2 View a
k-DNF as a disjunction over
metavariables, learn the disjunction using
elimination.
25 years later improving this to k
is still a major open question! Much has been
learned in trying for this improvement
9Poly-time PAC learning, general distributions
- Decision lists (greedy alg.)Rivest87
- Halfspaces (poly-time LP)Littlestone87, BEHW89,
- Parities, integer lattices (Gaussian elim.)
HelmboldSloanWarmuth92, FischerSimon92 - Restricted types of branching programs (DL
parities) ErgunKumarRubinfeld95,
BshoutyTamonWilson98 - Geometric concept classes (random projections)
BshoutyChenHomer94, BGMST98, Vempala99, - and more
-
-
-
-
-
-
-
- -
10General-distribution PAC learning, cont
- Quasi-poly / sub-exponential-time learning
- poly-size decision trees EhrenfeuchtHaussler89,
Blum92 - poly-size DNF Bshouty96, TaruiTsukiji99,Klivans
S01 - intersections of few poly(n)-weight halfspaces
KlivansODonnellS02 - PTF method (halfspaces metavariables) - link
with complexity theory
x3
x1
x5
x1
x5
x4
1
-1
1
-1
1
-1
1
OR
AND
AND
AND
_
_
_
_
x2
x3
x5
x6
x3
x5
x1
x6
x7
-
-
-
-
-
-
-
-
-
-
-
-
-
- -
-
11Distribution-specific learning
- Theorem KearnsLiValiant87 monotone Boolean
functions can be weakly learned (accuracy
) in poly time under the uniform
distribution on - Ushered in study of algorithms for
uniform-distribution and distribution-specific
learning halfspaces Baum90, DNF Verbeurgt90,
Jackson95, decision trees KushilevitzMansour93,
AC0 LinialMansourNisan89, FurstJacksonSmith91,
extended AC0 JacksonKlivansS02, juntas
MosselODonnellS03, general monotone functions
BshoutyTamon96, BlumBurchLangford98,
ODonnellWimmer09, monotone decision trees
ODonnellS06, intersections of halfspaces
BlumKannan94, Vempala97, KwekPitt98,
KlivansODonnellS08, convex sets, much more - Key tool Fourier analysis of Boolean functions
- Recently come full circle on monotone functions
- ODonnellWimmer09 poly time,
accuracy optimal! (by
BlumBurchLangford98)
1
1
0
12Other variants
- After Valiant84, efficient learning algorithms
studied in many settings - Learning in the presence of noise malicious
Valiant85, agnostic KearnsSchapireSellie93,
random misclassification AngluinLaird87, - Related models Exact learning from queries and
counterexamples Angluin87, Statistical Query
Learning Kearns93, many others - PAC-style analyses of unsupervised learning
problems learning discrete distributions
KMRRSS94, learning mixture distributions
Dasgupta99, AroraKannan01, many others - Evolvability framework Valiant07, Feldman08,
- Nice algorithmic results in all these settings.
13Limits of efficient learnabilityis proper
learning feasible?
Proper learning learning algorithm for class
must uses hypotheses from
- There are efficient proper learning algorithms
for conjunctions, disjunctions, halfspaces,
decision lists, parities, k-DNF, k-CNF. - What about k-term DNF can we learn
using k-term DNF as hypotheses?
14Proper learning is computationally hard
Theorem PittValiant87 If
no poly-time algorithm can learn 3-term DNF
using 3-term DNF hypotheses. Given a graph
reduction produces distribution over
labeled examples such that high-accuracy
3-term DNF iff is
3-colorable. Note can learn 3-term DNF in
poly time using 3-CNF hypotheses! Often a
change of representation can make a difficult
learning task easy.
distribution over (011111, ) (001111,
-) (101111, ) (010111, -) (110111, )
(011101, -)
reduction
15From 1987
This work showed computational barriers to
learning with restricted representations in
general, not just proper learning
Theorem PittValiant87 Learning k-term DNF
using (2k-3)-term DNF hypotheses is
hard. Opened door to whole range of hardness
results is hard to learn using
hypotheses from
16 to 2009
- Great progress in recent years using
sophisticated machinery from hardness of
approximation. - ABFKP04 Hard to learn n-term DNF using
n100-size OR-of-halfspace hypotheses.
Feldman06 Holds even if learner can make
membership queries to target function. - KhotSaket08 Hard to (even weakly) learn
intersection of 2 halfspaces using 100 halfspaces
as hypothesis - If data is corrupted with 1 noise, then
- FeldmanGopalanKhotPonnuswami08 Hard to (even
weakly) learn an AND using an AND as hypothesis.
Same for halfspaces. - GopalanKhotSaket07, Viola08 Hard to (even
weakly) learn a parity even using degree-100
GF(2) polynomials as hypotheses - Active area with lots of ongoing work.
17Representation-Independent Hardness
Suppose there are no hypothesis restrictions
any poly-size circuit OK. Are there learning
problems that are still hard for computational
reasons?
Yes
- Valiant84 Existence of pseudorandom functions
GoldreichGoldwasserMicali84 implies that
general Boolean circuits are (representation-indep
endently) hard to learn.
18PKC and hardness of learning
- Key insight of KearnsValiant89 Public-key
cryptosystems ? hard-to-learn functions. - Adversary can create labeled examplesof
by herselfso must not be learnable from
labeled examples, or else cryptosystem would be
insecure! - Theorem KearnsValiant89 Simple classes of
functions NC1, TC0, poly-size DFAs are
inherently hard to learn.
Theorem Regev05, KlivansSherstov06 Really
simple functions poly-size OR of halfspaces
are inherently hard to learn. Closing the gap
Can these results be extended to show that DNF
are inherently hard to learn? Or are DNF
efficiently learnable?
19Efficient learnability Model and Results
- Valiant
- provided an elegant model for the computational
study of learning - followed this up with foundational results on
what is (and isnt) efficiently learnable - These fundamental questions continue to be
intensively studied and cross-fertilize other
topics in TCS.
Thank you, Les!