Probably Approximately Correct Model (PAC) - PowerPoint PPT Presentation

About This Presentation
Title:

Probably Approximately Correct Model (PAC)

Description:

... and sample size. Occam Razor ... Occam: large enough m has size(h) m. Option 3 (MDL): A sends B a ... is a cover C' of size k. C' is a cover for ... – PowerPoint PPT presentation

Number of Views:129
Avg rating:3.0/5.0
Slides: 44
Provided by: Compu428
Category:

less

Transcript and Presenter's Notes

Title: Probably Approximately Correct Model (PAC)


1
Probably Approximately Correct Model (PAC)
2
Example (PAC)
  • Concept Average body-size person
  • Inputs for each person
  • height
  • weight
  • Sample labeled examples of persons
  • label average body-size
  • label - not average body-size
  • Two dimensional inputs

3
(No Transcript)
4
Example (PAC)
  • Assumption target concept is a rectangle.
  • Goal
  • Find a rectangle that approximate the target.
  • Formally
  • With high probability
  • output a rectangle such that
  • its error is low.

5
Example (Modeling)
  • Assume
  • Fixed distribution over persons.
  • Goal
  • Low error with respect to THIS distribution!!!
  • How does the distribution look like?
  • Highly complex.
  • Each parameter is not uniform.
  • Highly correlated.

6
Model Based approach
  • First try to model the distribution.
  • Given a model of the distribution
  • find an optimal decision rule.
  • Bayesian Learning

7
PAC approach
  • Assume that the distribution is fixed.
  • Samples are drawn are i.i.d.
  • independent
  • identical
  • Concentrate on the decision rule rather than
    distribution.

8
PAC Learning
  • Task learn a rectangle from examples.
  • Input point (x,y) and classification or -
  • classifies by a rectangle R
  • Goal
  • in the fewest examples
  • compute R
  • R is a good approximation for R

9
PAC Learning Accuracy
  • Testing the accuracy of a hypothesis
  • using the distribution D of examples.
  • Error R D R
  • PrError D(Error) D(R D R)
  • We would like PrError to be controllable.
  • Given a parameter e
  • Find R such that PrError lt e.

10
PAC Learning Hypothesis
  • Which Rectangle should we choose?

11
Setting up the Analysis
  • Choose smallest rectangle.
  • Need to show
  • For any distribution D and Rectangle R
  • input parameters e and d
  • Select m(e,d) examples.
  • Let R be the smallest consistent rectangle.
  • With probability 1-d
  • D(R D R) lt e

12
Analysis
  • Note that R ? R, therefore R D R R - R

R
R
13
Analysis (cont.)
  • By Definition D(Tu) e/4
  • Compute the probability thatTu ? Tu
  • PrD(x,y) in Tu e/4
  • Probability of NO example in Tu
  • For m examples (1-e/4)m lt e-e m/4
  • Failure probability 4 e-e m/4 lt d
  • Sample bound m gt (4/e) ln (4/d)

14
PAC comments
  • We only assumed that examples are i.i.d.
  • We have two independent parameters
  • Accuracy e
  • Confidence d
  • No assumption about the likelihood of rectangles.
  • Hypothesis is tested on the same distribution as
    the sample.

15
PAC model Setting
  • A distribution D (unknown)
  • Target function ct from C
  • ct X ? 0,1
  • Hypothesis h from H
  • h X ? 0,1
  • Error probability
  • error(h) ProbDh(x)? ct(x)
  • Oracle EX(ct,D)

16
PAC Learning Definition
  • C and H are concept classes over X.
  • C is PAC learnable by H if
  • There Exist an Algorithm A such that
  • For any distribution D over X and ct in C
  • for every input e and d
  • outputs a hypothesis h in H,
  • while having access to EX(ct,D)
  • with probability 1-d we have error(h) lt e
  • Complexities.

17
Finite Concept class
  • Assume CH and finite.
  • h is e-bad if error(h)gt e.
  • Algorithm
  • Sample a set S of m(e,d) examples.
  • Find h in H which is consistent.
  • Algorithm fails if h is e-bad.

18
Analysis
  • Assume hypothesis g is e-bad.
  • The probability that g is consistent
  • Prg consistent ? (1-e)m lt e- em
  • The probability that there exists
  • g is e-bad and consistent
  • H Prg consistent and e-bad ? H e- em
  • Sample size
  • m gt (1/e) ln (H/d)

19
PAC non-feasible case
  • What happens if ct not in H
  • Needs to redefine the goal.
  • Let h in H minimize the error berror(h)
  • Goal find h in H such that
  • error(h) ? error(h) e be

20
Analysis
  • For each h in H
  • let obs-error(h) be the error on the sample S.
  • Compute the probability that
  • obs-error(h) - error(h) lt e/2
  • Chernoff bound exp(-(e/2)2m)
  • Consider entire H H exp(-(e/2)2m)
  • Sample size
  • m gt (4/e2) ln (H/d)

21
Correctness
  • Assume that for all h in H
  • obs-error(h) - error(h) lt e/2
  • In particular
  • obs-error(h) lt error(h) e/2
  • error(h) -e/2 lt obs-error(h)
  • For the output h
  • obs-error(h) lt obs-error(h)
  • Conclusion error(h) lt error(h)e

22
Example Learning OR of literals
  • Inputs x1, , xn
  • Literals x1,
  • OR functions
  • Number of functions?

3n
23
ELIM Algorithm for learning OR
  • Keep a list of all literals
  • For every example whose classification is 0
  • Erase all the literals that are 1.
  • Example
  • Correctness
  • Our hypothesis h An OR of our set of literals.
  • Our set of literals includes the target OR
    literals.
  • Every time h predicts zero we are correct.
  • Sample size m gt (1/e) ln (3n/d)

24
Learning parity
  • Functions x1 ? x7 ? x9
  • Number of functions 2n
  • Algorithm
  • Sample set of examples
  • Solve linear equations
  • Sample size m gt (1/e) ln (2n/d)

25
Infinite Concept class
  • X0,1 and Hcq q in 0,1
  • cq(x) 0 iff x lt q
  • Assume CH
  • Which cq should we choose in min,max?

26
Proof I
  • Show that the probability that
  • Pr D(min,max) gt e lt d
  • Proof By Contradiction.
  • The probability that x in min,max at least e
  • The probability we do not sample from min,max
  • Is (1-e)m
  • Needs m gt (1/e) ln (1/d)

Whats WRONG ?!
27
Proof II (correct)
  • Let max be D(q,max)e/2
  • Let min be D(q,min)e/2
  • Goal Show that with high probability
  • X in max,q and
  • X- in q,min
  • In such a case any value in x-,x is good.
  • Compute sample size!

28
Non-Feasible case
  • Suppose we sample
  • Algorithm
  • Find the function h with lowest error!

29
Analysis
  • Define zi as a e/4 - net (w.r.t. D)
  • For the optimal h and our h there are
  • zj error(hzj) - error(h) lt e/4
  • zk error(hzk) - error(h) lt e/4
  • Show that with high probability
  • obs-error(hzi) -error(hzi) lt e/4
  • Completing the proof.
  • Computing the sample size.

30
General e-net approach
  • Given a class H define a class G
  • For every h in H
  • There exist a g in G such that
  • D(g D h) lt e/4
  • Algorithm Find the best h in H.
  • Computing the confidence and sample size.

31
Occam Razor
  • Finding the shortest consistent hypothesis.
  • Definition (a,b)-Occam algorithm
  • a gt0 and b lt1
  • Input a sample S of size m
  • Output hypothesis h
  • for every (x,b) in S h(x)b
  • size(h) lt sizea(ct) mb
  • Efficiency.

32
Occam algorithm and compression
A
B
S (xi,bi)
x1, , xm
33
compression
  • Option 1
  • A sends B the values b1 , , bm
  • m bits of information
  • Option 2
  • A sends B the hypothesis h
  • Occam large enough m has size(h) lt m
  • Option 3 (MDL)
  • A sends B a hypothesis h and corrections
  • complexity size(h) size(errors)

34
Occam Razor Theorem
  • A (a,b)-Occam algorithm for C using H
  • D distribution over inputs X
  • ct in C the target function, nsize(ct)
  • Sample size
  • with probability 1-d A(S)h has error(h) lt e

35
Occam Razor Theorem
  • Use the bound for finite hypothesis class.
  • Effective hypothesis class size 2size(h)
  • size(h) lt na mb
  • Sample size

36
Learning OR with few attributes
  • Target function OR of k literals
  • Goal learn in time
  • polynomial in k and log n
  • e and d constant
  • ELIM makes slow progress
  • disqualifies one literal per round
  • May remain with O(n) literals

37
Set Cover - Definition
  • Input S1 , , St and Si ? U
  • Output Si1, , Sik and ?j SjkU
  • Question Are there k sets that cover U?
  • NP-complete

38
Set Cover Greedy algorithm
  • j0 UjU C?
  • While Uj ? ?
  • Let Si be arg max Si ? Uj
  • Add Si to C
  • Let Uj1 Uj Si
  • j j1

39
Set Cover Greedy Analysis
  • At termination, C is a cover.
  • Assume there is a cover C of size k.
  • C is a cover for every Uj
  • Some S in C covers Uj/k elements of Uj
  • Analysis of Uj Uj1 ? Uj - Uj/k
  • Solving the recursion.
  • Number of sets j lt k ln U

40
Building an Occam algorithm
  • Given a sample S of size m
  • Run ELIM on S
  • Let LIT be the set of literals
  • There exists k literals in LIT that classify
    correctly all S
  • Negative examples
  • any subset of LIT classifies theme correctly

41
Building an Occam algorithm
  • Positive examples
  • Search for a small subset of LIT
  • Which classifies S correctly
  • For a literal z build Tzx z satisfies x
  • There are k sets that cover S
  • Find k ln m sets that cover S
  • Output h the OR of the k ln m literals
  • Size (h) lt k ln m log 2n
  • Sample size m O( k log n log (k log n))

42
Summary
  • PAC model
  • Confidence and accuracy
  • Sample size
  • Finite (and infinite) concept class
  • Occam Razor

43
Learning algorithms
  • OR function
  • Parity function
  • OR of a few literals
  • Open problems
  • OR in the non-feasible case
  • Parity of a few literals
Write a Comment
User Comments (0)
About PowerShow.com