Probably Approximately Correct Model (PAC) - PowerPoint PPT Presentation

About This Presentation

Title:

Probably Approximately Correct Model (PAC)

Description:

... and sample size. Occam Razor ... Occam: large enough m has size(h) m. Option 3 (MDL): A sends B a ... is a cover C' of size k. C' is a cover for ... – PowerPoint PPT presentation

Number of Views:129

Avg rating:3.0/5.0

Slides: 44

Provided by: Compu428

Category:

more less

Transcript and Presenter's Notes

Title: Probably Approximately Correct Model (PAC)

1
Probably Approximately Correct Model (PAC)
2
Example (PAC)

Concept Average body-size person
Inputs for each person
height
weight
Sample labeled examples of persons
label average body-size
label - not average body-size
Two dimensional inputs

3
(No Transcript)
4
Example (PAC)

Assumption target concept is a rectangle.
Goal
Find a rectangle that approximate the target.
Formally
With high probability
output a rectangle such that
its error is low.

5
Example (Modeling)

Assume
Fixed distribution over persons.
Goal
Low error with respect to THIS distribution!!!
How does the distribution look like?
Highly complex.
Each parameter is not uniform.
Highly correlated.

6
Model Based approach

First try to model the distribution.
Given a model of the distribution
find an optimal decision rule.
Bayesian Learning

7
PAC approach

Assume that the distribution is fixed.
Samples are drawn are i.i.d.
independent
identical
Concentrate on the decision rule rather than
distribution.

8
PAC Learning

Task learn a rectangle from examples.
Input point (x,y) and classification or -
classifies by a rectangle R
Goal
in the fewest examples
compute R
R is a good approximation for R

9
PAC Learning Accuracy

Testing the accuracy of a hypothesis
using the distribution D of examples.
Error R D R
PrError D(Error) D(R D R)
We would like PrError to be controllable.
Given a parameter e
Find R such that PrError lt e.

10
PAC Learning Hypothesis

Which Rectangle should we choose?

11
Setting up the Analysis

Choose smallest rectangle.
Need to show
For any distribution D and Rectangle R
input parameters e and d
Select m(e,d) examples.
Let R be the smallest consistent rectangle.
With probability 1-d
D(R D R) lt e

12
Analysis

Note that R ? R, therefore R D R R - R

R
R
13
Analysis (cont.)

By Definition D(Tu) e/4
Compute the probability thatTu ? Tu
PrD(x,y) in Tu e/4
Probability of NO example in Tu
For m examples (1-e/4)m lt e-e m/4
Failure probability 4 e-e m/4 lt d
Sample bound m gt (4/e) ln (4/d)

14
PAC comments

We only assumed that examples are i.i.d.
We have two independent parameters
Accuracy e
Confidence d
No assumption about the likelihood of rectangles.
Hypothesis is tested on the same distribution as
the sample.

15
PAC model Setting

A distribution D (unknown)
Target function ct from C
ct X ? 0,1
Hypothesis h from H
h X ? 0,1
Error probability
error(h) ProbDh(x)? ct(x)
Oracle EX(ct,D)

16
PAC Learning Definition

C and H are concept classes over X.
C is PAC learnable by H if
There Exist an Algorithm A such that
For any distribution D over X and ct in C
for every input e and d
outputs a hypothesis h in H,
while having access to EX(ct,D)
with probability 1-d we have error(h) lt e
Complexities.

17
Finite Concept class

Assume CH and finite.
h is e-bad if error(h)gt e.
Algorithm
Sample a set S of m(e,d) examples.
Find h in H which is consistent.
Algorithm fails if h is e-bad.

18
Analysis

Assume hypothesis g is e-bad.
The probability that g is consistent
Prg consistent ? (1-e)m lt e- em
The probability that there exists
g is e-bad and consistent
H Prg consistent and e-bad ? H e- em
Sample size
m gt (1/e) ln (H/d)

19
PAC non-feasible case

What happens if ct not in H
Needs to redefine the goal.
Let h in H minimize the error berror(h)
Goal find h in H such that
error(h) ? error(h) e be

20
Analysis

For each h in H
let obs-error(h) be the error on the sample S.
Compute the probability that
obs-error(h) - error(h) lt e/2
Chernoff bound exp(-(e/2)2m)
Consider entire H H exp(-(e/2)2m)
Sample size
m gt (4/e2) ln (H/d)

21
Correctness

Assume that for all h in H
obs-error(h) - error(h) lt e/2
In particular
obs-error(h) lt error(h) e/2
error(h) -e/2 lt obs-error(h)
For the output h
obs-error(h) lt obs-error(h)
Conclusion error(h) lt error(h)e

22
Example Learning OR of literals

Inputs x1, , xn
Literals x1,
OR functions
Number of functions?

3n
23
ELIM Algorithm for learning OR

Keep a list of all literals
For every example whose classification is 0
Erase all the literals that are 1.
Example
Correctness
Our hypothesis h An OR of our set of literals.
Our set of literals includes the target OR
literals.
Every time h predicts zero we are correct.
Sample size m gt (1/e) ln (3n/d)

24
Learning parity

Functions x1 ? x7 ? x9
Number of functions 2n
Algorithm
Sample set of examples
Solve linear equations
Sample size m gt (1/e) ln (2n/d)

25
Infinite Concept class

X0,1 and Hcq q in 0,1
cq(x) 0 iff x lt q
Assume CH
Which cq should we choose in min,max?

26
Proof I

Show that the probability that
Pr D(min,max) gt e lt d
Proof By Contradiction.
The probability that x in min,max at least e
The probability we do not sample from min,max
Is (1-e)m
Needs m gt (1/e) ln (1/d)

Whats WRONG ?!
27
Proof II (correct)

Let max be D(q,max)e/2
Let min be D(q,min)e/2
Goal Show that with high probability
X in max,q and
X- in q,min
In such a case any value in x-,x is good.
Compute sample size!

28
Non-Feasible case

Suppose we sample

Algorithm
Find the function h with lowest error!

29
Analysis

Define zi as a e/4 - net (w.r.t. D)
For the optimal h and our h there are
zj error(hzj) - error(h) lt e/4
zk error(hzk) - error(h) lt e/4
Show that with high probability
obs-error(hzi) -error(hzi) lt e/4
Completing the proof.
Computing the sample size.

30
General e-net approach

Given a class H define a class G
For every h in H
There exist a g in G such that
D(g D h) lt e/4
Algorithm Find the best h in H.
Computing the confidence and sample size.

31
Occam Razor

Finding the shortest consistent hypothesis.
Definition (a,b)-Occam algorithm
a gt0 and b lt1
Input a sample S of size m
Output hypothesis h
for every (x,b) in S h(x)b
size(h) lt sizea(ct) mb
Efficiency.

32
Occam algorithm and compression
A
B
S (xi,bi)
x1, , xm
33
compression

Option 1
A sends B the values b1 , , bm
m bits of information
Option 2
A sends B the hypothesis h
Occam large enough m has size(h) lt m
Option 3 (MDL)
A sends B a hypothesis h and corrections
complexity size(h) size(errors)

34
Occam Razor Theorem

A (a,b)-Occam algorithm for C using H
D distribution over inputs X
ct in C the target function, nsize(ct)
Sample size
with probability 1-d A(S)h has error(h) lt e

35
Occam Razor Theorem

Use the bound for finite hypothesis class.
Effective hypothesis class size 2size(h)
size(h) lt na mb
Sample size

36
Learning OR with few attributes

Target function OR of k literals
Goal learn in time
polynomial in k and log n
e and d constant
ELIM makes slow progress
disqualifies one literal per round
May remain with O(n) literals

37
Set Cover - Definition

Input S1 , , St and Si ? U
Output Si1, , Sik and ?j SjkU
Question Are there k sets that cover U?
NP-complete

38
Set Cover Greedy algorithm

j0 UjU C?
While Uj ? ?
Let Si be arg max Si ? Uj
Add Si to C
Let Uj1 Uj Si
j j1

39
Set Cover Greedy Analysis

At termination, C is a cover.
Assume there is a cover C of size k.
C is a cover for every Uj
Some S in C covers Uj/k elements of Uj
Analysis of Uj Uj1 ? Uj - Uj/k
Solving the recursion.
Number of sets j lt k ln U

40
Building an Occam algorithm

Given a sample S of size m
Run ELIM on S
Let LIT be the set of literals
There exists k literals in LIT that classify
correctly all S
Negative examples
any subset of LIT classifies theme correctly

41
Building an Occam algorithm

Positive examples
Search for a small subset of LIT
Which classifies S correctly
For a literal z build Tzx z satisfies x
There are k sets that cover S
Find k ln m sets that cover S
Output h the OR of the k ln m literals
Size (h) lt k ln m log 2n
Sample size m O( k log n log (k log n))

42
Summary

PAC model
Confidence and accuracy
Sample size
Finite (and infinite) concept class
Occam Razor

43
Learning algorithms

OR function
Parity function
OR of a few literals
Open problems
OR in the non-feasible case
Parity of a few literals

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Introduction to Machine Learning PowerPoint PPT Presentation

Introduction to Machine Learning - Called distribution-free. PAC Bounds: Problems. The PAC bound is ... Results in good feature weights. Regularized Log-linear Models. Regularization = smoothing ... | PowerPoint PPT presentation | free to view

1norm Support Vector Machines Good for Feature Selection PowerPoint PPT Presentation

1norm Support Vector Machines Good for Feature Selection - Quadratic -insensitive loss function: - insensitive Support Vector Regression Model ... Occam's razor: the simplest is the best ... | PowerPoint PPT presentation | free to view

On generalization bounds, projection profile, and margin distribution - Conclusion (Mine) They did not apply projection profile technique in real experiments! ... Conclusion (Mine) With new theoretical bounds, we could apply it to ... | PowerPoint PPT presentation | free to view

Support Vector Machines (SVMs) PowerPoint PPT Presentation

Support Vector Machines (SVMs) - Support Vector Machines (SVMs) Learning mechanism based on linear programming ... See, for example, Chapter 4 of Support Vector Machines by Christianini and Shawe ... | PowerPoint PPT presentation | free to view

A Survey of Boosting HMM Acoustic Model Training PowerPoint PPT Presentation

A Survey of Boosting HMM Acoustic Model Training - In bagging, generating complementary base-learners is left to chance and to the ... In each round, bagging randomly selects a number of examples from the original ... | PowerPoint PPT presentation | free to view

Can Machine Learning Be Secure PowerPoint PPT Presentation

Can Machine Learning Be Secure - Causative. Alter training process. 7. Attack models: Specificity. Indiscriminate ' ... Causative Targeted. Availability. Integrity. 10. Simple learning model ... | PowerPoint PPT presentation | free to view

agents' Researchers have addressed this problem by exploiting tools from learning theory' However, s PowerPoint PPT Presentation

agents' Researchers have addressed this problem by exploiting tools from learning theory' However, s - In this paper, we endeavor to remedy these shortcomings by considering PAC ... Heretofore, very few investigations have been devoted to learning in coalition ... | PowerPoint PPT presentation | free to view

Hadron structure and hadronic matter M.Giannini Cortona,13 october 2006 PowerPoint PPT Presentation

Hadron structure and hadronic matter M.Giannini Cortona,13 october 2006 - The description of the spectrum is the first task of a model builder: ... the calculated proton radius is about 0.5 fm ... By CP-PACS if the instanton size is 0.32 fm ... | PowerPoint PPT presentation | free to view

Boosted Lasso PowerPoint PPT Presentation

Boosted Lasso - Boosted Lasso (Blasso) for feature/subset selection. How is feature selection done? ... Boosting was originated from Probability Approximate Correct (PAC) learning ... | PowerPoint PPT presentation | free to view

Learning languages from bounded resources: the case of the DFA and the balls of strings PowerPoint PPT Presentation

Learning languages from bounded resources: the case of the DFA and the balls of strings - de la Higuera, Janodet and Tantini. 2. The authors. Fr d ric Tantini. Jean Christophe. Janodet. Colin de la Higuera. 3. Outline. What and why. Balls and automata ... | PowerPoint PPT presentation | free to view

LEARNING - Hypothesis Testing. and ... Any hypothesis that is consistent with a significantly ... Stationarity assumption: Training set and test sets are drawn from the ... | PowerPoint PPT presentation | free to view

Computational Learning Theory PowerPoint PPT Presentation

Computational Learning Theory - Ch5 Computational Learning Theory. Introduction. Probably Learning ... The optimal mistake bound for C, denoted by Opt(C), defined as minAlearning algMA(C) ... | PowerPoint PPT presentation | free to view

A Survey of Boosting HMM Acoustic Model Training - The No Free Lunch Theorem states that ... decision trees, multilayer perceptrons, condensed nearest neighbor ... company, first and family names. Evaluations: ... | PowerPoint PPT presentation | free to view

CS 391L: Machine Learning: Computational Learning Theory PowerPoint PPT Presentation

CS 391L: Machine Learning: Computational Learning Theory - University of Texas at Austin. 2. Learning Theory ... If there exists at least one subset of X of size d that can be shattered then VC ... | PowerPoint PPT presentation | free to view

A survey on PAC learning PowerPoint PPT Presentation

A survey on PAC learning - The probably approximately correct ( PAC ) learning model is a theory for ... an universal program that emulates the program f given y whose input is z. ... | PowerPoint PPT presentation | free to view

Machine%20Learning:%20Lecture%208 PowerPoint PPT Presentation

Machine%20Learning:%20Lecture%208 - Computational Learning. Theory (Based on Chapter 7 of Mitchell T.., Machine ... algorithms, and let k be the minimum number of mistakes made by any algorithm ... | PowerPoint PPT presentation | free to view

PAC Learning PowerPoint PPT Presentation

PAC Learning - Provides insight into what can be learned, given computational constraints ... Hard: induction of the finite automaton given example strings that it accepts ... | PowerPoint PPT presentation | free to view

Probably Approximately Correct Learning PowerPoint PPT Presentation

Probably Approximately Correct Learning - Then the m points can be shattered. by oriented hyperplanes if and only if ... Can you always shatter three points with a line in. Definition of VC-dimension ... | PowerPoint PPT presentation | free to view

CS 391L: Machine Learning: Computational Learning Theory - A hypothesis space is said to shatter a set of instances iff for every partition ... No five instances can be shattered since there can be at most 4 distinct extreme ... | PowerPoint PPT presentation | free to view

CIS732-Lecture-23-20070308 PowerPoint PPT Presentation

CIS732-Lecture-23-20070308 - Methodology (model parameters): aj, uij, bk, vjk (hyperparameters) ... Hidden layer activation: hj (x) = tanh (aj i uij xi) Classifier Output: Prediction ... | PowerPoint PPT presentation | free to view

Wrapper Induction for Information Extraction PowerPoint PPT Presentation

Wrapper Induction for Information Extraction - Wrapper Induction for Information Extraction. Nicholas Kushmerick ... two termination conditions: (a) we ran the system until the PAC criteria was satisfied ... | PowerPoint PPT presentation | free to view

Low Q2 Measurement of g2 and the ?LT Spin Polarizability PowerPoint PPT Presentation

Low Q2 Measurement of g2 and the ?LT Spin Polarizability - NCG 2006: Used CLAS model assuming 100% error. But, unknown in this region: MAID Model ... Only two descriptions are, in principle, exact ... | PowerPoint PPT presentation | free to view

IN SEARCH FOR THE BEST FORECAST 2 PowerPoint PPT Presentation

IN SEARCH FOR THE BEST FORECAST 2 - Make all forecasts look like model analysis. Independent of lead time ... Most likely value (expected value of forecast pdf) provided only ... | PowerPoint PPT presentation | free to view

CS344 : Introduction to Artificial Intelligence PowerPoint PPT Presentation

CS344 : Introduction to Artificial Intelligence - CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 26- Theoretical Aspect of Learning * * * * * * IIT Bombay ... | PowerPoint PPT presentation | free to view

CS 391L: Machine Learning: Computational Learning Theory - CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin | PowerPoint PPT presentation | free to view

Data Science Training in Hyderabad (1) PowerPoint PPT Presentation

Data Science Training in Hyderabad (1) - Data Science, Statistics with R & Python: This course is an introduction to Data Science and Statistics using the R programming language with Python training in Hyderabad.. It covers both the theoretical aspects of Statistical concepts and the practical implementation using R and Python. If you’re new to Python, don’t worry – the course starts with a crash course. If you’ve done some programming before or you are new in Programming, you should pick it up quickly. This course shows you how to get set up on Microsoft Windows-based PC’s; the sample code will also run on MacOS or Linux desktop systems. | PowerPoint PPT presentation | free to view

Data Science Training in Hyderabad PowerPoint PPT Presentation

Data Science Training in Hyderabad - Data Science, Statistics with R & Python: This course is an introduction to Data Science and Statistics using the R programming language with Python training in Hyderabad.. It covers both the theoretical aspects of Statistical concepts and the practical implementation using R and Python. If you’re new to Python, don’t worry – the course starts with a crash course. If you’ve done some programming before or you are new in Programming, you should pick it up quickly. This course shows you how to get set up on Microsoft Windows-based PC’s; the sample code will also run on MacOS or Linux desktop systems. | PowerPoint PPT presentation | free to view