A%20PAC-Bayes%20Risk%20Bound%20for%20General%20Loss%20Functions - PowerPoint PPT Presentation

About This Presentation
Title:

A%20PAC-Bayes%20Risk%20Bound%20for%20General%20Loss%20Functions

Description:

1. A PAC-Bayes Risk Bound for General Loss Functions. NIPS 2006 ... We provide a (tight) PAC-Bayesian bound for the expected loss of convex ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 20
Provided by: franoisla
Category:

less

Transcript and Presenter's Notes

Title: A%20PAC-Bayes%20Risk%20Bound%20for%20General%20Loss%20Functions


1
A PAC-Bayes Risk Bound for General Loss Functions
  • NIPS 2006
  • Pascal Germain, Alexandre Lacasse, François
    Laviolette, Mario Marchand
  • Université Laval, Québec, Canada

2
Summary
  • We provide a (tight) PAC-Bayesian bound for the
    expected loss of convex combinations of
    classifiers under a wide class of loss functions
  • like the exponential loss and the logistic loss.
  • Experiments with Adaboost indicate that the upper
    bound (computed on the training set) behaves very
    similarly as the true loss (estimated on the
    testing set).

3
Convex Combinations of Classifiers
  • Consider any set H of -1, 1-valued classifiers
    and any posterior Q on H .
  • For any input example x, the -1,1-valued
    output fQ(x) of a convex combination of
    classifiers is given by

4
The Margin and WQ(x,y)
  • WQ(x,y) is the fraction, under measure Q, of
    classifiers that err on example (x,y)
  • It is relate to the margin y fQ(x) by

5
General Loss Functions ?Q(x,y)
  • Hence, we consider any loss function ?Q(x,y) that
    can be written as a Taylor series
  • and our task is to provide tight bounds for the
    expected loss ?Q that depend on the empirical
    loss measured on a training set of m examples,
    where

6
Bounds for the Majority Vote
  • A bound on ?Q also provides a bound on the
    majority vote since

7
A PAC-Bayes Bound on ?Q
8
Proof
  • where h1-k denotes the product of k classifiers.
    Hence

9
Proof (cnt.)
  • Let us define the error rate R(h1-k ) as
  • to relate ?Q to the error rate of a new Gibbs
    classifier

10
Proof (ctn.)
  • Where is a distribution over products of
    classifiers that works as follows
  • A number k is chosen according to
  • k classifiers in H are chosen according to Qk
  • So denotes the risk of this Gibbs
    classifier

11
Proof (ctn.)
  • The standard PAC-Bayes theorem implies that for
    any prior on H k2 N Hk , we have
  • Our theorem follows for any having the same
    structure of (i.e k is first chosen
    according to g(k)/c, then k classifiers are
    chosen accord. to Pk) since, in that case, we
    have

12
Remark
  • Since we have
  • any looseness in the bound for R(GQ) will be
    amplified by c on the bound for ?Q.
  • Hence, the bound on ?Q can be tight only for
    small c.
  • This is the case for ?Q(x,y) fQ(x) yr since
    we have c 1 for r 1 and c 3 for r 2.

13
Bound Behavior During Adaboost
  • Here H is the set of decision stumps. The output
    h(x) of decision stump h on attribute x with
    threshold t is given by h(x) ? sgn(x-t) .
  • If P(h) 1/H ? h?H, then
  • H(Q) generally increases at each boosting round

14
Results for the Exponential Loss
  • For this loss function, we have
  • Since c increases exponentially rapidly with ?,
    so will the risk bound.

15
Exponential Loss Results (ctn.)
16
Exponential Loss Results (ctn.)
17
Results for the Sigmoid Loss
  • For this loss function, we have
  • The Taylor series for tanh(x) converges only
    for x lt ?/2. We are thus limited to ? lt
    ?/2.

18
Sigmoid Loss Results (ctn.)
19
Conclusion
  • We have obtained PAC-Bayesian risk bounds for any
    loss function ?Q having a convergent Taylor
    expansion around WQ ½.
  • The bound is tight only for small c.
  • On Adaboost, the loss bound is basically parallel
    to the true loss.
Write a Comment
User Comments (0)
About PowerShow.com