A%20PAC-Bayes%20Risk%20Bound%20for%20General%20Loss%20Functions

About This Presentation

Title:

Description:

Number of Views:44

Avg rating:3.0/5.0

Slides: 20

Provided by: franoisla

Category:

more less

Transcript and Presenter's Notes

Title: A%20PAC-Bayes%20Risk%20Bound%20for%20General%20Loss%20Functions

1
A PAC-Bayes Risk Bound for General Loss Functions

2
Summary

We provide a (tight) PAC-Bayesian bound for the
expected loss of convex combinations of
classifiers under a wide class of loss functions
like the exponential loss and the logistic loss.
Experiments with Adaboost indicate that the upper
bound (computed on the training set) behaves very
similarly as the true loss (estimated on the
testing set).

3
Convex Combinations of Classifiers

Consider any set H of -1, 1-valued classifiers
and any posterior Q on H .
For any input example x, the -1,1-valued
output fQ(x) of a convex combination of
classifiers is given by

4
The Margin and WQ(x,y)

WQ(x,y) is the fraction, under measure Q, of
classifiers that err on example (x,y)
It is relate to the margin y fQ(x) by

5
General Loss Functions ?Q(x,y)

Hence, we consider any loss function ?Q(x,y) that
can be written as a Taylor series
and our task is to provide tight bounds for the
expected loss ?Q that depend on the empirical
loss measured on a training set of m examples,
where

6
Bounds for the Majority Vote

7
A PAC-Bayes Bound on ?Q
8
Proof

9
Proof (cnt.)

10
Proof (ctn.)

11
Proof (ctn.)

The standard PAC-Bayes theorem implies that for
any prior on H k2 N Hk , we have
Our theorem follows for any having the same
structure of (i.e k is first chosen
according to g(k)/c, then k classifiers are
chosen accord. to Pk) since, in that case, we
have

12
Remark

Since we have
any looseness in the bound for R(GQ) will be
amplified by c on the bound for ?Q.
Hence, the bound on ?Q can be tight only for
small c.
This is the case for ?Q(x,y) fQ(x) yr since
we have c 1 for r 1 and c 3 for r 2.

13
Bound Behavior During Adaboost

Here H is the set of decision stumps. The output
h(x) of decision stump h on attribute x with
threshold t is given by h(x) ? sgn(x-t) .
If P(h) 1/H ? h?H, then
H(Q) generally increases at each boosting round

14
Results for the Exponential Loss

15
Exponential Loss Results (ctn.)
16
Exponential Loss Results (ctn.)
17
Results for the Sigmoid Loss

For this loss function, we have
The Taylor series for tanh(x) converges only
for x lt ?/2. We are thus limited to ? lt
?/2.

18
Sigmoid Loss Results (ctn.)
19
Conclusion

We have obtained PAC-Bayesian risk bounds for any
loss function ?Q having a convergent Taylor
expansion around WQ ½.
The bound is tight only for small c.
On Adaboost, the loss bound is basically parallel
to the true loss.

Write a Comment

User Comments (0)