VC dimension and Bootstrap method - PowerPoint PPT Presentation

1 / 26

About This Presentation

Title:

VC dimension and Bootstrap method

Description:

1. One difficulty in using estimates of in-sample error is the need to specify ... Data available comparing grades of GPA and LSAT. ... – PowerPoint PPT presentation

Number of Views:40

Avg rating:3.0/5.0

Slides: 27

Provided by: umiac7

Category:

more less

Transcript and Presenter's Notes

Title: VC dimension and Bootstrap method

1
VC dimension and Bootstrap method

Elements of Statistical Learning
Graduate Seminar (ENEE698A)
Presented by Xue Mei
Oct. 15, 2003

2
VC dimension

Why we use VC dimension?
1. One difficulty in using estimates of
in-sample error is the need to specify the number
of parameters (or the complexity) d
used in the fit.
2. Although we have the d(S) trace(S) to
estimate, it is still not fully general.
3. VC theory provides such a general measure
of complexity, and gives associated
bounds on the optimism.

3
Shattering

A set of instances S is shattered by a hypothesis
H space if for every dichotomy of S there is a
consistent hypothesis in H

4
Example Shattering
Is this set of points shattered by the hypothesis
space H of all circles?
5
Example Shattering
6
Is this set of points shattered by circles?
7
How About This One?
8
The VC Dimension VC

VC dimension of a set of indicator functions
The VC dimension of a set of indicator
function is the maximum
number h of vectors that can be
separated into two classes in all possible ways
using
functions of the set.
VC dimension of a set of real-value functions
The VC dimension of a class of real-valued
functions is defined to be the VC
dimension of the indicator class
where takes values over the range of g.
i.e.
The VC dimension of the class is
defined to be the largest number of points that
can be shattered by member of

9
The first three panels show that the class of
lines in the plane can shatter three points. The
last panel shows that this class cannot shatter
four points. Hence the VC dimension is three.
10
Example Circles

VC 3, since 3 points can be shattered but not 4

11
Is there an H with VC ? ?
12
Estimation of in-sample error using VC dimension

If we fit N training points using a class of
such a class of functions have VC
dimension h, then with probability at least
over training sets
For binary classification
For regression
Where

13
How to choose a?

For classification, they make no recommendation,
with a1 4, a2 2 corresponding to worst-case
scenarios
For regression they suggest a1 a2 1

14
How to use VC dimension

Vapniks structural risk minimization approach
fits a nested sequence of models of increasing VC
dimensions and then chooses the
model with the smallest value of the upper bound.

15
Basic idea of Bootstrap

Originally, from some list of data, one computes
an object.
Create an artificial list by randomly drawing
elements from that list. Some elements will be
picked more than once.
Compute a new object.
Repeat 100-1000 times and look at the
distribution of these objects.

16
Bootstrap
17
A simple example

Data available comparing grades of GPA and LSAT.
Some linear correlation between grades (high GPA
usually means high LSAT). r0.776
But how reliable is this result ? (whats the
probability of the result?)

18
(No Transcript)
19

In Mathematical terms
Construct the following sequence
(for example)

is called a bootstrap subsample
Call an ensemble of bootstrap
subsamples
The number of elements in each bootstrap
subsample is the same as the original set, so
some elements are repeated.
Having computed ,one can now
histogram it, and get an idea of the probability
distribution. Hence, one can get an idea of the
variance, skewness

21
(No Transcript)
22
Determine the standard deviation

Suppose we have an experiment with measurements .
we can compute the average
and its standard deviation.
But what to do when the statistic we are
observing is something else than the mean, e.g.
the median? Here The bootstrap comes in. We use
to estimate the standard error (the square
root of the variability of its expectation )
Where

23
How many bootstraps ?

No clear answer to this. Lots of theorems on
asymptotic convergence, but no real estimates !
Rule of thumb try it 100 times, then 1000
times, and see if your answers have changed by
much.
Anyway have NN possible subsamples

24
Is it reliable ?

Good agreement for Normal (Gaussian)
distributions, skewed distributions tend to more
problematic, particularly for the tails, (boot
strap underestimates the errors).

25
Summary of the Bootstrap method

Original object O (a tree, a best fit...) is
computed from a list of data (numbers,
sequences, microarray data,.).
Construct a new list, with the same number of
elements, from the original list by randomly
picking elements from the list. Any one element
from the list can be picked any number of times.
Compute new object, call it O1
Repeat the process many times (typically
100-1000).
The elements O1 , O2 , are assumed to be
taken from a statistical distribution, so one can
compute averages, variances, etc.

26
Conclusions