VC dimension and Bootstrap method - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

VC dimension and Bootstrap method

Description:

1. One difficulty in using estimates of in-sample error is the need to specify ... Data available comparing grades of GPA and LSAT. ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 27
Provided by: umiac7
Category:

less

Transcript and Presenter's Notes

Title: VC dimension and Bootstrap method


1
VC dimension and Bootstrap method
  • Elements of Statistical Learning
  • Graduate Seminar (ENEE698A)
  • Presented by Xue Mei
  • Oct. 15, 2003

2
VC dimension
  • Why we use VC dimension?
  • 1. One difficulty in using estimates of
    in-sample error is the need to specify the number
    of parameters (or the complexity) d
  • used in the fit.
  • 2. Although we have the d(S) trace(S) to
  • estimate, it is still not fully general.
  • 3. VC theory provides such a general measure
    of complexity, and gives associated
  • bounds on the optimism.

3
Shattering
  • A set of instances S is shattered by a hypothesis
    H space if for every dichotomy of S there is a
    consistent hypothesis in H

4
Example Shattering
Is this set of points shattered by the hypothesis
space H of all circles?
5
Example Shattering
6
Is this set of points shattered by circles?
7
How About This One?
8
The VC Dimension VC
  • VC dimension of a set of indicator functions
  • The VC dimension of a set of indicator
    function is the maximum
    number h of vectors that can be
    separated into two classes in all possible ways
    using
  • functions of the set.
  • VC dimension of a set of real-value functions
  • The VC dimension of a class of real-valued
    functions is defined to be the VC
    dimension of the indicator class
  • where takes values over the range of g.
  • i.e.
  • The VC dimension of the class is
    defined to be the largest number of points that
    can be shattered by member of

9
The first three panels show that the class of
lines in the plane can shatter three points. The
last panel shows that this class cannot shatter
four points. Hence the VC dimension is three.
10
Example Circles
  • VC 3, since 3 points can be shattered but not 4

11
Is there an H with VC ? ?
12
Estimation of in-sample error using VC dimension
  • If we fit N training points using a class of
    such a class of functions have VC
    dimension h, then with probability at least
    over training sets
  • For binary classification
  • For regression
  • Where

13
How to choose a?
  • For classification, they make no recommendation,
    with a1 4, a2 2 corresponding to worst-case
    scenarios
  • For regression they suggest a1 a2 1

14
How to use VC dimension
  • Vapniks structural risk minimization approach
    fits a nested sequence of models of increasing VC
    dimensions and then chooses the
    model with the smallest value of the upper bound.

15
Basic idea of Bootstrap
  • Originally, from some list of data, one computes
    an object.
  • Create an artificial list by randomly drawing
    elements from that list. Some elements will be
    picked more than once.
  • Compute a new object.
  • Repeat 100-1000 times and look at the
    distribution of these objects.

16
Bootstrap
17
A simple example
  • Data available comparing grades of GPA and LSAT.
  • Some linear correlation between grades (high GPA
    usually means high LSAT). r0.776
  • But how reliable is this result ? (whats the
    probability of the result?)

18
(No Transcript)
19
  • In Mathematical terms
  • Construct the following sequence
  • (for example)

20
  • is called a bootstrap subsample
  • Call an ensemble of bootstrap
    subsamples
  • The number of elements in each bootstrap
    subsample is the same as the original set, so
    some elements are repeated.
  • Having computed ,one can now
    histogram it, and get an idea of the probability
    distribution. Hence, one can get an idea of the
    variance, skewness

21
(No Transcript)
22
Determine the standard deviation
  • Suppose we have an experiment with measurements .
    we can compute the average
    and its standard deviation.
    But what to do when the statistic we are
    observing is something else than the mean, e.g.
    the median? Here The bootstrap comes in. We use
  • to estimate the standard error (the square
    root of the variability of its expectation )
  • Where

23
How many bootstraps ?
  • No clear answer to this. Lots of theorems on
    asymptotic convergence, but no real estimates !
  • Rule of thumb try it 100 times, then 1000
    times, and see if your answers have changed by
    much.
  • Anyway have NN possible subsamples

24
Is it reliable ?
  • Good agreement for Normal (Gaussian)
    distributions, skewed distributions tend to more
    problematic, particularly for the tails, (boot
    strap underestimates the errors).

25
Summary of the Bootstrap method
  • Original object O (a tree, a best fit...) is
    computed from a list of data (numbers,
    sequences, microarray data,.).
  • Construct a new list, with the same number of
    elements, from the original list by randomly
    picking elements from the list. Any one element
    from the list can be picked any number of times.
  • Compute new object, call it O1
  • Repeat the process many times (typically
    100-1000).
  • The elements O1 , O2 , are assumed to be
    taken from a statistical distribution, so one can
    compute averages, variances, etc.

26
Conclusions
  • VC dimension and bootstrap method play important
    roles in statistics learning theory.
  • Bootstrap is also very useful in Video and
    acoustic tracking.
Write a Comment
User Comments (0)
About PowerShow.com