Bootstrap - PowerPoint PPT Presentation

1 / 41

About This Presentation

Title:

Bootstrap

Description:

Bootstrap Chingchun Huang ( ) Vision Lab, NCTU Introduction A data-based simulation method For statistical inference finding estimators of the parameter in ... – PowerPoint PPT presentation

Number of Views:373

Avg rating:3.0/5.0

Slides: 42

Provided by: edut1550

Category:

more less

Transcript and Presenter's Notes

Title: Bootstrap

1
Bootstrap

Chingchun Huang (???)
Vision Lab, NCTU

2
Introduction

A data-based simulation method
For statistical inference
finding estimators of the parameter in interest
Confidence of the parameter in interest

3
An example

Two statistics definition for a random variable
Average sample mean
Standard error The standard deviation of the
sample means
Calculation of two statistics
Carry out measurement many times
Observations from these two statistics
Standard error decreases as N increases
Sample mean becomes more reliable as N increases

4
Central limit theorem

Averages taken from any distribution
(your experimental data) will have a normal
distribution
The error for such an statistic will
decrease slowly as the number of
observations increase

5
Averages of N.D.
Normal distribution
c2 distribution
Averages of c2 distribution
6
Uniform distribution
Averages of U.D.
7
Consequences of central limit theorem

But nobody tells you how big the sample has to
be..
Should we believe a measurement of Average?
How about other objects rather than Average

Bootstrap --- the technique to the rescue
8
Basic idea of bootstrap

Originally, from some list of data, one computes
an object (e.g. statistic).
Create an artificial list by randomly drawing
elements from that list. Some elements will be
picked more than once.
Nonparametric mode (later)
Parametric mode (later)
Compute a new object.
Repeat 100-1000 times and look at the
distribution of these objects.

9
A simple example

Data available comparing grades before and after
leaving graduate school
Some linear correlation between grades r0.776
But how reliable is this result (r0.776)?

10

A simple example
11
(No Transcript)
12
(No Transcript)
13
A simple example
14
Confidence intervals

Consider the similar situation as before
The parameter of interest is ? (e.g. Mean)
is an estimator of ? based on the sample
.
We are interested in finding the confidence
interval for the parameter.

15
The percentile algorithm

Input the level2 for the confidence
interval.
Generate B number of bootstrap samples.
Compute for b 1,, B
Arrange the new data set with s in order.
Compute and percentile for
the new data.
C.I. is given by ( th , th
)
Percentile 5 10 16 50 84 90 95
Percentile 49.7 56.4 62.7 86.9 112.3 118.7 126.7
of

16
How many bootstraps ?

No clear answer to this.
Rule of thumb try it 100 times, then 1000
times, and see if your answers have changed by
much.

17
How many bootstraps ?
B 50 100 200 500 1000 2000 3000
Std. Error 0.0869 0.0804 0.0790 0.0745 0.0759 0.0756 0.0755
18
Convergence

This histogram is showing the distribution of the
correlation coefficient for the bootstrap sample
. Here B200, B500

19
Contd

B1000, B2000

20
Contd..

B3000 B4000
Now it can be seen the sampling distributions
of correlation coefficient are more or less
identical.

21
Contd..

The above graph is showing the similarity in the
distribution of the bootstrap distribution and
the direct enumeration from random samples from
the empirical distribution

22
Is it reliable ?

Observations
Good agreement for Normal (Gaussian)
distributions
Skewed distributions tend to more problematic,
particularly for the tails
A tip For now nobody is going to shoot you down
for using it.

23
Schematic representation of bootstrap procedure
24
Bootstrap

The bootstrap can be used either
non-parametrically or parametrically
In nonparametric mode, it avoids restrictive and
sometimes dangerous parametric assumptions about
the form of the underlying population .
In parametric mode it can provide more accurate
estimates of errors than traditional methods.

25
Parametric Bootstrap
(distribution) P x (samples)
Real World
Statistic of interest
Bootstrap World Estimated
Bootstrap probability sample
model Bootstrap Replication
26
Bootstrap

The technique was extended, modified and
refined to handle a wide variety of problems
including
(1) confidence intervals and hypothesis tests,
(2) linear and nonlinear regression,
(3) time series analysis and other problems

26
27
Example one-dimensional smoothing
Fit a cubic spline (N50 training data)
28
The bootstrap and maximum likelihood method
Least squares
where ?
where
?
29
The bootstrap and maximum likelihood method
Nonparametric bootstrap Repeat B200
times - draw a dataset of N50 with replacement
from the training data zi(xi,yi) - fit a cubic
spline
Construct a 95 pointwise confidence
interval At each xi compute the mean and find
the 2,5 and 97,5 percentiles
30
The bootstrap and maximum likelihood method
Parametric bootstrap We assume that the
model errors are Gaussian Repeat B200 times -
draw a dataset of N50 with replacement from the
training data zi(xi,yi) - fit a cubic spline on
zi and estimate - simulate new
responses zi(xi,yi) - fit a
cubic spline on zi
Construct a 95 pointwise confidence interval At
each xi compute the mean and find the 2,5 and
97,5 percentiles
31
The bootstrap and maximum likelihood method
Parametric bootstrap
Conclusion least squares parametric
bootstrap as B ? ? (only because of Gaussian
errors)
32
Some notations

The Bootstrap is
A computer-based method for assigning measures of
accuracy to statistical estimates.
The basic idea behind bootstrap is very simple,
and goes back at least two centuries.
The bootstrap method is not a way of reducing
the error ! It only tries to estimate it.
Bootstrap methods depend only on the Bootstrap
samples. It does not depend on the underlying
distribution.

33
A general data set-up

We have dealt with
The standard error
The confidence interval
With the assumption that distribution is either
unknown or very complicated.
The situation can be more general
Like regression ,
Sometimes using maximum likelihood estimation.

34
Conclusion

The bootstrap allow the data analyst to
Asses the statistical accuracy of complicated
procedures, by exploiting the power of the
computer.
The use of the bootstrap either
Relief the analyst from having to do complex
mathematical derivation or
Provide an answer where no analytical answer can
be obtained.

35
Addendum The Jack-knife

Jack-knife is a special kind of bootstrap.
Each bootstrap subsample has all but one of the
original elements of the list.
For example, if original list has 10 elements,
then there are 10 jack-knife subsamples.

36
Introduction (continued)

Definition of Efrons nonparametric bootstrap.
Given a sample of n independent identically
distributed (i.i.d.) observations X1, X2, , Xn
from
a distribution F and a parameter ? of the
distribution F with a real valued estimator
?(X1, X2, , Xn ), the bootstrap estimates the
accuracy of the estimator by replacing F with Fn,
the empirical distribution, where Fn places
probability mass 1/n at each observation Xi.

36
37
Introduction (continued)

Let X1, X2, , Xn be a bootstrap sample, that
is a sample of size n taken with replacement from
Fn .
The bootstrap, estimates the variance of
?(X1, X2, , Xn ) by computing or approximating
the variance of
? ?(X1, X2, , Xn ).

37
38
Introduction (continued)

The bootstrap is similar to earlier techniques
which are also called resampling methods
(1) jackknife,
(2) cross-validation,
(3) delta method,
(4) permutation methods, and
(5) subsampling..

38
39
Bootstrap Remedies

In the past decade many of the problems where the
bootstrap is inconsistent remedies have been
found by researchers to give good modified
bootstrap solutions that are consistent.
For both problems describe thus far a simple
procedure called the m-out-n bootstrap has been
shown to lead to consistent estimates .

39
40
The m-out-of-n Bootstrap

This idea was proposed by Bickel and Ren (1996)
for handling doubly censored data.
Instead of sampling n times with replacement from
a sample of size n they suggest to do it only m
times where m is much less than n.
To get the consistency results both m and n need
to get large but at different rates. We need
mo(n). That is m/n?0 as m and n both ? 8.
This method leads to consistent bootstrap
estimates in many cases where the ordinary
bootstrap has problems, particularly (1) mean
with infinite variance and (2) extreme value
distributions.

Dont know why.
40
41
Examples where the bootstrap fails

Athreya (1987) shows that the bootstrap estimate
of the sample mean is inconsistent when the
population distribution has an infinite variance.
Angus (1993) provides similar inconsistency
results for the maximum and minimum of a sequence
of independent identically distributed
observations.

Write a Comment

User Comments (0)