Introduction to Bootstrap Estimation - PowerPoint PPT Presentation

1 / 19

About This Presentation

Title:

Introduction to Bootstrap Estimation

Description:

Generate bootstrap samples (B) from this distribution. ... Obtain B bootstrap estimates of the ... [LCL, UCL] can exceed the min and max from the observed data ... – PowerPoint PPT presentation

Number of Views:127

Avg rating:3.0/5.0

Slides: 20

Provided by: GOOD48

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to Bootstrap Estimation

1
Introduction to Bootstrap Estimation
2
What is the Bootstrap Method?

Given a sample with n observations, quantify the
uncertainty in any parameter estimate (e.g.,
mean, variance, percentile value, etc.)
All Bootstrap methods involve generating
hypothetical samples from the original sample
Each hypothetical sample, called a Bootstrap
Sample, represents a potential set of
observations that COULD be obtained, if we
resampled the population
Use Monte Carlo simulation to generate MANY
Bootstrap Samples (B gt 5000)

3
Parametric vs. Distribution Free

Parametric Bootstrap
Select and fit a probability distribution to the
sample data. Generate bootstrap samples (B) from
this distribution.
Non-parametric (Distribution-Free) Bootstrap
Generate bootstrap samples (B) directly from the
sample data. Randomly sample with replacement.

4
Distribution-Free Approaches

Bootstrap Methods
involve resampling from the original data set
many times, tracking the parameter estimates for
each iteration, and developing a PDFu.
statistically sound method and does not require
assumptions about PDFv
suffers from same limitations as other methods if
sample is not representative

5
Parameter Uncertainty
Parameter Estimates for each 1-D MCA Simulation
Soil Ingestion Lognormal(m, s)
m uniform(min, max)
s triang(min, mode, max)
6
Example

Original Sample (n 5)

23, 28, 30, 50, 61

Bootstrap Samples (B 5000)

1 2 3 . . 5000
28, 50, 30, 23, 23 30, 50, 50, 61, 28 61, 23, 30,
23, 28 . . 28, 50, 30, 61, 30
7
Uncertainty in the Mean

Original Sample (n 5)

23, 28, 30, 50, 61
38.4

Bootstrap Samples (B 5000)

1 2 3 . . 5000
28, 50, 30, 23, 23 30, 50, 50, 61, 28 61, 23, 30,
23, 28 . . 28, 50, 30, 61, 30
8
Confidence Intervals

CIs for parameter estimates are calculated using
statistics from the original sample and the
bootstrap samples
Many different methods are available
complexity ? accuracy
Difference in results will depend on sample size
(n) and skewness of original data

9
Example
If x1, x2, ,xnis an independent sample from a
normal distribution, X N(?,?). and m,s are
the sample mean and standard deviation, then
normal distribution theory says that
and that
10
Normal Distribution Assumption

Does not require Monte Carlo sampling
Select significance level (e.g., a 0.05 for 95
CI)

where
11
Bootstrap Methods

Percentile Bootstrap
Standard Bootstrap
Bootstrap-t (Pivotal Bootstrap)
Bias-corrected (BCa) Bootstrap

References - Hall (1988) - Efron and
Tibshirani (1993) - U.S. EPA (1997) or Singh et
al. (1997)
12
Percentile Bootstrap

Easy! Just calculate the parameter for each
bootstrap sample and select a (e.g., 0.05).
LCL a /2 th percentile.
UCL (1 - a /2) th percentile.
Use EXCELs percentile function
percentile(bootstrap data array, 0.025)

13
Standard Bootstrap

Obtain B bootstrap estimates of the parameter
theta
Calculate the standard error of theta based on
standard deviation of B bootstrap estimates

14
Bootstrap-t

Same as Standard Bootstrap, except obtain
t-statistic from the bootstrap samples.
For each bootstrap sample, calculate tb

Calculate the a /2 th and (1- a /2 )th
percentiles tb.

15
Calculating SEb for Bootstrap-t

Normal Approximation Rule (Large Sample)

Nested Bootstrap
For each bootstrap sample (b), run j 1000
bootstrap simulations to derive 1000 parameter
estimates

16
BCa Bootstrap

See Appendix or Efron Tibshirani (1993)
Accounts for skewness in the bootstrap sample
means and the rate of change of the standard error

17
How do the Methods Compare?

Confidence Intervals will differ depending on the
approach that is used.
In general, as n decreases and skewness
increases
Bootstrap-t gt BCa gt percentile gt standard
LCL, UCL can exceed the min and max from the
observed data
CI for mean of Standard bootstrap CI for mean
assuming X Normal ( , s)

18
So when should you use the Bootstrap estimate,
and which approach is best?

Use of Lognormal or Normal PDFs is weakly
supported
Data are poorly fit by continuous distributions
(e.g., censored, mixed)
Analytical solution is messy (simple alternative
is a parametric bootstrap

19
So when should you use the Bootstrap estimate,
and which approach is best?

As with other approaches for quantifying
parameter uncertainty, confidence in parameter
estimates improves with better data quality and
increased sample size
Bootstrap is not a substitute for a weak or
non-representative sample
Choosing the best bootstrap approach remains an
exercise in judgment. Extent of differences in
coverage of CIs may be a useful contribution to
sensitivity analysis