Title: Introduction to Bootstrap Estimation
1Introduction to Bootstrap Estimation
2What is the Bootstrap Method?
- Given a sample with n observations, quantify the
uncertainty in any parameter estimate (e.g.,
mean, variance, percentile value, etc.) - All Bootstrap methods involve generating
hypothetical samples from the original sample - Each hypothetical sample, called a Bootstrap
Sample, represents a potential set of
observations that COULD be obtained, if we
resampled the population - Use Monte Carlo simulation to generate MANY
Bootstrap Samples (B gt 5000)
3Parametric vs. Distribution Free
- Parametric Bootstrap
- Select and fit a probability distribution to the
sample data. Generate bootstrap samples (B) from
this distribution. - Non-parametric (Distribution-Free) Bootstrap
- Generate bootstrap samples (B) directly from the
sample data. Randomly sample with replacement.
4Distribution-Free Approaches
- Bootstrap Methods
- involve resampling from the original data set
many times, tracking the parameter estimates for
each iteration, and developing a PDFu. - statistically sound method and does not require
assumptions about PDFv - suffers from same limitations as other methods if
sample is not representative
5Parameter Uncertainty
Parameter Estimates for each 1-D MCA Simulation
Soil Ingestion Lognormal(m, s)
m uniform(min, max)
s triang(min, mode, max)
6Example
23, 28, 30, 50, 61
- Bootstrap Samples (B 5000)
1 2 3 . . 5000
28, 50, 30, 23, 23 30, 50, 50, 61, 28 61, 23, 30,
23, 28 . . 28, 50, 30, 61, 30
7Uncertainty in the Mean
23, 28, 30, 50, 61
38.4
- Bootstrap Samples (B 5000)
1 2 3 . . 5000
28, 50, 30, 23, 23 30, 50, 50, 61, 28 61, 23, 30,
23, 28 . . 28, 50, 30, 61, 30
8Confidence Intervals
- CIs for parameter estimates are calculated using
statistics from the original sample and the
bootstrap samples - Many different methods are available
- complexity ? accuracy
- Difference in results will depend on sample size
(n) and skewness of original data
9Example
If x1, x2, ,xnis an independent sample from a
normal distribution, X N(?,?). and m,s are
the sample mean and standard deviation, then
normal distribution theory says that
and that
10Normal Distribution Assumption
- Does not require Monte Carlo sampling
- Select significance level (e.g., a 0.05 for 95
CI)
where
11Bootstrap Methods
- Percentile Bootstrap
- Standard Bootstrap
- Bootstrap-t (Pivotal Bootstrap)
- Bias-corrected (BCa) Bootstrap
References - Hall (1988) - Efron and
Tibshirani (1993) - U.S. EPA (1997) or Singh et
al. (1997)
12Percentile Bootstrap
- Easy! Just calculate the parameter for each
bootstrap sample and select a (e.g., 0.05). - LCL a /2 th percentile.
- UCL (1 - a /2) th percentile.
- Use EXCELs percentile function
percentile(bootstrap data array, 0.025)
13Standard Bootstrap
- Obtain B bootstrap estimates of the parameter
theta - Calculate the standard error of theta based on
standard deviation of B bootstrap estimates
14Bootstrap-t
- Same as Standard Bootstrap, except obtain
t-statistic from the bootstrap samples. - For each bootstrap sample, calculate tb
- Calculate the a /2 th and (1- a /2 )th
percentiles tb.
15Calculating SEb for Bootstrap-t
- Normal Approximation Rule (Large Sample)
- Nested Bootstrap
- For each bootstrap sample (b), run j 1000
bootstrap simulations to derive 1000 parameter
estimates
16BCa Bootstrap
- See Appendix or Efron Tibshirani (1993)
- Accounts for skewness in the bootstrap sample
means and the rate of change of the standard error
17How do the Methods Compare?
- Confidence Intervals will differ depending on the
approach that is used. - In general, as n decreases and skewness
increases - Bootstrap-t gt BCa gt percentile gt standard
- LCL, UCL can exceed the min and max from the
observed data - CI for mean of Standard bootstrap CI for mean
assuming X Normal ( , s)
18So when should you use the Bootstrap estimate,
and which approach is best?
- Use of Lognormal or Normal PDFs is weakly
supported - Data are poorly fit by continuous distributions
(e.g., censored, mixed) - Analytical solution is messy (simple alternative
is a parametric bootstrap
19So when should you use the Bootstrap estimate,
and which approach is best?
- As with other approaches for quantifying
parameter uncertainty, confidence in parameter
estimates improves with better data quality and
increased sample size - Bootstrap is not a substitute for a weak or
non-representative sample - Choosing the best bootstrap approach remains an
exercise in judgment. Extent of differences in
coverage of CIs may be a useful contribution to
sensitivity analysis