Limits to Statistical Theory Bootstrap analysis - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

Limits to Statistical Theory Bootstrap analysis

Description:

Sample mean is a t-distributed random variable ... Use random number generator to create sample. Same size as original. Calculate sample mean ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 11
Provided by: brucek64
Category:

less

Transcript and Presenter's Notes

Title: Limits to Statistical Theory Bootstrap analysis


1
Limits to Statistical TheoryBootstrap analysis
  • ESM 206
  • 11 April 2006

2
Assumption of t-test
  • Sample mean is a t-distributed random variable
  • Guaranteed if observations are normally
    distributed random variables or sample size is
    very large
  • In practice, OK if observations are not too
    skewed and sample size is reasonably large
  • This assumption also applies when using standard
    formula for 95 CI of mean

3
Resampling for a confidence interval of the mean
  • IN AN IDEAL WORLD
  • Take sample
  • Calculate sample mean
  • Take new sample
  • Calculate new mean
  • Repeat many times
  • Look at the distribution of sample means
  • 95 CI ranges from 2.5 percentile to 97.5
    percentile
  • IN THE REAL WORLD
  • Find some way to simulate taking a sample
  • Calculate the sample mean
  • Repeat many times
  • Look at the distribution of sample means
  • 95 CI ranges from 2.5 percentile to 97.5
    percentile

4
Bootstrap resampling
  • PARAMETRIC BOOTSTRAP
  • Assume data are random variables from a
    particular distribution
  • E.g., log-normal
  • Use data to estimate parameters of the
    distribution
  • E.g., mean, variance
  • Use random number generator to create sample
  • Same size as original
  • Calculate sample mean
  • Allows us to ask What if data were a random
    sample from specified distribution with specified
    parameters?
  • NONPARAMETRIC BOOTSTRAP
  • Assume underlying distribution from which data
    come is unknown
  • Best estimate of this distribution is the data
    themselves the empirical distribution function
  • Create a new dataset by sampling with replacement
    from the data
  • Same size as original
  • Calculate sample mean
  • WHICH IS BETTER?
  • If underlying distribution is correctly chosen,
    parametric has more precision
  • If underlying distribution incorrectly chosen,
    parametric has more bias

5
TcCB in the cleanup site
  • Parametric bootstrap
  • If Y is log-normal, it is specified in terms of
    mean and standard deviation of X log(Y)
  • Mean -0.547
  • SD 1.360
  • Use Monte Carlo Simulation to generate 999
    replicate simulated datasets from log-normal
    distribution
  • Calculate mean of each replicate and sort means
  • 25th value is lower end of 95 CI
  • 975th value is upper end of 95 CI

95 CI -0.678, 8.458
6
Parametric bootstrap results
  • 95 CI 0.917, 2.293

7
Normal QQ Plot
  • Sort data
  • Index the values (i 1,2,,n)
  • Calculate q i /(n1)
  • This is the quantile
  • Plot quantiles against data values
  • This is the empirical cumulative distribution
    function (CDF)
  • Construct CDF of standard normal using same
    quantiles
  • Compare the distributions at the same quantiles

8
Nonparametric bootstrap results
  • 95 CI 0.851, 9.248

9
Bootstrap and hypothesis tests
  • One sample t-test
  • Calculate bootstrap CI of mean
  • Does it overlap test value?
  • Paired t-test
  • Calculate differences
  • Di xi - yi
  • Find bootstrap CI of mean difference
  • Does it overlap zero?
  • Two-sample t-test
  • Want to create simulated data where H0 is true
    (same mean) but allow variance and shape of
    distribution to differ between populations
  • Easiest with nonparametric
  • Subtract mean from each sample. Now both samples
    have mean zero
  • Resample these residuals, creating simulated
    group A from residuals of group A and simulated
    group B from residuals of group B
  • Generate distribution of t values
  • P is fraction of simulated ts that exceed t
    calculated from data

10
TcCB H0 cleanup mean reference mean
  • t 1.45
  • Bootstrapped t values do not follow a t
    distribution!
  • P 0.02
Write a Comment
User Comments (0)
About PowerShow.com