Title: Resampling Methods
1Resampling Methods
- Peter Bruce
- Resampling Stats, Cytel Software, Statistics.com
- pbruce_at_resample.com
2Resampling Methods
- What is resampling
- Examples
- Historical perspective
3What is resampling
- Permutation
- Bootstrap
- Monte Carlo simulation
4Permutation
- Survival times
- Â
- Treated mice 94, 38, 23, 197, 99, 16, 141
- Mean 86.8
- Â
- Untreated mice 52, 10, 40, 104, 51, 27, 146,
30, 46 - Mean 56.2
- (Efron Tibshirani)
51. Calculate the difference between the means of
the two observed samples its 30.6 days in
favor of the treated mice. 2. Consider the two
samples combined (16 observations) as the
relevant universe to resample from.
6- 3. Draw 7 hypothetical observations and
designate them "Treatment" draw 9 hypothetical
observations and designate them "Control". - 4. Compute and record the difference between the
means of the two samples.
7- 5. Repeat steps 3 and 4 perhaps 1000 times.
- 6. Determine how often the resampled difference
exceeds the observed difference of 30.6
8Histogram of permuted differences
9The Bootstrap
- A new pigfood ration is tested on twelve pigs,
with six-week weight gains as follows - Â
- 496 544 464 416 512 560 608 544 480 466 512 496
- Â
- Mean 508 ounces (establish a confidence
interval)
10The Classic Bootstrap
Draw simulated samples from a hypothetical
universe that embodies all we know about the
universe that this sample came from our sample,
replicated an infinite number of times
11- 1. Put the observed weight gains in a hat
- 2. Sample 12 with replacement
- 3. Record the mean
- 4. Repeat steps 2-3, say, 1000 times
- 5. Record the 5th and 95th percentiles (for a
90 confidence interval)
12Bootstrapped sample means
13Historical Perspective
141908 - W. S. Gossett
15(No Transcript)
16(No Transcript)
17Fishers Tea Taster
- 8 cups of tea are prepared, four with tea poured
first and four with milk poured first. The cups
are presented to her in random order.
18Permutation solution
- 1. Mark a strip of paper with eight guesses
about the order of the "tea-first" and
"milk-first" cups -- let's say T T T T M M M M. - 2. Make a deck of eight cards, four marked "T"
and four marked "M." - 3. Deal out these eight cards successively in all
possible orderings (permutations) - Â 4. Record how many of those permutations show gt
6 matches. - Â
19Approximate Permutation
- 3. Shuffle the deck and deal it out along the
strip of paper with the marked guesses, record
the number of matches. - 4. Repeat many times.
20Other names
- Monte Carlo permutation
- Randomization test
- Sampled permutation (randomization) test
21Extension to multiple samples
- Fisher went on to apply the same idea to
agricultural experiments involving two or more
samples. The question became "How likely is it
that random arrangements of the observed data
would produce samples differing as much as the
observed samples differ?"
22Extension to samples from populations
- In the 1930's, Fisher and Pitman showed that the
inference for a permutation test extended to
cover not just random re-arrangements of a fixed
set of finite elements, but also samples from
larger populations. - Â
23Formula-based analogs
- Fisher and Pitman showed that the t-distribution
and chi-squared distribution are good
approximations for sufficiently large and/or
normally-distributed samples.
24The bootstrap
- 1969 Simon publishes the bootstrap as an example
in Basic Research Methods in Social Science (the
earlier pigfood example) - 1979 Efron names and publishes first paper on the
bootstrap - Coincides with advent of personal computer
25Additional examples
26- 10 of 135 high cholesterol men developed MI
(.074), and only 21 of 470 low cholesterol(.045),
for a difference of .029
27Resampling solution
- 1. Constitute an urn with 31 1s (MI) and 574
2s (no MI) - 2. Take a sample of size 135 from the urn (high
cholesterol men) - 3. Take a second sample of size 470 (low
cholesterol men)
28Resampling solution, cont.
- 4. Count the number of 1s (MIs) in each
- 5. Divide by sample sizes to get proportions
- 6. Find the difference in proportions sample 1
(n135) minus sample 2 (n470) - 7. Keep score of the difference
29URN 311 5742 men An urn called
"men" with 31
"1's" infarctions) and 574
"2s" (no
infarction)REPEAT 1000 SHUFFLE men men TAKE
men 1,135 high 'Sample (without
replacement) from
'the urn 135 "men" TAKE men 136,605 low
'Same for a group of 470. COUNT high 1 a
'Count MI's in first group DIVIDE a 135
aa 'Express as a proportion COUNT
low 1 b 'Count MI's in second
group DIVIDE b 470 bb 'Express as a
proportion SUBTRACT aa bb c 'Find
the difference SCORE c z
'Keep score END HISTOGRAM z COUNT z gt.029 k
How often was the resampled
difference gt the observed
difference?
DIVIDE k 1000 kk Convert this
result to a
proportion PRINT kk
30Results (est. p-value 0.125)
31Permutation procedures
- Exact - conserve Type I error
- Increasingly a part of software
- Tend to be conservative
32Bootstrap procedures
- Bootstrap variants of permutation tests with 2 x
2 contingency tables can improve power - Straight bootstrap (involving no shuffling) can
produce overly narrow confidence limits - As sample sizes increase, bootstrap undercoverage
decreases
33Bootstrap adjustments
- Formula-based adjustments (t-boot, Boot-bca)
- Double (iterated) bootstrap
- Parametric bootstrap
34Resampling Checklist
- 1. Specify relevant universe(s) Â
- 2. Specify sampling procedure (size, number of
samples)? - 3. Calculation of statistic/estimate of
interest. - 4. Re-sample results are scored and, after
completion, used to calculate a numerical answer.
35Flexibility in test statistic
- Prior to resampling, much work expended on
determining sampling distribution of test
statistic. - No longer necessary
- Use a wide variety of home-grown statistics
36Terms
- Bootstrap
- - sampling with replacement from observed data
- Permutation test
- - constitute null model, permute divide into
resamples - Randomization test permutation test
- Approximate permutation test
- - shuffling instead of exhaustive permutation
- Monte Carlo simulation
- Exact tests control of Type 1 error
- Resampling