Resampling Methods - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Resampling Methods

Description:

1. Calculate the difference between the means of the two ... Formula-based adjustments (t-boot, Boot-bca) Double (iterated) bootstrap. Parametric bootstrap ... – PowerPoint PPT presentation

Number of Views:433
Avg rating:3.0/5.0
Slides: 37
Provided by: peter971
Category:

less

Transcript and Presenter's Notes

Title: Resampling Methods


1
Resampling Methods
  • Peter Bruce
  • Resampling Stats, Cytel Software, Statistics.com
  • pbruce_at_resample.com

2
Resampling Methods
  • What is resampling
  • Examples
  • Historical perspective

3
What is resampling
  • Permutation
  • Bootstrap
  • Monte Carlo simulation

4
Permutation
  • Survival times
  •  
  • Treated mice 94, 38, 23, 197, 99, 16, 141
  • Mean 86.8
  •  
  • Untreated mice 52, 10, 40, 104, 51, 27, 146,
    30, 46
  • Mean 56.2
  • (Efron Tibshirani)

5
1. Calculate the difference between the means of
the two observed samples its 30.6 days in
favor of the treated mice. 2. Consider the two
samples combined (16 observations) as the
relevant universe to resample from.
6
  • 3. Draw 7 hypothetical observations and
    designate them "Treatment" draw 9 hypothetical
    observations and designate them "Control".
  • 4. Compute and record the difference between the
    means of the two samples.

7
  • 5. Repeat steps 3 and 4 perhaps 1000 times.
  • 6. Determine how often the resampled difference
    exceeds the observed difference of 30.6

8
Histogram of permuted differences
9
The Bootstrap
  • A new pigfood ration is tested on twelve pigs,
    with six-week weight gains as follows
  •  
  • 496 544 464 416 512 560 608 544 480 466 512 496
  •  
  • Mean 508 ounces (establish a confidence
    interval)

10
The Classic Bootstrap
Draw simulated samples from a hypothetical
universe that embodies all we know about the
universe that this sample came from our sample,
replicated an infinite number of times
11
  • 1. Put the observed weight gains in a hat
  • 2. Sample 12 with replacement
  • 3. Record the mean
  • 4. Repeat steps 2-3, say, 1000 times
  • 5. Record the 5th and 95th percentiles (for a
    90 confidence interval)

12
Bootstrapped sample means
13
Historical Perspective
14
1908 - W. S. Gossett
15
(No Transcript)
16
(No Transcript)
17
Fishers Tea Taster
  • 8 cups of tea are prepared, four with tea poured
    first and four with milk poured first. The cups
    are presented to her in random order.

18
Permutation solution
  • 1. Mark a strip of paper with eight guesses
    about the order of the "tea-first" and
    "milk-first" cups -- let's say T T T T M M M M.
  • 2. Make a deck of eight cards, four marked "T"
    and four marked "M."
  • 3. Deal out these eight cards successively in all
    possible orderings (permutations)
  •  4. Record how many of those permutations show gt
    6 matches.
  •  

19
Approximate Permutation
  • 3. Shuffle the deck and deal it out along the
    strip of paper with the marked guesses, record
    the number of matches.
  • 4. Repeat many times.

20
Other names
  • Monte Carlo permutation
  • Randomization test
  • Sampled permutation (randomization) test

21
Extension to multiple samples
  • Fisher went on to apply the same idea to
    agricultural experiments involving two or more
    samples. The question became "How likely is it
    that random arrangements of the observed data
    would produce samples differing as much as the
    observed samples differ?"

22
Extension to samples from populations
  • In the 1930's, Fisher and Pitman showed that the
    inference for a permutation test extended to
    cover not just random re-arrangements of a fixed
    set of finite elements, but also samples from
    larger populations.
  •  

23
Formula-based analogs
  • Fisher and Pitman showed that the t-distribution
    and chi-squared distribution are good
    approximations for sufficiently large and/or
    normally-distributed samples.

24
The bootstrap
  • 1969 Simon publishes the bootstrap as an example
    in Basic Research Methods in Social Science (the
    earlier pigfood example)
  • 1979 Efron names and publishes first paper on the
    bootstrap
  • Coincides with advent of personal computer

25
Additional examples
  • Myocardial infarctions

26
  • 10 of 135 high cholesterol men developed MI
    (.074), and only 21 of 470 low cholesterol(.045),
    for a difference of .029

27
Resampling solution
  • 1. Constitute an urn with 31 1s (MI) and 574
    2s (no MI)
  • 2. Take a sample of size 135 from the urn (high
    cholesterol men)
  • 3. Take a second sample of size 470 (low
    cholesterol men)

28
Resampling solution, cont.
  • 4. Count the number of 1s (MIs) in each
  • 5. Divide by sample sizes to get proportions
  • 6. Find the difference in proportions sample 1
    (n135) minus sample 2 (n470)
  • 7. Keep score of the difference

29
URN 311 5742 men An urn called
"men" with 31
"1's" infarctions) and 574
"2s" (no
infarction)REPEAT 1000 SHUFFLE men men TAKE
men 1,135 high 'Sample (without
replacement) from
'the urn 135 "men" TAKE men 136,605 low
'Same for a group of 470. COUNT high 1 a
'Count MI's in first group DIVIDE a 135
aa 'Express as a proportion COUNT
low 1 b 'Count MI's in second
group DIVIDE b 470 bb 'Express as a
proportion SUBTRACT aa bb c 'Find
the difference SCORE c z
'Keep score END HISTOGRAM z COUNT z gt.029 k
How often was the resampled
difference gt the observed
difference?
DIVIDE k 1000 kk Convert this
result to a
proportion PRINT kk
30
Results (est. p-value 0.125)
31
Permutation procedures
  • Exact - conserve Type I error
  • Increasingly a part of software
  • Tend to be conservative

32
Bootstrap procedures
  • Bootstrap variants of permutation tests with 2 x
    2 contingency tables can improve power
  • Straight bootstrap (involving no shuffling) can
    produce overly narrow confidence limits
  • As sample sizes increase, bootstrap undercoverage
    decreases

33
Bootstrap adjustments
  • Formula-based adjustments (t-boot, Boot-bca)
  • Double (iterated) bootstrap
  • Parametric bootstrap

34
Resampling Checklist
  • 1. Specify relevant universe(s)  
  • 2. Specify sampling procedure (size, number of
    samples)?
  • 3. Calculation of statistic/estimate of
    interest.
  • 4. Re-sample results are scored and, after
    completion, used to calculate a numerical answer.

35
Flexibility in test statistic
  • Prior to resampling, much work expended on
    determining sampling distribution of test
    statistic.
  • No longer necessary
  • Use a wide variety of home-grown statistics

36
Terms
  • Bootstrap
  • - sampling with replacement from observed data
  • Permutation test
  • - constitute null model, permute divide into
    resamples
  • Randomization test permutation test
  • Approximate permutation test
  • - shuffling instead of exhaustive permutation
  • Monte Carlo simulation
  • Exact tests control of Type 1 error
  • Resampling
Write a Comment
User Comments (0)
About PowerShow.com