Title: Bootstraps and Jackknives
1Bootstraps and Jackknives
- Hal Whitehead
- BIOL4062/5062
2- Confidence in estimators
- Why use bootstraps or jackknives?
- The jackknife
- The parametric bootstrap
- The non-parametric bootstrap
- (The bootstrap)
3Estimation without confidence(standard error,
confidence interval)has little value
4Confidence in estimatesTraditional approach
- DATA Biological
- model
-
- Estimator Statistical
- (Statistic) model
- Confidence in estimator
?
5Confidence in estimatesTraditional approach
- e.g. What is sex ratio of vole population?
- Trap 12 males 15 females
- Estimate ratio 12/(1215)0.444
- Using binomial distribution
- SE ?0.444x(1-0.444)/(1215)0.096
- So Sex ratio is estimated to be 0.444 (SE 0.096)
6e.g. Asymmetry of size among nestlings in nests
of 6
- Measure difference between size of nestling and
its most similar neighbour - 1.2 4.3 4.7 3.2 6.1 1.3 gt
- 0.1 0.4 0.4 1.1 1.4 0.1 0.58
- But what confidence have we in this?
7Confidence in estimatorMean distance between
animals
- In a small population
- what is the expected distance between any two
animals? - Estimate is mean of distances between all pairs
of animals - What is confidence in this estimate?
- no easy formula (lack of independence)
8Use Bootstraps and Jackknives when
- No clear biological model
- Deriving statistical model
- very difficult, impossible, or tedious
- Statistical model too complicated to be useful
- Model may not be quite valid
- Accurate measure of precision under statistical
model only possible with large n
9The Jackknife
- Data D X1, X2, X3, .... ,Xn gt statistic s
- Jackknife replicates miss out units (or groups of
units) in turn - J1 X2, X3, .... ,Xn gt statistic s-1 (missing
unit 1) - J2 X1, X3, .... ,Xn gt statistic s-2 (missing
unit 2) - etc.
- Convert into pseudovalues
- f1 ns - (n-1)s-1
- f2 ns - (n-1)s-2
- etc.
10The Jackknife
- The Jackknifed Estimate of s is then
- sJ mean(f1,...,fn)
- SE(s) SE(f1,...,fn)
11The Jackknife
- Jackknifed Estimate removes bias
- Jackknife SE rough and ready
- usually conservative (overestimates SE)
- Jackknife on blocks of units, if data not
independent - Assumes normality for confidence intervals
12Correlation between gill weight and body weight
in 12 crabs
Gill(mg) Body(g) r-i fi 1590 14.40
0.888 0.607 1790 15.20 0.884 0.656
10 11.30 0.892 0.570 450 2.50
0.830 1.249 3840 22.70 0.811 1.452
23 14.90 0.863 0.879 10 1.41
0.875 0.751 32 15.81 0.872 0.779
8 4.19 0.845 1.078 22 15.39
0.867 0.843 32 17.25 0.858 0.940
21 9.52 0.877 0.725
r 0.865
- Jackknife r 0.878 Mean fi
- SE 0.0768 SD(fi)/?12)
13Bootstraps
14Parametric Bootstrap
- Assume Data produced by Model with some
Parameters unknown, which need to be estimated - Model gt Data gt Parameter estimates (s)
- The Bootstrap process
- Model Parameter estimates (s) gt Random data
gt Bootstrap replicate estimates (s) - Distribution of Bootstrap replicate estimates
(ss) give distribution, confidence intervals and
standard errors of s (plus indicator of bias) - Usually use 100-10,000 bootstrap replicates
15Parametric Bootstrapan exampleMark-Recapture
Estimate
- Mark 25 animals
- Recapture 46
- of which 12 Marked
- What is population size?
- Petersen estimate is 25x46/1295.8
- What is confidence in this estimate, expected
bias?
16Parametric Bootstrapan exampleMark-Recapture
Estimate
- Mark 25 animals Recapture 46, 12 Marked
- Petersen estimate is 25x46/1295.8
- What is confidence, expected bias?
- Parametric Bootstrap Replicates
- 96 Animals, mark 25, recapture 46
- How many marked?
- From simulation (ms)
- 9 14 14 9 14 13 12 13 12
14 ... - Calculate population estimates (ns 25x46/ms)
- 127.8 82.1 82.1 127.8 82.1 88.5 95.8 88.5
95.8 82.1..
17Parametric Bootstrapan exampleMark-Recapture
Estimate
- Petersen estimate is 25x46/1295.8
- Bootstrap population estimates (assuming n96)
- 127.8 82.1 82.1 127.8 82.1 88.5 95.8 88.5
95.8 82.1.. - Expected Bias
- mean(ns) - 96 99.7 - 96 3.7
- Estimated standard error
- SD(ns) 20.4
- So population estimate is 92.1 (SE 20.4)
18Parametric Bootstrapan exampleMark-Recapture
Estimate
19Non-Parametric Bootstrap(A.K.A. The Bootstrap)
- Data D X1, X2, X3, .... ,Xn gt statistic s
- Bootstrap replicate
- D1 X1, X2, X3, .... ,Xn gt statistic s1
- D2 X1, X2, X3, .... ,Xn gt statistic s2
- ...
- X1, X2, X3, .... ,Xn are randomly selected
with replacement, from X1, X2, X3, .... ,Xn - Distribution, confidence interval and SE of s
estimated from the distribution, confidence
interval and standard error of the ss - Usually use 100-10,000 bootstrap replicates
20Non-Parametric Bootstrap an exampleMedian Gill
Weight in Crabs
- Gill weights (in mg)
- 159 179 100 45 384 230 100 320 80 220 320 210
- Median 195mg
Median Real 159 179 100 45 384 230 100 320 80
220 320 210 195 Bootstrap replicates B1 320
159 45 320 100 320 100 320 100 230 100 210
185 B2 384 384 45 384 45 384 100 80 45 179
230 230 205 B3 159 320 80 45 45 80 220 210
230 320 230 220 215 B4 220 179 384 100 80 100
230 230 179 230 384 45 200 B5 320 220 210 100
159 320 220 210 100 80 100 210 210 B6 80 100
230 100 210 384 159 220 320 45 45 210 185 B7
179 210 80 320 100 230 159 320 100 45 384 320
195 B8 384 159 100 159 100 179 100 179 220 384
220 159 169 B9 320 210 45 320 179 159 100 210
159 45 210 100 169 ...
21Non-Parametric Bootstrap an exampleMedian Gill
Weight in Crabs
- Gill weights (in mg)
- 159 179 100 45 384 230 100 320 80 220 320 210
- Median 195mg
- Bootstrap
- mean(1000 samples)
- median 188mg
- 95 c.i. 100-275mg
- b(25) -b(975)
22Bootstraps in Molecular Genetics
- Calculate tree based on genetic data
- (e.g. 20 species and 300 loci)
- For each bootstrap replicate
- Resample loci with replacement
- (20 species with 300 loci, some repeats)
- Calculate tree
- Look at agreement between original and bootstrap
trees
23Bootstrapped spanning tree
Glazko Nei Mol. Biol. Evol. 2003
24- Bootstraps
- Better estimate of confidence
- Variable n
- Self-comparisons a problem
- e.g. Mean of associations
- Gives SEs, confidence intervals and profile of
confidence
- Jackknives
- Worse estimate of confidence
- Usually conservative
- underestimates precision
- Fixed n
- Self-comparisons not a problem
- Reduces Bias
- Only directly gives SE
- Confidence intervals need assumption of normality
25Bootstraps and Jackknives
- Give estimates of confidence (and bias) when
- distributions unknown, approximate, or
intractable - Parametric bootstrap
- very useful if model known
- needs programming
- Non-parametric bootstrap
- widely applicable (except self-referencing
situations) - few assumptions
- Jackknife
- approximate
- only standard error given directly
- useful when bootstrap not applicable