Bootstraps and Jackknives - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Bootstraps and Jackknives

Description:

The non-parametric bootstrap ('The bootstrap') Estimation without confidence ... e.g. What is sex ratio of vole population? Trap: 12 males 15 females ... – PowerPoint PPT presentation

Number of Views:119
Avg rating:3.0/5.0
Slides: 26
Provided by: HalWhi9
Category:

less

Transcript and Presenter's Notes

Title: Bootstraps and Jackknives


1
Bootstraps and Jackknives
  • Hal Whitehead
  • BIOL4062/5062

2
  • Confidence in estimators
  • Why use bootstraps or jackknives?
  • The jackknife
  • The parametric bootstrap
  • The non-parametric bootstrap
  • (The bootstrap)

3
Estimation without confidence(standard error,
confidence interval)has little value
4
Confidence in estimatesTraditional approach
  • DATA Biological
  • model
  • Estimator Statistical
  • (Statistic) model
  • Confidence in estimator

?
5
Confidence in estimatesTraditional approach
  • e.g. What is sex ratio of vole population?
  • Trap 12 males 15 females
  • Estimate ratio 12/(1215)0.444
  • Using binomial distribution
  • SE ?0.444x(1-0.444)/(1215)0.096
  • So Sex ratio is estimated to be 0.444 (SE 0.096)

6
e.g. Asymmetry of size among nestlings in nests
of 6
  • Measure difference between size of nestling and
    its most similar neighbour
  • 1.2 4.3 4.7 3.2 6.1 1.3 gt
  • 0.1 0.4 0.4 1.1 1.4 0.1 0.58
  • But what confidence have we in this?

7
Confidence in estimatorMean distance between
animals
  • In a small population
  • what is the expected distance between any two
    animals?
  • Estimate is mean of distances between all pairs
    of animals
  • What is confidence in this estimate?
  • no easy formula (lack of independence)

8
Use Bootstraps and Jackknives when
  • No clear biological model
  • Deriving statistical model
  • very difficult, impossible, or tedious
  • Statistical model too complicated to be useful
  • Model may not be quite valid
  • Accurate measure of precision under statistical
    model only possible with large n

9
The Jackknife
  • Data D X1, X2, X3, .... ,Xn gt statistic s
  • Jackknife replicates miss out units (or groups of
    units) in turn
  • J1 X2, X3, .... ,Xn gt statistic s-1 (missing
    unit 1)
  • J2 X1, X3, .... ,Xn gt statistic s-2 (missing
    unit 2)
  • etc.
  • Convert into pseudovalues
  • f1 ns - (n-1)s-1
  • f2 ns - (n-1)s-2
  • etc.

10
The Jackknife
  • The Jackknifed Estimate of s is then
  • sJ mean(f1,...,fn)
  • SE(s) SE(f1,...,fn)

11
The Jackknife
  • Jackknifed Estimate removes bias
  • Jackknife SE rough and ready
  • usually conservative (overestimates SE)
  • Jackknife on blocks of units, if data not
    independent
  • Assumes normality for confidence intervals

12
Correlation between gill weight and body weight
in 12 crabs
Gill(mg) Body(g) r-i fi 1590 14.40
0.888 0.607 1790 15.20 0.884 0.656
10 11.30 0.892 0.570 450 2.50
0.830 1.249 3840 22.70 0.811 1.452
23 14.90 0.863 0.879 10 1.41
0.875 0.751 32 15.81 0.872 0.779
8 4.19 0.845 1.078 22 15.39
0.867 0.843 32 17.25 0.858 0.940
21 9.52 0.877 0.725
r 0.865
  • Jackknife r 0.878 Mean fi
  • SE 0.0768 SD(fi)/?12)

13
Bootstraps
14
Parametric Bootstrap
  • Assume Data produced by Model with some
    Parameters unknown, which need to be estimated
  • Model gt Data gt Parameter estimates (s)
  • The Bootstrap process
  • Model Parameter estimates (s) gt Random data
    gt Bootstrap replicate estimates (s)
  • Distribution of Bootstrap replicate estimates
    (ss) give distribution, confidence intervals and
    standard errors of s (plus indicator of bias)
  • Usually use 100-10,000 bootstrap replicates

15
Parametric Bootstrapan exampleMark-Recapture
Estimate
  • Mark 25 animals
  • Recapture 46
  • of which 12 Marked
  • What is population size?
  • Petersen estimate is 25x46/1295.8
  • What is confidence in this estimate, expected
    bias?

16
Parametric Bootstrapan exampleMark-Recapture
Estimate
  • Mark 25 animals Recapture 46, 12 Marked
  • Petersen estimate is 25x46/1295.8
  • What is confidence, expected bias?
  • Parametric Bootstrap Replicates
  • 96 Animals, mark 25, recapture 46
  • How many marked?
  • From simulation (ms)
  • 9 14 14 9 14 13 12 13 12
    14 ...
  • Calculate population estimates (ns 25x46/ms)
  • 127.8 82.1 82.1 127.8 82.1 88.5 95.8 88.5
    95.8 82.1..

17
Parametric Bootstrapan exampleMark-Recapture
Estimate
  • Petersen estimate is 25x46/1295.8
  • Bootstrap population estimates (assuming n96)
  • 127.8 82.1 82.1 127.8 82.1 88.5 95.8 88.5
    95.8 82.1..
  • Expected Bias
  • mean(ns) - 96 99.7 - 96 3.7
  • Estimated standard error
  • SD(ns) 20.4
  • So population estimate is 92.1 (SE 20.4)

18
Parametric Bootstrapan exampleMark-Recapture
Estimate
19
Non-Parametric Bootstrap(A.K.A. The Bootstrap)
  • Data D X1, X2, X3, .... ,Xn gt statistic s
  • Bootstrap replicate
  • D1 X1, X2, X3, .... ,Xn gt statistic s1
  • D2 X1, X2, X3, .... ,Xn gt statistic s2
  • ...
  • X1, X2, X3, .... ,Xn are randomly selected
    with replacement, from X1, X2, X3, .... ,Xn
  • Distribution, confidence interval and SE of s
    estimated from the distribution, confidence
    interval and standard error of the ss
  • Usually use 100-10,000 bootstrap replicates

20
Non-Parametric Bootstrap an exampleMedian Gill
Weight in Crabs
  • Gill weights (in mg)
  • 159 179 100 45 384 230 100 320 80 220 320 210
  • Median 195mg


Median Real 159 179 100 45 384 230 100 320 80
220 320 210 195 Bootstrap replicates B1 320
159 45 320 100 320 100 320 100 230 100 210
185 B2 384 384 45 384 45 384 100 80 45 179
230 230 205 B3 159 320 80 45 45 80 220 210
230 320 230 220 215 B4 220 179 384 100 80 100
230 230 179 230 384 45 200 B5 320 220 210 100
159 320 220 210 100 80 100 210 210 B6 80 100
230 100 210 384 159 220 320 45 45 210 185 B7
179 210 80 320 100 230 159 320 100 45 384 320
195 B8 384 159 100 159 100 179 100 179 220 384
220 159 169 B9 320 210 45 320 179 159 100 210
159 45 210 100 169 ...
21
Non-Parametric Bootstrap an exampleMedian Gill
Weight in Crabs
  • Gill weights (in mg)
  • 159 179 100 45 384 230 100 320 80 220 320 210
  • Median 195mg
  • Bootstrap
  • mean(1000 samples)
  • median 188mg
  • 95 c.i. 100-275mg
  • b(25) -b(975)

22
Bootstraps in Molecular Genetics
  • Calculate tree based on genetic data
  • (e.g. 20 species and 300 loci)
  • For each bootstrap replicate
  • Resample loci with replacement
  • (20 species with 300 loci, some repeats)
  • Calculate tree
  • Look at agreement between original and bootstrap
    trees

23
Bootstrapped spanning tree
Glazko Nei Mol. Biol. Evol. 2003
24
  • Bootstraps
  • Better estimate of confidence
  • Variable n
  • Self-comparisons a problem
  • e.g. Mean of associations
  • Gives SEs, confidence intervals and profile of
    confidence
  • Jackknives
  • Worse estimate of confidence
  • Usually conservative
  • underestimates precision
  • Fixed n
  • Self-comparisons not a problem
  • Reduces Bias
  • Only directly gives SE
  • Confidence intervals need assumption of normality

25
Bootstraps and Jackknives
  • Give estimates of confidence (and bias) when
  • distributions unknown, approximate, or
    intractable
  • Parametric bootstrap
  • very useful if model known
  • needs programming
  • Non-parametric bootstrap
  • widely applicable (except self-referencing
    situations)
  • few assumptions
  • Jackknife
  • approximate
  • only standard error given directly
  • useful when bootstrap not applicable
Write a Comment
User Comments (0)
About PowerShow.com