Introduction into the Bootstrap - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Introduction into the Bootstrap

Description:

Resampling based method to obtain inferential information on parameter estimates ... permutation test (significance testing) cross-validation (validity of full model) ... – PowerPoint PPT presentation

Number of Views:143
Avg rating:3.0/5.0
Slides: 45
Provided by: URCG
Category:

less

Transcript and Presenter's Notes

Title: Introduction into the Bootstrap


1
Introduction into the Bootstrap
  • Marieke Timmerman

Baron von MĂĽnchhausen
2
What is bootstrapping?
  • Resampling based method to obtain inferential
    information on parameter estimates
  • Resampling based methods
  • bootstrap (inferential info on parameters)
  • jackknife
  • permutation test (significance testing)
  • cross-validation (validity of full model)

3
Standard Inference based an analytically derived
results
  • Derivation of sampling distribution of using
    assumptions about Population Distribution
    Function
  • and estimate standard error/confidence interval

4
When Standard inference fails
  • Distributional assumptions violated
  • Derivation of sampling distribution impossible or
    too complex

5
Alternative Bootstrap
Repeat many times
6
Core idea of the bootstrap
  • The empirical distribution function (EDF) becomes
    equal to the population distribution function
    (PDF) as n?8
  • For nlt8, assume that the EDF is representative of
    the PDF

7
Repeat many times
Estimate standarderror or Confidence interval
8
Example CI for population mean µ
Repeat many times
9
Bootstrap Sampling Distribution of ?
Used for estimating the Standard Error (SE) and
Confidence Interval (CI)
10
(No Transcript)
11
  • ExampleyiĂź0 Ăź1xi ei
  • Which ??
  • Sample(s) drawn from which population(s)?
  • How to define the EDF?
  • Is s(x) is non-unique?
  • How to estimate CIs from distribution of s(x)?

12
3. How to define the EDF?
yiĂź0 Ăź1xi ei, i1,,n
  • Resampling draw n times with replacement
  • non-parametric resample i from i1,,n, EDF is
    xi,yi, i1,,n
  • semi-parametric
  • parametric

13
5. How to estimate CIs from distribution of
s(x)?
  • CIs based on bootstrap tables
  • CIs based on percentiles

14
  • Based on bootstrap tables
  • Wald ( )
  • Students t-interval
  • Bootstrap t-interval with

If no simple se formula is available use of
Double bootstrap (pff)
15
  • Based on percentiles
  • Percentile method
  • Bias corrected percentile method
  • Bias corrected and accelerated (BCa)

,
16
Which bootstrap CI estimate?
  • Percentile methods are (and bootstrap table
    methods are not)
  • range preserving
  • transformation respecting
  • BCa usually better than ordinary percentile
    method
  • What means better??

17
Quality of CI? ? Coverage
?
  • central 1-2a CI CIleftCIright
  • P(?ltCIleft) a P(?gtCIright) a with ?
    population parameter

18
(No Transcript)
19
Whats next
  • Principal Component Analysis
  • 4. Is s(x) is non-unique? How to make s(x)
    comparable?
  • Multilevel Component Analysis
  • 2. Sample(s) drawn from which population(s)?

20
Principal Component Analysis
X (I?J) observed scores of I subjects on J
variables Z standardized scores of X F
(I?Q) Principal component scores A (I?Q)
Principal loadings Q Number of selected
principal components T (Q?Q) Rotation matrix
21
1. Which ??
  • Loadings
  • 1. Principal loadings (AQ)
  • 2. Rotated loadings (AQT)
  • a. Procrustes rotation towards external
    structure
  • b. use one, fixed criterion (e.g., Varimax)
  • c. search for the optimal simple solution
  • Oblique case correlations between components

22
2. Sample(s) drawn from which Population(s)?
  • observed scores of I subjects on J variables

23
3. How to define the EDF
  • non-parametric Xb rowwise resampling of Z


24
4. Is s(x) non-unique? How to make s(x)
comparable?
  • Loadings
  • 1. Principal loadings (AQ) non unique
  • Sign of Principal loadings (AQ) is arbitrary
  • reflect columns of AQ to the same direction

25
  • 1. Principal loadings (AQ) non-unique
  • Sign of Principal loadings (AQ) is arbitrary
  • reflect columns of AQ to the same direction

26
2. Rotated loadings (AQT)
a. Procrustes rotation towards external structure
reveals unique rotated solution
27
2. Rotated loadings (AQT)
  • b. use of one, fixed criterion (e.g., Varimax)
    reveals a non-unique solution
  • Sign order of Varimax rotated loadings is
    arbitrary
  • reflect reorder columns of AQT

28
2. Rotated loadings (AQT)c. search for the
optimal simple solution
  • How are bootstrap solutions AQT found?
  • For each bootstrap solution look for optimal
    simple loadings (unfeasible) reflect reorder
    columns of AQT
  • For each bootstrap solution Procrustes rotation
    towards optimally simple sample loadings
    reveals unique solution

29
  • Fixed criterion versus Procrustes towards
    (simple) sample loadings
  • Instable varimax rotated solutions over samples?

30
5. How to estimate CIs from the distribution of
?
  • Wald?
  • BCa?

31
Simulation study
  • CIs for Varimax rotated Sample loadings
  • Data properties varied
  • VAF in population (0.8,0.6,0.4)
  • number of variables (8, 16)
  • sample size (50, 100, 500)
  • distribution of component scores (normal,
    leptokurtic, skew)
  • simplicity of loading matrix (simple,
    halfsimple, complex)
  • Design completely crossed, 1000 replicates per
    cell

32
  • Simplicity of loading matrix ?
  • Stability of Varimax solution of samples

33
Quality criteria for 95CIsP(?ltCIleft) a
P(?gtCIright) a
  • 95coverage(1-prop(?ltCIleft)-prop(?gtCIright))100

34
Quality of estimated confidence intervals
35
(No Transcript)
36
Empirical example of2 PCA loadings
37
Multilevel Component Analysis
  • Examples
  • inhabitants within different countries
  • measurement occasions within different subjects

38
2. Sample(s) drawn from which population(s)?
Which level(s) considered fixed, which random?
  • different countries and samples of inhabitants
  • sample of mothers and their children
  • sample of hospitals and samples of patients
  • level 2 (countries) fixed, level 1 (inhabitants)
    random
  • level 2 (mothers) random,level 1 (children)
    fixed
  • both level 2 and 1 random

39
3. How to define the EDF?
  • MLCA (two level groups and objects)
  • level 2 fixed, level 1 random? (multi-group)
  • Resample objects within all groups
  • level 2 random, level 1 fixed (multi-observation
    )?
  • Resample groups (keeping all associated objects)
  • levels 2 and 1 random? (real multilevel)
  • Resample objects within resampled groups

Object resampling
Group resampling
Double resampling
40
Quality of estimated confidence intervals
multi-group case level 2 fixed, level 1 random
10 groups 20, 100, or 200 individuals per group
multi-observation case level 2 random, level 2
fixed
20, 100 or 200 groups 10 individuals per group
41
multi-group case level 2 fixed, level 1 random
multi-level case level 2 and level 1 random
20 groups 40 groups 20 groups 40 groups high
loadings low loadings high loadings low
loadings between within
42
To conclude
43
Some remarks
  • Bootstrapping is no solution for small sample
    sizes
  • THE bootstrap procedure does not exist
  • Be very careful in designing a bootstrap
    procedure (you may test it via simulation)

44
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com