Title: Introduction into the Bootstrap
1Introduction into the Bootstrap
Baron von MĂĽnchhausen
2What is bootstrapping?
- Resampling based method to obtain inferential
information on parameter estimates - Resampling based methods
- bootstrap (inferential info on parameters)
- jackknife
- permutation test (significance testing)
- cross-validation (validity of full model)
3Standard Inference based an analytically derived
results
- Derivation of sampling distribution of using
assumptions about Population Distribution
Function - and estimate standard error/confidence interval
4When Standard inference fails
- Distributional assumptions violated
- Derivation of sampling distribution impossible or
too complex
5Alternative Bootstrap
Repeat many times
6Core idea of the bootstrap
- The empirical distribution function (EDF) becomes
equal to the population distribution function
(PDF) as n?8 - For nlt8, assume that the EDF is representative of
the PDF
7Repeat many times
Estimate standarderror or Confidence interval
8Example CI for population mean µ
Repeat many times
9Bootstrap Sampling Distribution of ?
Used for estimating the Standard Error (SE) and
Confidence Interval (CI)
10(No Transcript)
11- ExampleyiĂź0 Ăź1xi ei
- Which ??
- Sample(s) drawn from which population(s)?
- How to define the EDF?
- Is s(x) is non-unique?
- How to estimate CIs from distribution of s(x)?
123. How to define the EDF?
yiĂź0 Ăź1xi ei, i1,,n
- Resampling draw n times with replacement
- non-parametric resample i from i1,,n, EDF is
xi,yi, i1,,n - semi-parametric
135. How to estimate CIs from distribution of
s(x)?
- CIs based on bootstrap tables
- CIs based on percentiles
14- Based on bootstrap tables
- Wald ( )
- Students t-interval
- Bootstrap t-interval with
If no simple se formula is available use of
Double bootstrap (pff)
15- Percentile method
- Bias corrected percentile method
- Bias corrected and accelerated (BCa)
,
16Which bootstrap CI estimate?
- Percentile methods are (and bootstrap table
methods are not) - range preserving
- transformation respecting
- BCa usually better than ordinary percentile
method - What means better??
17Quality of CI? ? Coverage
?
- central 1-2a CI CIleftCIright
- P(?ltCIleft) a P(?gtCIright) a with ?
population parameter
18(No Transcript)
19Whats next
- Principal Component Analysis
- 4. Is s(x) is non-unique? How to make s(x)
comparable? - Multilevel Component Analysis
- 2. Sample(s) drawn from which population(s)?
20Principal Component Analysis
X (I?J) observed scores of I subjects on J
variables Z standardized scores of X F
(I?Q) Principal component scores A (I?Q)
Principal loadings Q Number of selected
principal components T (Q?Q) Rotation matrix
211. Which ??
- Loadings
- 1. Principal loadings (AQ)
- 2. Rotated loadings (AQT)
- a. Procrustes rotation towards external
structure - b. use one, fixed criterion (e.g., Varimax)
- c. search for the optimal simple solution
- Oblique case correlations between components
222. Sample(s) drawn from which Population(s)?
- observed scores of I subjects on J variables
233. How to define the EDF
- non-parametric Xb rowwise resampling of Z
244. Is s(x) non-unique? How to make s(x)
comparable?
- Loadings
- 1. Principal loadings (AQ) non unique
- Sign of Principal loadings (AQ) is arbitrary
- reflect columns of AQ to the same direction
25- 1. Principal loadings (AQ) non-unique
- Sign of Principal loadings (AQ) is arbitrary
- reflect columns of AQ to the same direction
262. Rotated loadings (AQT)
a. Procrustes rotation towards external structure
reveals unique rotated solution
272. Rotated loadings (AQT)
- b. use of one, fixed criterion (e.g., Varimax)
reveals a non-unique solution - Sign order of Varimax rotated loadings is
arbitrary - reflect reorder columns of AQT
282. Rotated loadings (AQT)c. search for the
optimal simple solution
- How are bootstrap solutions AQT found?
- For each bootstrap solution look for optimal
simple loadings (unfeasible) reflect reorder
columns of AQT - For each bootstrap solution Procrustes rotation
towards optimally simple sample loadings
reveals unique solution
29- Fixed criterion versus Procrustes towards
(simple) sample loadings - Instable varimax rotated solutions over samples?
305. How to estimate CIs from the distribution of
?
31Simulation study
- CIs for Varimax rotated Sample loadings
- Data properties varied
- VAF in population (0.8,0.6,0.4)
- number of variables (8, 16)
- sample size (50, 100, 500)
- distribution of component scores (normal,
leptokurtic, skew) - simplicity of loading matrix (simple,
halfsimple, complex) - Design completely crossed, 1000 replicates per
cell
32- Simplicity of loading matrix ?
- Stability of Varimax solution of samples
33Quality criteria for 95CIsP(?ltCIleft) a
P(?gtCIright) a
- 95coverage(1-prop(?ltCIleft)-prop(?gtCIright))100
34Quality of estimated confidence intervals
35(No Transcript)
36Empirical example of2 PCA loadings
37Multilevel Component Analysis
- Examples
- inhabitants within different countries
- measurement occasions within different subjects
382. Sample(s) drawn from which population(s)?
Which level(s) considered fixed, which random?
- different countries and samples of inhabitants
- sample of mothers and their children
- sample of hospitals and samples of patients
- level 2 (countries) fixed, level 1 (inhabitants)
random - level 2 (mothers) random,level 1 (children)
fixed - both level 2 and 1 random
393. How to define the EDF?
- MLCA (two level groups and objects)
- level 2 fixed, level 1 random? (multi-group)
- Resample objects within all groups
- level 2 random, level 1 fixed (multi-observation
)? - Resample groups (keeping all associated objects)
- levels 2 and 1 random? (real multilevel)
- Resample objects within resampled groups
Object resampling
Group resampling
Double resampling
40Quality of estimated confidence intervals
multi-group case level 2 fixed, level 1 random
10 groups 20, 100, or 200 individuals per group
multi-observation case level 2 random, level 2
fixed
20, 100 or 200 groups 10 individuals per group
41multi-group case level 2 fixed, level 1 random
multi-level case level 2 and level 1 random
20 groups 40 groups 20 groups 40 groups high
loadings low loadings high loadings low
loadings between within
42To conclude
43Some remarks
- Bootstrapping is no solution for small sample
sizes - THE bootstrap procedure does not exist
- Be very careful in designing a bootstrap
procedure (you may test it via simulation)
44(No Transcript)