Title: Resampling techniques
1Resampling techniques
- Why resampling?
- Jacknife
- Cross-validation
- Bootstrap
- Examples of application of bootstrap
2Why resampling?
- One of the purposes of statistics is to estimate
some parameters and their reliability. Since
estimators are functions of sample points they
are random variables. If we could find
distribution of this random variable (sample
statistic) then we could estimate reliability of
the estimators. Unfortunately apart from the
simplest cases, sampling distribution is not easy
to derive. There are several techniques to
approximate them. These include Edgeworth
series, Laplace approximation, saddle-point
approximations. They give analytical forms for
the approximate distributions. With advent of
computers computationally intensive methods are
emerging. They work in many cases satisfactorily.
- Examples of simplest cases where sample
distributions are known include - Sample mean, when sample is from a population
with normal distribution, has normal distribution
with mean value equal to the population mean and
variance equal to variance of the population
divided by the sample size if population variance
is known. If population variance is not known
then variance of sample mean is the sample
variance divided by n. - Sample variance has the distribution of multiple
of ?2 distribution. Again it is valid if
population distribution is normal and sample
points are independent. - Sample mean divided by square root of sample
variance has the multiple of the t distribution
again normal and independent case - For independent samples and normal distribution
sample variance divided by sample variance has
the multiple of F-distribution.
3Resampling techniques
- Three of the popular computer intensive
resampling techniques are - Jacknife. It is a useful tool for bias removal.
It may work fine for medium and large samples. - Cross-validation. Very useful technique for model
selection. It may help to choose best model
among those under consideration. - Bootstrap. Perhaps one of the most important
resampling techniques. It can reduce bias as well
as can give variance of the estimator. Moreover
it can give the distribution of the statistic
under consideration. This distribution can be
used for such wide variety purposes as interval
estimation, hypothesis testing.
4Jacknife
- Jacknife is used for bias removal. As we know,
mean-square error of an estimator is equal to the
square of the bias plus the variance of the
estimator. If the bias is much higher than
variance then under some circumstances Jacknife
could be used. - Description of Jacknife Let us assume that we
have a sample of size n. We estimate some sample
statistics using all the data tn. Then by
removing one point at a time we estimate tn-1,i,
where subscript indicates the size of the sample
and the index of the removed sample point. Then
new estimator is derived as - If the order of the bias of the statistic tn is
O(n-1) then after the jacknife the order of the
bias becomes O(n-2). - Variance is estimated using
- This procedure can be applied iteratively. I.e.
for the new estimator jacknife can be applied
again. First application of Jacknife can reduce
bias without changing variance of the estimator.
But its second and higher order application can
in general increases the variance of the
estimator.
5Jacknife An example
- Let us take a data set of size 12 and perform
jacknife for mean value.
data
mean 0) 368 390 379 260 404 318 352 359 216 222
283 332 323.5833 Jacknife
samples 1) 390 379 260 404 318 352 359
216 222 283 332 319.5455 2) 368
379 260 404 318 352 359 216 222 283 332
317.5455 3) 368 390 260 404 318
352 359 216 222 283 332 318.5455 4)
368 390 379 404 318 352 359 216 222 283
332 329.3636 5) 368 390 379 260
318 352 359 216 222 283 332
316.2727 6) 368 390 379 260 404 352 359
216 222 283 332 324.0909 7) 368
390 379 260 404 318 359 216 222 283 332
321.0000 8) 368 390 379 260 404 318
352 216 222 283 332 320.3636
9) 368 390 379 260 404 318 352 359 222
283 332 333.3636 10) 368 390 379
260 404 318 352 359 216 283 332
332.8182 11) 368 390 379 260 404 318 352 359
216 222 332 327.2727 12) 368
390 379 260 404 318 352 359 216 222 283
322.8182 tjack 12323.5833-11mean(t)
323.5833. It is an unbiased estimator var(t)
345.7058
6Cross-validation
- Cross-validation is a resampling technique to
overcome overfitting. - Let us consider a least-squares technique. Let us
assume that we have a sample of size n
y(y1,y2,,,yn). We want to estimate the
parameters ?(?1, ?2,,, ?m). Now let us further
assume that mean value of the observations is a
function of these parameters (we may not know
form of this function). Then we can postulate
that function has a form g. Then we can find
values of the parameters using least-squares
techniques. - Where X is a fixed matrix or a matrix of random
variables. After minimisation of h we will have
values of the parameters, therefore complete
definition of the function. Form of the function
g defines model we want to use. We may have
several forms of the function. Obviously if we
have more parameters, the fit will be better.
Question is what would happen if we would have
new observations. Using estimated values of the
parameters we could estimate the square of
differences. Let us say we have new observations
(yn1,,,ynl). Can our function predict these new
observations? Which function predicts better? To
answer to these questions we can calculate new
differences - Where PE is the prediction error. Function g that
gives smallest value for PE have higher
predictive power. Model that gives smaller h but
larger PE corresponds to overfitted model.
7Cross-validation Cont.
- If we have a sample of observations, can we use
this sample and choose among given models. Cross
validation attempts to reduce overfitting thus
help model selection. - Description of cross-validation We have a sample
of the size n. - Divide sample into K roughly equal size parts.
- For the k-th part, estimate parameters using K-1
parts excluding k-th part. Calculate prediction
error for k-th part. - Repeat it for all k1,2,,,K and combine all
prediction errors and get cross-validation
prediction error. - If Kn then we will have leave-one-out
cross-validation technique. - Let us denote an estimate at the k-th step by
?k (it is a vector of parameters). Let k-th
subset of the sample be Ak and number of points
in this subset is Nk.. Then prediction error per
observation is - Then we would choose the function that gives the
smallest prediction error. We can expect that in
future when we will have new observations this
function will give smallest prediction error. - This technique is widely used in modern
statistical analysis. It is not restricted to
least-squares technique. Instead of least-squares
we could could use other techniques such as .
maximum-likelihood, Bayesian estimation.
8Bootstrap
- Bootstrap is one of the computationally expensive
techniques. Its simplicity and increasing
computational power makes this technique as a
method of choice in many applications. In a very
simple form it works as follows. - We have a sample of size n. We want to estimate
some parameter ?. The estimator for this
parameter gives t. For each sample point we
assign probability (usually equal to 1/n, i.e.
all sample points have equal probability). Then
from this sample with replacement we draw another
random sample of size n and estimate ?. Let us
denote an estimate of the parameter by ti at the
j-th resampling stage. Bootstrap estimator for ?
and its variance is calculated as - It is a very simple form of application of the
bootstrap resampling. For the parameter
estimation, the number of the bootstrap samples
is usually chosen to be around 200. When
distribution is desired then the recommended
number is around 1000-2000 - Let us analyse the working of bootstrap in one
simple case. Consider a random variable X with
sample space x(x1,,,,xM). Each point have the
probability fj. I.e. - f (f1,,,fM) represents the distribution of the
population. The sample of size n will have
relative frequencies for each sample point as
9Bootstrap Cont.
- Then distribution of conditional on f will
be multinomial distribution - Multinomial distribution is the extension of the
binomial distribution and expressed as - Limiting distribution of
- Is multinormal distribution. If we resample from
the given sample then we should consider
conditional distribution of the following (that
is also multinomial distribution) - Limiting distribution of
- is the same as the conditional distribution of
the original sample. Since these two distribution
converge to the same distribution then well
behaved function of them also will have same
limiting distributions. Thus if we use bootstrap
to derive distribution of the sample statistic we
can expect that in the limit it will converge to
the distribution of sample statistic. I.e.
following two function will have the same
limiting distributions
10Bootstrap Cont.
- If we could enumerate all possible resamples from
our sample then we could build ideal bootstrap
distribution. In practice even with modern
computers it is impossible to achieve. Instead
Monte Carlo simulation is used. Usually it works
like - Draw a random sample of size of n with
replacement from the given sample of size n. - Estimate parameter and get the estimate tj.
- Repeat step 1) and 2) B times and build
frequency and cumulative distributions for t
11Bootstrap Cont.
- While resampling we did not use any assumption
about the population distribution. Si this
bootstrap is a non-parametric bootstrap. If we
have some idea about the population distribution
then we can use it in resampling. I.e. when we
draw randomly from our sample we can use
population distribution. For example if we know
that population distribution is normal then we
can estimate its parameters using our sample
(sample mean and variance). Then we can
approximate population distribution with this
sample distribution and use it to draw new
samples. As it can be expected if assumption
about population distribution is correct then
parametric bootstrap will perform better. If it
is not correct then non-parametric bootstrap will
overperform its parametric counterpart.
12Balanced bootstrap
- One of the variation of bootstrap resampling is
balanced bootstrap. In this case, when resampling
one makes sure that number of occurrences of each
sample point is the same. I.e. if we make B
bootstrap we try to make the number of xi equal
to B in all bootstrap samples. Of course, in each
sample some of the observation will be present
several times and other will be missing. But for
all of them we want to make sure that all sample
points are present and their number of
occurrences is the same. It can be achieved as
follows - Let us assume that the number of sample points is
n. - Repeat numbers from 1 to n, B times
- Find a random permutation of numbers from 1 to
nB. Call it a vector N(nB) - Take the first n points from N and the
corresponding sample points. Estimate parameter
of interest. Then take the second n points (from
n1 to 2n) and corresponding sample points and do
estimation. Repeat it B times and find bootstrap
estimators, distributions and etc.
13Balanced bootstrap Example.
- Let us assume that we have 3 sample points and
number of bootstraps we want is 3. Our
observations are (x1,x2,x3) - Then we repeat numbers from 1 to 3 three times
- 1 2 3 1 2 3 1 2 3
- Then we take one of the random permutations of
numbers from 1 to 3x39. E.g. - 4 3 9 5 6 1 2 8 7
- First we take observations x1,x3,x3 estimate the
parameter - Then we take x2,x3,x1 and estimate the parameter
- Then we take x2,x2,x1 and we estimate parameter.
- As it can be seen each observation is present 3
times. - This technique meant to improve the results of
bootstrap resampling.
14Bootstrap Example.
- Let us take the example we used for Jackknife. We
generate 10000 (simple) bootstrap samples and
estimate for each of them the mean value. Here is
the bootstrap distribution of the estimated
parameter. This distribution now can be used for
various purposes (for variance estimation, for
interval estimation, hypothesis testing and so
on). For comparison the normal distribution with
mean equal to the sample mean and variance equal
to the sample variance divided by number of
elements is also given (black line) .
It seems that the approximation with the normal
distribution was sufficiently good.
15References
- Efron, B (1979) Bootstrap methods another look
at the jacknife. Ann Statist. 7, 1-26 - Efron, B Tibshirani, RJ (1993) An Introduction
to the Bootstrap - Chernick, MR. (1999) Bootstrap Methods A
practitioners Guide. - Berthold, M and Hand, DJ (2003) Intelligent Data
Analysis - Kendalls advanced statistics, Vol 1 and 2
16Exercise 2
- Differences between means and bootstrap
confidence intervals - Take data set from R - sleep. Find differences
between means for treatments 1 and 2. Find
confidence intervals using t.test and bootstrap
technique. - Necessary commands
- data(sleep)
- a sleepextrasleepgroup1
- b sleepextrasleepgroup2
- t.test - standard t-test
- boot_mean - It is available from
mres_course/2006 webpage. - Write a report.