Resampling techniques - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Resampling techniques

Description:

Then distribution of conditional on f will be multinomial distribution: Multinomial distribution is the extension of the binomial distribution and expressed as: ... – PowerPoint PPT presentation

Number of Views:122
Avg rating:3.0/5.0
Slides: 17
Provided by: gar115
Category:

less

Transcript and Presenter's Notes

Title: Resampling techniques


1
Resampling techniques
  • Why resampling?
  • Jacknife
  • Cross-validation
  • Bootstrap
  • Examples of application of bootstrap

2
Why resampling?
  • One of the purposes of statistics is to estimate
    some parameters and their reliability. Since
    estimators are functions of sample points they
    are random variables. If we could find
    distribution of this random variable (sample
    statistic) then we could estimate reliability of
    the estimators. Unfortunately apart from the
    simplest cases, sampling distribution is not easy
    to derive. There are several techniques to
    approximate them. These include Edgeworth
    series, Laplace approximation, saddle-point
    approximations. They give analytical forms for
    the approximate distributions. With advent of
    computers computationally intensive methods are
    emerging. They work in many cases satisfactorily.
  • Examples of simplest cases where sample
    distributions are known include
  • Sample mean, when sample is from a population
    with normal distribution, has normal distribution
    with mean value equal to the population mean and
    variance equal to variance of the population
    divided by the sample size if population variance
    is known. If population variance is not known
    then variance of sample mean is the sample
    variance divided by n.
  • Sample variance has the distribution of multiple
    of ?2 distribution. Again it is valid if
    population distribution is normal and sample
    points are independent.
  • Sample mean divided by square root of sample
    variance has the multiple of the t distribution
    again normal and independent case
  • For independent samples and normal distribution
    sample variance divided by sample variance has
    the multiple of F-distribution.

3
Resampling techniques
  • Three of the popular computer intensive
    resampling techniques are
  • Jacknife. It is a useful tool for bias removal.
    It may work fine for medium and large samples.
  • Cross-validation. Very useful technique for model
    selection. It may help to choose best model
    among those under consideration.
  • Bootstrap. Perhaps one of the most important
    resampling techniques. It can reduce bias as well
    as can give variance of the estimator. Moreover
    it can give the distribution of the statistic
    under consideration. This distribution can be
    used for such wide variety purposes as interval
    estimation, hypothesis testing.

4
Jacknife
  • Jacknife is used for bias removal. As we know,
    mean-square error of an estimator is equal to the
    square of the bias plus the variance of the
    estimator. If the bias is much higher than
    variance then under some circumstances Jacknife
    could be used.
  • Description of Jacknife Let us assume that we
    have a sample of size n. We estimate some sample
    statistics using all the data tn. Then by
    removing one point at a time we estimate tn-1,i,
    where subscript indicates the size of the sample
    and the index of the removed sample point. Then
    new estimator is derived as
  • If the order of the bias of the statistic tn is
    O(n-1) then after the jacknife the order of the
    bias becomes O(n-2).
  • Variance is estimated using
  • This procedure can be applied iteratively. I.e.
    for the new estimator jacknife can be applied
    again. First application of Jacknife can reduce
    bias without changing variance of the estimator.
    But its second and higher order application can
    in general increases the variance of the
    estimator.

5
Jacknife An example
  • Let us take a data set of size 12 and perform
    jacknife for mean value.

data
mean 0) 368 390 379 260 404 318 352 359 216 222
283 332 323.5833 Jacknife
samples 1) 390 379 260 404 318 352 359
216 222 283 332 319.5455 2) 368
379 260 404 318 352 359 216 222 283 332
317.5455 3) 368 390 260 404 318
352 359 216 222 283 332 318.5455 4)
368 390 379 404 318 352 359 216 222 283
332 329.3636 5) 368 390 379 260
318 352 359 216 222 283 332
316.2727 6) 368 390 379 260 404 352 359
216 222 283 332 324.0909 7) 368
390 379 260 404 318 359 216 222 283 332
321.0000 8) 368 390 379 260 404 318
352 216 222 283 332 320.3636
9) 368 390 379 260 404 318 352 359 222
283 332 333.3636 10) 368 390 379
260 404 318 352 359 216 283 332
332.8182 11) 368 390 379 260 404 318 352 359
216 222 332 327.2727 12) 368
390 379 260 404 318 352 359 216 222 283
322.8182 tjack 12323.5833-11mean(t)
323.5833. It is an unbiased estimator var(t)
345.7058
6
Cross-validation
  • Cross-validation is a resampling technique to
    overcome overfitting.
  • Let us consider a least-squares technique. Let us
    assume that we have a sample of size n
    y(y1,y2,,,yn). We want to estimate the
    parameters ?(?1, ?2,,, ?m). Now let us further
    assume that mean value of the observations is a
    function of these parameters (we may not know
    form of this function). Then we can postulate
    that function has a form g. Then we can find
    values of the parameters using least-squares
    techniques.
  • Where X is a fixed matrix or a matrix of random
    variables. After minimisation of h we will have
    values of the parameters, therefore complete
    definition of the function. Form of the function
    g defines model we want to use. We may have
    several forms of the function. Obviously if we
    have more parameters, the fit will be better.
    Question is what would happen if we would have
    new observations. Using estimated values of the
    parameters we could estimate the square of
    differences. Let us say we have new observations
    (yn1,,,ynl). Can our function predict these new
    observations? Which function predicts better? To
    answer to these questions we can calculate new
    differences
  • Where PE is the prediction error. Function g that
    gives smallest value for PE have higher
    predictive power. Model that gives smaller h but
    larger PE corresponds to overfitted model.

7
Cross-validation Cont.
  • If we have a sample of observations, can we use
    this sample and choose among given models. Cross
    validation attempts to reduce overfitting thus
    help model selection.
  • Description of cross-validation We have a sample
    of the size n.
  • Divide sample into K roughly equal size parts.
  • For the k-th part, estimate parameters using K-1
    parts excluding k-th part. Calculate prediction
    error for k-th part.
  • Repeat it for all k1,2,,,K and combine all
    prediction errors and get cross-validation
    prediction error.
  • If Kn then we will have leave-one-out
    cross-validation technique.
  • Let us denote an estimate at the k-th step by
    ?k (it is a vector of parameters). Let k-th
    subset of the sample be Ak and number of points
    in this subset is Nk.. Then prediction error per
    observation is
  • Then we would choose the function that gives the
    smallest prediction error. We can expect that in
    future when we will have new observations this
    function will give smallest prediction error.
  • This technique is widely used in modern
    statistical analysis. It is not restricted to
    least-squares technique. Instead of least-squares
    we could could use other techniques such as .
    maximum-likelihood, Bayesian estimation.

8
Bootstrap
  • Bootstrap is one of the computationally expensive
    techniques. Its simplicity and increasing
    computational power makes this technique as a
    method of choice in many applications. In a very
    simple form it works as follows.
  • We have a sample of size n. We want to estimate
    some parameter ?. The estimator for this
    parameter gives t. For each sample point we
    assign probability (usually equal to 1/n, i.e.
    all sample points have equal probability). Then
    from this sample with replacement we draw another
    random sample of size n and estimate ?. Let us
    denote an estimate of the parameter by ti at the
    j-th resampling stage. Bootstrap estimator for ?
    and its variance is calculated as
  • It is a very simple form of application of the
    bootstrap resampling. For the parameter
    estimation, the number of the bootstrap samples
    is usually chosen to be around 200. When
    distribution is desired then the recommended
    number is around 1000-2000
  • Let us analyse the working of bootstrap in one
    simple case. Consider a random variable X with
    sample space x(x1,,,,xM). Each point have the
    probability fj. I.e.
  • f (f1,,,fM) represents the distribution of the
    population. The sample of size n will have
    relative frequencies for each sample point as

9
Bootstrap Cont.
  • Then distribution of conditional on f will
    be multinomial distribution
  • Multinomial distribution is the extension of the
    binomial distribution and expressed as
  • Limiting distribution of
  • Is multinormal distribution. If we resample from
    the given sample then we should consider
    conditional distribution of the following (that
    is also multinomial distribution)
  • Limiting distribution of
  • is the same as the conditional distribution of
    the original sample. Since these two distribution
    converge to the same distribution then well
    behaved function of them also will have same
    limiting distributions. Thus if we use bootstrap
    to derive distribution of the sample statistic we
    can expect that in the limit it will converge to
    the distribution of sample statistic. I.e.
    following two function will have the same
    limiting distributions

10
Bootstrap Cont.
  • If we could enumerate all possible resamples from
    our sample then we could build ideal bootstrap
    distribution. In practice even with modern
    computers it is impossible to achieve. Instead
    Monte Carlo simulation is used. Usually it works
    like
  • Draw a random sample of size of n with
    replacement from the given sample of size n.
  • Estimate parameter and get the estimate tj.
  • Repeat step 1) and 2) B times and build
    frequency and cumulative distributions for t

11
Bootstrap Cont.
  • While resampling we did not use any assumption
    about the population distribution. Si this
    bootstrap is a non-parametric bootstrap. If we
    have some idea about the population distribution
    then we can use it in resampling. I.e. when we
    draw randomly from our sample we can use
    population distribution. For example if we know
    that population distribution is normal then we
    can estimate its parameters using our sample
    (sample mean and variance). Then we can
    approximate population distribution with this
    sample distribution and use it to draw new
    samples. As it can be expected if assumption
    about population distribution is correct then
    parametric bootstrap will perform better. If it
    is not correct then non-parametric bootstrap will
    overperform its parametric counterpart.

12
Balanced bootstrap
  • One of the variation of bootstrap resampling is
    balanced bootstrap. In this case, when resampling
    one makes sure that number of occurrences of each
    sample point is the same. I.e. if we make B
    bootstrap we try to make the number of xi equal
    to B in all bootstrap samples. Of course, in each
    sample some of the observation will be present
    several times and other will be missing. But for
    all of them we want to make sure that all sample
    points are present and their number of
    occurrences is the same. It can be achieved as
    follows
  • Let us assume that the number of sample points is
    n.
  • Repeat numbers from 1 to n, B times
  • Find a random permutation of numbers from 1 to
    nB. Call it a vector N(nB)
  • Take the first n points from N and the
    corresponding sample points. Estimate parameter
    of interest. Then take the second n points (from
    n1 to 2n) and corresponding sample points and do
    estimation. Repeat it B times and find bootstrap
    estimators, distributions and etc.

13
Balanced bootstrap Example.
  • Let us assume that we have 3 sample points and
    number of bootstraps we want is 3. Our
    observations are (x1,x2,x3)
  • Then we repeat numbers from 1 to 3 three times
  • 1 2 3 1 2 3 1 2 3
  • Then we take one of the random permutations of
    numbers from 1 to 3x39. E.g.
  • 4 3 9 5 6 1 2 8 7
  • First we take observations x1,x3,x3 estimate the
    parameter
  • Then we take x2,x3,x1 and estimate the parameter
  • Then we take x2,x2,x1 and we estimate parameter.
  • As it can be seen each observation is present 3
    times.
  • This technique meant to improve the results of
    bootstrap resampling.

14
Bootstrap Example.
  • Let us take the example we used for Jackknife. We
    generate 10000 (simple) bootstrap samples and
    estimate for each of them the mean value. Here is
    the bootstrap distribution of the estimated
    parameter. This distribution now can be used for
    various purposes (for variance estimation, for
    interval estimation, hypothesis testing and so
    on). For comparison the normal distribution with
    mean equal to the sample mean and variance equal
    to the sample variance divided by number of
    elements is also given (black line) .

It seems that the approximation with the normal
distribution was sufficiently good.
15
References
  • Efron, B (1979) Bootstrap methods another look
    at the jacknife. Ann Statist. 7, 1-26
  • Efron, B Tibshirani, RJ (1993) An Introduction
    to the Bootstrap
  • Chernick, MR. (1999) Bootstrap Methods A
    practitioners Guide.
  • Berthold, M and Hand, DJ (2003) Intelligent Data
    Analysis
  • Kendalls advanced statistics, Vol 1 and 2

16
Exercise 2
  • Differences between means and bootstrap
    confidence intervals
  • Take data set from R - sleep. Find differences
    between means for treatments 1 and 2. Find
    confidence intervals using t.test and bootstrap
    technique.
  • Necessary commands
  • data(sleep)
  • a sleepextrasleepgroup1
  • b sleepextrasleepgroup2
  • t.test - standard t-test
  • boot_mean - It is available from
    mres_course/2006 webpage.
  • Write a report.
Write a Comment
User Comments (0)
About PowerShow.com