CSI5388:%20Functional%20Elements%20of%20Statistics%20for%20Machine%20Learning%20%20Part%20I PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: CSI5388:%20Functional%20Elements%20of%20Statistics%20for%20Machine%20Learning%20%20Part%20I


1
CSI5388Functional Elements of Statistics for
Machine Learning Part I
2
Contents of the Lecture
  • Part I (This set of lecture notes)
  • Definition and Preliminaries
  • Hypothesis Testing Parametric Approaches
  • Part II (The next set of lecture notes)
  • Hypothesis Testing Non-Parametric Approaches
  • Power of a Test
  • Statistical Tests for Comparing Multiple
    Classifiers

3
Definitions and Preliminaries I
  • A Random Variable is a function, which assigns
    unique numerical values to all possible outcomes
    of a random experiment under fixed conditions.
  • If X takes on N values x1, x2, .. xN, such that
    each xi ? R, then,
  • The Mean of X is
  • The Variance is
  • The Standard Deviation is

4
Definitions and Preliminaries II
  • Sample Variance
  • Sample Standard Deviation

5
Hypothesis Testing
  • Generalities
  • Sampling Distributions
  • Procedure
  • One- versus Two-tailed tests
  • Parametric approaches

6
Generalities
  • Purpose If we assume a given sampling
    distribution, we want to establish whether or not
    a sample result is representative of the sampling
    distribution or not. This is interesting because
    it helps us decide whether the results we
    obtained on an experiment can generalize to
    future data.
  • Approaches to Hypothesis Testing There are two
    different approached to hypothesis testing
    Parametric and Non-Parametric approaches

7
Sampling Distributions
  • Definition The sampling distribution of a
    statistic (example, the mean, the median or any
    other description/summary of a data set) is the
    distribution of values obtained for that
    statistics over all possible samplings of the
    same size from a given population.
  • Note Since the populations under study are
    usually infinite or at least, very large, the
    true sampling distribution is usually unknown.
    Therefore, rather than finding its exact value,
    it will have to be estimated. Nonetheless, we can
    do so quite well, especially when considering the
    mean of the data

8
Procedure I
  • Idea If we assume a given sampling distribution,
    we want to establish whether or not a sample
    result is representative of the sampling
    distribution or not. This is interesting because
    it helps us decide whether the results we
    obtained on an experiment can generalize to
    future data.
  • Example If a sample mean we obtain on a
    particular data sample is representative of the
    sampling distribution, then we can conclude that
    our data sample is representative of the whole
    population. If not, it means that the values in
    our sample are unrepresentative. (Perhaps this
    sample contained data that were particularly
    easy or particularly difficult to classify).

9
Procedure II
  1. State your research hypothesis
  2. Formulate a null hypothesis stating the opposite
    of your research hypothesis. In particular, the
    null hypothesis regards the relationship between
    the sampling statistics of the basic population
    and the sample result you obtained from your
    specific set of data.
  3. Collect your specific data and compute the
    statistics sample result on it.
  4. Calculate the probability of obtaining the sample
    result you obtained if the sample emanated from
    the data set that gave you the original sample
    statistic.
  5. If this probability is low, reject the null
    hypothesis, and state that the sample you
    considered does not emanate from the data set
    that gave you the original sample statistic.

10
One- and Two-Tailed Tests
  • If H0 is expressed as an equality, then there are
    two ways to reject H0. Either the statistic
    computed from your sample at hand is lower than
    the sampling statistics or it is higher. If you
    are only concerned about either lower or higher
    statistics, then you should perform a one-tailed
    test. If you are simultaneously concerned about
    the two ways in which H0 can be rejected, then
    you should perform a two-tailed test.

11
Parametric Approaches to Hypothesis Testing
  • The classical approach to hypothesis testing is
    parametric. This means that in order to be
    applied, this approach makes a number of
    assumptions regarding the distribution of the
    population and the available sample.
  • Non-parametric approaches, discussed later do not
    make these strong assumptions, although they do
    make some assumptions as well, as will be
    discussed there.

12
Why are Hypothesis Tests often applied to means?
  • Hypothesis tests are often applied to means. The
    reason is that unlike for other statistics, the
    standard deviation of the mean is known and
    simple to calculate.
  • Since, without a standard deviation, hypothesis
    testing could not be performed (since the
    probability that the sample under consideration
    emanates from the population that is represented
    by the original sampling statistics is linked to
    this standard deviation), having access to the
    standard deviation is essential.

13
Why is the standard deviation of the mean easy to
calculate?
  • Because of the important Central Limit Theorem
    which states that no matter how your original
    population is distributed, if you use large
    enough samples, then the sampling distribution of
    the mean of these samples approaches a normal
    distribution. If the mean of the original
    population is µ and its standard deviation s,
    then the mean of the sampling distribution is µ
    and its standard deviation s/sqrt(N).

14
When is the sampling distribution of the mean
Normal?
  • The number of samples necessary for the sampling
    distribution of the mean to approach normal
    depends on the distribution of the parent
    population.
  • If the parent population is normal, then the
    sampling distribution of the mean is also normal.
  • If the parent population is not normal, but
    symmetrical and uni-modal, then the sampling
    distribution of the mean will be normal, even for
    small sample sizes.
  • If the population is very skewed, then, sample
    sizes of at least 30 will be required for the
    sampling distribution of the mean to be normal.

15
How are hypothesis tests set up?t-tests
  • Hypothesis Tests are used to find out whether a
    sample mean comes from a sampling distribution
    with a specified mean.
  • We will consider
  • One-sample t-tests
  • µ, s known
  • µ, s unknown
  • Two-sample t-tests
  • Two-matched samples
  • Two-independent samples

16
One-sample t-tests known
  • If s is known, we can use the central limit
    theorem to obtain the sampling distribution of
    this populations mean (mean is µ and standard
    deviation is s/sqrt(N)).
  • Let X be the mean of our data sample, we compute
  • z (X µ)/(s/sqrt(N)) (1)
  • We find the probability that z is as large as the
    value obtained from the z-table and then output
    this probability if we are solely interested in a
    one-tailed test and double it before outputting
    it if we are interested in a two-tailed test.
  • If this output probability is smaller than .05,
    we would reject H0 at the .05 level of
    significance. Otherwise, we would state that we
    have no evidence to conclude that H0 does not
    hold.

17
What is the meanings and purpose of z?
  • Normal distributions can all be easily mapped
    into a single one, using a specific
    transformation.
  • This means that, in our hypothesis tests, we can
    use the same information about the sampling
    distribution over and over (if we assume that our
    population is normally distributed), no matter
    what the mean and variance of our actual
    population are.
  • Any observation can be changed into a standard
    score, z, with respect to mean0 and standard
    deviation 1, as follows
  • Z (X-mean)/sd

18
One-sample t-tests unknown
  • In most situations, s, the variance of the
    population is unknown. In this case, we replace s
    by s, the sample standard deviation, in equation
    (1) yielding
  • t (X µ)/(s/sqrt(N)) (2)
  • Because s is likely to under-estimate s, and,
    thus, return a t-value larger than z would have
    been had s been known, it is inappropriate to use
    the distribution of z to accept or reject the
    null hypothesis.
  • Instead, we use the Students t distribution,
    which corrects for this problem and compares t to
    the t-table with degree of freedom N-1. We then
    proceed as we did for z on the slide about s
    known, above.

19
What is the meanings and purpose of t?
  • t follows the same principle as z except for the
    fact that t should be used when the standard
    deviation is unknown.
  • t, however, represents a family of curves rather
    than a single curve. The shape of the t
    distribution changes from sample size to sample
    size.
  • As the sample size grows larger and larger, t
    looks more and more like a normal distribution

20
Assumption of the t-test with s
unknown
  • Please, note that one assumption is made in the
    use of the t-test. That is that we assume that
    the sample was drawn from a normally distributed
    population.
  • This is required because the derivation of t by
    Student was based on the assumption that the mean
    and variance of the population were independent,
    an assumption that is true in the case of a
    normal distribution.
  • In practice, however, the assumption about the
    distribution from which the sample was drawn can
    be lifted whenever the sample size is
    sufficiently large to produce a normal sampling
    distribution of the mean. In general, n 25 or 30
    (number of cases in a sample) is sufficiently
    large. Often, it can be smaller than that.

21
Two-sample t-testsmatched samples
  • Given two matched population, we want to test
    whether the difference in means between these two
    populations are significant or not. We do so by
    looking at the difference in means, D, and
    variance, SD, between these two populations and
    comparing it to the mean of 0.
  • We can then apply the t-test as we did above, in
    the case where s was unknown.
  • This time, we have
  • t (D 0)/ (SD/sqrt(n)) (3)
  • We use the t-table as before with a n-1 degree of
    freedom, and the same assumptions about the
    normality of the distribution.

22
Two-sample t-testsindependent samples
  • This time, we are interested in comparing two
    populations with different means and variance.
    The two populations are completely independent.
  • We can, again apply the t-test, with the same
    conditions applying, using the formula
  • t (X1 X2)/ sqrt((s12/n1) (s22/n2))

23
Confidence Intervals
  • Sample means represent point estimates of the
    mean parameter.Here, we are interested in
    interval estimates, which tell us how large or
    small the true value of µ could be without
    causing us to reject H0, given that we ran a
    t-test on the mean of our sample.
  • To calculate these intervals, we simply take the
    equations presented on the previous slides and
    express them in terms of µ, and as a function of
    t.
  • We then replace t for the two-tailed value we are
    interested in in the t-table. This value can be
    positive or negative, meaning that we will obtain
    two values for µ µupper and µlower. This gives
    us the limits of the confidence interval.
  • The confidence interval means that µ has a
    certain probability (attached to the value of t
    chosen) to belong to this interval. The greater
    the size of the interval, the greater the
    probability that µ is included. Conversely, the
    smaller that interval, the smaller the
    probability that it is included.
Write a Comment
User Comments (0)
About PowerShow.com