CSI5388:%20Functional%20Elements%20of%20Statistics%20for%20Machine%20Learning%20%20Part%20I presentation

About This Presentation

Transcript and Presenter's Notes

Title: CSI5388:%20Functional%20Elements%20of%20Statistics%20for%20Machine%20Learning%20%20Part%20I

1
CSI5388Functional Elements of Statistics for
Machine Learning Part I
2
Contents of the Lecture

Part I (This set of lecture notes)
Definition and Preliminaries
Hypothesis Testing Parametric Approaches
Part II (The next set of lecture notes)
Hypothesis Testing Non-Parametric Approaches
Power of a Test
Statistical Tests for Comparing Multiple
Classifiers

3
Definitions and Preliminaries I

A Random Variable is a function, which assigns
unique numerical values to all possible outcomes
of a random experiment under fixed conditions.
If X takes on N values x1, x2, .. xN, such that
each xi ? R, then,
The Mean of X is
The Variance is
The Standard Deviation is

4
Definitions and Preliminaries II

Sample Variance
Sample Standard Deviation

5
Hypothesis Testing

Generalities
Sampling Distributions
Procedure
One- versus Two-tailed tests
Parametric approaches

6
Generalities

Purpose If we assume a given sampling
distribution, we want to establish whether or not
a sample result is representative of the sampling
distribution or not. This is interesting because
it helps us decide whether the results we
obtained on an experiment can generalize to
future data.
Approaches to Hypothesis Testing There are two
different approached to hypothesis testing
Parametric and Non-Parametric approaches

7
Sampling Distributions

Definition The sampling distribution of a
statistic (example, the mean, the median or any
other description/summary of a data set) is the
distribution of values obtained for that
statistics over all possible samplings of the
same size from a given population.
Note Since the populations under study are
usually infinite or at least, very large, the
true sampling distribution is usually unknown.
Therefore, rather than finding its exact value,
it will have to be estimated. Nonetheless, we can
do so quite well, especially when considering the
mean of the data

8
Procedure I

Idea If we assume a given sampling distribution,
we want to establish whether or not a sample
result is representative of the sampling
distribution or not. This is interesting because
it helps us decide whether the results we
obtained on an experiment can generalize to
future data.
Example If a sample mean we obtain on a
particular data sample is representative of the
sampling distribution, then we can conclude that
our data sample is representative of the whole
population. If not, it means that the values in
our sample are unrepresentative. (Perhaps this
sample contained data that were particularly
easy or particularly difficult to classify).

9
Procedure II

State your research hypothesis
Formulate a null hypothesis stating the opposite
of your research hypothesis. In particular, the
null hypothesis regards the relationship between
the sampling statistics of the basic population
and the sample result you obtained from your
specific set of data.
Collect your specific data and compute the
statistics sample result on it.
Calculate the probability of obtaining the sample
result you obtained if the sample emanated from
the data set that gave you the original sample
statistic.
If this probability is low, reject the null
hypothesis, and state that the sample you
considered does not emanate from the data set
that gave you the original sample statistic.

10
One- and Two-Tailed Tests

If H0 is expressed as an equality, then there are
two ways to reject H0. Either the statistic
computed from your sample at hand is lower than
the sampling statistics or it is higher. If you
are only concerned about either lower or higher
statistics, then you should perform a one-tailed
test. If you are simultaneously concerned about
the two ways in which H0 can be rejected, then
you should perform a two-tailed test.

11
Parametric Approaches to Hypothesis Testing

The classical approach to hypothesis testing is
parametric. This means that in order to be
applied, this approach makes a number of
assumptions regarding the distribution of the
population and the available sample.
Non-parametric approaches, discussed later do not
make these strong assumptions, although they do
make some assumptions as well, as will be
discussed there.

12
Why are Hypothesis Tests often applied to means?

Hypothesis tests are often applied to means. The
reason is that unlike for other statistics, the
standard deviation of the mean is known and
simple to calculate.
Since, without a standard deviation, hypothesis
testing could not be performed (since the
probability that the sample under consideration
emanates from the population that is represented
by the original sampling statistics is linked to
this standard deviation), having access to the
standard deviation is essential.

13
Why is the standard deviation of the mean easy to
calculate?

Because of the important Central Limit Theorem
which states that no matter how your original
population is distributed, if you use large
enough samples, then the sampling distribution of
the mean of these samples approaches a normal
distribution. If the mean of the original
population is µ and its standard deviation s,
then the mean of the sampling distribution is µ
and its standard deviation s/sqrt(N).

14
When is the sampling distribution of the mean
Normal?

The number of samples necessary for the sampling
distribution of the mean to approach normal
depends on the distribution of the parent
population.
If the parent population is normal, then the
sampling distribution of the mean is also normal.
If the parent population is not normal, but
symmetrical and uni-modal, then the sampling
distribution of the mean will be normal, even for
small sample sizes.
If the population is very skewed, then, sample
sizes of at least 30 will be required for the
sampling distribution of the mean to be normal.

15
How are hypothesis tests set up?t-tests

Hypothesis Tests are used to find out whether a
sample mean comes from a sampling distribution
with a specified mean.
We will consider
One-sample t-tests
µ, s known
µ, s unknown
Two-sample t-tests
Two-matched samples
Two-independent samples

16
One-sample t-tests known

If s is known, we can use the central limit
theorem to obtain the sampling distribution of
this populations mean (mean is µ and standard
deviation is s/sqrt(N)).
Let X be the mean of our data sample, we compute
z (X µ)/(s/sqrt(N)) (1)
We find the probability that z is as large as the
value obtained from the z-table and then output
this probability if we are solely interested in a
one-tailed test and double it before outputting
it if we are interested in a two-tailed test.
If this output probability is smaller than .05,
we would reject H0 at the .05 level of
significance. Otherwise, we would state that we
have no evidence to conclude that H0 does not
hold.

17
What is the meanings and purpose of z?

Normal distributions can all be easily mapped
into a single one, using a specific
transformation.
This means that, in our hypothesis tests, we can
use the same information about the sampling
distribution over and over (if we assume that our
population is normally distributed), no matter
what the mean and variance of our actual
population are.
Any observation can be changed into a standard
score, z, with respect to mean0 and standard
deviation 1, as follows
Z (X-mean)/sd

18
One-sample t-tests unknown

In most situations, s, the variance of the
population is unknown. In this case, we replace s
by s, the sample standard deviation, in equation
(1) yielding
t (X µ)/(s/sqrt(N)) (2)
Because s is likely to under-estimate s, and,
thus, return a t-value larger than z would have
been had s been known, it is inappropriate to use
the distribution of z to accept or reject the
null hypothesis.
Instead, we use the Students t distribution,
which corrects for this problem and compares t to
the t-table with degree of freedom N-1. We then
proceed as we did for z on the slide about s
known, above.

19
What is the meanings and purpose of t?

t follows the same principle as z except for the
fact that t should be used when the standard
deviation is unknown.
t, however, represents a family of curves rather
than a single curve. The shape of the t
distribution changes from sample size to sample
size.
As the sample size grows larger and larger, t
looks more and more like a normal distribution

20
Assumption of the t-test with s
unknown

Please, note that one assumption is made in the
use of the t-test. That is that we assume that
the sample was drawn from a normally distributed
population.
This is required because the derivation of t by
Student was based on the assumption that the mean
and variance of the population were independent,
an assumption that is true in the case of a
normal distribution.
In practice, however, the assumption about the
distribution from which the sample was drawn can
be lifted whenever the sample size is
sufficiently large to produce a normal sampling
distribution of the mean. In general, n 25 or 30
(number of cases in a sample) is sufficiently
large. Often, it can be smaller than that.

21
Two-sample t-testsmatched samples

Given two matched population, we want to test
whether the difference in means between these two
populations are significant or not. We do so by
looking at the difference in means, D, and
variance, SD, between these two populations and
comparing it to the mean of 0.
We can then apply the t-test as we did above, in
the case where s was unknown.
This time, we have
t (D 0)/ (SD/sqrt(n)) (3)
We use the t-table as before with a n-1 degree of
freedom, and the same assumptions about the
normality of the distribution.

22
Two-sample t-testsindependent samples

This time, we are interested in comparing two
populations with different means and variance.
The two populations are completely independent.
We can, again apply the t-test, with the same
conditions applying, using the formula
t (X1 X2)/ sqrt((s12/n1) (s22/n2))

23
Confidence Intervals

Sample means represent point estimates of the
mean parameter.Here, we are interested in
interval estimates, which tell us how large or
small the true value of µ could be without
causing us to reject H0, given that we ran a
t-test on the mean of our sample.
To calculate these intervals, we simply take the
equations presented on the previous slides and
express them in terms of µ, and as a function of
t.
We then replace t for the two-tailed value we are
interested in in the t-table. This value can be
positive or negative, meaning that we will obtain
two values for µ µupper and µlower. This gives
us the limits of the confidence interval.
The confidence interval means that µ has a
certain probability (attached to the value of t
chosen) to belong to this interval. The greater
the size of the interval, the greater the
probability that µ is included. Conversely, the
smaller that interval, the smaller the
probability that it is included.

Write a Comment

User Comments (0)

About PowerShow.com

CSI5388:%20Functional%20Elements%20of%20Statistics%20for%20Machine%20Learning%20%20Part%20I PowerPoint PPT Presentation