Title: CSC 323 Quarter: Winter
1CSC 323 Quarter Winter 02/03
- Daniela Stan Raicu
- School of CTI, DePaul University
2Outline
Chapter 5 Sampling Distributions
- Population and sample
- Sampling distribution of a sample mean
- Central limit theorem
- Examples
3Introduction
- This chapter begins a bridge from the study of
probabilities to the study of statistical
inference, by introducing the sampling
distribution. - Quality of sample data
- The quality of all statistical
- analysis depends on the quality
- of the sample data
- If the data sample is not representative,
analyzing the data and drawing conclusions will
be unproductive-at best.
Random Sampling every unit in the population
has an equal chance to be chosen
4Some definitions
- Parameter A number describing a population.
- Statistic A number describing a sample.
1. A random sample should represent the
population well, so sample statistics from a
random sample should provide reasonable estimates
of population parameters.
Sample statistics Population parameter
Sample mean x ?
Sample proportion p_hat p
Sample variance s2 ?2
5Some definitions (cont.)
2. All sample statistics have some error in
estimating population parameters.
3. If repeated samples are taken from a
population and the same statistic (e.g. mean) is
calculated from each sample, the statistics will
vary, that is, they will have a distribution.
4. A larger sample provides more information than
a smaller sample so a statistic from a large
sample should have less error than a statistic
from a small sample.
6Describing the Sample Mean
- Let us assume that we want to estimate the mean
? of the population since usually this is the
first piece of information that an analyst wants
to analyze
- Since the value of the sample mean depends on the
particular sample we draw, the sample mean is a
variable with a huge number of possible values. - The sample mean is a random variable because the
samples are drawn randomly. - The best way to summarize this vast amount of
information is to describe it with a probability
distribution.
7The Distribution of the Sample Mean
Problem
Population A,B,C,D,E,F
Population mean ? .1483
Population Variance ? .00061
8The Distribution of the Sample Mean
Assumptions
- What is the central value of the variable x?
- What is its variability?
- Is there a familiar pattern in the variability?
9What is the central value of the sample mean?
- For large samples, the distribution of x should
be symmetrical x should be larger than ? about
50 of the time and x should be smaller than ?
about 50 of the time.
It can be shown theoretically (Central Limit
theorem) that the mean of the sample means equals
the population mean E(x) ?
In our example, E(x) 0.1483 ?
x is an unbiased estimator
10What is the variance of the sample mean?
- An estimator variance reveals a great deal about
the quality of the estimator.
The variance of the sample mean s2 ?2/n Where
?2 variance of the population n sample size
Increase of the sample size n
Decrease of the variance s2
Better accuracy of the estimator
11Accuracy of the Estimator
As in many problems, there is a trade off between
accuracy and dollars.
What we will get from our money if we
invest dollars in obtaining a larger size?
n 100? n 200?
12Is there a familiar pattern in the data?
- As the sample size becomes larger, the
distribution of the sample mean becomes closer to
a normal distribution, regardless the
distribution of the population from which the
sample is drawn.
- The central limit theorem summarizes the
distribution of the - sample mean.
13The Central Limit Theorem
14Importance of the central limit theorem
- The most important feature is that it can be
applied to - any population as long as the sample size n is
large enough.
How large is large? n gt 30
15Importance of the central limit theorem
Examples
16Is x normal distributed?
Is the population normal?
Yes
No
Is ?
Is ?
may or may not be considered normal
has t-student distribution
is considered to be normal
(We need more info)