Title: Sampling Distribution Models for MEANS
1Chapter 18
- Sampling Distribution Models for MEANS
2The sampling distribution
- Weve studied already the sampling distribution
for proportions, and learned that it was - Normal, with a mean (mu) at
- p, the true population proportion
- and
- a standard deviation of
-
3Sampling distributions
- Recall that a sampling distribution is
- The distribution of values taken on by a
statistic (phat or xbar) in ALL possible samples
of the same size (n) from the same population. - The sampling distribution for MEANS works very
similarly to what we saw for proportions. - Lets look at the Pennies file and see what
happens when we approximate a sampling
distribution with different sample sizes.
4Pennies
The POPULATION distribution, mu1993.37,
sigma9.66972
Histogram of 1000 samples of size 7. mu
of distribution 1993.52 sigma
3.5002 Note shows a fair amount of skewness to
the left still
5More Pennies
Histogram of 1000 samples of size 21. mu
of distribution 1993.42 sigma
2.09872 Still slightly skewed left.
Histogram of 1000 samples of size 71. mu
of distribution 1993.42 sigma 1.16528
6The Central Limit Theorem
What did you notice about the histograms of the
sample means, as the sample size (n) got
larger? 1)They started looking closer and
closer to a Normal distribution!! 2) The mean of
the distribution was always near the true
population mean (mu) 3) The standard deviation
of the distributions got smaller as n
increased. These observations are what the
Central Limit Theorem tells us No matter what
the distribution of the underlying population is
(normal, skewed, multimodal, etc), the sampling
distribution of xbar gets closer and closer to
Normal as n increases.
7More on the Central Limit Theorem
The CLT (Central Limit Theorem) tells us that the
Normal model that the sampling distribution
approaches has 1)Mean (m(xbar)) EQUAL TO the
population mean, mu. 2) Standard Deviation
Did we see this with the Pennies? 1) mean of
histogram for n71 was 1993.42, extremely close
to the population mu of 1993.37. 2) SD of
histogram for n71 was 1.16528. By the CLT, it
should have been , so again, we were very
close. (remember, the histograms for the pennies
are not exactly sampling distributions, just
histograms with a LOT of samples)
8So what does this mean?
The center of the sampling distribution for xbar
will always be the same as the population mean.
The SD of xbar gets smaller as sample size
increases. (at a rate of sqrt(n)) The shape of
the sampling distribution gets more and more
Normal as n increases.
9Assumptions
- Of course, as always, there are assumptions that
must be made to use the Normal model for sample
means. - Data must be a random sample
- Sampled values must be independent.
- Sample size lt 10 of population.
- How large of a sample is needed?
- This depends on the population.
- In general, if the population is Normal, we
dont need to worry about the size of the
sample. - If the population is NOT Normal, though, we do.
- Rule of Thumb If ngt40, we can assume a Normal
Model. - If 15ltnlt40, use Normal if histogram of
the sample shows little skewness, no extreme
outliers - If nlt15, use Normal if histogram of the
sample is symmetric and no outliers.
10Example Problem, 22
Ithaca, NY gets an average of 35.4 of rain each
year, with a standard deviation of 4.2. Assume
that a Normal model applies (to the population of
years) During what percentage of years does
Ithaca get more than 40 of rain? ie,
P(xgt40)? Known mu35.4, sigma4.2 Calculat
e z-score Look z-score up on Z-table
P(xlt40)0.8643 (I used z1.10) We want the
P(xgt40), though, so use 1-0.8643
0.1357. Note this part is just a Chapter 6
question!!
1122 continued
Let xbar be the mean amount of rain for a random
sample of 4 years. Describe the sampling
distribution model of xbar. It will be N(35.4,
), which is N(35.4, 2.1) What is the
probability of a sample of 4 years having less
than 30 of rain? ie, find P(xbarlt30). Calculat
e z-score Look z up on table P(xbarlt30)
0.0051 What is the probability of a sample of 4
yrs having more than 40? P(xbargt40)?
Z-score P(xbargt40) 1-P(xbarlt40)
1-0.9857 0.0143.