Title: Sampling Distribution Models
1Chapter 18
- Sampling Distribution Models
2Demonstration
- Observe in class SPSS demonstration related to
sampling distribution models
3Demonstration Summary
- First, we examined the distribution of state
appropriations for education given the entire
population of U.S. states. - Our findings indicated that the distribution of
state spending on education was skewed to the
right, with a mean (m) of 1,272,969,120.00 and
standard deviation (s) of 1,567,930,688.096
4Demonstration Summary
- Next we randomly selected 30 states to be
included in our sample. Analysis of this sample
indicated that again the distribution of spending
on education was skewed to the right however the
mean of the sample ( )was 1,410,710,766.67
with a standard deviation (s) of
1,941,673,134.577.
5Demonstration Summary
- We then repeated the random sampling process to
get a new sample of thirty states. We noticed
that this new sample also had a distribution that
was skewed to the right however, the mean and
standard deviation of this sample differed. The
results were 1,126,093,266.67 and
1,781,298,838.439 respectively. - Did we do something wrong?
6Demonstration Summary
- We then examined 100 different random samples of
size thirty and determined that each sample had a
slightly different mean and standard deviation
due to sampling variability (i.e. different
combinations of states were included in each of
our samples). - When we went to create a histogram for our
collection of sample means, we discovered
something pretty amazing that distribution
looked very much like a normal model even though
the distribution of state appropriations from our
original population was skewed to the right.
7Sampling Distribution
- A listing of all the values that a sample mean
can take on and how often those values can occur
is called the sampling distribution of a sample
mean. - This histogram of sample means depicts the
sampling distribution of the sample mean. - Like any other distribution, a sampling
distribution of the sample mean has a shape,
center, and measure of variability (i.e. spread) - This distribution can be interpreted as the
probability distribution of sample means. - Under certain conditions this sampling
distribution will approximate the normal model
regardless of the shape of the distribution for
the original variable from the population.
8Simulating the Sampling Distribution of a Mean
- We can use simulation to get a sense as to what
the sampling distribution of the sample mean
might look like - Lets start with a simulation of 10,000 tosses of
a die. A histogram of the results is
9Means Averaging More Dice
- Looking at the average of two dice after a
simulation of 10,000 tosses
- The average of 5 dice after a simulation of
10,000 tosses looks like
10Means What the Simulations Show
- As the sample size (number of dice) gets larger,
each sample average is more likely to be closer
to the population mean. - So, we see the shape continuing to tighten around
3.5 - And, it probably does not shock you that the
sampling distribution of this mean becomes
Normal.
11The Central Limit Theorem (CLT)
- The mean of a random sample has a sampling
distribution whose shape can be approximated by a
Normal model. The larger the sample, the better
the approximation will be. - The CLT is surprising and a bit weird
- Not only does the histogram of the sample means
get closer and closer to the Normal model as the
sample size grows, but this is true regardless of
the shape of the population distribution. - All we need is for the observations to be
independent and collected with randomization.
12Conditions Required for the CLT
- Random Sampling Condition The data values must
be sampled randomly or the concept of a sampling
distribution makes no sense. - Independence Assumption Impossible to know for
sure, instead use the 10 condition the sample
size, n, is no more than 10 of the population.
13But Which Normal?
- Recall that normal models are described by their
means and standard deviations. - The mean of all sample means is the population
mean m. That is to say, the sampling
distribution of the mean has a mean m . - The standard deviation of all sample means is
. That is to say, the sampling
distribution of the mean has a standard deviation
.
14The Sampling Distribution Model for a Mean
- When a random sample is drawn from any population
with mean m and standard deviation s , its sample
mean has a sampling distribution with the
same mean m but whose standard deviation is - (we write ).
15The Sampling Distribution Model for a Mean
(continued)
- No matter what population (whether it has a
distribution that is symmetric, uniform, or
skewed to the right or left) the random sample
comes from, the shape of the sampling
distribution is approximately Normal as long as
the sample size is large enough. The larger the
sample used, the more closely the Normal
approximates the sampling distribution for the
mean.
16Sampling Distributions for Proportions
- The Central limit theorem does not apply only to
sample means - Can make the same conclusions about the shape,
center and variability about the sample
proportions. - The sample proportion is denoted by and is
equal to the number of individuals in the sample
in the category of interest, divided by the total
sample size (n).
17What About the Sampling Distribution Model for a
Proportion
- Provided that the sampled values are independent
and the sample size is large enough, the sampling
distribution of (sample proportion) is modeled
by a Normal model with - Mean
- Standard deviation
- Where p is the probability of success (i.e.
observation falls into the specific group of the
categorical variable that you are interested in).
q is the probability of failure.
18Necessary Conditions When Working with Proportions
- Two assumptions
- The sampled values must be independent of each
other - The sample size, n, must be large enough
- Check the following corresponding conditions
- 10 Conditions sample size must be no larger
than 10 percent of the population - Success/Failure Condition sample size must be
large enough such that np and nq are at least
10. In other words we need to expect at least 10
success and 10 failures to have enough data for a
sound conclusion.
19Standard Error
- The standard deviations of our Normal models are
as follows - For proportions For means
- When we dont know p or s, were stuck, right?
20Standard Error (cont)
- Nope. We will use sample statistics to estimate
these population parameters. - For a sample proportion, the standard error is
- For the sample mean, the standard error is
- When we estimate the standard deviation of a
sampling distribution using statistics found from
the data, the estimate is called a standard
error.
21Watch out for small samples from skewed
populations
- If the original population is not itself normally
distributed, here is a common guideline For
samples of size n greater than 30, the
distribution of the sample means can be
approximated reasonably well by a normal model.
The approximation gets better as the sample size,
n, becomes larger. - If the original population is itself normally
distributed, then the sample means will be
normally distributed for any sample size n (not
just values of n larger than 30).
22Applications of the Central Limit Theorem - 1
- In the 2001 ACT, students had a mean score of
21.3 with a standard deviation of 6.0. Assume
that the scores are normally distributed. - If 60 students are randomly selected, find the
probability that they have a mean score greater
than 23.5.
23Applications of the Central Limit Theorem - 2
- A national study found that 44 of college
students engage in binge drinking (5 drinks at a
sitting for men, 4 for women). Use the
68-95-99.7 Rule to describe the sampling
distribution model for the proportion of students
in a randomly selected group of 200 college
students who engage in binge drinking. Do you
think the appropriate conditions are met?
24Example 3
- Carbon monoxide emissions for a certain kind of
car vary with mean 2.9 g/m and standard deviation
0.4 g/m. A company has 80 of these cars in its
fleet. - Estimate the probability that the mean CO level
for the companys fleet is between 3.0 and 3.1
g/m. - There is only a 5 percent chance that the fleets
mean CO level is greater than what value?
25Example 4
- Just before a referendum on a school budget, a
local newspaper polls 400 voters in an attempt to
predict whether the budget will pass. Suppose
that the budget actually has the support of 52
of the voters. Whats the probability the
newspapers sample will lead them to predict
defeat? Be sure to verify that the assumptions
and conditions necessary for your analysis are
met.
26Assignment
- Read Chapter 18 Again!
- Try the following exercises from Ch. 18
- 1, 3, 7, 9, 17, 21, 23, 25, 27, 33, 37
- Work through the ActivStats assignments for
Chapter 18 for additional practice.