Title: Sampling Methods and Sampling Distributions
1Sampling Methods and Sampling Distributions
- Potential sampling errors
- Sampling Distributions and the Central Limit
Theorem - Confidence Intervals
- Review
2Review of terms
- A target population is the entire group of
elements about which we want information. - A sample is part of the target population.
3Inference
- Making an inference means using sample results to
describe the population.
Sample (Known)
Population (Unknown)
We dont know the mean of the population so we
have to infer it from samples of the population
4Sampling Questions
- What errors might there be in a sample conducted
over the phone? - If you wanted to estimate the number of people
who would vote Liberal if an election was held
tomorrow, how would you go about it?
5Sampling Terminology
- An Element is an object on which we take a
Measurement. Objects that are people are called
Subjects. - A Target Population is a collection of elements
about which we wish to make an Inference. - Sampling Units are non-overlapping collections
of elements from the target population. - A Frame is a list of sampling units.
- The sampling Design specifies the Method of
selecting the sample.
6Errors in Survey Sampling
- Selection Error
- Sampling frame does not represent target
population. We exclude members of the target
population from the sample. - Interested in determining filmgoers attitudes
toward horror films. Sampling frame is households
that own a VCR. Many filmgoers do not own VCRs.
We have committed the selection error. - Increasing the Sample Size Will Not Help.
7Errors in Survey Sampling
- Response Error
- Respondents do not
- 1) Understand question
- 2) Have the information
- 3) Want to give the information.
- Ask 13 year old school students the following
question How often do you imbibe intoxicating
spirits? Respondents may not understand or be
honest. - Increasing the Sample Size Will Not Help.
8Errors in Survey Sampling
- Non-Response Error
- Respondents are not representative of sampling
frame. - Be concerned when a large percentage of the
sampling frame does not respond. - Lower income families may ignore mailed surveys.
- Families with two wage earners eat out often and
are often not at home when an interviewer calls. - Increasing the Sample Size Will Not Help
9More terms Parameters and Statistics
- A population parameter is a numerical measure
that describes the target population. - A sample statistic is an estimate of the unknown
population parameter and will vary from sample to
sample.
10A small population (N 5)
Number of bedrooms per household 1 2 2 3 5
Note that the denominator for the standard
deviation calculation is N 5 because this is a
population
11A single sample of size n 2 from the population
of N 5
- Do not expect
- Sample mean to equal population mean of 2.6.
- Sample standard deviation to equal
- population standard deviation of 1.36.
12Sample Statistics
- Note that the sample standard deviation has a
different formula than the population standard
deviation - To help keep the ideas separate we have different
symbols for populations and samples - The sample mean is
- The sample standard deviation is s
13Margin of Error
- Because samples statistics and population
parameters are inevitably (usually) going to be
different we have some error when we take a
sample. - But what affects the amount of error?
- Dartboard example
14Margins of Error
Margin of Error Possible difference between the
sample result and the result we would obtain if
we selected the entire population. Want as small
as possible.
15Samples of 3 from population
16Effect of sample size
Samples of Size 3
Samples of Size 4
2.6
2.6
Population Mean
3.33 1.67
3 2
Largest and Smallest Sample Means
Maximum Margin of Error
0.93
0.6
Increasing the sample size reduces the margin of
error
17Effect of level of confidence
Increasing the level of confidence increases the
margin of error
18Effect of population variance
New Population 1 2 4 6 7 bdrms
As the variance of the population increases the
margin of error also increases
19SummarySampling Lessons
- Increasing the sample size reduces the margin
of error. - If we increase the level of confidence in an
inference, the price we pay is in the margin of
error. - As the variability of the target population
increases, the margin of error increases.
20Sampling Distribution
- What is a sampling distribution of the mean?
21Bedrooms (samples of n 3)
The sampling distribution contains all
possible sample means.
22Sampling distribution
This is a sampling distribution
23Standard Error of the Mean
- The standard deviation of the sampling
distribution measures the spread of the sample
means around their mean and is called the
standard error of the mean. - The standard error of the mean is smaller than
the standard deviation of the population. - Why?
242 New Populations (both N6)
- A 1, 1, 2, 4, 5, 5
- B 1, 2, 3, 3, 4, 5
25Central Limit Theorem
- No matter what the population distribution looks
like, the sampling distribution of the mean will
always end up looking like a normal distribution
(for high enough n).
26(No Transcript)
27Try playing with the Central Limit Theorem on the
class web page. - Try different sample sizes
(n). - Try different population distributions.
- See how the sampling distributions look
normal.
28(No Transcript)
29Some Conclusions
Population
Sampling Distribution
Mean
(unknown)
Standard Deviation
(unknown)
Shape
Any Shape
Approx Normal provided n gt 30
30Estimating Unknown Population Parameters
Unknown Parameter
Sample Statistic
Mean
Standard Deviation
s
Standard Error
31Why does the Central Limit Theorem work?
- As sample size increases
- most sample means will be close to population
mean. - some sample means will be relatively far above or
below population mean. - a few sample means will be very far above or
below population mean. - Above bullets describe a normal distribution.
32Lessons
- The mean of any distribution of the sample mean
is the same as the mean of the population from
which it was derived. - The standard error of the mean is smaller than
the standard deviation of the population.
33Lessons
- The standard error of the mean decreases as the
sample size increases. - If the population is normal or the sample size is
sufficiently large, the distribution of the
sample mean will be near-normal. We will be able
to use the standard normal table to compute
probabilities for the sample means.
34Two assumptions for Central Limit Theorem to work
- 1) Samples are drawn randomly from population
(each possible sample has an equal chance of
being chosen) - 2) The population is (near) normal or the sample
size is large (n ? 30)
35Overview of Inference
Draw Conclusion about a Population Parameter
36Confidence Interval
- A confidence interval is a range estimate of an
unknown population parameter. - The level of confidence associated with an
interval estimate is the percentage of intervals
that will include the unknown population over a
large number of similarly constructed intervals. - Just like the confidence we had in margin of
error in an earlier lecture (dartboard example)
37Confidence Intervals
Sampling Distribution of the mean
38What does 95 confidence look like? (a 0.05)
Each probability 0.025
39Intervals and Confidence Level
Confidence Intervals
40Margin of Error
- So what a confidence interval does is add and
subtract a margin of error from the sample mean - The margin of error is
- but if we dont know ? then well have to use s
(the sample standard deviation) instead.
41Assumptions for confidence intervals
- 1. Random samples
- 2. If n lt 30 then population must be near normal
to do a confidence interval. (If n ? 30 then
sampling distribution is close enough to normal
whatever the population.)
42Margin of Error - Three Lessons
Lesson 1 As sample size (n) increases,
margin of error decreases. Lesson 2 As
confidence level increases (z),
margin of error increases. Lesson 3 As variance
increases (s2), margin of error
increases.
43Rules of thumb
- Some quick rules of thumb for z (values come from
normal distribution) - For confidence of 90 use z 1.64
- For confidence of 95 use z 2
- For confidence of 99 use z 2.58
44T-distribution
- What is a t-value?
- Sophisticated statistical way of dealing with
smaller sample sizes by using slightly different
values instead of the rule of thumb z-values - Do I have to care?
- No. Increased accuracy of t-values possibly
spurious and not worth the added effort. Just
think z-value wherever you see t-value and use
rule of thumb - If accuracy is important, can I use the t-value
anyway? - Yes. Statpro calculations automatically use
t-values.
45Width versus meaningfulness of Confidence
Intervals
GOAL Narrow Confidence Interval and high
level of confidence.
46Try the confidence interval demonstration on the
class web page. Try different values of ?. Count
how many of the confidence intervals contain the
population mean.
47Using Statpro
- Make sure data is in a column with label in first
row. - Use Statpro function
- Statistical Inference gt One sample analysis
- Select data
- Choose confidence interval for mean and input
confidence level (e.g. 95)
48Question
- As the sample size increases, does the estimated
standard error increase, decrease, or stay the
same?
49Question
- As the sample size increases, does the sample
standard deviation increase, decrease, or stay
the same?
50Question
- As the sample size increases, does the sample
mean increase, decrease, or stay the same?
51Question
- As the sample size increases, does the margin of
error increase, decrease, or stay the same?
52Second hand cars
- In a survey of their latest 20 customers, a
second hand car dealer found that the average age
of car buyers is 37.3 years old with a standard
deviation of 4.2 years. - What is a 95 CI for the mean age of secondhand
car buyers?
53Small populations
- If you have a relatively large sample compared to
the population (n/N gt 0.05) - Use correction for confidence interval
N number in population n number in sample
54What did we do?
- Talked about margins of error.
- Saw how the Central Limit Theorem ensures that
means always have a normal distribution. - Talked about confidence intervals
- Reviewed first half of material for subject
55Managerial applications
- What did you learn today that makes a difference
to the way you manage? - What are the three most important things to
remember from todays lecture?
56Next lecture (after Midterm)
- Download data file metrobus.xls and
customerages.xls and bring them on laptop - Read supplementary material on Two Samples,
Matched Pairs and Estimating P.