Title: What Do Samples Tell Us
1What Do Samples Tell Us?
2Review Samples, Good and Bad
- We select a sample to get information about a
population - We want a sample that fairly represents the
population - Convenience samples and voluntary response
samples are common but do not produce
trustworthy, representative results because they
are usually biased - Bias is the systematic favoring of one part of
the population (and their opinions) over other
parts of the population
3Review Samples, Good and Bad
- Using chance to choose a sample is one of the
fundamental ideas of statistics - Random samples use chance to choose the sample
- In the Simple Random Sample
- Every individual in the population has the same
chance of being in the sample - Every sample of the same size has the same chance
of being chosen - To choose a SRS
- Use a table of random digits, or
- Use software that produces random digits
4- Questions from last time ??
5Same-sex Marriage
- Same-sex marriage is a controversial issue!
- During 2004 same-sex marriage became a major news
item when several cities performed same-sex
marriages even though this was against the law - As a result President Bush called on Congress to
promptly pass a law that effectively banned
same-sex marriage - Question What do Americans really think about
same-sex marriage?
6Same-sex Marriage
- The Gallup Poll decided to find out, between July
2003 and February 2004 they conducted a poll that
asked - Would you favor or oppose a constitutional
amendment that would define marriage as being
between a man and a a woman, thus barring
marriages between gay and lesbian couples? - Result Supported by a very slim majority of
Americans, 51 of Americans support such an
amendment, 45 oppose - What does this result mean?
7Same-sex Marriage
- We need to read further, Gallup says
- Sample consists of 2,527 randomly selected adults
- But, according to the census bureau there are
209 million adults in the United States, how can
the opinion of only 2,527 reflect all 209
million? - A sample cannot give us the exact truth about the
population, so the result comes with a margin of
error - For results based on a sample of this size, one
can say with 95 confidence that the error
attributable to sampling and other random effects
could be plus or minus 2 percentage points. - WHAT DOES THIS MEAN ???
8From Sample to Population
- Gallups finding is such an amendment is
supported by a very slim majority of Americans,
51 - Gallup makes this claim with respect to the 209
million adults who are Americans, but they do not
know the truth about those 209 million - Rather, they do know the truth about the 2,527
adults that they contacted and talked to - That sample was chosen at random from the
population of 209 million adults and therefore is
representative of those 209 million (no bias)
9From Sample to Population
- What Gallup has done is taken the fact that 51
of the sample support the amendment and turned it
into an estimate that 51 of the population
support the amendment - This is a basic concept in statistics using a
fact about a sample to produce an estimate about
the truth in the whole population - There is a vocabulary to talk about this
10From Sample to Population
- Parameter is to population as statistic is to
sample. - Want to estimate an unknown parameter?
- Choose a sample from the population and use a
sample statistic as your estimate.
11Example 1 Do you favor a constitutional
amendment?
- p is a parameter whose value is the proportion of
adults in the population who favor the amendment - This is what we are interested in
- We do not know the real value of p
- To estimate the value of p, Gallup took a sample
of 2,527 adults - The proportion of adults in the sample who favor
the amendment is a statistic whose value is an
estimate of p - The name we give this statistic is or
p-hat
12Example 1 Do you favor a constitutional
amendment?
- 1,289 adults in the sample favored the amendment,
so
- That is the value of the statistic p-hat 51
- Because the 2,527 adults in the sample were
chosen at random, it is reasonable to think we
can use the value of the statistic p-hat as an
estimate of the unknown parameter p
13Example 1 Do you favor a constitutional
amendment?
- The fact is that 51 of the sample favored the
amendment - We do not know the percentage of adults in the
population that favor the amendment, but because
we have a representative sample, we estimate that
to be 51 as well
14Sampling Variability
- What would happen if Gallup took a second sample
of 2,527 adults? - Almost certainly a different number would support
the amendment, perhaps 1,322, or maybe 1,016 - Because we choose samples randomly and this
involves variability, repeatedly taking samples
would yield a variety of values for the p-hat
statistic, say 42, 51 and 67 - If the variation in p-hat among a large number of
samples is too great, we cannot trust the results
of any one sample
15Sampling Variability
- The first big advantage of random samples is that
they attack bias - The second is that if we take lots of random
samples of the same size from a population, the
variation from sample to sample follows a
predictable pattern - This predictable pattern shows that the results
from bigger samples are less variable than the
results from smaller samples
16Example 2 Lots and lots of samples
- How trustworthy are samples?
- Lets compare many samples of two different sizes
for the Gallup poll - Imagine that exactly half the adults in the
population favor the amendment, so p 0.5, or
50 - Now, what if Gallup used an SRS of size 100 to
estimate p-hat - How is this different from using an SRS of size
2,527?
17Example 2 Lots and lots of samples
- 1,000 SRSs of size 100
- Histogram of the number of SRSs yielding a given
value of p-hat, notice how dispersed (spread out)
the histogram is, but that it is centered around
0.50 (50)
18Example 2 Lots and lots of samples
- 1,000 SRSs of size 2,527
- Notice the histogram is much sharper this time,
concentrated right around 0.50 (50)
19Example 2 Lots and lots of samples
- There is no bias with either the small or big
sample - There is no heaping of values of p-hat anywhere
else but around the true value of p - The results from the small sample are more
variable - they cover a range from about 0.40 to 0.59
- The results from the big sample are less
variable - they cover a range from about 0.4804 to 0.5204
- The conclusion Both sample sizes give us an
unbiased estimate of p-hat, but the bigger sample
almost always give an estimate of p-hat that is
closer to the true value - This is true for any value of p like 0.40 or
0.65
20Sampling Variability
21Sampling Variability
- Lets think about the true value of the
population parameter as a bulls eye on a target
and the sample statistic as an arrow fired at the
bulls eye - Bias and variability describe what happens when
an archer fires many arrows at the target - Bias means the aim is consistently off
- Variability means that repeated shots are widely
scattered
22(No Transcript)
23(No Transcript)
24(No Transcript)
25(No Transcript)
26(No Transcript)
27Sampling Variability
28Sampling Variability
- Gallup only took one sample
- We cannot know how close to the truth the p-hat
estimated from this sample is we dont know the
truth - However, it is true that large random samples
almost always given an estimate that is close to
the truth
29Margin of Error and all that
- The margin of error translates sampling
variability into a statement of how much
confidence we can have in the results of a survey
30Margin of Error and all that
- A random sample will usually not estimate the
truth about the population exactly - We need a margin of error to tell us how close
to the truth the estimate is - Although the difference between the truth and the
estimate usually differ by less than the margin
of error, we cannot be certain that the estimate
does not differ from the truth by more than the
margin of error - 95 of the samples we draw differ from the truth
by less than the margin of error, 5 miss by more
than the margin of error
31Margin of Error and all that
- Finding the exact margin of error is a job for
statisticians - We will use a simple formula to get a rough idea
of the size of a sample surveys margin of error
when a SRS is used
32Example 4 What is the margin of error?
- What is the margin of error for the Gallup poll?
- n 2,527
- The margin of error for 95 confidence is
- That is about 2.0, pretty much what Gallup
announced
33Example 5 Margin of error and sample size
- In Example 2, we compared SRSs of size n 100
and n 2,527 - The variability in p-hat for n 100 was about 5
times as for n 2,527 - We saw that for n 2,527, the margin of error
for 95 confidence is about 2.0, what about for
n 100 - That is, about 10
- This is roughly 5 times the margin of error of
the n 2,527 sample
34Confidence Statements
- Here is what Gallup says about their results
- The poll found that a very slim majority of
Americans, 51, favor such an amendment. - Heres a more informative statement
- We are 95 confident that between 49 and 53 of
all adults favor such an amendment - These are both confidence statements that tell us
roughly how close the estimate of p (that is
p-hat) is to the true value of p
35Confidence Statements
- A confidence statement is a fact about what
happens in all possible samples, and is used to
say how much we can trust the result of one
sample - 95 confidence means
- We used a sampling method that gives a result
this close to the truth 95 of the time
36Confidence Statements Rules of Thumb
- The conclusion of a confidence statement always
applies to the population, not to the sample - Our conclusion about the population is never
completely certain - A sample survey can choose to use a confidence
level other than 95 - Remember that our quick rule only works for
confidence levels of 95
37Confidence Statements Rules of Thumb
- It is usual to report the margin of error for 95
confidence - Want a smaller margin of error with the same
confidence, then you must take a larger sample
38Sampling from Large Populations
- Gallups sample of 2,527 adults is only 1 of
every 82,700 adults in the US - Does it matter that 2,527 is 1 in 100 or 1 in
82,700?
39Sampling from Large Populations
- Why doesnt population size matter?
- Imagine sampling corn kernels from a truck full
of corn or a bag full of corn using a shovel - The kernels are well mixed (to insure a random
sample) - The shovel doesnt care or know whether it is
shoveling from the truck or the bag - The variability in each shovel full of corn
depends on the size of the shovel, not the size
of the container its shoveling from - This is great news for those who study large
populations!
40Sampling from Large Populations
- Random samples of size 1,000 or more are large
enough to give small margins of error and can
still properly represent very large populations - Keep in mind that even very large voluntary
response or convenience samples are worthless
because of bias - Taking a large sample DOES NOT fix bias
- This is not good news for people who study small
populations! - Remember the margin of error depends on the
sample size not the population size
41Sampling from Large Populations
- It always takes a sample size of roughly 2,500 to
get a margin of error of 2 (with 95 confidence) - So, if you want to answer a question about the
10,000 students at the University of Smallsville,
and you want a margin of error of 2, you will
have to interview a quarter of all the students
42Summary
- The purpose of sampling is to gain information
about a population - We often use a sample statistic to estimate the
value of a population parameter - To describe how trustworthy a single sample is,
ask What would happen if we took a large number
of samples from the same population? - If almost all samples give a result that is very
close to the truth, then we can trust our one
sample, even though we cant be certain that it
is close to the truth
43Summary
- In planning a survey
- Avoid bias by using a random sampling technique
- Choose a large enough sample to reduce the
variability of the result - Using a large sample guarantees that almost all
samples will give an accurate result - Use a confidence statement to say how accurate
the result is
44Summary
- Most frequently, the margin of error is all that
is mentioned - Usually this margin of error corresponds to 95
confidence - That is if we chose many samples, the truth about
the population would be within the margin of
error 95 of the time
45Summary
- We can estimate the margin of error for 95
confidence for a SRS with the formula - As the formula suggests, the margin of error
depends only on the sample size, not on the size
of the population
46Exercise 3.5
- A sampling experiment. The n 100 and n 2,527
examples show how the sample proportion p-hat
behaves when we take many samples form the same
population. You can follow the steps in this
process on a small scale. - The figure on the following slide represents a
small population. Each circle represents an
adult. The white circles are people who favor a
constitutional amendment that would define
marriage as between a man and woman, and the
colored circles are people who are opposed. You
can check that 50 of the 100 circles are white,
so the population proportion in favor is p
50/100 0.5
47(No Transcript)
48Exercise 3.5
- The circles are labeled 00, 01, , 99. Use line
101 of Table A to draw an SRS of size 4. What is
the proportion p-hat of the people in your sample
who favor the constitutional amendment? - Take 9 more SRSes of size 4 (10 in all), using
lines 102 to 110 of Table A, a different line for
each sample. You now have 10 values of the
sample proportion p-hat. - Because your samples have only 4 people, the only
values p-hat can take are 0/4, 1/4, 2/4, 3/4, and
4/4. That is, p-hat is always 0, 0.25, 0.5, 0.75
or 1. Mark these numbers on a line and make a
histogram of your 10 results by putting a bar
above each number to show how many samples had
that outcome.
49Exercise 3.5
- Taking samples of size 4 from a population of
size 100 is not a practical setting, but lets
look at your results anyway. How many of your 10
samples estimated the population proportion p
0.5 exactly correctly? Is the true value 0.5 in
the center of your sample values? Explain why
0.5 would be in the center of the sample values
if you took a large number of samples?
50Solution to Exercise 3.5
- Starting at line 101, we choose 19, 22, 39, and
50. Two of these circles are white, so p-hat
0.5. - The table (next slide) shows all ten samples,
indicating which circles are shaded. - Histogram on the next slide.
- Four were exactly correct. For this small number
of samples, the center seems to be a bit higher
than 0.5. In a large number of samples, 0.5
should be in the center because this random
sample should be unbiased.
51Solution to Exercise 3.5