What Do Samples Tell Us - PowerPoint PPT Presentation

1 / 51

About This Presentation

Title:

What Do Samples Tell Us

Description:

... but do not produce trustworthy, representative results because they are usually biased ... adults and therefore is representative of those 209 million (no ... – PowerPoint PPT presentation

Number of Views:36

Avg rating:3.0/5.0

Slides: 52

Provided by: samuel59

Category:

more less

Transcript and Presenter's Notes

Title: What Do Samples Tell Us

1
What Do Samples Tell Us?

Chapter 3

2
Review Samples, Good and Bad

We select a sample to get information about a
population
We want a sample that fairly represents the
population
Convenience samples and voluntary response
samples are common but do not produce
trustworthy, representative results because they
are usually biased
Bias is the systematic favoring of one part of
the population (and their opinions) over other
parts of the population

3
Review Samples, Good and Bad

Using chance to choose a sample is one of the
fundamental ideas of statistics
Random samples use chance to choose the sample
In the Simple Random Sample
Every individual in the population has the same
chance of being in the sample
Every sample of the same size has the same chance
of being chosen
To choose a SRS
Use a table of random digits, or
Use software that produces random digits

Questions from last time ??

5
Same-sex Marriage

Same-sex marriage is a controversial issue!
During 2004 same-sex marriage became a major news
item when several cities performed same-sex
marriages even though this was against the law
As a result President Bush called on Congress to
promptly pass a law that effectively banned
same-sex marriage
Question What do Americans really think about
same-sex marriage?

6
Same-sex Marriage

The Gallup Poll decided to find out, between July
2003 and February 2004 they conducted a poll that
asked
Would you favor or oppose a constitutional
amendment that would define marriage as being
between a man and a a woman, thus barring
marriages between gay and lesbian couples?
Result Supported by a very slim majority of
Americans, 51 of Americans support such an
amendment, 45 oppose
What does this result mean?

7
Same-sex Marriage

We need to read further, Gallup says
Sample consists of 2,527 randomly selected adults
But, according to the census bureau there are
209 million adults in the United States, how can
the opinion of only 2,527 reflect all 209
million?
A sample cannot give us the exact truth about the
population, so the result comes with a margin of
error
For results based on a sample of this size, one
can say with 95 confidence that the error
attributable to sampling and other random effects
could be plus or minus 2 percentage points.
WHAT DOES THIS MEAN ???

8
From Sample to Population

Gallups finding is such an amendment is
supported by a very slim majority of Americans,
51
Gallup makes this claim with respect to the 209
million adults who are Americans, but they do not
know the truth about those 209 million
Rather, they do know the truth about the 2,527
adults that they contacted and talked to
That sample was chosen at random from the
population of 209 million adults and therefore is
representative of those 209 million (no bias)

9
From Sample to Population

What Gallup has done is taken the fact that 51
of the sample support the amendment and turned it
into an estimate that 51 of the population
support the amendment
This is a basic concept in statistics using a
fact about a sample to produce an estimate about
the truth in the whole population
There is a vocabulary to talk about this

10
From Sample to Population

Parameter is to population as statistic is to
sample.
Want to estimate an unknown parameter?
Choose a sample from the population and use a
sample statistic as your estimate.

11
Example 1 Do you favor a constitutional
amendment?

p is a parameter whose value is the proportion of
adults in the population who favor the amendment
This is what we are interested in
We do not know the real value of p
To estimate the value of p, Gallup took a sample
of 2,527 adults
The proportion of adults in the sample who favor
the amendment is a statistic whose value is an
estimate of p
The name we give this statistic is or
p-hat

12
Example 1 Do you favor a constitutional
amendment?

1,289 adults in the sample favored the amendment,
so

That is the value of the statistic p-hat 51
Because the 2,527 adults in the sample were
chosen at random, it is reasonable to think we
can use the value of the statistic p-hat as an
estimate of the unknown parameter p

13
Example 1 Do you favor a constitutional
amendment?

The fact is that 51 of the sample favored the
amendment
We do not know the percentage of adults in the
population that favor the amendment, but because
we have a representative sample, we estimate that
to be 51 as well

14
Sampling Variability

What would happen if Gallup took a second sample
of 2,527 adults?
Almost certainly a different number would support
the amendment, perhaps 1,322, or maybe 1,016
Because we choose samples randomly and this
involves variability, repeatedly taking samples
would yield a variety of values for the p-hat
statistic, say 42, 51 and 67
If the variation in p-hat among a large number of
samples is too great, we cannot trust the results
of any one sample

15
Sampling Variability

The first big advantage of random samples is that
they attack bias
The second is that if we take lots of random
samples of the same size from a population, the
variation from sample to sample follows a
predictable pattern
This predictable pattern shows that the results
from bigger samples are less variable than the
results from smaller samples

16
Example 2 Lots and lots of samples

How trustworthy are samples?
Lets compare many samples of two different sizes
for the Gallup poll
Imagine that exactly half the adults in the
population favor the amendment, so p 0.5, or
50
Now, what if Gallup used an SRS of size 100 to
estimate p-hat
How is this different from using an SRS of size
2,527?

17
Example 2 Lots and lots of samples

1,000 SRSs of size 100
Histogram of the number of SRSs yielding a given
value of p-hat, notice how dispersed (spread out)
the histogram is, but that it is centered around
0.50 (50)

18
Example 2 Lots and lots of samples

1,000 SRSs of size 2,527
Notice the histogram is much sharper this time,
concentrated right around 0.50 (50)

19
Example 2 Lots and lots of samples

There is no bias with either the small or big
sample
There is no heaping of values of p-hat anywhere
else but around the true value of p
The results from the small sample are more
variable
they cover a range from about 0.40 to 0.59
The results from the big sample are less
variable
they cover a range from about 0.4804 to 0.5204
The conclusion Both sample sizes give us an
unbiased estimate of p-hat, but the bigger sample
almost always give an estimate of p-hat that is
closer to the true value
This is true for any value of p like 0.40 or
0.65

20
Sampling Variability
21
Sampling Variability

Lets think about the true value of the
population parameter as a bulls eye on a target
and the sample statistic as an arrow fired at the
bulls eye
Bias and variability describe what happens when
an archer fires many arrows at the target
Bias means the aim is consistently off
Variability means that repeated shots are widely
scattered

22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
Sampling Variability
28
Sampling Variability

Gallup only took one sample
We cannot know how close to the truth the p-hat
estimated from this sample is we dont know the
truth
However, it is true that large random samples
almost always given an estimate that is close to
the truth

29
Margin of Error and all that

The margin of error translates sampling
variability into a statement of how much
confidence we can have in the results of a survey

What does this mean?

30
Margin of Error and all that

A random sample will usually not estimate the
truth about the population exactly
We need a margin of error to tell us how close
to the truth the estimate is
Although the difference between the truth and the
estimate usually differ by less than the margin
of error, we cannot be certain that the estimate
does not differ from the truth by more than the
margin of error
95 of the samples we draw differ from the truth
by less than the margin of error, 5 miss by more
than the margin of error

31
Margin of Error and all that

Finding the exact margin of error is a job for
statisticians
We will use a simple formula to get a rough idea
of the size of a sample surveys margin of error
when a SRS is used

32
Example 4 What is the margin of error?

What is the margin of error for the Gallup poll?
n 2,527
The margin of error for 95 confidence is
That is about 2.0, pretty much what Gallup
announced

33
Example 5 Margin of error and sample size

In Example 2, we compared SRSs of size n 100
and n 2,527
The variability in p-hat for n 100 was about 5
times as for n 2,527
We saw that for n 2,527, the margin of error
for 95 confidence is about 2.0, what about for
n 100
That is, about 10
This is roughly 5 times the margin of error of
the n 2,527 sample

34
Confidence Statements

Here is what Gallup says about their results
The poll found that a very slim majority of
Americans, 51, favor such an amendment.
Heres a more informative statement
We are 95 confident that between 49 and 53 of
all adults favor such an amendment
These are both confidence statements that tell us
roughly how close the estimate of p (that is
p-hat) is to the true value of p

35
Confidence Statements

A confidence statement is a fact about what
happens in all possible samples, and is used to
say how much we can trust the result of one
sample
95 confidence means
We used a sampling method that gives a result
this close to the truth 95 of the time

36
Confidence Statements Rules of Thumb

The conclusion of a confidence statement always
applies to the population, not to the sample
Our conclusion about the population is never
completely certain
A sample survey can choose to use a confidence
level other than 95
Remember that our quick rule only works for
confidence levels of 95

37
Confidence Statements Rules of Thumb

It is usual to report the margin of error for 95
confidence
Want a smaller margin of error with the same
confidence, then you must take a larger sample

38
Sampling from Large Populations

Gallups sample of 2,527 adults is only 1 of
every 82,700 adults in the US
Does it matter that 2,527 is 1 in 100 or 1 in
82,700?

39
Sampling from Large Populations

Why doesnt population size matter?
Imagine sampling corn kernels from a truck full
of corn or a bag full of corn using a shovel
The kernels are well mixed (to insure a random
sample)
The shovel doesnt care or know whether it is
shoveling from the truck or the bag
The variability in each shovel full of corn
depends on the size of the shovel, not the size
of the container its shoveling from
This is great news for those who study large
populations!

40
Sampling from Large Populations

Random samples of size 1,000 or more are large
enough to give small margins of error and can
still properly represent very large populations
Keep in mind that even very large voluntary
response or convenience samples are worthless
because of bias
Taking a large sample DOES NOT fix bias
This is not good news for people who study small
populations!
Remember the margin of error depends on the
sample size not the population size

41
Sampling from Large Populations

It always takes a sample size of roughly 2,500 to
get a margin of error of 2 (with 95 confidence)
So, if you want to answer a question about the
10,000 students at the University of Smallsville,
and you want a margin of error of 2, you will
have to interview a quarter of all the students

42
Summary

The purpose of sampling is to gain information
about a population
We often use a sample statistic to estimate the
value of a population parameter
To describe how trustworthy a single sample is,
ask What would happen if we took a large number
of samples from the same population?
If almost all samples give a result that is very
close to the truth, then we can trust our one
sample, even though we cant be certain that it
is close to the truth

43
Summary

In planning a survey
Avoid bias by using a random sampling technique
Choose a large enough sample to reduce the
variability of the result
Using a large sample guarantees that almost all
samples will give an accurate result
Use a confidence statement to say how accurate
the result is

44
Summary

Most frequently, the margin of error is all that
is mentioned
Usually this margin of error corresponds to 95
confidence
That is if we chose many samples, the truth about
the population would be within the margin of
error 95 of the time

45
Summary

We can estimate the margin of error for 95
confidence for a SRS with the formula
As the formula suggests, the margin of error
depends only on the sample size, not on the size
of the population

46
Exercise 3.5

A sampling experiment. The n 100 and n 2,527
examples show how the sample proportion p-hat
behaves when we take many samples form the same
population. You can follow the steps in this
process on a small scale.
The figure on the following slide represents a
small population. Each circle represents an
adult. The white circles are people who favor a
constitutional amendment that would define
marriage as between a man and woman, and the
colored circles are people who are opposed. You
can check that 50 of the 100 circles are white,
so the population proportion in favor is p
50/100 0.5

47
(No Transcript)
48
Exercise 3.5

The circles are labeled 00, 01, , 99. Use line
101 of Table A to draw an SRS of size 4. What is
the proportion p-hat of the people in your sample
who favor the constitutional amendment?
Take 9 more SRSes of size 4 (10 in all), using
lines 102 to 110 of Table A, a different line for
each sample. You now have 10 values of the
sample proportion p-hat.
Because your samples have only 4 people, the only
values p-hat can take are 0/4, 1/4, 2/4, 3/4, and
4/4. That is, p-hat is always 0, 0.25, 0.5, 0.75
or 1. Mark these numbers on a line and make a
histogram of your 10 results by putting a bar
above each number to show how many samples had
that outcome.

49
Exercise 3.5

Taking samples of size 4 from a population of
size 100 is not a practical setting, but lets
look at your results anyway. How many of your 10
samples estimated the population proportion p
0.5 exactly correctly? Is the true value 0.5 in
the center of your sample values? Explain why
0.5 would be in the center of the sample values
if you took a large number of samples?

50
Solution to Exercise 3.5

Starting at line 101, we choose 19, 22, 39, and
50. Two of these circles are white, so p-hat
0.5.
The table (next slide) shows all ten samples,
indicating which circles are shaded.
Histogram on the next slide.
Four were exactly correct. For this small number
of samples, the center seems to be a bit higher
than 0.5. In a large number of samples, 0.5
should be in the center because this random
sample should be unbiased.

51
Solution to Exercise 3.5

Write a Comment

User Comments (0)