What Do Samples Tell Us - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

What Do Samples Tell Us

Description:

... but do not produce trustworthy, representative results because they are usually biased ... adults and therefore is representative of those 209 million (no ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 52
Provided by: samuel59
Category:

less

Transcript and Presenter's Notes

Title: What Do Samples Tell Us


1
What Do Samples Tell Us?
  • Chapter 3

2
Review Samples, Good and Bad
  • We select a sample to get information about a
    population
  • We want a sample that fairly represents the
    population
  • Convenience samples and voluntary response
    samples are common but do not produce
    trustworthy, representative results because they
    are usually biased
  • Bias is the systematic favoring of one part of
    the population (and their opinions) over other
    parts of the population

3
Review Samples, Good and Bad
  • Using chance to choose a sample is one of the
    fundamental ideas of statistics
  • Random samples use chance to choose the sample
  • In the Simple Random Sample
  • Every individual in the population has the same
    chance of being in the sample
  • Every sample of the same size has the same chance
    of being chosen
  • To choose a SRS
  • Use a table of random digits, or
  • Use software that produces random digits

4
  • Questions from last time ??

5
Same-sex Marriage
  • Same-sex marriage is a controversial issue!
  • During 2004 same-sex marriage became a major news
    item when several cities performed same-sex
    marriages even though this was against the law
  • As a result President Bush called on Congress to
    promptly pass a law that effectively banned
    same-sex marriage
  • Question What do Americans really think about
    same-sex marriage?

6
Same-sex Marriage
  • The Gallup Poll decided to find out, between July
    2003 and February 2004 they conducted a poll that
    asked
  • Would you favor or oppose a constitutional
    amendment that would define marriage as being
    between a man and a a woman, thus barring
    marriages between gay and lesbian couples?
  • Result Supported by a very slim majority of
    Americans, 51 of Americans support such an
    amendment, 45 oppose
  • What does this result mean?

7
Same-sex Marriage
  • We need to read further, Gallup says
  • Sample consists of 2,527 randomly selected adults
  • But, according to the census bureau there are
    209 million adults in the United States, how can
    the opinion of only 2,527 reflect all 209
    million?
  • A sample cannot give us the exact truth about the
    population, so the result comes with a margin of
    error
  • For results based on a sample of this size, one
    can say with 95 confidence that the error
    attributable to sampling and other random effects
    could be plus or minus 2 percentage points.
  • WHAT DOES THIS MEAN ???

8
From Sample to Population
  • Gallups finding is such an amendment is
    supported by a very slim majority of Americans,
    51
  • Gallup makes this claim with respect to the 209
    million adults who are Americans, but they do not
    know the truth about those 209 million
  • Rather, they do know the truth about the 2,527
    adults that they contacted and talked to
  • That sample was chosen at random from the
    population of 209 million adults and therefore is
    representative of those 209 million (no bias)

9
From Sample to Population
  • What Gallup has done is taken the fact that 51
    of the sample support the amendment and turned it
    into an estimate that 51 of the population
    support the amendment
  • This is a basic concept in statistics using a
    fact about a sample to produce an estimate about
    the truth in the whole population
  • There is a vocabulary to talk about this

10
From Sample to Population
  • Parameter is to population as statistic is to
    sample.
  • Want to estimate an unknown parameter?
  • Choose a sample from the population and use a
    sample statistic as your estimate.

11
Example 1 Do you favor a constitutional
amendment?
  • p is a parameter whose value is the proportion of
    adults in the population who favor the amendment
  • This is what we are interested in
  • We do not know the real value of p
  • To estimate the value of p, Gallup took a sample
    of 2,527 adults
  • The proportion of adults in the sample who favor
    the amendment is a statistic whose value is an
    estimate of p
  • The name we give this statistic is or
    p-hat

12
Example 1 Do you favor a constitutional
amendment?
  • 1,289 adults in the sample favored the amendment,
    so
  • That is the value of the statistic p-hat 51
  • Because the 2,527 adults in the sample were
    chosen at random, it is reasonable to think we
    can use the value of the statistic p-hat as an
    estimate of the unknown parameter p

13
Example 1 Do you favor a constitutional
amendment?
  • The fact is that 51 of the sample favored the
    amendment
  • We do not know the percentage of adults in the
    population that favor the amendment, but because
    we have a representative sample, we estimate that
    to be 51 as well

14
Sampling Variability
  • What would happen if Gallup took a second sample
    of 2,527 adults?
  • Almost certainly a different number would support
    the amendment, perhaps 1,322, or maybe 1,016
  • Because we choose samples randomly and this
    involves variability, repeatedly taking samples
    would yield a variety of values for the p-hat
    statistic, say 42, 51 and 67
  • If the variation in p-hat among a large number of
    samples is too great, we cannot trust the results
    of any one sample

15
Sampling Variability
  • The first big advantage of random samples is that
    they attack bias
  • The second is that if we take lots of random
    samples of the same size from a population, the
    variation from sample to sample follows a
    predictable pattern
  • This predictable pattern shows that the results
    from bigger samples are less variable than the
    results from smaller samples

16
Example 2 Lots and lots of samples
  • How trustworthy are samples?
  • Lets compare many samples of two different sizes
    for the Gallup poll
  • Imagine that exactly half the adults in the
    population favor the amendment, so p 0.5, or
    50
  • Now, what if Gallup used an SRS of size 100 to
    estimate p-hat
  • How is this different from using an SRS of size
    2,527?

17
Example 2 Lots and lots of samples
  • 1,000 SRSs of size 100
  • Histogram of the number of SRSs yielding a given
    value of p-hat, notice how dispersed (spread out)
    the histogram is, but that it is centered around
    0.50 (50)

18
Example 2 Lots and lots of samples
  • 1,000 SRSs of size 2,527
  • Notice the histogram is much sharper this time,
    concentrated right around 0.50 (50)

19
Example 2 Lots and lots of samples
  • There is no bias with either the small or big
    sample
  • There is no heaping of values of p-hat anywhere
    else but around the true value of p
  • The results from the small sample are more
    variable
  • they cover a range from about 0.40 to 0.59
  • The results from the big sample are less
    variable
  • they cover a range from about 0.4804 to 0.5204
  • The conclusion Both sample sizes give us an
    unbiased estimate of p-hat, but the bigger sample
    almost always give an estimate of p-hat that is
    closer to the true value
  • This is true for any value of p like 0.40 or
    0.65

20
Sampling Variability
21
Sampling Variability
  • Lets think about the true value of the
    population parameter as a bulls eye on a target
    and the sample statistic as an arrow fired at the
    bulls eye
  • Bias and variability describe what happens when
    an archer fires many arrows at the target
  • Bias means the aim is consistently off
  • Variability means that repeated shots are widely
    scattered

22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
Sampling Variability
28
Sampling Variability
  • Gallup only took one sample
  • We cannot know how close to the truth the p-hat
    estimated from this sample is we dont know the
    truth
  • However, it is true that large random samples
    almost always given an estimate that is close to
    the truth

29
Margin of Error and all that
  • The margin of error translates sampling
    variability into a statement of how much
    confidence we can have in the results of a survey
  • What does this mean?

30
Margin of Error and all that
  • A random sample will usually not estimate the
    truth about the population exactly
  • We need a margin of error to tell us how close
    to the truth the estimate is
  • Although the difference between the truth and the
    estimate usually differ by less than the margin
    of error, we cannot be certain that the estimate
    does not differ from the truth by more than the
    margin of error
  • 95 of the samples we draw differ from the truth
    by less than the margin of error, 5 miss by more
    than the margin of error

31
Margin of Error and all that
  • Finding the exact margin of error is a job for
    statisticians
  • We will use a simple formula to get a rough idea
    of the size of a sample surveys margin of error
    when a SRS is used

32
Example 4 What is the margin of error?
  • What is the margin of error for the Gallup poll?
  • n 2,527
  • The margin of error for 95 confidence is
  • That is about 2.0, pretty much what Gallup
    announced

33
Example 5 Margin of error and sample size
  • In Example 2, we compared SRSs of size n 100
    and n 2,527
  • The variability in p-hat for n 100 was about 5
    times as for n 2,527
  • We saw that for n 2,527, the margin of error
    for 95 confidence is about 2.0, what about for
    n 100
  • That is, about 10
  • This is roughly 5 times the margin of error of
    the n 2,527 sample

34
Confidence Statements
  • Here is what Gallup says about their results
  • The poll found that a very slim majority of
    Americans, 51, favor such an amendment.
  • Heres a more informative statement
  • We are 95 confident that between 49 and 53 of
    all adults favor such an amendment
  • These are both confidence statements that tell us
    roughly how close the estimate of p (that is
    p-hat) is to the true value of p

35
Confidence Statements
  • A confidence statement is a fact about what
    happens in all possible samples, and is used to
    say how much we can trust the result of one
    sample
  • 95 confidence means
  • We used a sampling method that gives a result
    this close to the truth 95 of the time

36
Confidence Statements Rules of Thumb
  • The conclusion of a confidence statement always
    applies to the population, not to the sample
  • Our conclusion about the population is never
    completely certain
  • A sample survey can choose to use a confidence
    level other than 95
  • Remember that our quick rule only works for
    confidence levels of 95

37
Confidence Statements Rules of Thumb
  • It is usual to report the margin of error for 95
    confidence
  • Want a smaller margin of error with the same
    confidence, then you must take a larger sample

38
Sampling from Large Populations
  • Gallups sample of 2,527 adults is only 1 of
    every 82,700 adults in the US
  • Does it matter that 2,527 is 1 in 100 or 1 in
    82,700?

39
Sampling from Large Populations
  • Why doesnt population size matter?
  • Imagine sampling corn kernels from a truck full
    of corn or a bag full of corn using a shovel
  • The kernels are well mixed (to insure a random
    sample)
  • The shovel doesnt care or know whether it is
    shoveling from the truck or the bag
  • The variability in each shovel full of corn
    depends on the size of the shovel, not the size
    of the container its shoveling from
  • This is great news for those who study large
    populations!

40
Sampling from Large Populations
  • Random samples of size 1,000 or more are large
    enough to give small margins of error and can
    still properly represent very large populations
  • Keep in mind that even very large voluntary
    response or convenience samples are worthless
    because of bias
  • Taking a large sample DOES NOT fix bias
  • This is not good news for people who study small
    populations!
  • Remember the margin of error depends on the
    sample size not the population size

41
Sampling from Large Populations
  • It always takes a sample size of roughly 2,500 to
    get a margin of error of 2 (with 95 confidence)
  • So, if you want to answer a question about the
    10,000 students at the University of Smallsville,
    and you want a margin of error of 2, you will
    have to interview a quarter of all the students

42
Summary
  • The purpose of sampling is to gain information
    about a population
  • We often use a sample statistic to estimate the
    value of a population parameter
  • To describe how trustworthy a single sample is,
    ask What would happen if we took a large number
    of samples from the same population?
  • If almost all samples give a result that is very
    close to the truth, then we can trust our one
    sample, even though we cant be certain that it
    is close to the truth

43
Summary
  • In planning a survey
  • Avoid bias by using a random sampling technique
  • Choose a large enough sample to reduce the
    variability of the result
  • Using a large sample guarantees that almost all
    samples will give an accurate result
  • Use a confidence statement to say how accurate
    the result is

44
Summary
  • Most frequently, the margin of error is all that
    is mentioned
  • Usually this margin of error corresponds to 95
    confidence
  • That is if we chose many samples, the truth about
    the population would be within the margin of
    error 95 of the time

45
Summary
  • We can estimate the margin of error for 95
    confidence for a SRS with the formula
  • As the formula suggests, the margin of error
    depends only on the sample size, not on the size
    of the population

46
Exercise 3.5
  • A sampling experiment. The n 100 and n 2,527
    examples show how the sample proportion p-hat
    behaves when we take many samples form the same
    population. You can follow the steps in this
    process on a small scale.
  • The figure on the following slide represents a
    small population. Each circle represents an
    adult. The white circles are people who favor a
    constitutional amendment that would define
    marriage as between a man and woman, and the
    colored circles are people who are opposed. You
    can check that 50 of the 100 circles are white,
    so the population proportion in favor is p
    50/100 0.5

47
(No Transcript)
48
Exercise 3.5
  • The circles are labeled 00, 01, , 99. Use line
    101 of Table A to draw an SRS of size 4. What is
    the proportion p-hat of the people in your sample
    who favor the constitutional amendment?
  • Take 9 more SRSes of size 4 (10 in all), using
    lines 102 to 110 of Table A, a different line for
    each sample. You now have 10 values of the
    sample proportion p-hat.
  • Because your samples have only 4 people, the only
    values p-hat can take are 0/4, 1/4, 2/4, 3/4, and
    4/4. That is, p-hat is always 0, 0.25, 0.5, 0.75
    or 1. Mark these numbers on a line and make a
    histogram of your 10 results by putting a bar
    above each number to show how many samples had
    that outcome.

49
Exercise 3.5
  • Taking samples of size 4 from a population of
    size 100 is not a practical setting, but lets
    look at your results anyway. How many of your 10
    samples estimated the population proportion p
    0.5 exactly correctly? Is the true value 0.5 in
    the center of your sample values? Explain why
    0.5 would be in the center of the sample values
    if you took a large number of samples?

50
Solution to Exercise 3.5
  • Starting at line 101, we choose 19, 22, 39, and
    50. Two of these circles are white, so p-hat
    0.5.
  • The table (next slide) shows all ten samples,
    indicating which circles are shaded.
  • Histogram on the next slide.
  • Four were exactly correct. For this small number
    of samples, the center seems to be a bit higher
    than 0.5. In a large number of samples, 0.5
    should be in the center because this random
    sample should be unbiased.

51
Solution to Exercise 3.5
Write a Comment
User Comments (0)
About PowerShow.com