Introduction to Inference Chapter 6 - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Introduction to Inference Chapter 6

Description:

Suppose I want a 99% c.i. of the form xbar /- 300. How big a sample do I need? ... The mean yield of corn in the US is about 120 bushels per acre. ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 34
Provided by: artwar
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Inference Chapter 6


1
Introduction to InferenceChapter 6
2
Statistical Inference
  • Want to draw conclusions based on sample data
  • i.e.-- say something about an entire population
    based on information in a sample.
  • Conclusions are subject to sampling error
  • We want to quantify the margin of error we are
    likely to encounter.
  • Examples
  • 1. Im interested in estimating the mean
    income of loggers in the interior of BC.
  • 2. Gallup poll.

3
Confidence intervals
  • Our goal is to obtain an estimate of some
    parameter of a population.
  • We want our estimate to be of the general form
  • Best guess /- error of estimation
  • To get our estimate, we take a random sample from
    the population and proceed on the basis of the
    information we obtain from the sample.
  • What is our best guess??
  • How do we obtain our error of estimation ?

4
Confidence intervals for population mean
  • Best Guess
  • Our best guess is xbar, the sample mean.
  • How to determine the estimation error?

5
Determining the estimation error
  • To determine estimation error, well use our
    knowledge of sampling distribution of xbar.
  • We know that xbar has mean µ, with standard
    deviation, ?/sqrt(n), where n is our sample
    size.
  • If our original population is normal, we also
    know that xbar is normally distributed.
  • Even if the original population isnt normal,
    xbar approximately follows a normal distribution
    if the sample size, n , is large (by CLT).

6
Determining the error of estimation -- continued
  • For example, we are about 95 sure that xbar lies
    in the range
  • µ /- 2 ?/sqrt(n). ()
  • E.g., if ? 45 and n 100, we are 95 sure
    that xbar lies in the range µ /- 9. (verify).
  • How to use () to get our confidence interval??

7
A confidence interval for population mean
  • It turns out that our 95 c.i. for µ is just
  • xbar /- 2 ?/sqrt(n).
  • Why does this work?
  • From our picture ( to be added in class), we see
    that our interval will trap µ approximately
    95 percent of the time.

8
A numerical example
  • I collect SRS of n 100 loggers. I find
    average income in sample is 17,000.
  • Obtain a 95 c.i. for the mean income of all
    loggers. Assume that std. dev. of loggers
    incomes is known to be ? 2500.
  • (In Ch 7 we will drop the assumption that ? is
    known -- ? will be estimated from the sample data)

9
Level C confidence intervals
10
Tradeoffs
  • For a given sample size, higher level of
    confidence leads to wider confidence interval.

11
Designing a confidence interval
  • Suppose I want a level C confidence interval of
    the form
  • Xbar /- m,
  • where m is a desired margin of error that I
    supply in advance. I also specify C in advance.
  • Required sample size is
  • n (z?/m)2
  • where z is the z value required for level C
    confidence
  • Where does this formula come from?

12
Example--Designing a confidence interval
  • Recall the logger example. I have ? 2500.
    Suppose I want a 99 c.i. of the form xbar /-
    300. How big a sample do I need?
  • What if I want a 99 confidence interval of form
    xbar /- 150?

13
Example 6.12 (Text)
  • A study of career paths of hotel general managers
    sent questionnaires to a SRS of 160 hotels
    belonging to major US hotel chains. There were
    114 responses. The average time these 114
    general managers had spent with their current
    company was 11.78 years. Give a 99 c.i. for the
    mean number of years general managers of
    major-chain hotels have spent with their current
    company. (Take it as known that the std dev of
    time with the company for general managers is 3.2
    years).
  • A margin of error of /- 1 year is considered
    acceptable. What is the minimum sample size that
    would be required to achieve this level of
    accuracy with a confidence level of 99

14
Confidence interval summary
  • Assume I have a random sample and that the
    population std dev, ?, is known.
  • A level C c.i. for the population mean is
  • xbar /- z?/sqrt(n).
  • (value of z will depend on C)
  • The c.i, is exact when the underlying population
    is normal.
  • If the pop. is not normal, the c.i. is
    approximately correct for large samples (by CLT).
  • We can find the sample size, n, required to
    obtain a c.i. with a specified margin error, m,
    by using the formula
  • n (z?/m)2

15
Tests of Significance
  • Confidence interval
  • Goal is to estimate some parameter of a
    population.
  • Test of Significance
  • Goal is to assess the evidence provided by the
    data in favor of some claim about the population.

16
Motivational Example
  • Four randomly selected students do a 20 hour SAT
    prep course at a special school. After the
    course, they write the SAT. Their scores turn
    out to be 560, 600, 590, 490. We find xbar
    560.
  • Scores of students who take the course are known
    to be normally distributed with a standard
    deviation of 50.
  • National SAT test scores are normally distibuted
    with a mean of 500 and a standard deviation of
    50.
  • Does the sample data provide strong support for
    the schools claim that the prep course is
    effective in increasing SAT scores?

17
Motivational Example (cont.)
  • Lets use what we know about the behavior of xbar
    to assess this claim.
  • In particular, we ask
  • What is the probability of observing a sample
    mean of 560 or larger if the population mean
    score for those who took the course is 500?
    (i.e., course doesnt help on average)

18
Motivational Example (cont.)
  • What if the sample mean had been xbar 700?
  • What if the sample mean were xbar 520?

19
Formalization of our example
  • First, state hypotheses
  • H0 µ 500
  • (course has no effect).
  • Ha µ gt 500
  • (course increases mean score).
  • Terminology
  • H0 is the null hypothesis.
  • Ha is the alternative hypothesis.
  • Usually, Ha is the claim we hope to establish.
    (Equivalently, H0 is the claim we want to
    falsify).
  • In our example, the more the sample mean exceeds
    500, the more evidence we have in favor of Ha (
    i.e. against H0 ).

20
Formalization of our example (cont.)
  • Second We compute the test statistic
  • z (xbar mu0)/(sigma/sqrt(n))
  • (560 -500)/(50/sqrt(4)) 2.4
  • Third Assess the strength of the evidence
    against the null hypothesis by computing a
    p-value.
  • p-value Prob (z gt2.4) 0.0082
  • (The p-value is obtained from the z value that
    results from the specific form of our
    hypotheses.)
  • Fourth state a conclusion.
  • strong evidence in favor of companys claim.

21
Other possible forms of H0 and Ha in our example
  • Suppose that I suspect that the school has a
    reverse effect and actually decreases average
    SAT performance.
  • How would I set up hypotheses in such a way that
    a low average test score will support my
    suspicions?
  • What if I suspect that the course has some effect
    on SAT scores, but Im not sure whether it
    increases or decreases the average score. How
    would I set up appropriate two sided hypotheses
    for this situation?
  • Either a low or a high average test score will
    support my suspicions.

22
General form of the z test
23
Example
  • SSHA is a psychological test that
    measures motivation, attitude towards school, and
    study habits of students. Scores range from 0 to
    200. The mean score for US college students is
    about 115 with a std. dev of about 30. A teacher
    who suspects that older students have better
    attitudes towards school gives the SSHA to 20
    students who are at least 30 years of age. Their
    mean score is 135.2
  • State appropriate null and alternative
    hypotheses.
  • Report the p-value of your test and state your
    conclusion clearly.
  • Your test required 2 important assumptions in
    addition to the assumption that sigma 30. What
    are they? Which is more important?

24
Another Example
  • A study of the pay of corporate CEOs examined the
    increase in cash compensation for the CEOs of 104
    companies, adjusted for inflation, in a recent
    year. The average inflation adjusted increase
    in the sample was xbar 6.9 with a sample
    standard deviation of s 55. Is this good
    evidence that the mean real compensation
    increased in the past year?
  • Because the sample size is large, s is close to
    the population sigma, so it is reasonable to
    assume that s 55.

25
P-values and statistical significance
26
Example statistical significance
  • The mean yield of corn in the US is about 120
    bushels per acre. A survey of 40 farmers this
    year gives a sample mean of xbar 123.8 bushels
    per acre. We want to know if this provides good
    evidence that the national mean this year is not
    120 bushels per acre. Assume that the farmers
    surveyed constitute a SRS from the population of
    all commercial corn growers and that the
    population has a std dev of sigma 10 bushels
    per acre.
  • (a) Set up the appropriate hypotheses. Give the
    p-value for the test. Is the result significant
    at the 5 level?
  • (b) Are you convinced that the population mean is
    not 120 bushels per acre?
  • (c) Is your conclusion correct if the
    distribution of corn yields is somewhat
    non-normal? Why?

27
Another Example
  • A computer has random number generator that
    generates random numbers uniformly distributed
    between 0 and 1. If this is true, the numbers
    generated come from a population with µ 0.5 and
    ? .2887. A command to generate 100 random
    numbers gives an average of 0.4365. Is the
    generator working properly?

28
Another Example
  • A union leader claims that the average school
    teacher makes less than 40,000 per year. A
    random sample of 400 school teachers finds a
    sample mean of xbar 39,650. The standard
    deviation in school teachers incomes is known to
    be 5,000. Assess the union leaders claim.

29
Relationship between confidence intervals and two
sided tests
  • A level ? 2-sided significance test rejects the
    hypothesis
  • H0 µ µ0
  • and accepts the alternative
  • Ha µ ? µ0
  • precisely when the value µ0 lies
  • outside a level 1- ? confidence
  • interval for µ.

30
Example (confidence interval and 2 sided
hypothesis test)
  • Diameters of a certain machine part are normally
    distributed with a standard deviation of 0.1 mm.
    A random sample of 25 parts yields an average
    diameter of 11.9 mm.
  • Find a 99 c.i. for the true mean diameter.
  • Based on your sample, can you conclude, at the 1
    level of significance, that the true mean
    diameter differs from 12 mm?

31
Comments about hypothesis testing
  • P-values tell us more than setting, in advance, a
    fixed level of significance.
  • Statistical significance is not necessarily the
    same as practical significance.
  • Statistical testing is not always valid e.g.
    faulty data bias in questionnaires etc.

32
Two types of error
  • Example If there are more than 32,000 trees on
    a plot of land it will be economical to log the
    plot. I sample the plot and get the following
    95 c.i. for the total number of trees
  • 32,500 /- 3,500
  • Should I log the lot?

33
Type I and Type II errors
Write a Comment
User Comments (0)
About PowerShow.com