Chapter 14 Tests of Hypotheses Based on Count Data - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

Chapter 14 Tests of Hypotheses Based on Count Data

Description:

The pharmacy then counted the remaining pills and classified each patient as ... There's really no lower bound on the amount of data that is needed for Fisher's ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 54
Provided by: dUmn
Category:

less

Transcript and Presenter's Notes

Title: Chapter 14 Tests of Hypotheses Based on Count Data


1
Chapter 14 Tests of Hypotheses Based on Count
Data
  • 14.2 Tests concerning proportions (large samples)
  • 14.3 Differences between proportions
  • 14.4 The analysis of an r x c table

2
14.2 Tests concerning proportions (large samples)
  • npgt5 n(1-p)gt5
  • n independent trials
  • X of successes
  • pprobability of a success
  • Estimate

3
Tests of Hypotheses
  • Null H0 pp0
  • Possible Alternatives
  • HA pltp0
  • HA pgtp0
  • HA p?p0

4
Test Statistics
  • Under H0, pp0, and
  • Statistic
  • is approximately standard normal under H0 .
  • Reject H0 if z is too far from 0 in either
    direction.

5
Rejection Regions
6
Equivalent Form
7
Example 14.1
  • H0 p0.75 vs HA p?0.75
  • ?0.05
  • n300
  • x206
  • Reject H0 if zlt-1.96 or zgt1.96

8
Observed z value
  • Conclusion reject H0 since zlt-1.96
  • P(zlt-2.5 or zgt2.5)0.0124lta ?reject H0.

9
Example 14.2
  • Toss a coin 100 times and you get 45 heads
  • Estimate pprobability of getting a head
  • Is the coin balanced one? a0.05
  • Solution
  • H0 p0.50 vs HA p?0.50

10
Enough Evidence to Reject H0?
  • Critical value z0.0251.96
  • Reject H0 if zgt1.96 or zlt-1.96
  • Conclusion accept H0

11
Another example
  • The following table is for a certain screening
    test

12
  • Test to see if the sensitivity of the screening
    test is less than 97.
  • Hypothesis
  • Test statistic

13
What is the conclusion?
  • Check p-value when z-2.6325, p-value 0.004
  • Conclusion we can reject the null hypothesis at
    level 0.05.

14
One word of caution about sample size
  • If we decrease the sample size by a factor of 10,

15
And if we try to use the z-test,
P-value is greater than 0.05 for sure (p0.2026).
So we cannot reach the same conclusion.
And this is wrong!
16
So for test concerning proportions
  • We want
  • npgt5 n(1-p)gt5

17
14.3 Differences Between Proportions
  • Two drugs (two treatments)
  • p1 percentage of patients recovered after taking
    drug 1
  • p2 percentage of patients recovered after taking
    drug 2
  • Compare effectiveness of two drugs

18
Tests of Hypotheses
  • Null H0 p1p2 (p1-p2 0)
  • Possible Alternatives
  • HA p1ltp2
  • HA p1gtp2
  • HA p1?p2

19
Compare Two Proportions
  • Drug 1 n1 patients, x1 recovered
  • Drug 2 n2 patients, x2 recovered
  • Estimates
  • Statistic for test
  • If we did this study over and over and drew a
    histogram of the resulting values of ,
    that histogram or distribution would have
    standard deviation

20
Estimating the Standard Error
  • Under H0, p1p2p. So
  • Estimate the common p by

21
So put them together
22
Example 12.3
  • Two sided test
  • H0 p1p2 vs HA p1?p2
  • n180, x156
  • n280, x238

23
Two Tailed Test
  • Observed z-value
  • Critical value for two-tailed test 1.96
  • Conclusion Reject H0 since zgt1.96

24
Rejection Regions
25
P-value of the previous example
  • P-valueP(zlt-2.88)P(zgt2.88)20.004
  • So not only we can reject H0 at 0.05 level, we
    can also reject at 0.01 level.

26
14.4 The analysis of an r x c table
  • Recall Example 12.3
  • Two sided test H0 p1p2 vs HA
    p1?p2
  • n180, x156 n280, x238
  • We can put this into a 2x2 table and the question
    now becomes is there a relationship between
    treatment and outcome? We will come back to this
    example after we introduce 2x2 tables and
    chi-square test.

Recover Not Rec Treat 1 56 24
80 Treat 2 38 42
80 94 66 160
27
2x2 Contingency Table
  • The table shows the data from a study of 91
    patients who had a myocardial infarction (Snow
    1965). One variable is treatment (propranolol
    versus a placebo), and the other is outcome
    (survival for at least 28 days versus death
    within 28 days).

28
Hypotheses for Two-way Tables
  • The hypotheses for two-way tables are very broad
    stroke.
  • The null hypothesis H0 is simply that there is no
    association between the row and column variable.
  • The alternative hypothesis Ha is that there is an
    association between the two variables. It
    doesnt specify a particular direction and cant
    really be described as one-sided or two-sided.

29
Hypothesis statement in Our Example
  • Null hypothesis the method of treating the
    myocardial infarction patients did not influence
    the proportion of patients who survived for at
    least 28 days.
  • The alternative hypothesis is that the outcome
    (survival or death) depended on the treatment,
    meaning that the outcomes was the dependent
    variable and the treatment was the independent
    variable.

30
Calculation of Expected Cell Count
  • To test the null hypothesis, we compare the
    observed cell counts (or frequencies) to the
    expected cell counts (also called the expected
    frequencies)
  • The process of comparing the observed counts with
    the expected counts is called a goodness-of-fit
    test. (If the chi-square value is small, the fit
    is good and the null hypothesis is not rejected.)

31
  • Observed cell counts

Expected cell counts
32
The Chi-Square ( c2) Test Statistic
The chi-square statistic is a measure of how much
the observed cell counts in a two-way table
differ from the expected cell counts. It can be
used for tables larger than 2 x 2, if the average
of the expected cell counts is gt 5 and the
smallest expected cell count is gt 1 and for 2 x
2 tables when all 4 expected cell counts are gt 5.
The formula is c2 S(observed count expected
count)2/expected count Degrees of freedom (df)
(r 1) x (c 1) Where observed is an observed
sample count and expected is the computed
expected cell count for the same cell, r is the
number of rows, c is the number of columns, and
the sum (S) is over all the r x c cells in the
table (these do not include the total cells).
33
The Chi-Square ( c2) Test Statistic
34
(No Transcript)
35
Example Patient Compliance w/ Rx
In a study of 100 patients with hypertension, 50
were randomly allocated to a group prescribed 10
mg lisinopril to be taken once daily, while the
other 50 patients were prescribed 5 mg lisinopril
to be taken twice daily. At the end of the 60
day study period the patients returned their
remaining medication to the research pharmacy.
The pharmacy then counted the remaining pills and
classified each patient as lt 95 or 95
compliant with their prescription. The two-way
table for Compliance and Treatment was
Treatment Compliance 10 mg Daily 5 mg
bid Total 95 46 40 86 lt 95 4 10 14 Total
50 50 100
36
Example Patient Compliance w/ Rx
Treatment Compliance 10 mg Daily 5 mg
bid Total 95 460 400 860 lt 95
40 100 140 Total 500 500 1000
c2 29.9, df (2-1)(2-1) 1, P-value lt0.001
37
If we use the two sample test for proportion
38
The c2 and z Test Statistics
  • The comparison of the proportions of successes
    in two populations leads to a 2 x 2 table, so the
    population proportions can be compared either
    using the c2 test or the two-sample z test .
  • For a 2-sided test, it really doesnt matter,
    because
  • they always give exactly the same result, because
    the c2 is equal to the square of the z statistic
    and
  • the chi-square with one degree of freedom c2(1)
    critical values are equal to the squares of the
    corresponding z critical values.
  • A P-value for the 2 x 2 c2 can be found by
    calculating the square root of the chi-square,
    looking that up in Table for P(Z gt z) and
    multiplying by 2, because the chi-square always
    tests the two-sided alternative.

39
The c2 and z Test Statistics
  • For a 2 x 2 table with a one-sided alternative
  • The 1-sided two-sample z statistic could to be
    used.
  • The chi-square p-value could be modified as
  • 1-sided p-value 0.5(2-sided p-value) if the
    observed
  • difference is in the direction of the
    alternative
  • HA p1lt p2 and
  • 1-sided p-value 1 - 0.5(2-sided p-value) if
    the observed
  • difference is away from the alternative, e.g.
  • The chi-square is the one most often seen in the
    literature.

40
Summary Computations for Two-way Tables
  • Create the table, including observed cell counts,
    column and row totals.
  • Find the expected cell counts.
  • Determine if a c2 test is appropriate.
  • Calculate the c2 statistic and number of degrees
    of freedom.
  • Find the approximate P-value
  • Use Table III chi-square table to find the
    approximate P-value.
  • Or use z-table and find the two-tailed p-value if
    it is 2 x 2.
  • Draw conclusions about the association between
    the row and column variables.

41
Yates Correction for Continuity
  • The chi-square test is based on the normal
    approximation of the binomial distribution
    (discrete), many statisticians believe a
    correction for continuity is needed.
  • It makes little difference if the numbers in the
    table are large, but in tables with small numbers
    it is worth doing.
  • It reduces the size of the chi-square value and
    so reduces the chance of finding a statistically
    significant difference, so that correction for
    continuity makes the test more conservative.

42
What do we do if the expected values in any of
the cells in a 2x2 table is below 5?
For example, a sample of teenagers might be
divided into male and female on the one hand, and
those that are and are not currently dieting on
the other. We hypothesize, perhaps, that the
proportion of dieting individuals is higher among
the women than among the men, and we want to test
whether any difference of proportions that we
observe is significant. The data might look like
this

43
The question we ask about these data is knowing
that 10 of these 24 teenagers are dieters, what
is the probability that these 10 dieters would be
so unevenly distributed between the girls and the
boys? If we were to choose 10 of the teenagers at
random, what is the probability that 9 of them
would be among the 12 girls, and only 1 from
among the 12 boys? --Hypergeometric
distribution! --Fishers exact test uses the
hypergeometric distribution to calculate the
exact probability of obtaining such set of the
values.
44
Fishers exact test
  • Before we proceed with the Fisher test, we first
    introduce some notation. We represent the cells
    by the letters a, b, c and d, call the totals
    across rows and columns marginal totals, and
    represent the grand total by n. So the table now
    looks like this


45
Fisher showed that the probability of obtaining
any such set of values was given by the
hypergeometric distribution                    
                                               
   
46
In our example
More extreme than observed
As extreme as observed
HA pM pW HA pMlt pW Recall that p-value
is the probability of observing data as extreme
or more extreme if the null hypothesis is true.
So the p-value is this problem is 0.00137.
47
Two Sided Test
More extreme than observed
As extreme as observed
HA pM pW HA pM? pW Extreme observations
include either mostly women dieters or mostly
men. Since the numbers of men and women are
equal, probabilities remain the same if we
interchange men and women. So the p-value is
this problem is 20.00137.
48
The Fisher Exact Probability Test
  • Used when one or more of the expected counts in a
    contingency table is small (lt2).
  • Fisher's Exact Test is based on exact
    probabilities from a specific distribution (the
    hypergeometric distribution).
  • There's really no lower bound on the amount of
    data that is needed for Fisher's Exact Test. You
    can use Fisher's Exact Test when one of the cells
    in your table has a zero in it. Fisher's Exact
    Test is also very useful for highly imbalanced
    tables. If one or two of the cells in a two by
    two table have numbers in the thousands and one
    or two of the other cells has numbers less than
    5, you can still use Fisher's Exact Test.
  • Fisher's Exact Test has no formal test statistic
    and no critical value, and it only gives you a
    p-value.

49
  • Does pregnancy affect outcome of methadone
    maintenance treatment? Does pregnancy affect
    outcome of methadone maintenance treatment?
  • Journal of Substance Abuse Treatment, June 2004,
    295-303

50
(No Transcript)
51
Pick 3 randomly for 3 Axis 1 Disorder Cases
51 pregnant
51 non-pregnant
Observed X0
52
Adinoma Tumors in Mice
  • Genetic basis of variation in adenoma
    multiplicity in ApcMin/ Mom1S mice. Proceedings
    of the National Academy of Science, February 22,
    2005, 2868-2873
  • The results (Table 5) indicated that the
    frequency of allele loss was significantly
    higher in line V (96) than in line I (77)
  • P 0.003, Fisher's exact test).
  • Table 5. Frequency of allele loss of WT Apc in
    lines
  • Type Number Percent     
  • Line MinI/I 46/60 77     
  • Line MinV/V 52/54 96

53
Loss No Loss Line I 46
14 60 Line V 52 X2
54 98 16 114
  • X P(X)
  • 0 0.000012
  • 0.000223
  • 0.001925
  • 0.009938
  • 0.034317
  • 0.012970
  • 0.002941
  • 0.000445
  • 0.000040
  • 0.000002
  • P 0.002647

Observed count
More extreme values are values with smaller
probability.
Write a Comment
User Comments (0)
About PowerShow.com