Chapter 14 Tests of Hypotheses Based on Count Data presentation

About This Presentation

Transcript and Presenter's Notes

Title: Chapter 14 Tests of Hypotheses Based on Count Data

1
Chapter 14 Tests of Hypotheses Based on Count
Data

14.2 Tests concerning proportions (large samples)
14.3 Differences between proportions
14.4 The analysis of an r x c table

2
14.2 Tests concerning proportions (large samples)

npgt5 n(1-p)gt5
n independent trials
X of successes
pprobability of a success
Estimate

3
Tests of Hypotheses

Null H0 pp0
Possible Alternatives
HA pltp0
HA pgtp0
HA p?p0

4
Test Statistics

Under H0, pp0, and
Statistic
is approximately standard normal under H0 .
Reject H0 if z is too far from 0 in either
direction.

5
Rejection Regions
6
Equivalent Form
7
Example 14.1

H0 p0.75 vs HA p?0.75
?0.05
n300
x206
Reject H0 if zlt-1.96 or zgt1.96

8
Observed z value

Conclusion reject H0 since zlt-1.96
P(zlt-2.5 or zgt2.5)0.0124lta ?reject H0.

9
Example 14.2

Toss a coin 100 times and you get 45 heads
Estimate pprobability of getting a head
Is the coin balanced one? a0.05
Solution
H0 p0.50 vs HA p?0.50

10
Enough Evidence to Reject H0?

Critical value z0.0251.96
Reject H0 if zgt1.96 or zlt-1.96
Conclusion accept H0

11
Another example

The following table is for a certain screening
test

Test to see if the sensitivity of the screening
test is less than 97.
Hypothesis
Test statistic

13
What is the conclusion?

Check p-value when z-2.6325, p-value 0.004
Conclusion we can reject the null hypothesis at
level 0.05.

14
One word of caution about sample size

If we decrease the sample size by a factor of 10,

15
And if we try to use the z-test,
P-value is greater than 0.05 for sure (p0.2026).
So we cannot reach the same conclusion.
And this is wrong!
16
So for test concerning proportions

We want
npgt5 n(1-p)gt5

17
14.3 Differences Between Proportions

Two drugs (two treatments)
p1 percentage of patients recovered after taking
drug 1
p2 percentage of patients recovered after taking
drug 2
Compare effectiveness of two drugs

18
Tests of Hypotheses

Null H0 p1p2 (p1-p2 0)
Possible Alternatives
HA p1ltp2
HA p1gtp2
HA p1?p2

19
Compare Two Proportions

Drug 1 n1 patients, x1 recovered
Drug 2 n2 patients, x2 recovered
Estimates
Statistic for test
If we did this study over and over and drew a
histogram of the resulting values of ,
that histogram or distribution would have
standard deviation

20
Estimating the Standard Error

Under H0, p1p2p. So
Estimate the common p by

21
So put them together
22
Example 12.3

Two sided test
H0 p1p2 vs HA p1?p2
n180, x156
n280, x238

23
Two Tailed Test

Observed z-value
Critical value for two-tailed test 1.96
Conclusion Reject H0 since zgt1.96

24
Rejection Regions
25
P-value of the previous example

P-valueP(zlt-2.88)P(zgt2.88)20.004
So not only we can reject H0 at 0.05 level, we
can also reject at 0.01 level.

26
14.4 The analysis of an r x c table

Recall Example 12.3
Two sided test H0 p1p2 vs HA
p1?p2
n180, x156 n280, x238
We can put this into a 2x2 table and the question
now becomes is there a relationship between
treatment and outcome? We will come back to this
example after we introduce 2x2 tables and
chi-square test.

Recover Not Rec Treat 1 56 24
80 Treat 2 38 42
80 94 66 160
27
2x2 Contingency Table

The table shows the data from a study of 91
patients who had a myocardial infarction (Snow
1965). One variable is treatment (propranolol
versus a placebo), and the other is outcome
(survival for at least 28 days versus death
within 28 days).

28
Hypotheses for Two-way Tables

The hypotheses for two-way tables are very broad
stroke.
The null hypothesis H0 is simply that there is no
association between the row and column variable.

The alternative hypothesis Ha is that there is an
association between the two variables. It
doesnt specify a particular direction and cant
really be described as one-sided or two-sided.

29
Hypothesis statement in Our Example

Null hypothesis the method of treating the
myocardial infarction patients did not influence
the proportion of patients who survived for at
least 28 days.
The alternative hypothesis is that the outcome
(survival or death) depended on the treatment,
meaning that the outcomes was the dependent
variable and the treatment was the independent
variable.

30
Calculation of Expected Cell Count

To test the null hypothesis, we compare the
observed cell counts (or frequencies) to the
expected cell counts (also called the expected
frequencies)
The process of comparing the observed counts with
the expected counts is called a goodness-of-fit
test. (If the chi-square value is small, the fit
is good and the null hypothesis is not rejected.)

Observed cell counts

Expected cell counts
32
The Chi-Square ( c2) Test Statistic
The chi-square statistic is a measure of how much
the observed cell counts in a two-way table
differ from the expected cell counts. It can be
used for tables larger than 2 x 2, if the average
of the expected cell counts is gt 5 and the
smallest expected cell count is gt 1 and for 2 x
2 tables when all 4 expected cell counts are gt 5.
The formula is c2 S(observed count expected
count)2/expected count Degrees of freedom (df)
(r 1) x (c 1) Where observed is an observed
sample count and expected is the computed
expected cell count for the same cell, r is the
number of rows, c is the number of columns, and
the sum (S) is over all the r x c cells in the
table (these do not include the total cells).
33
The Chi-Square ( c2) Test Statistic
34
(No Transcript)
35
Example Patient Compliance w/ Rx
In a study of 100 patients with hypertension, 50
were randomly allocated to a group prescribed 10
mg lisinopril to be taken once daily, while the
other 50 patients were prescribed 5 mg lisinopril
to be taken twice daily. At the end of the 60
day study period the patients returned their
remaining medication to the research pharmacy.
The pharmacy then counted the remaining pills and
classified each patient as lt 95 or 95
compliant with their prescription. The two-way
table for Compliance and Treatment was
Treatment Compliance 10 mg Daily 5 mg
bid Total 95 46 40 86 lt 95 4 10 14 Total
50 50 100
36
Example Patient Compliance w/ Rx
Treatment Compliance 10 mg Daily 5 mg
bid Total 95 460 400 860 lt 95
40 100 140 Total 500 500 1000
c2 29.9, df (2-1)(2-1) 1, P-value lt0.001
37
If we use the two sample test for proportion
38
The c2 and z Test Statistics

The comparison of the proportions of successes
in two populations leads to a 2 x 2 table, so the
population proportions can be compared either
using the c2 test or the two-sample z test .
For a 2-sided test, it really doesnt matter,
because
they always give exactly the same result, because
the c2 is equal to the square of the z statistic
and
the chi-square with one degree of freedom c2(1)
critical values are equal to the squares of the
corresponding z critical values.

A P-value for the 2 x 2 c2 can be found by
calculating the square root of the chi-square,
looking that up in Table for P(Z gt z) and
multiplying by 2, because the chi-square always
tests the two-sided alternative.

39
The c2 and z Test Statistics

For a 2 x 2 table with a one-sided alternative
The 1-sided two-sample z statistic could to be
used.
The chi-square p-value could be modified as
1-sided p-value 0.5(2-sided p-value) if the
observed
difference is in the direction of the
alternative
HA p1lt p2 and
1-sided p-value 1 - 0.5(2-sided p-value) if
the observed
difference is away from the alternative, e.g.

The chi-square is the one most often seen in the
literature.

40
Summary Computations for Two-way Tables

Create the table, including observed cell counts,
column and row totals.

Find the expected cell counts.
Determine if a c2 test is appropriate.
Calculate the c2 statistic and number of degrees
of freedom.

Find the approximate P-value
Use Table III chi-square table to find the
approximate P-value.
Or use z-table and find the two-tailed p-value if
it is 2 x 2.

Draw conclusions about the association between
the row and column variables.

41
Yates Correction for Continuity

The chi-square test is based on the normal
approximation of the binomial distribution
(discrete), many statisticians believe a
correction for continuity is needed.
It makes little difference if the numbers in the
table are large, but in tables with small numbers
it is worth doing.
It reduces the size of the chi-square value and
so reduces the chance of finding a statistically
significant difference, so that correction for
continuity makes the test more conservative.

42
What do we do if the expected values in any of
the cells in a 2x2 table is below 5?
For example, a sample of teenagers might be
divided into male and female on the one hand, and
those that are and are not currently dieting on
the other. We hypothesize, perhaps, that the
proportion of dieting individuals is higher among
the women than among the men, and we want to test
whether any difference of proportions that we
observe is significant. The data might look like
this

43
The question we ask about these data is knowing
that 10 of these 24 teenagers are dieters, what
is the probability that these 10 dieters would be
so unevenly distributed between the girls and the
boys? If we were to choose 10 of the teenagers at
random, what is the probability that 9 of them
would be among the 12 girls, and only 1 from
among the 12 boys? --Hypergeometric
distribution! --Fishers exact test uses the
hypergeometric distribution to calculate the
exact probability of obtaining such set of the
values.
44
Fishers exact test

Before we proceed with the Fisher test, we first
introduce some notation. We represent the cells
by the letters a, b, c and d, call the totals
across rows and columns marginal totals, and
represent the grand total by n. So the table now
looks like this

45
Fisher showed that the probability of obtaining
any such set of values was given by the
hypergeometric distribution


46
In our example
More extreme than observed
As extreme as observed
HA pM pW HA pMlt pW Recall that p-value
is the probability of observing data as extreme
or more extreme if the null hypothesis is true.
So the p-value is this problem is 0.00137.
47
Two Sided Test
More extreme than observed
As extreme as observed
HA pM pW HA pM? pW Extreme observations
include either mostly women dieters or mostly
men. Since the numbers of men and women are
equal, probabilities remain the same if we
interchange men and women. So the p-value is
this problem is 20.00137.
48
The Fisher Exact Probability Test

Used when one or more of the expected counts in a
contingency table is small (lt2).
Fisher's Exact Test is based on exact
probabilities from a specific distribution (the
hypergeometric distribution).
There's really no lower bound on the amount of
data that is needed for Fisher's Exact Test. You
can use Fisher's Exact Test when one of the cells
in your table has a zero in it. Fisher's Exact
Test is also very useful for highly imbalanced
tables. If one or two of the cells in a two by
two table have numbers in the thousands and one
or two of the other cells has numbers less than
5, you can still use Fisher's Exact Test.
Fisher's Exact Test has no formal test statistic
and no critical value, and it only gives you a
p-value.

Does pregnancy affect outcome of methadone
maintenance treatment? Does pregnancy affect
outcome of methadone maintenance treatment?
Journal of Substance Abuse Treatment, June 2004,
295-303

50
(No Transcript)
51
Pick 3 randomly for 3 Axis 1 Disorder Cases
51 pregnant
51 non-pregnant
Observed X0
52
Adinoma Tumors in Mice

Genetic basis of variation in adenoma
multiplicity in ApcMin/ Mom1S mice. Proceedings
of the National Academy of Science, February 22,
2005, 2868-2873
The results (Table 5) indicated that the
frequency of allele loss was significantly
higher in line V (96) than in line I (77)
P 0.003, Fisher's exact test).
Table 5. Frequency of allele loss of WT Apc in
lines
Type Number Percent
Line MinI/I 46/60 77
Line MinV/V 52/54 96

53
Loss No Loss Line I 46
14 60 Line V 52 X2
54 98 16 114

X P(X)
0 0.000012
0.000223
0.001925
0.009938
0.034317
0.012970
0.002941
0.000445
0.000040
0.000002
P 0.002647

Observed count
More extreme values are values with smaller
probability.

Write a Comment

User Comments (0)

About PowerShow.com

Chapter 14 Tests of Hypotheses Based on Count Data PowerPoint PPT Presentation