Lecture 9 Chapter 22. Tests for two-way tables - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Lecture 9 Chapter 22. Tests for two-way tables

Description:

A study compared the success rates of two different procedures for removing kidney stones: open surgery and percutaneous nephrolithotomy (PCNL), ... – PowerPoint PPT presentation

Number of Views:99
Avg rating:3.0/5.0
Slides: 20
Provided by: Brigi66
Category:

less

Transcript and Presenter's Notes

Title: Lecture 9 Chapter 22. Tests for two-way tables


1
Lecture 9Chapter 22. Tests for two-way tables
2
Objectives
  • The chi-square test for two-way tables (Award
    NHST Test for Independence)
  • Two-way tables
  • Hypotheses for the chi-square test for two-way
    tables
  • Expected counts in a two-way table
  • Conditions for the chi-square test
  • Chi-square test for two-way tables of fit
  • Simpsons paradox

3
Two-way tables
  • An experiment has a two-way factorial design if
    two categorical factors are studied with several
    levels of each factor.
  • Two-way tables organize data about two
    categorical variables with any number of
    levels/treatments obtained from a factorial
    design design or two-way observational study.

High school students were asked whether they
smoke, and whether their parents smoke
4
Marginal distribution
  • The marginal distributions (in the margins of
    the table) summarize each factor independently.

Marginal distribution for parental
smoking P(both parent) 1780/5375
33.1 P(one parent) 41.7 P(neither parent)
25.2
400 1380 416 1823 188 1168
5
Conditional distribution
  • The cells of the two-way table represent the
    intersection of a given level of one factor with
    a given level of the other factor. They represent
    the conditional distributions.

400 1380 416 1823 188 1168
Conditional distribution of student smoking for
different parental smoking statuses
P(student smokes both parent) 400/1780
22.5 P(student smokes one parent) 416/2239
18.6 P(student smokes neither parent)
188/1356 13.9
6
Hypotheses
A two-way table has r rows and c columns. H0
states that there is no association between the
row and column variables in the table.
Statistical Hypotheses H0 There is no
association between the row and column variables
Ha There is an association/relationship
between the 2 variables
We will compare actual counts from the sample
data with the counts we would expect if the null
hypothesis of no relationship were true.
7
Expected counts in a two-way table
A two-way table has r rows and c columns. H0
states that there is no association between the
row and column variables (factors) in the
table. The expected count in any cell of a
two-way table when H0 is true is The
expected count is the average count you would get
for that cell if the null hypotheses was true.
8
Cocaine addiction
  • Cocaine produces short-term feelings of physical
    and mental well being. To maintain the effect,
    the drug may have to be taken more frequently and
    at higher doses. After stopping use, users will
    feel tired, sleepy and depressed. 

A study compares the rates of successful
rehabilitation for cocaine addicts following 1 of
3 treatment options
1 antidepressant treatment (desipramine) 2
standard treatment (lithium) 3 placebo (sugar
pill)
9
Cocaine addiction
Calculate the expected cell counts if relapse is
independent of the treatment.
10
Observed
Expected
35
35
35
Expected relapse counts No
Yes
2526/74 8.78250.35 16.22250.65
9.14260.35 16.86250.65
8.08230.35 14.92250.65
Desipramine Lithium Placebo
11
Situations appropriate for the chi-square test
  • The chi-square test for two-way tables looks for
    evidence of association between multiple
    categorical variables (factors) in sample data.
    The samples can be drawn either
  • By randomly selecting SRSs from different
    populations (or from a population subjected to
    different treatments)
  • girls vaccinated for HPV or not, among 8th
    graders and 12th graders
  • remission or no remission for different
    treatments
  • Or by taking 1 SRS and classifying the
    individuals according to 2 categorical variables
    (factors)
  • 11th graders smoking status and parents status
  • When looking for associations between two
    categorical/nominal variables.

12
  • We can safely use the chi-square test when
  • no more than 20 of expected counts are less than
    5 (lt 5)
  • all individual expected counts are 1 or more
    (1)
  • What goes wrong? With small expected cell counts
    the sampling distribution will not be chi-square
    distributed.
  • Statisticians note If one factor has many
    levels and too many expected counts are too low,
    you might be able to collapse some of the
    levels (regroup them) and thus have large-enough
    expected counts.

13
The chi-square test for two-way tables
H0 there is no association between the row and
column variables Ha H0 is not true
The c2 statistic sums over all r x c cells in the
table When H0 is true, the c2
statistic follows c2 distribution
with (r-1)(c-1) degrees of freedom.
P-value P(c2 variable calculated c2 H0 is
true)
14
Table A
Ex df 6
If c2 15.9 the P-value is between 0.01 -0.02.
15
No relapse Relapse
Table of counts actual/expected, with three
rows and two columns df (3 - 1)(2 - 1) 2
158.78 1016.22
79.14 1916.86
48.08 1914.92
Desipramine Lithium Placebo
We compute the X2 statistic
  • Using Table D 10.60 lt X2 lt 11.98 ? 0.005 gt
    P gt 0.0025
  • The P-value is very small (JMP gives P 0.0047)
    and we reject H0.
  • ? There is a significant relationship between
    treatment type (desipramine, lithium, placebo)
    and outcome (relapse or not).

16
Interpreting the X2 output
  • When the X2 test is statistically significant
  • The largest components indicate which
    condition(s) are most different from H0. You can
    also compare the observed and expected counts, or
    compare the computed proportions in a graph.

No relapse Relapse
Desipramine Lithium Placebo
c2 components
The largest X2 component, 4.41, is for
desipramine/norelapse. Desipramine has the
highest success rate (see graph).
17
Influence of parental smoking
  • Here is a computer output for a chi-square test
    performed on the data from a random sample of
    high school students (rows are parental smoking
    habits, columns are the students smoking
    habits). What does it tell you?

Is the sample size sufficient? What are the
hypotheses? Are the data ok for a c2 test? What
else should you ask? What is your interpretation?
18
Caution with categorical data
  • An association that holds for all of several
    groups can reverse direction when the data are
    combined to form a single group. This reversal is
    called Simpson's paradox.
  • Kidney stones

A study compared the success rates of two
different procedures for removing kidney stones
open surgery and percutaneous nephrolithotomy
(PCNL), a minimally invasive technique.
273 289 77 61 22 17
It turns out that for any given patient that PCNL
is more likely to result in failure. Can you
think of a reason why?
19
The procedures are not chosen randomly by
surgeons! In fact, the minimally invasive
procedure is most likely used for smaller stones
(with a good chance of success) whereas open
surgery is likely used for more problematic
conditions.
273 289 77 61 22 17
For both small stones and large stones, open
surgery has a lower failure rate. This is
Simpsons paradox. The more challenging cases
with large stones tend to be treated more often
with open surgery, making it appear as if the
procedure were less reliable overall.
Beware of lurking variables!
Write a Comment
User Comments (0)
About PowerShow.com