Introduction to Categorical Data Analysis July 22, 2004 - PowerPoint PPT Presentation

About This Presentation

Title:

Introduction to Categorical Data Analysis July 22, 2004

Description:

Title: Disordered Eating, Menstrual Irregularity, and Bone Mineral Density in Young Female Runners Author: John Last modified by: kristinc Created Date – PowerPoint PPT presentation

Number of Views:74

Avg rating:3.0/5.0

Slides: 49

Provided by: John4423

Learn more at: http://web.stanford.edu

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to Categorical Data Analysis July 22, 2004

1
Introduction to Categorical
DataAnalysisJuly 22, 2004
2
Categorical data

The t-test, ANOVA, and linear regression all
assumed outcome variables that were continuous
(normally distributed).
Even their non-parametric equivalents assumed at
least many levels of the outcome (discrete
quantitative or ordinal).
We havent discussed the case where the outcome
variable is categorical.

3
Types of Variables a taxonomy
Categorical
Quantitative
continuous
discrete
ordinal
nominal
binary
2 categories more categories
order matters numerical
uninterrupted
4
Overview of statistical tests

Independent variablepredictor
Dependent variableoutcome
e.g., BMD pounds age amenorrheic (1/0)

5
(No Transcript)
6
(No Transcript)
7
Difference in proportions

Example You poll 50 people from random
districts in Florida as they exit the polls on
election day 2004. You also poll 50 people from
random districts in Massachusetts. 49 of
pollees in Florida say that they voted for Kerry,
and 53 of pollees in Massachusetts say they
voted for Kerry. Is there enough evidence to
reject the null hypothesis that the states voted
for Kerry in equal proportions?

8
Null distribution of a difference in proportions
9
Null distribution of a difference in proportions
10
Answer to Example

We saw a difference of 4 between Florida and
Massachusetts
Null distribution predicts chance variation
between the two states of 10.
P(our data/null distribution)P(Zgt.04/.10.4)gt.05
Not enough evidence to reject the null.

11
Chi-square testfor comparing proportions (of a
categorical variable) between groups
I. Chi-Square Test of Independence When both
your predictor and outcome variables are
categorical, they may be cross-classified in a
contingency table and compared using a chi-square
test of independence. A contingency table
with R rows and C columns is an R x C contingency
table.
12
Example

Asch, S.E. (1955). Opinions and social pressure.
Scientific American, 193, 31-35.

13
The Experiment

A Subject volunteers to participate in a visual
perception study.
Everyone else in the room is actually a
conspirator in the study (unbeknownst to the
Subject).
The experimenter reveals a pair of cards

14
The Task Cards
Standard line
Comparison lines A, B, and C
15
The Experiment

Everyone goes around the room and says which
comparison line (A, B, or C) is correct the true
Subject always answers last after hearing all
the others answers.
The first few times, the 7 conspirators give
the correct answer.
Then, they start purposely giving the (obviously)
wrong answer.
75 of Subjects tested went along with the
groups consensus at least once.

16
Further Results

In a further experiment, group size (number of
conspirators) was altered from 2-10.
Does the group size alter the proportion of
subjects who conform?

17
The Chi-Square test

Apparently, conformity less likely when less or
more group members

18

20 50 75 60 30 235 conformed
out of 500 experiments.
Overall likelihood of conforming 235/500 .47

19
Expected frequencies if no association between
group size and conformity

20

Do observed and expected differ more than
expected due to chance?

21
Chi-Square test
Rule of thumb if the chi-square statistic is
much greater than its degrees of freedom,
indicates statistical significance. Here 85gtgt4.
22
The Chi-Square distributionis sum of squared
normal deviates
The expected value and variance of a
chi-square E(x)df Var(x)2(df)
23
Chi-Square test
Rule of thumb if the chi-square statistic is
much greater than its degrees of freedom,
indicates statistical significance. Here 85gtgt4.
24
Caveat

When the sample size is very small in any cell
(lt5), Fischers exact test is used as an
alternative to the chi-square test.

25
Example of Fishers Exact Test
26
Fishers Tea-tasting experiment
Claim Fishers colleague (call her Cathy)
claimed that, when drinking tea, she could
distinguish whether milk or tea was added to the
cup first. To test her claim, Fisher designed
an experiment in which she tasted 8 cups of tea
(4 cups had milk poured first, 4 had tea poured
first). Null hypothesis Cathys guessing
abilities are no better than chance. Alternatives
hypotheses Right-tail She guesses right more
than expected by chance. Left-tail She guesses
wrong more than expected by chance
27
Fishers Tea-tasting experiment
Experimental Results
28
Fishers Exact Test
Step 1 Identify tables that are as extreme or
more extreme than what actually happened Here
she identified 3 out of 4 of the
milk-poured-first teas correctly. Is that good
luck or real talent? The only way she could have
done better is if she identified 4 of 4 correct.
29
Fishers Exact Test
Step 2 Calculate the probability of the tables
(assuming fixed marginals)
30
Step 3 to get the left tail and right-tail
p-values, consider the probability mass
function Probability mass function of X, where
X the number of correct identifications of the
cups with milk-poured-first
31
SAS code and outputfor generating Fishers Exact
statistics for 2x2 table
32
data tea input MilkFirst GuessedMilk
Freq datalines 1 1 3 1 0 1 0 1 1 0 0
3 run data tea Fix quirky reversal of SAS 2x2
tables set tea MilkFirst1-MilkFirst Guessed
Milk1-GuessedMilkrun proc freq
datatea tables MilkFirstGuessedMilk
/exact weight freqrun
33
SAS output
Statistics for Table of
MilkFirst by GuessedMilk
Statistic DF Value
Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Chi-Square 1 2.0000
0.1573 Likelihood Ratio
Chi-Square 1 2.0930 0.1480
Continuity Adj. Chi-Square 1
0.5000 0.4795
Mantel-Haenszel Chi-Square 1 1.7500
0.1859 Phi Coefficient
0.5000
Contingency Coefficient 0.4472
Cramer's V
0.5000 WARNING 100
of the cells have expected counts less
than 5. Chi-Square may not be
a valid test.
Fisher's Exact Test
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Cell (1,1) Frequency (F)
3 Left-sided
Pr lt F 0.9857
Right-sided Pr gt F 0.2429
Table Probability (P)
0.2286 Two-sided
Pr lt P 0.4857
Sample Size 8
34
Introduction to the 2x2 Table
35
Introduction to the 2x2 Table
36
Cohort Studies
Disease
Disease-free
Target population
Disease
Disease-free
TIME
37
The Risk Ratio, or Relative Risk (RR)
38
Hypothetical Data

39
Case-Control Studies

Sample on disease status and ask retrospectively
about exposures (for rare diseases)
Marginal probabilities of exposure for cases and
controls are valid.
Doesnt require knowledge of the absolute risks
of disease
For rare diseases, can approximate relative risk

40
Case-Control Studies
Exposed in past

Disease
(Cases)

Not exposed
Target population
Exposed
No Disease (Controls)
Not Exposed
41
The Odds Ratio (OR)
42
The Odds Ratio
43
Properties of the OR (simulation)
44
Properties of the lnOR
Standard deviation
45
Hypothetical Data
30
30
46
Example Cell phones and brain tumors
(cross-sectional data)
47
Same data, but use Chi-square testor Fischers
exact
48
Same data, but use Odds Ratio

Write a Comment

User Comments (0)