Title: Analysis of matched data
1Analysis of matched data
2Pair Matching Why match?
- Pairing can control for extraneous sources of
variability and increase the power of a
statistical test. - Match 1 control to 1 case based on potential
confounders, such as age, gender, and smoking.
3Example
- Johnson and Johnson (NEJM 287 1122-1125, 1972)
selected 85 Hodgkins patients who had a sibling
of the same sex who was free of the disease and
whose age was within 5 years of the
patientsthey presented the data as.
OR1.47 chi-square1.53 (NS)
From John A. Rice, Mathematical Statistics and
Data Analysis.
4Example
- But several letters to the editor pointed out
that those investigators had made an error by
ignoring the pairings. These are not independent
samples because the sibs are pairedbetter to
analyze data like this
OR2.14 chi-square2.91 (p.09)
From John A. Rice, Mathematical Statistics and
Data Analysis.
5Pair Matching Agresti example
- Match each MI case to an MI control based on age
and gender. - Ask about history of diabetes to find out if
diabetes increases your risk for MI.
6Pair Matching Agresti example
Which cells are informative?
7Pair Matching
OR estimate comes only from discordant pairs! The
question is among the discordant pairs, what
proportion are discordant in the direction of the
case vs. the direction of the control. If more
discordant pairs favor the case, this indicates
ORgt1.
8P(favors case/discordant pair)
9odds(favors case/discordant pair)
10OR estimate comes only from discordant
pairs!! OR 37/16 2.31 Makes Sense!
11McNemars Test
Null hypothesis P(favors case / discordant
pair) .5 (note equivalent to OR1.0 or cell
bcell c)
12McNemars Test
Null hypothesis P(favors case / discordant
pair) .5 (note equivalent to OR1.0 or cell
bcell c)
By normal approximation to binomial
13McNemars Test generally
By normal approximation to binomial
Equivalently
14McNemars Test
McNemars Test
15RECALL 95 confidence interval for a difference
in INDEPENDENT proportions
1695 CI for difference in dependent proportions
1795 CI for difference in dependent proportions
18The connection between McNemar and
Cochran-Mantel-Haenszel Tests
19View each pair is its own age-gender stratum
Example Concordant for exposure (cell a from
before)
20x 9
x 37
x 16
x 82
21Mantel-Haenszel for pair-matched data
We want to know the relationship between diabetes
and MI controlling for age and gender (the
matching variables). Mantel-Haenszel methods
apply.
22RECALL The Mantel-Haenszel Summary Odds Ratio
23ad/T 0 bc/T0
ad/T1/2 bc/T0
ad/T0 bc/T1/2
ad/T0 bc/T0
24Mantel-Haenszel Summary OR
25Mantel-Haenszel Test Statistic(same as McNemars)
26Concordant cells contribute nothing to
Mantel-Haenszel statistic (observedexpected)
27Discordant cells
28(No Transcript)
29(No Transcript)
30Example Salmonella Outbreak in France, 1996
From Large outbreak of Salmonella enterica
serotype paratyphi B infection caused by a goats'
milk cheese, France, 1993 a case finding and
epidemiological study BMJ 312 91-94 Jan 1996.
31(No Transcript)
32Epidemic Curve
33Matched Case Control Study
- Case Salmonella gastroenteritis.
- Community controls (11) matched for
- age group (lt 1, 1-4, 5-14, 15-34, 35-44, 45-54,
55-64, or gt 65 years) - gender
- city of residence
34Results
35In 2x2 table form any goats cheese
36In 2x2 table form Brand A Goats cheese
37x8
x24
x2
x25
38Summary 8 concordant-exposed pairs (strata)
contribute nothing to the numerator
(observed-expected0) and nothing to the
denominator (variance0).
Summary 25 concordant-unexposed pairs contribute
nothing to the numerator (observed-expected0)
and nothing to the denominator (variance0).
39Summary 2 discordant control-exposed pairs
contribute -.5 each to the numerator
(observed-expected -.5) and .25 each to the
denominator (variance .25).
Summary 24 discordant case-exposed pairs
contribute .5 each to the numerator
(observed-expected .5) and .25 each to the
denominator (variance .25).
40(No Transcript)
41ExtensionM1 matched studies
- This is just a thought problem! I will not test
you on the material that follows, although its
very good practice in thinking like a
statistician - We will soon learn conditional logistic
regression, which can handle all of the data we
are discussing today with much more ease - You can see that as M increases, so does
complexity!
42M1 matched studies
- One-to-one pair matching provides the most
cost-effective design when cases and controls are
equally scarce. - But when cases are the limiting factor, as with
rare diseases, statistical power may be increased
by selecting more than 1 control matched to each
case. - But with diminishing returns
43M1 matched studies
- 21 matched study of colorectal cancer.
- Background Carcinoembryonic antigen (CEA) is the
classical tumor marker for colorectal cancer.
This study investigated whether the plasma levels
of carcinoembryonic antigen and/or CA 242 were
elevated BEFORE clinical diagnosis of colorectal
cancer. - From Palmqvist R et al. Prediagnostic Levels of
Carcinoembryonic Antigen and CA 242 in Colorectal
Cancer A Matched Case-Control Study. Diseases of
the Colon Rectum. 46(11)1538-1544, November
2003.
44M1 matched studies Prediagnostic Levels of
Carcinoembryonic Antigen and CA 242 in Colorectal
Cancer A Matched Case-Control Study
- Study design A so-called nested case-control
study. - Idea Study subjects who were members of an
ongoing prospective cohort study in Sweden had
given blood at baseline, when they had no
disease. Years later, blood can be thawed and
tested for the presence of prediagnostic
antigens. - Key innovation The cohort is large, the disease
is rare, and its too costly to test everyones
blood so only test stored blood of cases and
matched controls from the cohort.
45M1 matched studies
- Two cancer-free controls were randomly selected
to each case from the corresponding cohort at the
time of diagnosis of the matched case. - Matched for
- Gender
- age at recruitment (12 months)
- date of blood sampling 2 months
- fasting time (lt4 hours, 48 hours, gt8 hours).
4621 matching
- stratummatching group
- 3 subjects per stratum
- 6 possible 2x2 tables
47Everyone exposed non-informative
Case exposed 1 control unexposed
Case exposed both controls unexposed
48Case unexposed both controls exposed
Case unexposed 1 control exposed
Everyone unexposed non-informative
49RESULTS
0
2
12
500
1
102
512 Tables with 2 exposed (CEA)
2
x0
2
x2
Represents all possible discordant tables (either
2 or 1 total exposed)
13 Tables with 1 exposed (CEA)
1
x12
1
x1
522 Tables with 2 exposed
2
2
53(No Transcript)
54(No Transcript)
55(No Transcript)
56Summary
- P(case exposed/2 total exposed)2OR/(2OR1)
- P(case unexposed/2 total exposed)1-2OR/(2OR1)
- P(case exposed/1 total exposed) OR/(OR2)
- P(case unexposed/1 total exposed) 1-OR/(OR2)
- Therefore, we can make a likelihood equation for
our data that is a function of the OR, and use
Maximum Likelihood Estimation to solve for OR
57Applying to example data
- Well talk (briefly!) about MLE estimation next
week, but for now - This is the probability of our data as a function
of the unknown OR. - To find the value of the OR that maximizes the
function (and therefore the likelihood of our
data)?Take derivative set equal to 0 solve for
OR.
58Applying to example data
Breslow-Day give a more simple robust estimate of
OR for 21 matching