Title: Using Logistic Regression In Case Control Studies
1Using Logistic Regression In Case Control Studies
- Department of Community Health Sciences
September 27,2002
2No statistics should stand in the way of an
experimenter keeping his eyes open, his mind
flexible, and on the lookout for surprises.
(William Feller)
3Background
- Quan H., Arboleda-Florez J., Fick G.H., Stuart
H.L., Love E.J. (2002) Association Between
Physical Illness and Suicide Among The Elderly.
Social Psychiatry and Psychiatric Epidemiology,
37190-197 - David Adler, Nimira Kanji, Kiril Trpkov, Gordon
Fick, Rhiannon M. Hughes HPC2/ELAC2 Gene Variants
Associated with Prostate Cancer (in submission) - The class of MDSC643.02 in the Winter Term 2002
4Case Control Studies
- Investigator selects cases and controls
- Investigator determines exposure
- Primary outcome measure Odds of exposure
(yes/no) - The Magic Odds Ratio (OR)
5Case Control Studies
- Two by Two tables
- Classical Stratified Analysis (SA)
- Stratum specific odds ratios
- Crude odds ratio
- Mantel Haentzel odds ratio
- EASY. Right?
6A Definition of the Chi-Square test A
procedure any fool can carry out and frequently
does.(SJ Penn)
7Logistic Regression
- 1) Model the log of the odds of exposure
- OR
- 2) Model the log of the odds of disease
- Does it matter? MOST of the time.
- Standard Likelihood theory gives us blessing for
option 1)
8It does not matter -
- IF the model is equivalent to a stratified
analysis, - . then some of the coefficients from LR will the
same as the log(OR) values from SA - . not all the coefficients will be the same
though
9Results will differ -
- In ALL other situations (at least a little)
- BUT there are those solid papers in the
literature that appear to say its OKAY to
model the odds of disease - AND the textbooks and standard references appear
to give a green light as well
10References
- Prentice RL and Pyke R (1979) Biometrika 663
403-411 - after some impenetrable mathematics
- .is precisely the distributional statement that
would arise if a model for the odds of disease
were directly applied to the case-control data - BUT BUT Arent we all frequentists?
11The books
- Kleinbaum, Kupper, Morgenstern
- Rothman and Greenland
- Rosner
- Matthews and Farewell
- They ALL note the estimates are OKAY
- They are ALL silent on the sampling distribution.
- BUT what about the standard errors? P-values?
12It does not follow that if quantitative methods
be indiscriminately applied to inexhaustible
quantities of data, scientific understanding will
necessarily emerge. (M.K. Hubbert)
13An exercise that makes no provision for the
definition and estimation of error cannot
properly be called an experiment. (D.B. DeLury)
14Exposures may not be dichotomous
- If exposure is measured, then the model for
exposure could be linear regression - There is no obvious magical odds ratio now
- BUT it is still SO SO tempting to just model the
log of odds of disease with a continuous
independent variable (exposure)
15The modelling process -
- Can lead us in very different ways to very
different models and very different conclusions - QUAN Hude et al
- et al and Rhiannon Hughes
16What about the Gate Keepers?
- Editors and Associate Editors
- Epidemiologists
- Biostatisticians
17Conclusions
- I am taking yet another poke at the much maligned
case-control study - Epidemiological issues still dominate the
challenges of designing and using case-control
studies - It remains safe to model the exposure(s)
individually as dependent variable(s) (if we
trust the standard likelihood theory)
18SJ Penn again
A definition of Power A probability of a
possible outcome of a potential decision
conditional upon an imaginable circumstance given
a conceivable value of an algebraic embodiment of
an abstract mathematical idea and the strict
adherence to an extremely precise rule.