How to use data to get The Right Answer - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

How to use data to get The Right Answer

Description:

Alcohol and esophageal cancer. Hep B virus and liver cancer. HPV and cervical cancer ... Oral contraceptives and breast cancer ... – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 39
Provided by: Randall105
Category:
Tags: answer | cancer | cervical | data | right | use

less

Transcript and Presenter's Notes

Title: How to use data to get The Right Answer


1
How to use data to getThe Right Answer
  • Donna Spiegelman
  • Departments of Epidemiology and Biostatistics
  • Harvard School of Public Health
  • stdls_at_channing.harvard.edu

2
Triumphs of Modern Epidemiology
  • Alcohol and esophageal cancer
  • Hep B virus and liver cancer
  • HPV and cervical cancer
  • H. pylori and peptic ulcer
  • Folic acid and neural tube defects
  • Asbestos and lung cancer, mesothelioma
  • Aniline dye and bladder cancer
  • Vinyl chloride and angiosarcoma of the liver
  • Nickel and nasal cancer
  • Radon and lung cancer
  • Aspirin and MI
  • Dalkon Shield IUD and PID ...
  • www.epimonitor.net/EpiMonday/Triumph62501.htm
  • http//www-cie.iarc.fr/monoeval/crthgr01.html

3
The Start of Junk Science??
  • Definition of junk science It is a hodgepodge
    of biased data, spurious inference, and logical
    legerdemain, patched together by researchers
    whose enthusiasm for discovery and diagnosis far
    outstrips their dredging, wishful thinking,
    truculent dogmatism, and, now and again, outright
    fraud.

4
  • The Junkyard Dogs
  • Unfortunately, and increasingly today, one can
    find examples of junk science that compromise the
    integrity of the field of science and, at the
    same time, create a scare environment where
    unnecessary regulations on industry in
    particular, are rammed through without respect to
    rhyme, reason, effect or cause.
  • Michael A. Miles, former CEO of the Phillip
    Morris tobacco company

5
(No Transcript)
6
More controversial weak effects
  • Air pollution and all-cause mortality, CVD
    mortality
  • Low dose exposure to radon and lung cancer
  • Low dose exposure to lead and neurotoxicity in
    children
  • Passive cigarette smoking and lung cancer
  • Alcohol and breast cancer
  • Oral contraceptives and breast cancer

7
Population Attributable Risk and Weak Effects
Still Important!
8
The research (and the researcher) that Philip
Morris did not want you to see
  • (Ragnar) Rylander was at that time at another
    Swedish university and had previously undertaken
    assignments for both Lorillard (another tobacco
    company) and Philip Morris. He was to be
    officially carried on the books as a
    consultant to FTR Fabriques de tabac réunies, a
    Philip Morris subsidiary and would be paid by
    FTR.

By means of material from internal industry
documents it can be revealed that one company,
Philip Morris, acquired a research facility,
INBIFO, in Germany and created a complex
mechanism seeking to ensure that the work done in
the facility could not be linked to Philip
Morris. In particular it involved the appointment
of a Swedish professor as a co-ordinator, who
would synthesise reports for onward transmission
to the USA. Various arrangements were made to
conceal this process
Relation between Philip Morris and INBIFO
Source Diethelm et al. Lancet 2005
9
Weak Associations (NEJM 1990)
  • Rylander
  • Studies a relation between exposure to
    environmental tobacco smoke and lung cancer must
    take into account other factors and the
    possibility that exposure to environmental
    tobacco smoke may be confounded. This has not
    been considered in the majority of such studies.
    Until this has been done, the claim of causality
    between environmental tobacco smoke and lung
    cancer remains uncertain.
  • Angell
  • There is no question that epidemiologic studies
    of risk factors for disease are of growing
    interest and importance, both for individuals and
    for the public health. It is important, however,
    to remember the pitfalls in interpreting them and
    to be cautious in advising patients on the basis
    of single or conflicting studies. This is
    particularly true of studies that purport to show
    only weak associations between exposures and
    disease.

10
RCTs vs. observational studies
  • ß-carotene and lung cancer
  • HRT and CVD incidence and mortality
  • VIOXX
  • Dietary fat and breast cancer

11
  • - Standard designs analysis sometimes not
  • adequately controlling for
  • - confounding
  • - information bias
  • - selection bias
  • Wrong answer?
  • - Agreed We can be doing a better job
  • - Not agreed HOW

12
Confounding What do we do?
industry standard END of
mainstream epi methods
collect data on known suspected time-varying
confounders
MSMs, G-causal algorithm
13
Confounding outstanding problems
  • unmeasured confounding
  • known or suspected confounders
  • unknown confounders

Fact 47 of US breast cancer incidence
explained by known risk factors (Madigan et al.,
JNCI, 19871681-1695) r2 in most epi regressions
(blood pressure, serum hormones) 20-40
(Pediatric Task Force on BP Control in Children,
Pediatrics, 2004 Hankinson, personal
communication)
Undiscovered genes? Unimagined environmental
factors? Complex non-linear interactions?
14
Solution to confounding by unknown risk
factors randomization
VERY limited applicability Outstanding
questions a few strong risk factors or many
weak ones? many rare ones or a few common ones?
modeling of scenarios do biases cancel?
NEW IDEAS NEEDED
15
Effects of unknown confounders
  • References
  • The impact of residual and unmeasured
    confounding in epidemiological studies a
    simulation study, Davey-Smith and colleagues, Am
    J Epidemiol 2007 166646-655
  • Poppers, Kaposis sarcoma, and HIV infection
    empirical evidence of a strong confounding
    effect?, Morabia, Prev Med 1995 2490-95.
  • Marshall and Hastrup, Am J Epidemiol, 1999
    15088-96, 1996 143 1069-1078

16
Unmeasured confounding by known or suspected risk
factors We can use the data to get the right
answer/improve the validity of new
studies! Design two-stage (Reilly Salim,
http//meb.ki.se/marrei/software/) Stage 1
(Di, Ei, C1i), i 1, . . . , n Stage 2 (Di, Ei,
C1i, C2i), i 1, . . . , n2 or (Ei, C1i,
C2i), i 1, . . . , n2 So that (Di, Ei, C1i,
. ), i n2 1, . . . , n1 n2 n1 gtgt
n2 Analysis MLE of 2-stage likelihood References
Reilly M, 1996 Weinberg Wacholder,
1990 Zhao Lipsitz, 1992 Robins et al.,
1994 many others
Cain Breslow, AJE, 1988
17
f (D E, C1, C2 ß) pdf of complete data Pr (I
D, E, C1), I 1 if in stage 2, 0 otherwise f
(D, I E, C1 ß,?) Pr (I D, E, C1) f
(D E, C1, c2) f (c2 E, C1) d c2
likelihood of 2-stage design Stage 1
log f (D E, C1 , ?) Stage 2

log f (D E, C1, C2 ) Stage 2
log f (C2 E, C1 ?
18
Example Kyle Steenland retrospective cohort
study of lung cancer in
(Steenland Greenland, AJE 2004160384-392) f
(D E, C) E silica, C smoking f (D E)
f (D E, C s) Pr (C s E) Pr (C s
Er) where
relation to occupational silica exposure
n1 silica workers in retrospective cohort study
n2 silica workers in 1987 smoking prevalence
study n3 NHIS participants on general
population smoking rates in 1986 n4 ACS
prospective cohort data on smoking lung cancer
Likelihood (silica 1987 smoking data US
smoking data ACS lung cancer smoking data)
silica
1987 silica smoking data log f(Di
Ei)
ACS
U.S
r1,, R levels of exposure s1,, S levels of
smoking
could treat as known
  • assume distribution of smoking during entire
    period 1987 distribution

19
Obstacles software? Design software
available Offsets or weights in PROC GENMOD or
PROC PHREG can be used for analysis
training? funding? Result The
right answer? A better answer? Is
it worth it?
20
INFORMATION BIAS What do we usually do?
NOTHING! What can we do?
Design
Analysis main
study/validation study
measurement error methods MS/EVS, MS/IVS,
IVS misclassification methods References
Carroll, Ruppert, Stefanski, 1995, Chapman
Hall Rosner et al., AJE, 1990, 1992
Spiegelman, Reliability studies
Validation studies Robins et al.,
JASA, 1994
Encyclopedia of Biostatistics
21
EXAMPLE FRAMINGHAM HEART STUDY MAIN STUDY - 1731
men free of CHD (non-fatal MI, fatal
CHD) At exam 4 - Followed for 10 years
for CHD Incidence (163 events, cumulative
incidence 9.4) REPRODUCIBILITY STUDY - 1346
men with all risk factors information at exams
23 (subgroup of 1731 men) - Risk factors in
main study Age, BMI, Serum Cholesterol, Serum
Glucose, Smoking, SBP - Risk factors in
reproducibility study Serum Cholesterol, BMI,
Serum Glucose, SBP, Smoking
?
?
?
?
22
Example (from Rosner, Spiegelman, Willett AJE,
1992) Framingham Heart Study Reliability study
(n 1346 men)
Subject is observed valve at time j
Subject is true mean
Reliability
Coefficients CHOL
75 GLUC 52
BMI 95 SBP
72
23
Assumptions 1. Measurement error model
within
between
2. Disease incidence model log
3.
  • Pr (Di) is small
  • Measurement error independent of disease status

4. Reliability substudy representative of main
study
24
The Procedure ? For one variable measured with
unbiased, additive error , where
simplest case Step 1. Run a logistic
regression of D on Z, U in main study logit
Measured with
Measured without
error
error (gt1)
25
Step 2. Estimate reliability coefficient from
reliability substudy (n2 subjects,
r replicates)
Need same of replicates per subject
where
within-person variance (estimated)
TOTAL
26
Step 3. Correct.
corrected
uncorrected
MAIN STUDY
RELIABILITY STUDY
This contributes much less.

(Donner, Intl Stat Review, 1986)
95 C.I. for odds ratio
biological meaningful comparison, e.g. 90
percentile 10 percentile
27
10-year cumulative incidence of CHD (163 events
/ 1731 men)
Results

2.91 (1.62, 5.24)
CHOL 2.21 (1343, 3.39)
100mg/dl
1.75 (0.87, 3.52)
GLUC 1.27 (0.97, 1.66)
34mg/dl
1.49 (0.92, 2.43)
BMI 1.64 (1.04, 2.58)
9.7kg/m2
3.93 (2.19, 7.05)
SBP 2.80 (1.85, 4.24)
49mmHg
1.69 (1.16, 2.47)
SMOKE 1.70 (1.17, 2.47)
(cig/day)
30 cig/day
1.89 (1.16, 3.07)
AGE 2.05 (1.27, 3.33)
45-54
AGE 3.21 (1.95, 5.29)
2.85 (1.72, 4.74)
55-64
AGE 4.30 (2.06, 8.98)
3.73 (1.67, 8.35)
65-69
28
General framework for estimation and inference in
failure time regression models
  • Main study/validation study studies

The data
(Di, Ti, Xi, Vi), i 1, . . ., n1 main study
subjects
(Di, Ti, xi, Xi, Vi), i n1 1, . . ., n1 n2
validation study subjects
where
Ti survival time
Di 1 if case at Ti, 0 o.w.
xi perfect exposure measurement
Xi surrogate exposure measurement for x
Vi other perfectly measured covariate data
- assume sampling into validation study is at
random
Spiegelman and Logan, submitted
29
(No Transcript)
30
(No Transcript)
31
Effect of radon exposure on lung cancer mortality
rates UNM uranium miners
Mortality RR(95 CI)
100 WLM 500 WLM
Uncorrected 3.52 (0.658)
1.4 (1.3, 1.6) 5.8 (3.1, 11) EPL
5.00 (1.00) 1.7
(1.4, 2.0) 12 (4.6, 32)
  • gt 30 attenuation in
  • policy implications for risk assessment

32
Nutritional epidemiology Tworoger SS, Eliassen
AH, Rosner B, Sluss P, Hankinson SE. Plasma
prolaction concentrations and risk of
premenopausal breast cancer. Cancer Research,
2004646814-6819. Hankinson SE, Willett WC,
Michaud DS, Manson JE, Colditz GA, Longcope C,
Rosner B, Speizer FE. Plasma prolaction levels
and subsequent risk of breast cancer in
postmenopausal women. Journal of the National
Cancer Institute 1999 91629-634. Smith-Warner
SA, Spiegelman D, Adami H, Beeson L, van den
Brandt P, Folsom A, Fraser G, Freudenheim J,
Goldbohm R, Graham S, Kushi L, Miller A, Rohan T,
Speizer FE, Toniolo P, Willett WC, Wolk A,
Zeleniuch-Jacquotte A, Hunter DJ. Types of
dietary fat and breast cancer a pooled analysis
of cohort studies. International Journal of
Cancer 2001 92767-774. Holmes MD, Stampfer MJ,
Wolf AM, Jones CP, Spiegelman D, Manson JE,
Coldditz GA. Can behavioral risk factors explain
the difference in body mass index between
African-American and European-American women?
Ethnicity and Disease 1999 8331-339. Rich-Edward
s JW, Hu F, Michels K, Stampfer MJ, Manson JE,
Rosner B, Willett WC. Breastfeeding in infancy
and risk of cardiovascular disease in adult
women. Epidemiology, 2004 15550-556. Koh-Banerje
e P, Chu NF, Spiegelman D, Rosner B, Colditz GA,
Willett WC, Rimm EB. Prospective study of the
association of changes in dietary intake,
physical activity, alcohol consumption, and
smoking with 9-year gain in wais circumference
among 15,587 men. Am J Clin Nutr 2003
78719-727. Koh-Banerjee P, Franz M, Sampson L,
Liu S, Jacobs Jr. DR, Spiegelman D, Willett WC,
Rimm EB. Changes in whole grain, bran and cereal
fiber consumption in relation to 8-year weight
gain among men. Am J Clin Nutr, 2004
51237-1245.
33
Environmental epidemiology
Keshaviah AP, Weller EA, Spiegelman D.
Occupational exposure to methyl tertiary-butyl
ether in relation to key health symptom
prevalence the effect of measurement error
correction. Environmetrics, 2002 14573-582.
Thurston SW, Williams P, Hauser R, Hu H,
Hernandez-Avila M, Spiegelman D. A comparison of
regression calibration methods for measurement
error in main study/internal validation study
designs. Journal of Statistical Planning and
Inference, 2005 131175-190.
Fetal lead exposure in relation to birth weight
MS/IVS bone lead vs. cord lead (r0.19)
Weller EA, Milton DK, Eisen EA, Spiegelman D.
Regression calibration for logistic regression
with multiple surrogates for one exposure.
Journal of Statistical Planning and Inference,
2007 137449-461 .
Metal working fluids exposure in relation to lung
function MS/EVS job characteristics vs.
personal monitors (r0.82)
Horick N, Milton DK, Gold D, Weller E, Spiegelman
D. Household dust endotoxin exposure and
respiratory effects in infants correction for
measurement error bias. Environmental Health
Perspectives, 2006 114135-140.
Li R, Weller EA, Dockery DW, Neas LM, Spiegelman
D. Association of indoor nitrogen dioxide with
respiratory symptoms in children the effect of
measurement error correction with multiple
surrogates. Journal of Exposure Analysis and
Environmental Epidemiology, 2006 16342-350.
34
SOFTWARE IS AVAILABLE!
  • http/www.hsph.harvard.edu/facres/spglmn.html

SAS macros for regression calibration (Rosner et
al., AJE, 1990, 1992 Spiegelman et al., AJCN,
1997 Spiegelman et al, SIM, 2001)
in main study/validation study designs
  • STATA (Carroll et al. SIMEX, regression
    calibration)

So why are methods under-utilized?
No validation data Insufficient training of
statisticians epidemiologists Either/or about
assumptions
35
Quantitative correction for selection bias
Design Analysis main study/selection
study ML SPE E-E
Note
large overlap w/ missing data literature
when D is missing, potential for selection bias
References
Little Rubin, Wiley, 1986 Scharfstein et al.,
1998 Rotnitzky et al., 1997 Robins et al., 1995
ML
SPE E-E
36
Basic idea Let I1 if selected, 0
otherwise, Pr (I E, C) selection
probability Selection study has data on those not
in main study (Di, Ei, Ci (Ci, Ui ), i1, ,
n2)
Surrogates for D, risk factors for D
Mail, phone, house visit to get data
IPW Pr -1(Ii 1 Di, Ei, Ci) Wi
Use PROC GENMOD w/ robust variance weights Wi
i1, , n1 REPEATED SUBJECT ID
/ TYPE IND
For dependent censoring, (a.k.a. biased loss to
follow-up)
Assumes
37
ENAR 2007 Spring Meeting
  • Double Sampling Designs for Addressing Loss to
    Follow-up In Estimating Mortality
  • Ming-Wen An, Johns Hopkins University
  • Constantine Frangakis, Johns Hopkins University
  • Donald B. Rubin, Harvard University
  • Constantin T. Yiannoutsos, Indiana University
    School of Medicine
  • Loss to follow-up is an important challenge that
    arises when estimating mortality, and is
    particular concern in developing countries. In
    the absence of more active follow-up systems,
    resulting mortality estimates may be biased. One
    design approach to address this is double
    sampling, where a subset of patients who are
    lost to follow-up is chosen to be actively
    followed, often subject to resource constraints,
    with the goal of obtaining valid and efficient
    estimators. We demonstrate our results using data
    from Africa, which were collected to estimate HIV
    mortality as part of the evaluation of the
    Presidents Emergency Plan for AIDS Relief
    (PEPFAR).

38
CONCLUSIONS
-
Methods EXIST for efficient study design and
valid data analysis when standard
design with standard analysis gives the wrong
answer
-
Why do epidemiologists routinely adjust for one
source of bias only?
(confounding by measured risk factors)
-
Barriers to utilization
  • software gaps
  • software unfriendly, no QC
  • inadequate training of students practitioners
    (Epi Biostat)
  • are two-stage designs fundable _at_ NIH?
Write a Comment
User Comments (0)
About PowerShow.com