Epidemiologic design from a sampling perspective - PowerPoint PPT Presentation

About This Presentation
Title:

Epidemiologic design from a sampling perspective

Description:

Epidemiologic design is determined by investigator control, temporality, sampling fraction ... Sampling fraction is fixed within exposed and within not exposed. ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 55
Provided by: davidj83
Learn more at: https://sites.pitt.edu
Category:

less

Transcript and Presenter's Notes

Title: Epidemiologic design from a sampling perspective


1
Epidemiologic design from a sampling perspective
  • Epidemiology II Lecture
  • April 14, 2005
  • David Jacobs

2
Why different epidemiologic designs?
  • It is generally not possible to observe everyone
    in a population
  • New questions arise after data / samples have
    been collected
  • Cost and feasibility
  • Statistical efficiency and appropriateness to
    study question

3
The possibilities
  • There are many approaches
  • Sampling from the whole population
  • Sampling from exposure
  • Sampling from caseness
  • Haphazard selection

4
True Population Configuration Underlying
Epidemiologic Study Designs
Time 0
Time 1
5
True Population Configuration Underlying
Epidemiologic Study Designs Approximate numbers
for Minnesota
Time 0
Time 1
6
Alternate Format Population Values Exposed and
Diseased
Diseased Not diseased
Exposed A C
Not exposed B D
The numbers A, B, C, D are fixed and known exactly
7
Alternate Format Population Values Exposed and
Diseased
Diseased Not diseased
Exposed A 50,000 C 950,000
Not exposed B 50,000 D 2,950,000
The numbers A, B, C, D are fixed and known exactly
8
Measures of Risk and Relative Risk Whole
Population
Diseased Not diseased Odds Risk Probability
Exposed A C A/C A/(AC)
Not exposed B D B/D B/(BD)
Exposure Ratio A/(AB) C/(CD)
  • Risk Difference A/(AC) B/(BD)
  • Risk Ratio, Relative Risk A/(AC) / B/(BD)
  • Odds Ratio A/C / B/D AD/BC

9
Measures of Risk and Relative Risk Whole
Population
Diseased Not diseased Odds Risk Probability
Exposed 50,000 950,000 0.053 .05
Not exposed 50,000 2,950,000 0.017 .017
Exposure Ratio 0.5 0.244
  • Risk Difference 0.033
  • Risk Ratio, Relative Risk 3
  • Odds Ratio 3.11

10
  • Epidemiologic studies sample from A, B, C, and D
    to estimate Odds or Risk Risk Difference, Risk
    Ratio, or Relative Risk
  • Epidemiologic design is determined by
    investigator control, temporality, sampling
    fraction

11
Epidemiologic design level of investigator
control
  • Clinical Trial
  • Exposure assigned (at random)
  • Reflects temporary state
  • Observational
  • Exposure occurs naturally
  • Often reflects long term state

12
Epidemiologic design temporality
  • Clinical Trial, Cohort, Nested case control,
    Case-cohort
  • Exposure assessed at variable times before
    disease
  • Cross-sectional
  • Exposure assessed simultaneously with disease
  • Case-control
  • Past exposure assessed simultaneously with disease

13
Epidemiologic design sampling fraction
  • Cells A, B, C, and D are sampled at random with
    constant probability (called the sampling
    fraction)
  • Sample size is a, b, c, d
  • If a/A b/B c/C d/D then the sampling
    fraction is equal for all cells

14
Sampling fractions
Diseased Not diseased
Exposed a/A fA c/C fC
Not exposed b/B fB d/D fD
The numbers A, B, C, D are fixed and known
exactly. The numbers a,b,c,d are realized in a
given study, determined during the study.
15
Expected observations given sampling fractions
Diseased Not diseased
Exposed a 5,000 fA0.1 c 950 fC0.001
Not exposed b 1,250 fB0.025 d 5,900 fD0.002
  • Risk naïve (and wrong) 5000/5950 0.84 and
  • 1250/7150 0.175 naïve relative risk 4.8
  • Correct risk 5000/0.1/ (5000/0.1 950/0.001)
    0.05 and 1250/0.025 / (1250/0.025 5900/0.002)
    0.017 leading to Relative risk 0.05/0.017 3

16
Observations given sampling fractions
Diseased Not diseased
Exposed a 5,000 ea fA0.1 c 950 ec fC0.001
Not exposed b 1,250 eb fB0.025 d 5,900 ed fD0.002
All estimates differ from population values by
random amounts (see example in Excel file)
17
Epidemiologic design sampling fraction
  • Cross-sectional sample equally from everyone fA
    fB fC fD
  • Clinical trial, Cohort study sample equally from
    initial exposure groups
  • fAC and fBD
  • (ac)/(AC) usually differ from (bd)/(BD) in
    clinical trial, usually the same in cohort study

18
Cross-sectional Study
Diseased Not diseased
Exposed a/A f c/C f
Not exposed b/B f d/D f
  • Sampling fraction is the same in all cells.
  • Risk and odds estimates are unbiased, so risk
    differences and ratios are unbiased.

19
Expected Cross-sectional Study
Diseased Not diseased
Exposed a 50 fA0.001 c 950 fC0.001
Not exposed b 50 fB0.001 d 2,950 fD0.001
Naïve risks and relative risks are correct!
50/1000 0.05, etc.
20
Observed Cross-sectional Study
Diseased Not diseased
Exposed a 50 ea fA0.001 c 950 ec fC0.001
Not exposed b 50 eb fB0.001 d 2,950 eabc fD0.001
All estimates differ from population values by
random amounts
21
Clinical Trial or Cohort Study
Diseased Not diseased
Exposed a/A fAC c/C fAC
Not exposed b/B fBD d/D fBD
  • Sampling fraction is fixed within exposed and
    within not exposed. Usually fAC not fBD in
    clinical trial, fAC fBD in cohort study
    (which has cross-sectional baseline).
  • Risk and odds estimates are unbiased, so risk
    differences and ratios are unbiased.

22
Expected Clinical Trial or Cohort Study
Diseased Not diseased
Exposed a 100 fAC0.002 c 1900 fAC0.002
Not exposed b 50 fBD0.001 d 2950 fBD0.001
  • fAC usually fBD in a cohort study
  • fAC may differ from fBD in a clinical trial (if
    treatment allocation is not 11)

23
Expected Measures of Risk and Relative Risk
Clinical Trial or Cohort Study
Diseased Not diseased Odds Risk Probability
Exposed 100 1,900 0.053 .05
Not exposed 50 2,950 0.017 .017
Exposure Ratio 0.67 0.39 ?Differs from total population ?Differs from total population
  • Correct Risk Difference 0.033
  • Correct Risk Ratio, Relative Risk 3
  • Odds Ratio 3.11

24
Observed Clinical Trial or Cohort Study
Diseased Not diseased
Exposed a 100 ea fAC0.002 c 1900 - ea fAC0.002
Not exposed b 50 eb fBD0.001 d 2950 - eb fBD0.001
All estimates differ from population values by
random amounts
25
Epidemiologic design sampling fraction
  • Case control sample differentially within
    diseased and within nondiseased
  • fA fB fAB and fC fD fCD
  • Usually fAB much greater than fCD

26
Case-control Study
Diseased Not diseased
Exposed a/A fAB c/C fCD
Not exposed b/B fAB d/D fCD
  • Sampling fraction is fixed with diseased and
    within not diseased.
  • Exposure probabilities and odds estimates are
    unbiased, but risk, disease odds, risk
    differences and ratios are biased.
  • Odds ratio relative risk when disease is rare.

27
Expected Case-control Study
Diseased Not diseased
Exposed a 500 fAB0.01 c 494 fCD0.00052
Not exposed b 500 fAB0.01 d 1534 fCD0.00052
fAB 19.23 fCD
28
Expected Measures of Risk and Relative Risk
Case-Control Study
Diseased Not diseased Odds Risk Probability
Exposed 500 494 1.01 0.503
Not exposed 500 1,534 0.33 0.246
Exposure Ratio 0.5 0.24 Differs from total population Differs from total population
  • Incorrect Risk Difference 0.257
  • Incorrect Risk Ratio, Relative Risk 2.04
  • Odds Ratio 3.11 ? correct and approx true Rel
    Risk

29
Observed Case-control Study
Diseased Not diseased
Exposed a 500 ea fAB0.01 c 494 ec fCD0.00052
Not exposed b 500 - ea fAB0.01 d 1534 - ec fCD0.00052
30
Epidemiologic design sampling fraction
  • Nested case control sample differentially within
    diseased and within nondiseased starting with a
    cross-sectional base, so exposure measured prior
    to disease diagnosis
  • fA fB fAB and fC fD fCD
  • Often fAB 1
  • Usually fAB somewhat greater than fCD

31
Nested Case-Control Study, 1Observed
Cross-sectional Study
Diseased Not diseased
Exposed a 500 ea fA0.01 c 9500 ec fC0.01
Not exposed b 500 eb fB0.01 d 29500 eabc fD0.01
Previous cross-sectional example with sampling
fractions increased by a factor of 10
32
Nested Case-Control Study, 2Sampling from the
cross-section
Diseased Not diseased
Exposed a/A fAB c/C fCD
Not exposed b/B fAB d/D fCD
  • Sampling fraction is fixed within diseased and
    within not diseased temporality preserved.
  • Exposure probabilities and odds estimates are
    unbiased, but risk, disease odds, risk
    differences and ratios are biased.
  • Odds ratio relative risk when disease is rare.

33
Observed Nested Case-Control Study
Diseased Not diseased
Exposed a 500 ea fAB1 c 950 ec ec1 fCD0.01
Not exposed b 500 eb fAB1 d 2950 eabc ec1 fCD0.01
ea, eb, ec are ignored if fAB lt 1 then there
is an ea1.
34
Expected Measures of Risk and Relative Risk
Nested Case-Control Study
Diseased Not diseased Odds Risk Probability
Exposed 500 950 0.526 0.344
Not exposed 500 2,950 0.169 0.145
Exposure Ratio 0.5 0.24 Differs from total population Differs from total population
Incorrect Risk Difference 0.199 Incorrect Risk
Ratio, Relative Risk 2.38 Odds Ratio 3.11 ?
correct and approx true Rel Risk
35
Epidemiologic design sampling fraction
  • Case cohort sample differentially within
    diseased and within everyone (diseased
    nondiseased) starting with a cross-sectional
    base, so exposure measured prior to disease
    diagnosis
  • fA fB fAB the whole cohort is sampled at
    fABCD
  • Usually fAB 1, while fABCD is a sizeable
    fraction like 0.1 or 0.25.

36
Case-Cohort Study, 1Observed Cross-sectional
Study
Diseased Not diseased
Exposed a 500 ea fA0.01 c 9500 ec fC0.01
Not exposed b 500 eb fB0.01 d 29500 eabc fD0.01
Previous cross-sectional example with sampling
fractions increased by a factor of 10
37
Case-Cohort Study, 2 sampling from the
cross-section
Diseased Cohort (Part of all ppts)
Exposed A/A, fAB1 (ac)/(AC) fABCD
Not exposed B/B, fAB1 (bd)/(BD) fABCD
  • Sampling fraction is fixed within diseased and
    within not diseased temporality preserved
    cohort includes cases and noncases.
  • Risk and odds estimates are unbiased within
    exposed and within unexposed but differently
    weighted, so risk differences biased
  • Risk ratios are unbiased.

38
Observed Case-Cohort Study
Diseased Cohort
Exposed a 500 ea fAB1 c 1000ecec1 fABCD0.1
Not exposed b 500 eb fAB1 d 3000eabcec1 fABCD0.1
39
Observed Case-Cohort Study
Case, fAB1 Cohort, fABCD0.1 Cohort, fABCD0.1
Diseased Diseased Not diseased
Exposed a 500 ea 50ecec1 950ecec3
Not exposed b 500 eb 50ecec2 2950 eabcec123
  • When fAB 1, cohort diseased is a subset of
    case diseased.
  • When fAB lt 1, cohort diseased usually overlaps
    case diseased.

40
Nested Case-Control vs Case Cohort
  • Same cases in both
  • For a certain sampling strategy, same noncases in
    both
  • Analytic strategy different

41
Expected Measures of Risk and Relative Risk
Case-Cohort Study
Diseased Cohort Odds Risk Probability
Exposed 500 1,000 n/a 0.5
Not exposed 500 3,000 n/a 0.167
Exposure Ratio 0.5 0.25 ?Differs from total population ?Differs from total population
  • Incorrect Risk Difference 0.333 (true risk
    diff/fABCD)
  • Odds ratio Correct Risk Ratio, Relative Risk
    3
  • Relative risk would be correct even if the
    disease were rare

42
Analysis of Case Cohort Study
  • Set up table as if cohort were the control group
  • Include the overlapping cases in both cases and
    cohort
  • Compute ad/bc
  • You have estimated relative risk
  • Note
  • If you know the cohort sampling fraction, you can
    multiply the cohort up and estimate true risks
  • Given additional error in second stage cohort
    sampling, this is less efficient than estimating
    relative risk without upweighting

43
Analysis of Case Control and Case Cohort Study
  • Case Control
  • Logistic regression
  • eb is an odds ratio
  • Temporal bias?
  • Nested Case Control
  • Logistic regression
  • eb is an odds ratio
  • No temporal bias
  • Case Cohort
  • Logistic regression or Linear regression
  • eb is a relative risk, not an odds ratio
  • No Temporal bias
  • Variance somewhat high unless robust variance
    estimate is used (e.g. PROC GENMOD with GEE
    option)

44
Disadvantage of Case Control vs. Case Cohort Study
  • Case Control and Nested Case Control
  • Inflexible Outcome is fixed
  • Even in nested case control study the sampling
    structure is usually unknown
  • Case Cohort
  • Ideal for the intended outcome or for multiple
    outcomes
  • If cohort is large enough, multiple outcomes can
    be analyzed
  • Cases can be included in analysis of alternate
    dependent variable because sampling structure is
    known

45
Person years of risk, 1
  • The foregoing assumes that all cases occur at the
    same time (or can safely be treated as such)
  • In many, even most studies, this assumption is
    reasonable
  • Person years is length of followup number of
    participants when events are rare and/or all
    participants start followup at nearly the same
    time

46
Person years of risk, 2
  • Even if there are 50 events and they occur on
    average somewhat after the midpoint of followup,
    person years gt ¾ length of followup number of
    participants
  • Incidence density rates are somewhat higher than
    correspondingly scaled cumulative incidence
    rates, but relative risks are probably not much
    affected by computation of incidence density vs
    cumulative incidence

47
Person years of risk, 3
  • Proportional hazards models do not allow time
    dependency in prediction, so most analyses are
    not considering followup time in this way.
  • The timing of events vs. censoring and competing
    risk may cause differences in findings for
    incidence density vs. cumulative incidence, but
    this would be rare
  • Subgroups with very different followup times
    could create problems, but this is rare

48
  • In prospective studies, events are accumulated
    over time, so incidence density methods can be
    applied
  • Nested case control
  • Case cohort

49
Nested Case-Control Study (1)
Consider the following hypothetical cohort
X
X lung cancer case O loss to follow-up
X
O
O
X
t1
t2
t3
Time
50
Nested Case-Control Study (2)
  • At time t1 the first case occurs for which 8
    eligible controls are identified
  • Similarly, there are 5 eligible controls for the
    case at time t2, and 4 eligible controls for the
    case occurring at time t3
  • A control can become a case at a later time
    (e.g., cases at t2 and t3 serve as control for
    case at t1)
  • Controls can be selected randomly from all
    eligible controls (i.e., 1 or more controls for
    each case)
  • Number of eligible controls decreases with
    increasing number of matching factors

51
Case-Cohort Study (1)
Consider the following hypothetical cohort
X
X lung cancer case O loss to follow-up
X
O
O
X
t0
Time
52
Case-Cohort Study (2)
  • In closed cohort (in this case, when everybody
    enters cohort at t0), a sample of all subjects
    (sub-cohort) is randomly selected from cohort
    members at start of follow-up t0
  • In open cohort (i.e., when time of entry into
    cohort is variable), a sample of all subjects
    (sub-cohort) is randomly selected from members
    of cohort as it is followed over time (i.e.,
    regardless of when subjects entered the cohort)

53
Merits of time-based selection
  • From a black and white theoretical perspective,
    time-based selection makes sense a person is a
    noncase until a certain time, then becomes a
    case.
  • In the life table approach, consistent with this
    thought, the risk set at any time point is cases
    and noncases at that time point.
  • However, much chronic disease develops slowly and
    the black-white, case-noncase formulation does
    not apply very well.
  • I am unenthusiastic generally about time-based
    selection of controls or noncases because I would
    like to maintain maximum separation between cases
    and noncases.

54
Analysis of events that evolve in time
  • Nevertheless, taking person years into account
    (incidence density analysis) is more precise than
    is analysis of cumulative incidence.
  • Analysis is therefore by Cox proportional hazards
    life table regression methods, or some similar
    technique.
  • The nested case-control method is not easily
    adapted to this type of analysis
  • The case-cohort method is easily analyzed this
    way (PROC PHREG or GENMOD with Poisson
    regression), but the variances of the slopes tend
    to be too large.
  • Robust variance estimation is possible in GENMOD
    with GEE option Barlow provides a SAS macro for
    use of PHREG.
Write a Comment
User Comments (0)
About PowerShow.com