Title: Observational Studies of Disease
1Observational Studies of Disease
- Descriptive (incidence, prevalence)
- Analytic (associate characteristics of population
with risk of disease) - Population experience can be studied with group
level or individual level data - Studies using group level data are called
ecological studies
2Studies Using Group Data (1)
- Disadvantages of group data/ecological studies
- Most group data measures are made on individuals
but not under investigators control - Do not know if persons with given characteristic
are those at higher risk of disease the
ecological fallacy - Confounding problem for all observational studies
but greatest with group data lack of
investigator control in measuring confounding
variables
3Studies Using Group Data (2)
- Advantages of group data/ecological studies
- Inexpensive secondary data already collected
(vital statistics, disease registries, HMOs,
etc) - Rapid test of hypothesis
- Idea that ecological studies are
hypothesis-generating doesnt reflect their usual
purpose - If hypothesized risk factor is associated with
disease, it may well be seen in group level data - Can overcome threshold problem exposure is so
universal that effect is difficult to detect in
one setting
4Studies Using Group Data (3)
- Advantages of group data/ecological studies
- Some disease transmission dynamics can only be
studied at group level (eg, herd immunity and
infectious disease transmission - Allows global measures of group characteristics
(e.g., type of health care system) - Allows tests of area-level interventions (eg,
closing of a public hospital)
5Studies Using Group Data (3)
- All ecological studies are not created equal
- Typical study relates a disease rate across
different geographic areas with an aggregrate
measure of a characteristic of the individuals in
the areas (eg, average alcohol consumption) or
with a global measure of a characteristic of the
area (eg, climate). No attempt is made to
control for confounding. - Quality of secondary data varies widely
- Length of time secondary data has been collected
varies (relevant to looking at an association in
different time periods)
6Strategies used to strengthen ecological studies
(1)
- There are several strategies that can strengthen
inferences from ecological studies - Multiple kinds of comparisons to strengthen
inference of association eg, across geographic
areas and over different time periods - Example Valerie Berals study showing inverse
association between average family size and
ovarian cancer mortality using comparisons among
different birth cohorts, different countries, and
different social and ethnic groups (Lancet, 1978)
7Strategies used to strengthen ecological studies
(2)
- Small Area analysis Used in health services
research to investigate variation within small
geographic areas - Reduce confounding by comparing small areas from
a larger area thought to be fairly homogeneous on
potential confounders (SES, disease prevalence) - Example Wennbergs study of variation in rates
of surgical procedures in 6 areas of Vermont with
similar disease prevalence (Medical Care, 1987)
8Strategies used to strengthen ecological studies
(3)
- Mixed studies that collect data on individuals
but use secondary group data for rare outcomes - Doesnt avoid ecological fallacy but reduces
confounding by key measures at individual level - Using group data may make study feasible that
would be otherwise prohibitively expensive - Example Bindmans study of health care access
and rates of preventable hospitalizations in
California medical service areas (JAMA, 1995)
9Studies Using Individual Data
- Unifying concept Characterize morbidity and
mortality in a defined population during a
defined period of time - Defined population Study Base morbidity and
mortality experience of a cohort of individuals
over time - Cohort studies, case-control studies, and
cross-sectional studies are best understood
within the framework of a common study base
10Study Base
- Establish by
- Assembling an explicit cohort from
- Sample of larger population of interest
- Sample of persons with and without an exposure of
interest - Identifying cases of a specific disease and
defining population that gave rise to the cases - Defines the cohort within which the cases
occurred - Study designs differ in how they sample disease
experience of the study base
11Cohort Study
- Easiest design to understand because it
explicitly defines the study base as a cohort - Measures individual characteristics before
disease occurrence fulfilling the temporal order
required for cause and effect (but is not the
only study design that can do this). - Provides conceptual basis for understanding
sampling strategies of case-control, case-cohort,
and cross-sectional designs
12Cohort Study
X
L
Subjects dying or lost to follow-up
X
X
X
D
L
L
D
X
X
D
D
D
Subjects followed until end of study
D
D
D
D
D
Begin
End
Time of Follow-up
X dead L lost D disease
13Types of Cohort Studies
- Fixed (closed) versus dynamic (open) cohort
- Fixed All subjects identified at baseline in
study - Dynamic (open) Additional subjects taken during
follow-up subjects enter at different times - Fixed versus dynamic exposure measurement
- Groups of individuals with and without exposure
of interest do not change during follow-up.
Sometimes assembled and followed as two separate
cohorts of exposed and unexposed. - Dynamic exposure may vary during follow-up (eg,
individuals stop or start a behavior or exposure
is defined by an accumulation of years of
exposure)
14Measuring exposures that can vary over time in a
cohort study
- Simple cohort study with exposure status fixed at
baseline calculates risk by number of cases of
disease among exposed and unexposed subjects - More complex cohort studies allow individuals to
change from exposed to unexposed and therefore
have to calculate disease occurrence on basis of
both number of persons and length of time exposed
(called person-time) - Diseases with long incubation periods, such as
cancer, require lag time to be taken into account
in relating exposure to disease occurrence
15Threats to Validity of a Cohort Study
- Ascertainment of disease outcome
- Length of follow-up
- Time between ascertainment of status (visits,
follow-up interviews, medical record checks,
etc.) - Subtlety of disease onset (case definition)
- Secondary data sources for outcomes (eg,
registries) - Subjects lost during follow-up
- Key issue is whether losses are related to
exposure and disease outcome - If disease incidence is important outcome, losses
may bias results even if not related to exposures
16Threats to Validity of a Cohort Study (2)
- Long follow-up time biggest threat to validity of
cohort studies - Difficult to retain cohort and ascertain all
outcomes - Bias from loss to follow-up is analogous to bias
in case-control study based on prevalent cases - Large size and expense of cohort may require
compromise in measurements - Can be complicated to measure dynamic exposures
and allow for incubation periods
17Common paradigm of study design presents time the
study is undertaken as key to design but
neglects time measurements taken and concept of
study base
Past Present
Future
Cross-sectional Classify exposure and disease at
one time
Cohort Classify by exposure
Classify by disease
Case-control Classify by disease
Classify by exposure
18Timing of Study and Measurements
- Prospective versus retrospective study
terminology not always clear about when
measurements were made - Exposure and disease measurements may be
concurrent, non-concurrent, or both with respect
to the experience of the study base - Study may be carried out concurrently, or
non-currently, or both with respect to the
experience of the study base
19Timing of Study and Measurements
- Some authors designate case-control studies as
retrospective studies inaccurate since cohort
studies can also be retrospective - Distinction between when measurement of exposure
was made and study recorded it - Key issue for causality is measuring exposure
before disease--that is not design dependent
20Timing of measurement of exposures and disease
with respect to timing of study
A
CHD
Diet Exercise
Study begins and makes measurements
B
CHD
Diet Exercise
Study begins and records measurements made
previously in medical record
C
CHD
Diet Exercise
Study begins, asks subjects to recall
information in the past
21Timing of Study and Measurements (2)
- Schematic A is a prospective or concurrent cohort
study - Schematic B could be either retrospective cohort
or case-control using medical records - Schematic C is most often a case-control study
but could be a retrospective cohort that uses
recall to ascertain exposure - Mixed designs are also possible with some
measures concurrent with study and some measures
non-concurrent
22Cross-sectional Study
- In context of a cohort, a cross-sectional sample
is equivalent to sampling those with prevalent
disease and those without at one point in time in
the follow-up - A comparison of exposure in those with and
without disease is equivalent to a case-control
study using prevalent cases and concurrent
controls
23Cross-sectional Study in Context of a Cohort
Cross-sectional sample of cohort (population) at
one point in time Equivalent to sampling
prevalent cases and concurrent controls
Possible source of bias Missing potential
subjects
D
D
D
D
C
C
D
D
C
D
Subjects in Cross-sectional Study
C
D
C
D
C
D
C
D disease case C control (no disease)
24Cross-sectional Study in a Dynamic Population
Cross-sectional sample of a dynamic population
differs from sampling in fixed cohort setting.
Persons enter as well as leave the population.
Disease sampling is still of prevalent cases.
Persons entering the population
D
Subjects in cross-sectional study
D
D
D
Persons leaving the population
D
D
D
D
D
D
D
D disease case
25Cross-sectional Study (2)
- Using cross-sectional study to identify a cohort
can provide a representative sample (prevalent
cases of disease usually excluded) - Repeated cross-sectional studies of a population
can provide important information on trends a
cohort might miss - Although major weakness is problem of temporal
order (cause and effect), timing of exposure and
disease can sometimes be determined
retrospectively
26Case-Control Study Designs (1)
- Best conceptualized as occurring within a cohort
study - Variations on the case-control design come from
how the cases and the controls are sampled - From the point of view of design, case-control
studies can be just as valid as cohort studies - Threats to validity come from greater difficulty
in defining and sampling the study base and in
measuring exposure prior to disease
27Case-Control Studies Case-based sampling
- In context of a cohort, case-based sampling
identifies all cases of disease during the
follow-up period and samples individuals disease
disease free at the time of study (end of
follow-up in the cohort context) - Unbiased sample of cases but possibly biased
sample of controls - Requires rare disease assumption for odds ratio
to estimate relative risk
28Case-Control Study with Case-Based Sampling
Sampling within a Cohort Study Ascertaining all
cases and sampling controls from subjects
disease free at end of follow-up
Possible bias Potential controls not in study
at end of follow-up
D
D
D
C
D
C
C
D
Subjects in Case-Control Study
C
D
C
D
C
D
C
D
C
C
D
C
D disease case C control (no disease)
29Case-Control Studies Case-Based Sampling (2)
- Case-based sampling is most common case-control
design outside setting of explicit cohort - The study base that gave rise to the cases is
often not defined - Examples of study bases
- Cases from population disease registry study
base is the population covered by the registry - Cases from HMO study base is plan members
- Hospital cases study base is persons who would
have been admitted to hospital with the disease
30Nested Case-control Studies
- Nested case-control studies occur within a
defined cohort and sample controls from the risk
set of persons at risk in the cohort at the
occurrence of each case (called incidence-density
sampling) - Controls may become cases at some point later in
follow-up (true of any study design if everyone
is not followed until death)
31Nested Case-Control (Incidence Density Sampling)
Sampling within a cohort Including all cases and
sampling controls from subjects disease free at
the time each case is diagnosed
Cases 10 Ds Controls 10 Cs Formed from 9
risk sets
D
C
D
C
D
C
C
D
C
Subjects in Case-Control Study
D
D
C
C
D
D
C
D
D
C
C
Risk Set 1
Risk Set 2
Etc.
Risk Set 9
32Nested Case-control Studies (2)
- In example, 10 cases occur at nine points in time
(2 cases occur at same time) giving rise to 9
risk sets - One control for each case is selected in each
risk set, so 2 controls selected in risk set with
2 cases - One of the controls, selected in the second risk
set, becomes a case at the fourth risk set
33Nested Case-control Studies (3)
- Nested used by some authors to mean any
case-control study conducted within a cohort
study used here to mean incidence-density
sampling design - Outside of prior cohort study, incidence sampling
of the study base giving rise to the cases
produces same nested design - Example Identify cases as they occur from cancer
disease registry for S.F. county and obtain
controls from random sample of county at time
each case occurs
34Nested Case-control Studies (4)
- Avoids potential biases of prevalent controls or
prevalent cases - Incidence density sampling gives unbiased
estimate of ratio of disease rates in exposed and
unexposed subjects - Controls for secular (calendar time) trends since
cases and controls are matched on calendar time
35Case-Cohort Studies
- Alternative design to nested case-control
study--in context of a cohort selects all cases
and takes random sample of the cohort baseline
for controls - Like the nested design, some persons selected as
controls may become cases - Like the nested design, can be extended outside
setting of cohort study to a study base
36Case-Cohort Study
Sampling within a cohort Including all cases and
sampling controls from all subjects at baseline
of cohort
Study subjects
C
D
D
C
D
C
D
C
D
Controls in Case-Cohort Study
C
C
D
Cases in Case-Cohort Study
C
D
D
C
D
C
D
C
D disease case C control (no disease)
37Case-Cohort Studies (2)
- Taking random sample of cohort at baseline gives
estimate of prevalence of exposure in the cohort
and allows calculation of attributable risk - Controls are not linked to timing of disease
occurrence so not matched to cases on calendar
time - A single baseline control group can be used for
more than one disease outcome
38Case-Cohort Studies (3)
- No necessity to screen out silent cases of
disease from the control group - Same sub-cohort can be used for future period of
extended cohort follow-up - Gives unbiased estimate of relative risk
39Choosing a Study Design
- What has already been done?
- If no research, a rapid and inexpensive
ecological study may be useful - If several case-control studies have already been
done, what would yours contribute? - Is it worth repeating a cohort study that has
been done in a one population in a different
population (eg, in women rather than in men)?
40Choosing a Study Design (2)
- Cohort study decisions
- Need to represent a larger population?
- Not necessarily relevant to biological question
of relative disease risk in exposed and unexposed - May be important to generalizing findings
- Larger cohort versus longer follow-up
- If disease rate is constant, same number of
outcome events by more subjects rather than more
follow-up - Shorter follow-up limits potential usefulness of
cohort to examine other research questions - Shorter follow-up desirable if rapid answer to
research question is a high priority
41Choosing a Study Design Case-cohort versus
nested case-control
- Nested case-control somewhat more statistically
efficient in cohorts with long follow-up and
substantial censoring - Analysis is more familiar and available for
nested case-control - Power of nested case-control requires only
estimate of number of cases and controls
case-cohort requires information on whole cohort
and drop out rates
42Choosing a Study Design Case-cohort versus
nested case-control (2)
- Case-cohort can use same controls for multiple
disease outcomes - Case-cohort allows direct modeling of disease
incidence in exposed and unexposed - Case-cohort allows multiple time scales (age,
calendar time) nested case-control only one - Nested case-control allows more efficient
collection of time dependent exposures
43Choosing a Study Design Case-cohort versus
nested case-control (3)
- Case-cohort can use same controls for a future
period of additional cohort follow-up - Case-cohort can use controls for other purposes
(such as monitoring compliance) - Controls can be selected more rapidly in
case-cohort nested case-control may require
control selection at end of study for late cases