Title: Capturerecapture method
1Capture-recapture method
Based on presentations given by Jean-Claude
Desenclos, Thomas Grein, Tony Nardone, Anne
Gallay, Natasha Crowcroft
2What is it?
- Capture-recapture methods are used for counting
the total number of individuals in a population
using two or more incomplete lists of those
individuals - Originially used in wildlife (birds, polar bears,
wild salmon) counting - Capture gt tag gt recapture gt calculate
3Uses in epidemiology
- Estimate prevalence or incidence from incomplete
sources - Simplify prevalence surveys
- Evaluate completeness of a surveillance system
(Epiet objective!) - Can be used for any condition
4Principles
- Two or more sources (lists, registries,
observations, samples) of cases with a given
disease or state - Sources considered independent capture samples
from the same source (total) population - Cases can be matched by unique identifiers
- Estimate total number in the source population
(captured and uncaptured) from the numbers of
captured in each capture
5(No Transcript)
6(No Transcript)
7(No Transcript)
8(No Transcript)
9(No Transcript)
10(No Transcript)
11Two-source model
12Two-source model
N?
Y1
Source Y
Z1
Source Z
b
a
c
x?
N a b c x
13Two-source analysis
14With independent sources
- pYZ pYnot Z
- a/(ab) c/(cx)
- c(ab) a(cx)
- bc ax
- x bc / a
15Estimations
- Unobserved cell x bc / a
- Total population N abc(bc/a) N (ab)
(ac) / a - N Y1 Z1 / a
- Sensitivity of Y Ysn Y1/N (ac)/N
- Sensitivity of Z Zsn Z1/N (ab)/N
16Confidence interval
- N Y1 Z1 / a
- VarN Y1 Z1 b c / a3
- 95 ci N 1.96 vVarN
- (Of course, adjusts only for sampling
fluctuations, not for violations of assumptions
of the method.)
17Assumptions
- The population is closed
- No change during the investigation
- Individuals captured on both occasions can be
matched - No loss of tags
- For each sample, each individual has the same
chance of being included - Same catchability
- Capture in the second sample is independent of
capture in the first - The two samples are independent, pYZ pY pZ
18Assumptions may not hold
- The population is closed ? Usually possible
- Individuals captured on both occasions can be
matched ? OK if good recording systems - For each sample, each individual has the same
chance of being included ? Rarely true - Capture in the second sample is independent of
capture in the first ? Rarely true
19Closed population
- Nobody enters or leaves the population during the
study period - No immigration, emigration, death
- Open population
- Individuals captured in first sample cannot be
captured in second - Probability of recapture ? ? a ? ?
overestimates N
N Y1 Z1 a
20True cases
- All cases in any source are true cases
- False positive cases
- Positive predictive value (PPV) lt 1
- Overestimation of Y1 or Z1 ? overestimates N
- Correction
- Take random sample of positive samples and verify
- Estimate PPV and adjust PPV Y1 Z1
21True matches
- Matches and only matches are identified
- Ideally, unique identifier available (social
security number, name, etc) - Combination of criteria Name initials, age,
sex... - True matches missed
- a ? ? overestimates N
- Wrong matches created
- a ? ? underestimates N
N Y1 Z1 a
22Equal catchability
- For each source, probability of capture is the
same for all cases - Probability may differ between sources - ok
- Some people have low probability of capture by
any source - Drug users, homeless, severely ill
- Not counted ? underestimates N
N Y1 Z1 a
23Accounting for variable catchability
- Identify and exclude population outside of all
sources - or
- Stratify by factor introducing variable
catchability - Calculate estimates by strata
- Sum N by strata
24Sources are independent(most important condition)
- Being in one source does not influence the
probability of being in the other source
OR gt 1 (positive dependence) d lt d ?
underestimates N OR lt 1 (negative dependence) d
gt d ? overestimates N
25Example
- Estimation of number of IVDU in Bangkok in 1991
(Maestro 1994) - Two sources used
- Methadone programme (April May 1991)
- Police arrests (June September 1991)
- Methadone ? Need for drugs ? ? Probability of
being arrested ? negative dependence,
overestimation of N
26Still useful
- There will always be dependence
- We can predict the direction
- So we know whether our estimate is a lower or
upper boundary - And this may be what we need
- NB Confidence intervals does not solve the
problem of dependency!!
27Evaluation of source dependence
- Two sources
- Qualitative analysis of the notification process
in each source - No statistical method to allow for dependence for
two sources - More than two sources
- Wittes method
- Log-linear modelling
28Wittes method
- Evaluate dependence between sources
- Compare two-source estimates of N
- If estimates different ?
- Test of independence
- Calculate odds ratios between cell counts of two
sources within a third source - If OR ? 1 ? dependence
- Merge dependent sources
- Repeat calculation of estimates with merged source
29Test of independence
A
B
a
b
f
c
d
e
OR cg/de
g
C
OR 1 ? independence OR gt 1 ? positive
dependence ? underestimation of N OR lt 1 ?
negative dependence ? overestimation of N
30Example Legionellosis in France
NS Notification system NRL National Reference
Laboratory HL Hospital Laboratories
31Example Legionellosis in France
- Two-source estimates
-
- Tests of independence (Wittes)
- Merge NS/NLR into one source
NS/NRL 389 cases NS/HL 615 cases HL/NRL
715 cases
NS?NRL / HL 528 495561 cases
32Conclusion
- If conditions are met
- Great potentital to estimate population size by
using incomplete sources - Cheaper than exhaustive registers or full
counting - Two sources
- Impossible to quantify extent of dependence
- Multiple sources
- Log-linear modelling method of choice
- Can adjust for dependence and variable
catchability
33How many participants are there?
- Capture Source Preben
- Recapture Source Arnold
- Estimations ?
- Assumptions hold? ?
34Estimations
- Unobserved cell x bc / a
- Total population N abc(bc/a)
- N Preb1 Arn1 / a
- Sensitivity of Preb Prebsn Preb1/N (ac)/N
- Sensitivity of Arn Arnsn Arn1/N (ab)/N
35Confidence interval
- N Preb1 Arn1 / a
- VarN Preb1 Arn1 b c / a3
- 95 ci N 1.96 vVarN
- (Of course, adjusts only for sampling
fluctuations, not for violations of assumptions
of the method.)
36Assumptions hold?
- The population is closed
- Individuals captured on both occasions can be
matched - For each sample, each individual has the same
chance of being included - Capture in the second sample is independent of
capture in the first
37Recommended reading
- Wittes JT, Colton T and Sidel VW.
Capture-recapture models for assessing the
completeness of case ascertainment using multiple
information sources. J Chronic Dis 19742725-36. - Hook EB, Regal RR. Capture-recapture methods in
epidemiology. Methods and limitations. Epidemiol
Rev 1995 17 243-264 - International Working Group for Disease
Monitoring and Forecasting. Capture-recapture and
multiple-record systems estimation I History and
theoretical development. Am J Epidemiol
19951421047-58 - International Working Group for Disease
Monitoring and Forecasting. Capture-recapture and
multiple-record systems estimation II
Applications in human diseases. Am J Epidemiol
19951421059-68
38Some examples from field epidemiology
- Legionnaires disease. Nardone et al Epidemiol
Infect 2003131647-54 - Malaria. Klein and Bosman. Euro Surveill 2005
10 244-6 - Measles. Van den Hof et al Pediatr Inf Dis J
2002 211146-50 - Acute flaccid paralysis. Whitfield Bull WHO
200280846-851 - Pertussis deaths. Crowcroft et al Arch Dis Child
200286336-8 - Intussception after rotavirus vaccination.
Verstraeten et al Am J Epidemiol
20011541006-1012 - Tuberculosis. Tocque et al Commun Dis Public
Health 20014141-3 - Salmonella outbreaks. Gallay et al Am J Epidemiol
2000 152171-7 - AIDS. Bernillon et al Int J Epidemiol
200029168-174 - Meningitis. Faustini et al. Eur J Epidemiol
200016843-8