Title: Studies of Diagnostic Tests
1Studies of Diagnostic Tests
- Thomas B. Newman, MD, MPH
- October 15, 2009
2Reminders/Announcements
- Door must be closed
- Write down answers to problems in the book and
check your answers! - Final exam to be passed out 12/3, reviewed 12/10
- Send questions!
3Overview
- Common biases of studies of diagnostic test
accuracy - Prevalence, spectrum and nonindependence
- Meta-analysis of diagnostic tests
- Checklist systematic approach
- Examples
- Physical examination for presentation
- Pain with percussion, hopping or cough for
appendicitis - Pertussis
- Predicting hyperbilirubinemia
4Bias 1 Example
- Study of BNP to diagnose congestive heart failure
(CHF, Chapter 4, Problem 3)
5Bias 1 Example
- Gold standard determination of CHF by two
cardiologists blinded to BNP - Chest x-rays found to be highly predictive of CHF
- Is there a problem with assessing accuracy of
chest x-rays to diagnose CHF in this study?
Maisel AS, Krishnaswamy P, Nowak RM, McCord J,
Hollander JE, Duc P, et al. Rapid measurement of
B-type natriuretic peptide in the emergency
diagnosis of heart failure. N Engl J Med
2002347(3)161-7.
6Bias 1 Incorporation bias
- Cardiologists not blinded to Chest X-ray
- Probably used (incorporated) it to make final
diagnosis - Incorporation bias for assessment of Chest X-ray
(not BNP) - Biases both sensitivity and specificity upward
7Bias 2 Example
- Visual assessment of jaundice in newborns
- Study patients who are getting a bilirubin
measurement - Ask clinicians to estimate extent of jaundice at
time of blood draw
8Visual Assessment of jaundice Results
- Sensitivity of jaundice below the nipple line for
TSB 12 mg/dL 97 - Specificity 19
-
- What is the problem?
Editors Note The take-home message for me is
that no jaundice below the nipple line equals no
bilirubin test, unless theres some other
indication. --Catherine D. DeAngelis, MD
Moyer et al., APAM 2000 154391
9Bias 2 Verification bias
- Inclusion criterion for study gold standard test
was done - in this case, blood test for bilirubin
- Subjects with positive index tests are more
likely to be get the gold standard and to be
included in the study - clinicians dont order blood test for bilirubin
if the jaundice is minimal - How doe this affect sensitivity and specificity?
10Bias 2 Verification Bias
Sensitivity, a/(ac), is biased ___.
Specificity, d/(bd), is biased ___.
AKA Work-up, Referral Bias, or Ascertainment Bias
11Bias 3
- Example Pioped study of accuracy of V/Q scan to
diagnose pulmonary embolus - Study Population All patients presenting to the
ED who received a V/Q scan - Test V/Q Scan
- Disease Pulmonary embolism (PE)
- Gold Standards
- 1. Pulmonary arteriogram (PA-gram) if done (more
likely with more abnormal V/Q scan) - 2. Clinical follow-up in other patients (more
likely with normal VQ scan
PIOPED. JAMA 1990263(20)2753-9.
12Double Gold Standard Bias
- Two different gold standards
- One gold standard (e.g., surgery, invasive test)
is more likely to be applied in patients with
positive index test, - Other gold standard (e.g., clinical follow-up) is
more likely to be applied in patients with a
negative index test. - There are some patients in whom the tests do not
give the same answer - spontaneously resolving disease
- newly occurring disease
13Double Gold Standard Bias effect of
spontaneously resolving cases
Sensitivity, a/(ac) biased __ Specificity,
d/(bd) biased __
Double gold standard compared with follow-up for
all
Double gold standard compared with PA-Gram for all
14Double Gold Standard Bias effect of newly
occurring cases
Sensitivity, a/(ac) biased __ Specificity,
d/(bd) biased __
Double gold standard compared with follow-up for
all
Double gold standard compared with PA-Gram for all
15Double Gold Standard Bias Ultrasound diagnosis
of intussusception
16What if 10 of the 86 U/S- followed subjects
actually had intussusceptions that resolved
spontaneously?
17Spectrum of Disease, Nondisease and Test Results
- Disease is often easier to diagnose if severe
- Nondisease is easier to diagnose if patient is
well than if the patient has other diseases - Test results will be more reproducible if
ambiguous results excluded
18Spectrum Bias
- Sensitivity depends on the spectrum of disease in
the population being tested. - Specificity depends on the spectrum of
non-disease in the population being tested. - Example Absence of Nasal Bone (on 13-week
ultrasound) as a Test for Chromosomal Abnormality
19Spectrum Bias Example Absence of Nasal Bone as a
Test for Chromosomal Abnormality
Sensitivity 229/333 69 BUT the D group only
included fetuses with Trisomy 21
Cicero et al., Ultrasound Obstet Gynecol 2004
23 218-23
20Spectrum Bias Absence of Nasal Bone as a Test
for Chromosomal Abnormality
- D group excluded 295 fetuses with other
chromosomal abnormalities (esp. Trisomy 18) - Among these fetuses, sensitivity 32 (not 69)
- What decision is this test supposed to help with?
- If it is whether to test chromosomes using
chorionic villus sampling or amniocentesis,
these 295 fetuses should be included!
21Spectrum BiasAbsence of Nasal Bone as a Test
for Chromosomal Abnormality, effect of including
other trisomies in D group
Sensitivity 324/628 52 NOT 69 obtained when
the D group only included fetuses with Trisomy 21
22Quiz What if we considered the nasal bone
absence as a test for Trisomy 21?
- Then instead of excluding subjects with other
chromosomal abnormalities or including them as
D, we should count them as D-. Compared with
excluding them, - What would happen to sensitivity?
- What would happen to specificity?
23Prevalence, spectrum and nonindependence
- Prevalence (prior probability) of disease may be
related to disease severity - One mechanism is different spectra of disease or
nondisease - Another is that whatever is causing the high
prior probability is related to the same aspect
of the disease as the test
24Prevalence, spectrum and nonindependence
- Examples
- Iron deficiency
- Diseases identified by screening
- Urinalysis as a test for UTI in women with more
and fewer symptoms (high and low prior
probability)
25Overfitting
26Meta-analyses of Diagnostic Tests
- Systematic and reproducible approach to finding
studies - Summary of results of each study
- Investigation into heterogeneity
- Summary estimate of results, if appropriate
- Unlike other meta-analyses (risk factors,
treatments), results arent summarized with a
single number (e.g., RR), but with two related
numbers (sensitivity and specificity) - These can be plotted on an ROC plane
27MRI for the diagnosis of MS
Whiting et al. BMJ 2006332875-84
28Studies of Diagnostic Test Accuracy Checklist
- Was there an independent, blind comparison with a
reference (gold) standard of diagnosis? - Was the diagnostic test evaluated in an
appropriate spectrum of patients (like those in
whom we would use it in practice)? - Was the reference standard applied regardless of
the diagnostic test result? - Was the test (or cluster of tests) validated in a
second, independent group of patients?
From Sackett et al., Evidence-based Medicine,2nd
ed. (NY Churchill Livingstone), 2000. p 68
29Systematic Approach
- Authors and funding source
- Research question
- Study design
- Study subjects
- Predictor variable
- Outcome variable
- Results Analysis
- Conclusions
30A clinical decision rule to identify children at
low risk for appendicitis (Problem 5.6)
- Study design prospective cohort study
- Subjects
- Of 4140 patients 3-18 years presenting to Boston
Childrens Hospital ED with CC abdominal pain - 767 (19) received surgical consultation for
possible appendicitis - 113 Excluded (Chronic diseases, recent imaging)
- 53 missed
- 601 included in the study (425 in derivation set)
Kharbanda et al. Pediatrics 2005 116(3) 709-16
31A clinical decision rule to identify children at
low risk for appendicitis
- Predictor variable
- Standardized assessment by PEM attending
- Focus on Pain with percussion, hopping or cough
(complete data in N381) - Outcome variable
- Pathologic diagnosis of appendicitis for those
who received surgery (37) - Follow-up telephone call to family or
pediatrician 2-4 weeks after the ED visit for
those who did not receive surgery (63)
Kharbanda et al. Pediatrics 116(3) 709-16
32A clinical decision rule to identify children at
low risk for appendicitis
- Results Pain with percussion, hopping or
cough - 78 sensitivity seems low to me. Is it valid for
me in deciding whom to image?
Kharbanda et al. Pediatrics 116(3) 709-16
33Checklist
- Was there an independent, blind comparison with a
reference (gold) standard of diagnosis? - Was the diagnostic test evaluated in an
appropriate spectrum of patients (like those in
whom we would use it in practice)? - Was the reference standard applied regardless of
the diagnostic test result? - Was the test (or cluster of tests) validated in a
second, independent group of patients?
From Sackett et al., Evidence-based Medicine,2nd
ed. (NY Churchill Livingstone), 2000. p 68
34Systematic approach
- Study design prospective cohort study
- Subjects
- Of 4140 patients 3-18 years presenting to Boston
Childrens Hospital ED with CC abdominal pain - 767 (19) received surgical consultation for
possible appendicitis
Kharbanda et al. Pediatrics 116(3) 709-16
35A clinical decision rule to identify children at
low risk for appendicitis
- Predictor variable
- Pain with percussion, hopping or cough
(complete data in N381) - Outcome variable
- Pathologic diagnosis of appendicitis for those
who received surgery (37) - Follow-up telephone call to family or
pediatrician 2-4 weeks after the ED visit for
those who did not receive surgery (63)
Kharbanda et al. Pediatrics 116(3) 709-16
36Issues
- Sample representative?
- Verification bias?
- Double-gold standard bias?
- Spectrum bias
37For children presenting with abdominal pain to
SFGH 6-M
- Sensitivity probably valid (not falsely low)
- But whether all of them tried to hop is not clear
- Specificity probably low
- PPV is high
- NPV is low
- Does not address surgical consultation decision
38Does this coughing patient have pertussis?
- RQ (for us) what are LR for coughing fits,
whoop, and post-tussive vomiting in adults with
persistent cough? - Design (for one study we reviewed) Prospective
cross-sectional study - Subjects 217 adults 18 years with cough 7-21
days, no fever or other clear cause for cough
enrolled by 80 French GPs. - In a subsample from 58 GPs, of 710 who met
inclusion criteria only 99 (14) enrolled
Gilberg S et al. J Inf Dis 2002186415-8
39Petussis diagnosis
- Predictor variables GPs interviewed patients
using a standardized questionnaire. - Outcome variable Evidence of pertussis based on
- Culture (N1)
- PCR (N36)
- Or 2-fold change in anti-pertussis toxin IgG
(N40) - Total N 70/217 with evidence of pertussis
Gilberg S et al. J Inf Dis 2002186415-8
40Results
- 89 in both groups met CDC criteria for pertussis
41Issues
- Verification (selection) bias only 14 of
eligible subjects included - Questionable gold standard (internally
inconsistent) - Nice illustration of difficulty doing a
systematic review!
42Questions?