Assessing agreement for diagnostic devices - PowerPoint PPT Presentation

About This Presentation
Title:

Assessing agreement for diagnostic devices

Description:

No official support or endorsement by the Food and Drug Administration of this ... one often compares the standard film mammogram to a digital mammogram where the ... – PowerPoint PPT presentation

Number of Views:125
Avg rating:3.0/5.0
Slides: 28
Provided by: CDR81
Category:

less

Transcript and Presenter's Notes

Title: Assessing agreement for diagnostic devices


1
Assessing agreement for diagnostic devices
  • FDA/Industry Statistics Workshop
  • September 28-29, 2006
  • Bipasa Biswas
  • Mathematical Statistician, Division of
    Biostatistics
  • Office of Surveillance and Biometrics
  • Center for Devices and Radiological Health, FDA
  • No official support or endorsement by the Food
    and Drug Administration of this presentation is
    intended or should be inferred

2
Outline
  • Accuracy measures for diagnostic tests with a
    dichotomous outcome. Ideal world -tests with
    reference standard.
  • Two indices to measure accuracy Sensitivity and
    Specificity
  • Assessing agreement between two tests in the
    absence of a reference standard.
  • Overall agreement
  • Cohens Kappa
  • McNemars test
  • Proposed remedy
  • Extending agreement to tests with more than 2
    outcomes.
  • Cohens Kappa
  • Extension to Random Marginal Agreement
    coefficient (RMAC)
  • Should agreement per cell be reported?

3
Ideal World-Tests with perfect reference standard
(Single)
  • If a perfect reference standard exists to
    classify patients as diseased (D) versus not
    diseased (D-) then we can represent the data as
  • True Status
  • Test D D-
  • T
  • T -
  • If the true status of the disease is known then
    we can estimate the Se TP/(TPFN) and the
    SpTN/(TNFP)

TP FP TPFP
FN TN FNTN
TPFN FPTN TPFPFNTN
4
Ideal World-Tests with perfect reference standard
(Comparing two tests)
  • McNemars test to test equality of either
    sensitivity or specificity.
  • True Status
  • Disease D No Disease D-
  • Comparator test Comparator test
  • New test R R- New test R
    R-
  • T T
  • T - T -
  • McNemar Chi square
  • Check equality of sensitivities of the two tests
    (b1-c1-1)2/(b1c1)
  • Check equality of specifities of the two tests
    (c2-b2-1)2/(c2b2)

a1 b1 a1b1
c1 d1 c1d1
a1c1 b1d1 a1b1c1d1
a2 b2 a2b2
c2 d2 c2d2
a2c2 b2d2 a2b2c2d2
5
Ideal World-Tests with perfect reference standard
(Comparing two tests)
  • Example
  • True Status
  • Disease D Disease D-
  • Comparator test Comparator test
  • New test R R- New test R R-
  • T T
  • T - T -
  • SeT85.0(85/100) SpT88.3(795/900)
  • SeR90.0(90/100) SpR90.0(810/900)
  • McNemar Chi square
  • Check equality of sensitivities of the two
    tests (5101)2/(510)
  • p-value0.30
  • 95 CI (13.5,3.5)

85 20 105
5 790 795
90 810 900
80 5 85
10 5 15
90 10 100
6
McNemars test when a reference standard exists
  • Note however that the McNemars test is only
    checking for equality and thus the null
    hypothesis is of equivalence and the alternative
    hypothesis of difference. This is not an
    appropriate hypothesis as a failure to find a
    statistically significant difference is naively
    interpreted as evidence for equivalence.
  • The 95 confidence interval of the difference in
    sensitivities and specificities provides a better
    idea on the difference between the two tests.

7
Imperfect reference standard
  • A subjects true disease status is seldom known
    with certainty.
  • What is the effect on sensitivity and specificity
    when the comparator test R itself has error?
  • Imperfect reference test (Comparator test)
  • New test R R-
  • T
  • T -

a b ab
c d cd
ac bd abcd
8
Imperfect reference standard
  • Example1 Say we have a new Test T with 80
    sensitivity and 70 specificity. And an
    imperfect reference test R (the comparator test)
    which misses 20 of the diseased subjects but
    never falsely indicates disease.
  • True Status Imperfect reference test
  • D D- R R-
  • T
  • T
  • Se (80/100)80.0 Se (relative to R) (64/80)
    80.0
  • Sp (70/100)70.0 Sp (relative to R)
    (74/120)62.0

80 30 110
20 70 90
100 100 200
64 46 110
16 74 90
80 120 200
9
Imperfect reference standard
  • Example 2 Say we have a new Test T with 80
    sensitivity and 70 specificity. And an
    imperfect reference test R which misses 20 of
    the diseased subjects but the error in R is
    related to the error in T.
  • True Status Imperfect reference test
  • D D- R R-
  • T
  • T
  • Se (80/100)80.0 Se (relative to R)(80/80)
    100.0
  • Sp (70/100)70.0 Sp (relative to R)
    (90/120)75.0

80 30 110
20 70 90
100 100 200
80 30 110
0 90 90
80 120 200
10
Imperfect reference standard
  • Example3 Now suppose our test is perfect, that
    is has 100 sensitivity and 100 specificity, but
    the imperfect reference test R has only 90
    sensitivity and 90 specificity.
  • True Status Imperfect reference test
  • D D- R R-
  • T
  • T
  • Se (100/100)100.0 Se (relative to R)(90/100)
    90.0
  • Sp (100/100)100.0 Sp (relative to R)(90/100)
    90.0

100 0 100
0 100 100
100 100 200
90 10 100
10 90 100
100 100 200
11
Challenges in assessing agreement in the absence
of a reference standard.
  • Two commonly used overall measures are
  • Overall agreement measure
  • Cohens Kappa
  • McNemars Test
  • In stead report positive percent agreement (ppa)
    and negative percent agreement (npa).

12
Estimate of Agreement
  • The overall percent agreement can be calculated
    as
  • 100x(ad)/(abcd)
  • The overall percent agreement however, does not
    differentiate between the agreement on the
    positives and agreement on the negatives.
  • Instead of overall agreement, report positive
    percent agreement (PPA) with respect to the
    imperfect reference standard positives and
    negative percent agreement (NPA) with respect to
    imperfect reference standard negative. (reference
    Feinstein et. al.)
  • PPA100xa/(ac)
  • NPA100xd/(bd)

13
Why not to report just the overall percent
agreement?
  • The overall percent agreement is insensitive to
    off diagonal
  • imperfect reference test
  • R R-
  • New T
  • Test
  • T-
  • The overall percent agreement is 85.0 and yet
    it does not account for the off-diagonal
    imbalance. The PPA is 100 and the NPA is only
    50

70 15 85
0 15 15
70 30 100
14
Why report both PPA and NPA?
  • imperfect reference test imperfect
    reference test
  • R R- R R-
  • New T new T
  • Test T- test T-
  • Table 1 Table2
  • Overall pct. agreement90.0 Overall pct.
    agreement90.0
  • PPA50.0 (5/10) PPA87.5 (35/40)
  • 95 CI 18.7,81.3 95 CI73.2,95.8
  • NPA94.4 (85/90) NPA91.7 (55/60)
  • 95 CI 87.5,98.2 95 CI81.6,97.2

5 5 10
5 85 90
10 90 100
35 5 40
5 55 60
40 60 100
15
Kappa measure of agreement
  • Kappa is defined as the difference between
    observed and expected agreement expressed as a
    fraction of the maximum difference and ranges
    between -1 to 1.
  • Imperfect reference standard
  • R R-
  • New T
  • Test
  • T-
  • k(Io-Ie)/(1-Ie) where Io(ad)/n,
    Ie((ac)(ab)(bd)(cd))/n2

a b ab
c d cd
ac bd nabcd
16
Kappa measure of agreement
  • Imperfect reference standard
  • R R-
  • New T
  • Test
  • T-
  • Io(70)/1000.70, Ie((50)(50)(50)(50))/10000
    0.50
  • ?(0.70-0.50)/(1-0.50)0.40
  • 95 CI0.22,0.58
  • By the way the overall percent agreement is 70.0

35 15 50
15 35 50
50 50 100
17
Kappa measure of agreement sensitive to
off-diagonal?
  • Imperfect reference test
  • R R-
  • New T
  • Test T-
  • Kappa?0.45 95 CI0.31,0.59
  • Although the overall agreement stayed the same
    (70) and the marginal differences are much
    bigger than before, the kappa agreement index
    indicates otherwise.
  • Kappa statistics is impacted by the marginal
    totals even though the overall agreement is the
    same.

35 30 65
0 35 35
35 65 100
18
McNemars Test to check for equality in the
absence of a reference standard
  • Hypothesizes Equality of rates of positive
    response
  • Imperfect reference test
  • R R-
  • New T
  • Test T-
  • McNemar Chi square(b-c-1)2/(bc)
  • (30-5-1)2/(305)16.46
  • Two sided p-value0.00005

37 30 67
5 28 33
42 58 100
19
McNemars test (insensitivity to main diagonal)
  • Imperfect reference test
  • R R-
  • New T
  • Test T-
  • Same p-value as when A37 and D28, even though
    the new and the old test agree on 99.5 of
    individual cases.

3700 30 3730
5 2800 2805
3705 2830 6535
20

McNemars test (insensitivity to main diagonal)
  • Imperfect reference test
  • R R-
  • New T
  • Test T-
  • Two sided p-value1 even though old and new test
    agree on no cases.

0 19 19
18 0 18
18 19 37
21
Proposed remedy
  • In stead of reporting overall agreement or kappa
    or the McNemars test p-value, report both
    positive percent agreement and negative percent
    agreement.
  • In the 510(k) paradigm where a new device is
    compared to an already marketed device the
    positive percent agreement and the negative
    percent agreement is relative to the comparator
    device, which is appropriate.

22
Agreement of tests with more than two outcomes
  • For example in radiology one often compares the
    standard film mammogram to a digital mammogram
    where the radiologists assign a score of
    1(negative finding) to 5 (highly suggestive of
    malignancy) depending on severity.
  • The article by Fay in 2005 in Biostatistics
    proposes a random marginal agreement coefficient
    (RMAC) which uses a different adjustment for
    chance than the standard agreement coefficient
    (Cohens Kappa).

23
Comparing two tests with more than two outcomes
  • The advantages of RMAC is that the differences
    between two marginal distributions will not
    induce greater apparent agreement.
  • However, as stated in the paper similar to
    Cohens Kappa with the fixed marginal assumption,
    the RMAC also depends on the heterogeneity of the
    population. Thus in cases where the probability
    of responding in one category is nearly 1 then
    the chance agreement will be large leading to low
    agreement coefficients.

24
Comparing two tests with more than two outcomes
  • An omnibus agreement index for situations with
    more than two outcomes is also ridden by similar
    situations faced for tests with dichotomous
    outcome. Also, in a regulatory set-up where a new
    test device is being compared to a predicate
    device RMAC may not be appropriate as it gives
    equal weight to the marginals from the test and
    the predicate device.
  • In stead report individual agreement for each
    category.

25
Summary
  • Perfect standard exists then for a dichotomous
    test then both sensitivity and specificity can be
    estimated and appropriate hypothesis tests can be
    performed.
  • If a new test is being compared to an imperfect
    predicate test then the positive percent
    agreement and negative percent agreement along
    with their 95 confidence interval is a more
    appropriate way of comparison than reporting the
    overall agreement or the kappa statistics or the
    McNemars test.
  • In case of tests with more than two outcomes the
    kappa statistics or the overall agreement has the
    same problems if the goal of the study is to
    compare the new test against a predicate. A
    suggestion would be to report agreement for each
    cell.

26
References
  • Pepe, M.S. (2003). The Statistical Evaluation of
    Medical Tests for Classification and Prediction.
    Oxford University Press.
  • Statistical Guidance on Reporting Results from
    Studies Evaluating Diagnostic Tests Draft
    Guidance for Industry and FDA Reviewers. March 2,
    2003.
  • Fleiss, JL, Statistical Methods for Rates and
    Proportions, John Wiley Sons, New York (2nd
    ed., 1981).
  • Bossuyt, P.M., Reitsma, J.B., Bruns, D.E.,
    Gatsonis, C.A., Glasziou, P.P., Irwig, L.M.,
    Lijmer, J.G., Moher, D., Rennie, D., deVet,
    H.C.W. (2003). Towards complete and accurate
    reporting of studies of diagnostic accuracy The
    STARD initiative. Clinical Chemistry, 49(1), 16.
    (Also appears in Annals of Internal Medicine
    (2003) 138(1), W112 and in British Medical
    Journal (2003) 329(7379), 4144)

27
References (continued)
  • Dunn, G and Everitt, B, Clinical Biostatistics
    An Introduction to Evidence-Based Medicine, John
    Wiley Sons, New York.
  • Feinstein A. R. and Cicchetti D. V. (1990). High
    agreement but low kappa I. The problems of two
    paradoxes. J. Clin. Epidemiol 1990 Vol. 43, No.
    6, 543-549.
  • Feinstein A. R. and Cicchetti D. V. (1990). High
    agreement but low kappa II. Resolving the
    paradoxes. J. Clin. Epidemiol 1990 Vol. 43, No.
    6, 551-558.
  • Fay M. P. (2005). Random marginal agreement
    coefficients rethinking the adjustment for
    chance when measuring agreement 2005
    Biostatistics 6171-180.
Write a Comment
User Comments (0)
About PowerShow.com