Inter-rater Reliability of Clinical Ratings: A Brief Primer on Kappa - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Inter-rater Reliability of Clinical Ratings: A Brief Primer on Kappa

Description:

Chi-Square is a test of Association, not Agreement. ... Personality. Disorder. Other. Psychosis. Schizo- phrenia. Rater 2. Rater. 1 ... – PowerPoint PPT presentation

Number of Views:153
Avg rating:3.0/5.0
Slides: 12
Provided by: tri5529
Category:

less

Transcript and Presenter's Notes

Title: Inter-rater Reliability of Clinical Ratings: A Brief Primer on Kappa


1
Inter-rater Reliability of Clinical Ratings A
Brief Primer on Kappa
Daniel H. Mathalon, Ph.D., M.D. Department of
Psychiatry Yale University School of Medicine
2
Inter-rater Reliability of Clinical Interview
Based Measures
  • Ratings of clinical severity for specific symptom
    domains (e.g, PANSS, BPRS, SAPS, SANS)
  • Continuous scales
  • Use intraclass correlations to assess inter-rater
    reliability.
  • Diagnostic Assessment
  • Categorical Data / Nominal Scale Data
  • How do we quantify reliability between
    diagnosticians?
  • Percent Agreement, Chi-Square, Kappa

3
Two raters classify n cases into k mutually
exclusive categories.
Rater 2
1 2 . . j k ?jnij
1 n11 n12 n1.
2 n21 n22 n2.
.
i nii nij ni.
.
k
?inij n.1 n.2 n.j n..
n.1 n.2 n.j n..
Category
nijnumber of cases falling into cell
freq of joint event ij
Rater 1
n..total number of cases
pij nij / n.. proportion of cases
falling into particular cell.
Reliability by Percentage Agreement ?ipii
1/n ?inii
4
Percent Agreement Fails to Consider Agreement by
Chance
Rater 2
Schiz Other
Schiz .81 .09 .90
Other .09 .01 .10
.90 .10 1.0
.90 x .90 .81
Rater 1
.10 x .10 .01
Proportion Agreement .82
Assume that two raters whose judgments are
completely independent (i.e., not influenced by
the true diagnostic status of the patient) each
diagnose 90 of cases to have schizophrenia and
10 of cases to not have schizophrenia (i.e.,
Other). Expected agreement by chance for each
category obtained by multiplying the marginal
probabilities together. Can get Percentage
Agreement of 82 strictly by chance.
5
Chi-Square Test of Association as Proposed
Solution
Can perform a Chi-Square Test of Association to
test null hypothesis that the two raters
judgments are independent. To reject
independence, show that observed agreement
departs from what would be expected by chance
alone. Chi-Square ?cells (Observed -
Expected)2 / Expected Problem In example
below, we have a perfect association between the
Raters with zero agreement. Chi-Square is a
test of Association, not Agreement. It is
sensitive to any departure from chance agreement,
even when the dependency between the raters
judgments involves perfect non-agreement. So,
we cannot use Chi-Square Test to assess agreement
between raters.
Rater 2
Sz BP Other
Sz 0 5 0 5
BP 0 0 5 5
Other 5 0 0 5
5 5 5 n15
Rater 1
6
Kappa Coefficient (Cohen, 1960)
High reliability requires that the frequencies
along the diagonal should be gt chance and off
diagonal frequencies should be lt chance. Use
marginal frequencies/probabilities to estimate
chance agreement.
Proportion agreement observed, po ?ipii 1/n
?inii
Proportion agreement expected by chance, pc
?ipi. x p.i
Rater 2
Sz Bp Other ni. pi.
Sz 106 .53 (78) .39 10 4 120 .6
BP 22 28 .14 (15) .075 10 60 .3
Other 2 12 6 .03 (2) .01 20 .1
n.j 130 50 20 200
p.j .65 .25 .1 1
Rater 1
pi. x p.i .39 .075 .01
7
  • Interpretations of Kappa
  • K P (agreement no agreement by chance)
  • 1-pc 1- .475 .525 of cases where no agreement
    by chance
  • po - pc .7- .475 .225 of cases are those
    non-chance agreement cases where observers
    agreed.
  • Kappa is the probability that judges will agree
    given no agreement by chance.
  • Can test Ho that Kappa 0, Kappa is normally
    distributed with large samples, can test
    significance using normal distribution.
  • Can erect confidence intervals for Kappa.

8
Weighted Kappa Coefficient
Can assign weights, wij, to classification
errors according to their seriousness using ratio
scale weights.
po(w) - pc(w)
Rater 2
Schizo- phrenia Other Psychosis Personality Disorder ni. pi.
Schizo-phrenia 106 .53 .39 0 10 .05 .15 1 4 .02 .06 6 120 .6
Other Psychosis 22 .11 .195 1 28 .14 .075 0 10 .05 .03 3 60 .3
Personality Disorder 2 .01 .065 6 12 .06 .025 3 6 .03 .01 0 20 .1
n.j 130 50 20 200
p.j .65 .25 .1 1.0
Rater 1
9
Kappa Rules of Thumb
  • K .75 is considered excellent agreement.
  • K .46 is considered poor agreement.

10
Weighted Kappa and the ICC
  • Is an intraclass correlation coefficient ( except
    for factor of 1/n) when weights have following
    property
  • wij 1 - (i - j)2

(k - 1) 2
11
Problems with Kappa
  • Affected by base rates of diagnoses.
  • Cant easily compare across studies that have
    different base rates, either in the population,
    or in the reliability study.
  • Chance agreement is a problem?
  • When the null hypothesis of rater independence is
    not met (which is most of the time), the estimate
    of chance agreement is inaccurate and possibly
    inappropriate).
Write a Comment
User Comments (0)
About PowerShow.com