Rater Reliability - PowerPoint PPT Presentation

About This Presentation
Title:

Rater Reliability

Description:

Rater Reliability How Good is Your Coding? Why Estimate Reliability? Quality of your data Number of coders or raters needed Reviewers/Grant Applications For What ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 15
Provided by: Michael3644
Category:

less

Transcript and Presenter's Notes

Title: Rater Reliability


1
Rater Reliability
  • How Good is Your Coding?

2
Why Estimate Reliability?
  • Quality of your data
  • Number of coders or raters needed
  • Reviewers/Grant Applications

3
For What Variables Do You Need Reliability
Estimates?
  • Any variables with judgments
  • Ratings of any kind
  • Recordings, even of numbers or counts
  • Basically, all of them

4
Data Collection (1)
  • 1 judge rates all targets. NA1.
  • 2 judges, each rates (different) half of the
    targets. More than 2, but each rates different
    targets. NA2.
  • 2 judges, each rate all targets. 3 or more, all
    rate all. Crossed design.
  • 4 judges, different pairs rate each targets all
    targets by 2, but different 2 each target. 3 or
    more, not all rate all. Nested design.

5
Data Collection (2)
  • IMHO, Use a fully crossed design to estimate
    reliability (otherwise it will be hard to
    estimate and you have to hire help). Fully
    crossed is good for final data collection, too,
    but may not be feasible.
  • Use any design (crossed or nested) to collect
    real data.
  • Use proper estimate of reliability (fixed for
    crossed, random for nested, proper number of
    raters) for the design you finally used.

6
Estimation (1)
  • Use the data you collected to compute sums of
    squares for judge, target, and error. SAS GLM can
    do this for you.
  • Compute ICC(2,1) or ICC(3,1) depending on whether
    your design will be fixed (crossed) or random
    (nested)
  • Apply Spearman-Brown to estimate the reliability
    of your data.

7
Estimation (2)
  • If you collected fully crossed data (all judges
    saw all targets for entire study), you can treat
    each rater as a column (item), and each target or
    study as a row (person), and then compute
    Cronbachs alpha for those data as rater
    reliability index. Alpha ICC(3,k).
  • Cant do that if raters and targets are not
    crossed.

8
Illustration (1)
3 raters judge rigor of 5 articles using 1 to 5
scale.
Study Jim Joe Sue
1 2 3 1
2 3 2 2
3 4 3 3
4 5 4 4
5 5 5 3
9
Illustration (2)
Computer Input One column for ratings, one for
rater, one for target. Analysis GLM rating
equals rater, target, rater by target. (can use
SAS, SPSS, R, whatever) Output sums of squares
and mean squares for each.
Source Type III SS Mean Square Rater
3.73 1.87 Target 14.27
3.57 RaterTarget 2.93 .37
10
Illustration (3)
Use mean squares to compute intraclass
correlations.
ICC(2,1) one random rater ICC(3,1) one fixed rater


See Shrout Fleiss, 1979, to see additional ICCs.
11
Illustration (4)
Use Spearman Brown to estimate reliability of
multiple raters and to estimate the number of
raters needed for a desired level of reliability.
Reliability of 2 raters Raters needed for rxx of .90

random
fixed
12
SPSS
  • Raters are columns, ratings are rows
  • Analyze, Scale, Reliability Analysis
  • Drag all columns into Items
  • The default Model Alpha will produce ICC(3,k)
  • In this case alpha .897 (three judges, same
    judges all rate every target take the average)

13
SPSS (2)
  • To get 1 fixed judge, Analyze, Scale,
    Reliability, all colums into Items, then click
    Statisics
  • Check box Intraclass correlation coefficient
  • For 1 fixed judge, click 2-way mixed, ok, then
    run
  • In this case 1 fixed judge is .74.
  • For 1 random judge, click 1-way random
  • In this case, 1 random judge .59 (not quite .61
    because of my rounding error.

14
Categorical Agreement
  • If the same data were categorical, we could
    compute a percent agreement for each item and
    average over items. This does not take chance
    agreement into account, but it is easy to do.
  • We should use kappa in such a cases.
  • Can use SPSS if 2 raters, but not if there are
    more.
  • You can use SAS (my program) if more than two
  • http//faculty.cas.usf.edu/mbrannick/software/kapp
    a.htm
Write a Comment
User Comments (0)
About PowerShow.com