Title: Metaanalysis
1Meta-analysis
- Funded through the ESRCs Researcher Development
Initiative
Session 3.3 Inter-rater reliability
Department of Education, University of Oxford
2Steps in a meta-analysis
3Interrater reliability
- Aim of co-judge procedure, to discern
- Consistency within coder
- Consistency between coders
- Take care when making inferences based on little
information, - Phenomena impossible to code become missing values
4Interrater reliability
- Percent agreement Common but not recommended
- Cohens kappa coefficient
- Kappa is the proportion of the optimum
improvement over chance attained by the coders, 1
perfect agreement, 0 agreement is no better
than that expected by chance, -1 perfect
disagreement - Kappas over .40 are considered to be a moderate
level of agreement (but no clear basis for this
guideline) - Correlation between different raters
- Intraclass correlation. Agreement among multiple
raters corrected for number of raters using
Spearman-Brown formula (r)
5Interrater reliability of categorical IV (1)
Number of observations agreed on Total number
of observations
Categorical IV with 3 discreet scale-steps 9
ratings the same exact agreement 9/12 .75
6Interrater reliability of categorical IV (2)
unweighted Kappa
If agreement matrix is irregular Kappa will not
be calculated, or misleading
Kappa Positive values indicate how much the
raters agree over and above chance alone Negative
values indicate disagreement
7Interrater reliability of categorical IV (3)
unweighted Kappa in SPSS
CROSSTABS /TABLESrater1 BY rater2 /FORMAT
AVALUE TABLES /STATISTICKAPPA /CELLS COUNT
/COUNT ROUND CELL .
8Interrater reliability of categorical IV (4)
Kappas in irregualar matrices
If rater 2 is systmatically above rater 1 when
coding an ordinal scale, Kappa will be misleading
? possible to fill up with zeros
K .51
K -.16
9Interrater reliability of categorical IV (5)
Kappas in irregular matrices
If there are no observations in some row or
column, Kappa will not be calculated ? possible
to fill up with zeros
K not possible to estimate
K .47
10Interrater reliability of categorical IV (6)
weighted Kappa using SAS macro
PROC FREQ DATA int.interrater1 TABLES rater1
rater2 / AGREE TEST KAPPA RUN
Papers and macros available for estimating Kappa
when unequal or misaligned rows and columns, or
multiple raters lthttp//www.stataxis.com/about_me
.htmgt
11Interrater reliability of continuous IV (1)
- Average correlation r (.873 .879 .866) / 3
.873 - Coders code in same direction!
12Interrater reliability of continuous IV (2)
13Interrater reliability of continuous IV (3)
- Design 1 one-way random effects model when each
study is rater by a different pair of coders - Design 2 two-way random effects model when a
random pair of coders rate all studies - Design 3 two-way mixed effects model ONE pair of
coders rate all studies
14Comparison of methods (from Orwin, p. 153 in
Cooper Hedges, 1994)
Low Kappa but good AR when little variability
across items, and coders agree
15Interrater reliability in meta-analysis and
primary study
16Interrater reliability in meta-analysis vs. in
other contexts
- Meta-analysis coding of independent variables
- How many co-judges?
- How many objects to co-judge? (sub-sample of
studies, versus sub-sample of codings) - Use of Golden standard (i.e., one
master-coder) - Coder drift (cf. observer drift) are coders
consistent over time? - Your qualitative analysis is only as good as the
quality of your categorisation of qualitative data