Validity and Agreement - PowerPoint PPT Presentation

1 / 65
About This Presentation
Title:

Validity and Agreement

Description:

the degree to which a measurement provides the same result each time it is ... Subject ID. Plot of Fake Data. Evaluation of the. Scatter Diagram ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 66
Provided by: abbasmot
Category:
Tags: agreement | fake | id | validity

less

Transcript and Presenter's Notes

Title: Validity and Agreement


1
????? ??? ?? ???????????
????? ? ?????
Validity and Agreement
2
A study can only be as good as the data . . .
  • -Martin Bland

3
Reproducibility vs Validity
  • Reproducibility
  • the degree to which a measurement provides the
    same result each time it is performed on a given
    subject or specimen
  • Validity
  • from the Latin validus - strong
  • the degree to which a measurement truly measures
    (represents) what it purports to measure
    (represent)

4
Reproducibility vs Validity
  • Reproducibility
  • reliability, repeatability, precision,
    variability, dependability, consistency,
    stability
  • Validity
  • accuracy

5
Why Care About Reproducibility?
  • ?2O ?2T ?2E
  • More measurement error means more variability in
    observed measurements
  • e.g. measure height in a group of subjects.
  • If no measurement error
  • If measurement error

Height
6
Impact of Reproducibility on Statistical
Precision
  • observed value (O) true value (T) measurement
    error (E)
  • E is random and N (0, ?2E)
  • When measuring a group of subjects, the
    variability of observed values is a combination
    of
  • the variability in their true values and the
    variability in the measurement error
  • ?2O ?2T ?2E

7
Why Care About Reproducibility?
  • ?2O ?2T ?2E
  • More variability of observed measurements has
    profound influences on statistical
    precision/power
  • Descriptive studies wider confidence intervals
  • RCTs power to detect a treatment difference is
    reduced
  • Observational studies power to detect an
    influence of a particular risk factor upon a
    given disease is reduced.

8
Mathematical Definition of Reproducibility
  • Reproducibility
  • Varies from 0 (poor) to 1 (optimal)
  • As ?2E approaches 0 (no error), reproducibility
    approaches 1

9
Why Care About Reproducibility?
  • Impact on Validity
  • Mathematically, the upper limit of a
    measurements validity is a function of its
    reproducibility
  • Consider a study to measure height in the
    community
  • Assume the measurement has imperfect
    reproducibility if we measure height twice on a
    given person, we get two different values 1 of
    the 2 values must be wrong (imperfect validity)
  • If study measures everyone only once, errors,
    despite being random, will lead to biased
    inferences when using these measurements (i.e.
    lack validity)

10
Sources of Measurement Error
  • Observer
  • within-observer (intrarater)
  • between-observer (interrater)
  • Instrument
  • within-instrument
  • between-instrument

11
Sources of Measurement Error
  • e.g. plasma HIV viral load
  • observer measurement to measurement differences
    in tube filling, time before processing
  • instrument run to run differences in reagent
    concentration, PCR cycle times, enzymatic
    efficiency

12
Within-Subject Variability
  • Although not the fault of the measurement
    process, moment-to-moment biological variability
    can have the same effect as errors in the
    measurement process
  • Recall that
  • observed value (O) true value (T) measurement
    error (E)
  • T the average of measurements taken over time
  • E is always in reference to T
  • Therefore, lots of moment-to-moment
    within-subject biologic variability will serve to
    increase the variability in the error term and
    thus increase overall variability because
    ?2O ?2T ?2E

13
(No Transcript)
14
Selected Indices or Graphic Approaches for the
Assessment of Validity and Reliability
15
Selected Indices or Graphic Approaches for the
Assessment of Validity and Reliability
16
Indices forCategorical Variables
17
Sensitivity and Specificity
18
Predictive Values at Different Prevalence with
Sensitivity .90 and Specificity .90
Prev 10 PPV .50 NPV .99
Prev 25 PPV .76 NPV .96
Prev 50 PPV .90 NPV .90
19
Influence of prevalence on predictive values
20
??? ?????? ? ????? ??????? ??????? ????? ??????
?? ???? ??? ????? ???? ?????? ???? ? ???? ??
?????? ????? ??? ????? ??????
21
Spectrum of severity
22
Youdens J statistic
  • Sensitivity a
  • Specificity b
  • Youdens J statistic a b -1
  • All these three measures have a range of 0 to 1
    (Youdens Index can be less than 0, but only if
    the sensitivity and specificity are worse than
    would be obtained by chance with a random
    definition)

23
Youdens J statistic
  • Suppose that we are doing a survey in a
    population in which the true prevalence is P
  • The observed prevalence isaP (1-b)(1-P)
    P(ab-1) (1-b)

24
Youdens J statistic
  • If we compare two populations, then the observed
    prevalences areP1(ab-1) (1-b)P0(ab-1)
    (1-b)
  • the observed prevalence difference is
  • (P1-P0)(ab-1)
  • Youdens Index indicates the reduction in the
    true prevalence difference due to
    misclassification

25
Youdens J statistic
  • In population-based prevalence surveys, Youdens
    H statistic is the most appropriate measure of
    validity
  • 95 Confidence interval for Youdens J is

Var(J) Sen(1-Sen)/n1 Spe(1-Spe)/n2 95 CI
for J J 1.96 ?Var(J)
26
Example Jenkins et al (1996)ISAAC questionnaire
for Asthma in children
27
  • Reliability or Reproducibility
  • Is there good agreement between these two
    imperfect measurements?

28
Percent agreement
29
Cohens Kappa
  • Reported in 1960
  • Kappa corrects for the chance agreement that
    would be expected to occur if the 2
    classifications were completely unrelated

30
Kappa
  • Definition
  • Chance corrected measure of nominal scale
    agreement among raters
  • Assumptions
  • Subjects are independent
  • Categories are independent, mutually exclusive,
    and exhaustive
  • Raters operate independently

31
Kappa
p - pe
K
1 - pe
  • p Observed proportion of agreement
  • pe Proportion of agreement expected to occur
    by chance alone
  • Varies from -1 to 1

32
Weighted Kappa Coefficient
  • Definition
  • Proportion of weighted agreement corrected for
    chance
  • Application
  • All disagreements between categories are not of
    equal importance

33
Weighted Kappa Coefficient
pw - pew
K
1 - pew
  • qw Observed weighted proportion of agreement
  • qcw Weighted proportion of agreement
    expected to occur by chance alone
  • Varies from -1 to 1

34
Rater A
Category
Pi .
2
1
3
P12
P13
P1 .
1
P11
Rater B
P21
P23
P2 .
2
P22
P32
P31
3
P3 .
P33
P.1
P.2
P.3
P1
P.j
35
Example K and Kw
  • Diagnostic Category
  • - Personality Disorder
  • - Neurosis
  • - Psychosis
  • a - Disagreement weight Vij
  • b - Chance-expected cell proportion
  • Pcij (Pi.)(p.j)
  • o - Observed cell proportion Pij

36
Rater A
piB
Category
1
3
2
1a
1
0.75
0.25
.6
(.30)b
.44o
(.18)
.07
(.12)
.09
0.75
0.1
1
Rater
2
.3
(.06)
(.09)
(.15)
.05
.20
.05
B

0.25
0.1
1
3
.1
(.02)
(.03)
(.05)
.06
.03
.01


piA
1

.3
.5
.2
37
Calculating P ? Pij where i j Pc ?
(Pi.)(P.j) where i j P .44 .20 .06
.70 Pc .30 .09 .02 .41
P - Pc
1 - Pc
K
.70 - .41
.49

K
1 - .41
38
Pw - Pew
Calculating
1 - Pew
Kw
Pw ? VijPij for all ij Pew ? Vij
(Pi.)(P.j) for all ij Pw 1(.44 .2 .06)
0.75(.07 .05) 0.25(.09 .01) 0.1(.03
.05) .823 Pcw 1(.30 .09 .02)
0.75(.18 .15) 0.25(.12 .05) 0.1(.03
.06) .709
.823 - .709
.39

Kw
1 - .709
39
Standard Error of Kappa Statistic
  • The standard error of the Kappa statistic is
    calculated by
  • To test the hypothesis Hok0 vs. H1kgt0, use the
    test statistic

40
Interpretation of Kappa
  • Various authors have developed classifications
    for the interpretation of a kappa value
  • See Altman (1991) or Fleiss (1981) or Byrt (1996)

41
Interpretation of Kappa
42
Interpretation of Kappa
  • Below 0.0 ? Poor
  • 0.00 - 0.20 ? Slight
  • 0.21 - 0.41 ? Fair
  • 0.42 - 0.60 ? Moderate
  • 0.61 - 0.80 ? Substantial
  • 0.81 - 1.00 ? Almost Perfect
  • Landis Koch (1977a)

43
  • K s
  • Adjustment for chance agreement
  • Most commonly used measure of agreement
  • Many Variants and generalizations of kappa
  • Interpretability in qualitative as well as
    quantitative terms
  • K -s
  • Base rate controversy

44
  • Kw s
  • Adjustment for chance agreement
  • Ability to determine where the largest source
    of disagreement is occurring
  • Interpretability in qualitative as well as
    quantitative terms
  • Kw -s
  • Weights are arbitrarily set by researcher
  • Decreases generalizability across studies

45
Kappa and Prevalence
  • Limitation of kappa when comparing the
    reliability of a diagnostic procedure in
    different populations is its dependence on the
    prevalence of true positivity in each
    population (from Szklo Nieto, Epidemiology
    Beyond the Basics)

46
Population One (Prevalence 0.05)Table for true
positives
Observer B
From Szklo and Nieto, 2000
47
Population One (Prevalence 0.05)Table for true
negatives
Observer B
From Szklo and Nieto, 2000
48
Population One (Prevalence 0.05)Table for
total population
Observer B
? 0.296
From Szklo and Nieto, 2000
49
Population Two (Prevalence 0.30)Table for true
positives
Observer B
From Szklo and Nieto, 2000
50
Population One (Prevalence 0.05)Table for true
negatives
Observer B
From Szklo and Nieto, 2000
51
Population One (Prevalence 0.05)Table for
total population
Observer B
? 0.598
From Szklo and Nieto, 2000
52
Kappa and Prevalence
  • So, for the same sensitivity and specificity of
    the observers, the kappa value is greater in the
    population in which the prevalence of positivity
    is higher

53
Indices forContinuous Variables
54
Reproducibility of an Interval Scale Measurement
Peak Flow
  • Assessment requires
  • gt1 measurement per subject
  • Peak Flow Rate in 17 adults
  • (Bland Altman)

55
Assessment by Simple Correlation
56
Pearson Product-Moment Correlation Coefficient
  • r (rho) ranges from -1 to 1
  • r
  • r describes the strength of linear association
  • r2 proportion of variance (variability) of one
    variable accounted for by the other variable

57
r -1.0
r 1.0
r 1.0
r -1.0
r 0.0
r 0.8
r 0.8
r 0.0
58
Correlation Coefficient for Peak Flow Data
  • r ( meas.1, meas. 2) 0.98

59
Limitations of Simple Correlation for Assessment
of Reproducibility
  • Depends upon range of data
  • e.g. Peak Flow
  • r (full range of data) 0.98
  • r (peak flow lt450) 0.97
  • r (peak flow gt450) 0.94
  • Measures linear association only

60
(No Transcript)
61
  • Avoid using the usual correlation coefficient
    (the Pearson correlation coefficient)
  • It does not correct for systematic error!

62
  • Instead, calculate the intraclass correlation
    coefficient

Vbetween individulas
ICC
Vtotal
63
Intraclass Correlation Coefficient (ICC)
  • Say you have 2 raters
  • What if Rater 2 consistently overestimates the
    measurement when compared to Rater 1?

64
Fake Data (Margo, et al. 2002)
65
Plot of Fake Data
66
Evaluation of theScatter Diagram
  • Strong linear association Pearson correlation
    coefficient is 0.99
  • However, the ICC is weaker 0.89

67
Pearsons vs. ICC
  • The weaker concordance is due to the fact that
    the ICC takes into the account the difference in
    the mean, which for Rater 1 is 3.7 and for Rater
    2 is 4.9

68
Intra- and interobserver agreementBland-Altman
method (Lancet, 86)
  • there are two measurements of I patients Yi1
    and Yi2
  • Per patient calculate average and difference
  • Pi(Yi1 Yi2)/2 and diYi1 Yi2.
  • make a scatter-plot of di versus Pi
  • always calculate intraclass correlation (not
    Pearson correlation) to quantify agreement.

69
Purpose of Bland and Altman-plot
  • to check for systemic difference
  • to check for equality of variance (slope0 if
    variances are equal)

70
Example
LDL-cholesterol of 50 patients was measured with
the Friedewald formula, and directly. Friedewald
Directly Mean 251.6 250.4 SD 15.6 17.0 Me
an difference 1.21 (SD 6.08) p-value 0.16
according to paired t-test
71
Mixed-model ANOVA MSe18.488 gt ?e
4.30 MSp511.794 gt ?p ?246.65315.71 and
intraclass-correlation 246.653/(246.65318.488)
0.930 Repeatability 1.96 SD(d) 1.966.08
12.16
72
Illustrations
Bland-Altman plot -gt
73
Conclusions
  • Measurement reproducibility plays a key role in
    determining validity and statistical precision in
    all different study designs
  • When assessing reproducibility, for interval
    scale measurements
  • avoid correlation coefficients
  • use intraclass correlation coefficient
  • For categorical scale measurements, use Kappa
  • What is acceptable reproducibility depends upon
    desired use
  • Assessment of validity depends upon whether or
    not gold standards are present, and can be a
    challenge when they are absent

74
Conclusions
  • Measurement reproducibility plays a key role in
    determining validity and statistical precision in
    all different study designs
  • When assessing reproducibility, for interval
    scale measurements
  • avoid correlation coefficients
  • use intraclass correlation coefficient
  • or coefficient of variation if within-subject sd
    is proportional to the magnitude of measurement
  • For categorical scale measurements, use Kappa
  • What is acceptable reproducibility depends upon
    desired use
  • Assessment of validity depends upon whether or
    not gold standards are present, and can be a
    challenge when they are absent
Write a Comment
User Comments (0)
About PowerShow.com