Title: EPI820 EvidenceBased Medicine EBM
1EPI-820 Evidence-Based Medicine (EBM)
- LECTURE 2 MEDICAL MEASUREMENT
- Mat Reeves BVSc, PhD
- Department of Epidemiology
- Michigan State University
2Objectives
- 1. Understand biological and measurement
variation and its effects on precision and
validity. - 2. Understand the components of variability
- biological and measurement
- between- and within-person/observer
- 3. Understand measures of variation and measures
of agreement. - 4. Understand the calculation and application of
K. - 5. Understand the consequences of variability in
clinical data and possible remedies to ameliorate - 6. Understand regression to the mean.
3I. Variation in Clinical Data
- 1. Biologic Variation variation in the actual
entity being measured - derives from the dynamic nature of physiology,
homeostasis and pathophysiology. - within (intra-person) biologic variability and,
- between (inter-person) biologic variability
4Within (day-to-day variation) and Between Person
Biological Variation Coefficient of Variation
() (see Winkel et al, 1974)
- Variable CV (Within) CV (Between)
- Na 0.7 0.8
- K 4.3 4.3
- Cl 2.1 1.2
- Ca 1.7 2.8
- BUN 12.3 16.4
- Creatinine 4.3 9.5
- Cholesterol 5.3 13.6
- SGOT (ALT) 24.2 24.8
- TP 2.9 5.7
5I. Variation in Clinical Data
- 2. Measurement Variation variation due to the
measurement process - inaccuracy of the instrument (instrument error),
and/or, - inaccuracy of the person (operator error)
- can introduce both random error and bias
6Analytical Variation - Coefficient of Variation
() of Duplicate Samples
- Variable CV (Analytical)
- Na 1.1
- K 2.6
- Cl 2.1
- Ca 2.1
- BUN 2.2
- Creatinine 3.4
- Cholesterol 3.1
- SGOT (ALT) 7.3
- TP 1.7
-
7Validity
- Degree to which a measurement process measures
what is intended i.e., accuracy. - Lack of systematic error or bias.
- A valid instrument will, on average, be close to
the underlying true value. - Assessment of validity requires a gold standard
(a reference).
8What if no gold standard? (e.g., pain, nausea or
anxiety)
- Use instrument or clinical scale to measure a
specific phenomenon or construct. - Criterion Validity - the degree to which the
scale predicts a directly observable phenomenon
e.g. APGAR score and neonatal survival. - Content Validity - the extent to which the
instrument includes all of the dimensions of the
construct being measured e.g. does APGAR include
all relevant patho-physiological parameters? - Construct Validity - the degree to which the
scale correlates with other known measures of the
phenomenon e.g. how well does a new Neonatal
assessment scale correlate with APGAR score?
9How do you measure validity?
- Dichotomous data
- sensitivity, specificity, and predictive values.
- Continuous data
- mean and standard deviation of the difference
between surrogate measure and gold standard (see
Bland and Altman, 1986).
10Precision (or reliability or reproducibility)
- the extent that repeated measurements of a
phenomenon tend to yield the same results
(regardless of their accuracy!). - Precision refers to the lack of random error
- Precision 1 / random error
11Hard versus Soft Data ?
- Blood chloride level
- Left ventricular ejection volume
- Migraine severity
- 28-d stroke case-fatality rate
- Indirect costs of school absenteeism
- Direct costs of school absenteeism
- Degree of depression
- Alzheimer severity
- Self-reported ability to do domestic chores
- Self-reported ability to climb stairs
- Patient preferences for induced labour
- Self-reported assessment of health
12Hard versus Soft Data
- No specific criteria to define hard data,
attributes include - Consistency the ability to preserve basic
evidence (repeated observations are consistent)
(most important attribute). - Objectivity observations are free of subjective
influences. - Quantifiable the ability to express the result
as a number.
13Hard versus Soft Data
- Usually hard data are numeric measures, such as
lab data, but not always (e.g., histology, cancer
stage) - Hard (numeric) data preferred to softer
(qualitative) measures because they are more
objective and reliable? (but see Feinstein AR et
al, 1985, Will Rogers phenomenon)
14Between and Within Person Variation
- Four categories of clinical variability
- 1. Between-person biological variability
- 2. Within-person biological variability
- 3. Between-observer measurement variability
- 4. Within-observer measurement variability
15 ANOVA Model Conceptualization
- yijkl ?i ?ij ?ik ?il
- where
- yijk the observed measurement for individual
i, measured at time j, by the kth observer at the
lth replication. - ?i individuals usual true mean (between
person biological variation) - ?ij perturbation due to biological variation
at time j (within person biologic variation). - ?ik perturbation due to measurement error by
the kth observer (between observer measurement
variation). - ?il perturbation due to measurement error at
the lth replication (within observer measurement
variation).
16II. Statistical aspects of variability
- A. Measures of Variation
- 1. Variance and Standard Deviation
- SD absolute value of average differences of
individual values from the overall mean. - CLT 68, 95, 99
- Example
- Av. US Cholesterol 220 mg/dl, SD 15 mg/dl
- Indv. readings expected to vary 190-250 mg/dl
17 A. Measures of Variation
- 2. Co-efficient of Variation (CV)
- represents the variation of a set of
measurements around their mean - conceptualized as a noise-to-signal ratio
- useful index for comparing the precision of
different instruments, individuals and/or
laboratories.
18B. Measures of Agreement
- 1. Correlation (r)
- Pearson product moment correlation and Spearmans
rank correlation - measures the degree of linear relationship
between two variables (-1, 1) - correlation between two sets of continuous
measurements ( reliability) or extent of
replication
191. Correlation (Contd)
- Two observers, same time period inter-rater
reliability. - Single observer, two time periods intra-rater
reliability (test-retest reliability). - Can have very high values of r, but little direct
agreement between raters or instruments. - Can only be used as a test of validity if the
actual true values are known.
20B. Measures of Agreement
- Intra-class Correlation Coefficient
- (R or reliability)
- a measure of reliability for continuous or
quantitative data - an observed value (X) consists of two parts
- X T e
- where
- T the True unknown level or error-free
score or steady state or signal - e error (whether biologic or measurement
error) - true error-free value varies about some unknown
mean (?) with a variance of ?2T.
212. R (Contd)
- error term is regarded as iid (? 0, ?2e ).
- Variance of X (?2x ) ?2T ?2e
- relative size of error variance (?2e) in
relation to variance of true value (?2T ) is a
measure of the imprecision. - R ?2T.
- ?2T ?2e
- R the proportion of the total variance due to
subject-to-subject (or between-person)
variability in the true value. - As random error decreases, the value of R
increases
222. Categorical data Kappa (K)
- A measure of reliability for categorical or
qualitative data. - Kappa corrects for the degree of chance in the
overall level of agreement, and is preferred over
other measures (like overall percent agreement). - K Po - Pe Actual agreement beyond chance
1 - Pe Potential agreement beyond
chance - Po the total proportion of observations on
which there is agreement - Pe the proportion of agreement expected by
chance alone.
23Agreement matrix for kappa statistic
(inter-rater agreement, 2 observers, dichotomous
data)
24Agreement matrix for kappa statistic (2
observers, dichotomous data)
25K (Contd)
- Observed agreement (Po) 78
- (69 48)/150 0.78 or 78.
- Agreement expected dt chance (Pe) 51.
- Calculated by the product of the marginal totals
for cells a and d 87 x 84/150 48.75 63 x
66/150 27.72 - Then divide sum 76.47 by 150 to get Pe 0.51
or 51.
26K (Contd)
- K Po - Pe 0.78 - 0.51 0.27 0.55 or
55 1 - Pe 1 - 0.51 0.47 - Kappa varies from -1 to 1, with a value of zero
denoting agreement no better than chance
(negative values denotes agreement worse than
chance!) - Value of k Strength of agreement lt0 Poor0 -
0.20 Slight0.21 - 0.40 Fair0.41 -
0.60 Moderate0.61 - 0.80 Substantial0.81 -
1.0 Almost perfect
27K - Issue of Prevalence
- The prevalence of condition affects the
likelihood that observers will agree purely due
to chance - hence the importance of using
kappa. Example - Observer A classified 120/150 patients
- Observer B classified 130/150 patients
- Pe is now 72.
28K - More Complicated Scenarios
- Overall (summary) kappa
- several observers or raters and/or where the
subjects are classified into several different
categories. - Weighted kappa
- measuring the relative degree of disagreement
when subjects are classified into several ordinal
categories (e.g., normal, slightly abnormal and
very abnormal). - MacClure and Willett (1987)
- Use kappa for dichotomous data or nominal
polytomous data only. - For ordinal data use either Spearmans rank
correlation or R.
29IV. Consequences of variability of clinical data
- A. Clinical impact
- Errors in diagnosis, prognosis and even
treatment. - Clinical disagreement between clinicians.
- B. Research Impact
- Between-person biological variability is a
prerequisite for etiologic studies. - Random within-person variability (a form
unreliability) results in non-differential
misclassification - with a resulting dilution or
attenuation of effect.
30B. Research impact
- Generally, imprecision has less impact in
research setting than individual clinical setting
because can average over a large number of
observations (but still require measure to be
valid). - Variability and misclassification result in the
need for larger samples sizes (and increased
costs). - Measurement errors can introduce bias if they do
not occur at random - non-differential
misclassification
31Regression Dilution Bias
- Example MacMahon et al., (1990)
- imprecision resulting from a single measurement
of diastolic blood pressure resulted in a 60
attenuation of RRs (for the effect of elevated
blood pressure on stroke and MI). - regression dilution bias.
32C. Regression towards the mean
- Group of individuals selected based on the
results of an abnormal test can be divided
into - a) those with a true underlying abnormal value,
and - b) those with a true underlying normal value (but
random fluctuations resulted in an outlying
abnormal value). - On retesting, patients in group b are closer to
their typical (normal) values, so, the overall
mean is less extreme ( regression to the mean). - Occurs when repeated observations are performed
on a variable that is inherently variable.
33C. RTTM
- Often interpreted as a sign of clinical
improvement, regardless of effectiveness of
treatment (an important explanation for the
placebo effect) - If first reading is d units higher than the true
value (?), then on average, the next value will
be closer to the mean by d(1 - r) units, - where r is the correlation between the two
measurements - RTTM increases if d is large and r is small.
- RTTM is a general tendency for describing the
average behaviour of a group, not necessarily
individuals!!
34V. Remedies for variability of clinical data
- A. Within-person biologic variation
- Standardized measurements use a standard
protocol i.e., time of day, body position etc. - Average repeated tests e.g., take several blood
pressure reading. - Use a less variable test e.g., for diabetes use
glycosolated Hb, rather than blood glucose. - Plot the data - what is the trend?
- Develop reference values for each individual -
especially if - within-person variability ltltlt between-person
variability - this results in a wide reference range which
makes it difficult to identify individual
deviations - e.g., body weight, PSA, EKG
35B. Measurement Error
- Measurement imprecision corrected by adjusting
the machine or re-training the tester, (or,
average several values?). - Measurement error that causes bias requires
quality assurance testing. Fix by re-calibration
(dont average!!).
36Sackett - Six strategies for preventing or
minimizing clinical disagreements
- 1. Match diagnostic environment to the
diagnostic task. - 2. Corroborate key findings by
- repeating observations and questions
- confirm information with other sources (e.g.,
family members) - confirm key findings using appropriate diagnostic
tests - seek confirmation from blinded colleagues
- 3. Report actual findings then report inference
- 4. Use appropriate technical aids to avoid
imprecision (e.g., ruler). - 5. Blinded assessments of diagnostic findings.
- 6. Apply skills of social sciences
- establish understanding, follow a logical order,
listen, observe, interrupt only where
necessary).