Title: Client Assessment and Other New Uses of Reliability
1Client Assessment and Other New Uses of
Reliability
Part of a mini-symposium presented at the Annual
Meetingof the American College of Sports
Medicine in Baltimore,May 31, 2001
Will G HopkinsPhysiology and Physical
EducationUniversity of Otago, Dunedin NZ
- Reliability the Essentials
- Assessment of Individual Clients and Patients
- Estimation of Sample Size for Experiments
- Estimation of Individual Responses to a Treatment
2Reliability the Essentials
- Reliability is reproducibility of a measurement
if or when you repeat the measurement. - It's crucial for cliniciansbecause you need
good reproducibility to monitor small but
clinically important changes in an individual
patient or client. - It's crucial for researchersbecause you need
good reproducibility to quantify such changes in
controlled trials with samples of reasonable size.
3Reliability the Essentials
- How do we quantify reliability?
- Easy to understand for one subject tested many
times
- The 2.8 is the standard error of measurement.
- I call it the typical error, because it's the
typical difference between the subject's true
value and the observed values. - It's the random error or noise in our assessment
of clients and in our experimental studies. - Strictly, this standard deviation of a subject's
values is the total error of measurement.
4Reliability the Essentials
- We usually measure reliability with many subjects
tested a few times
5
- The 3.4 divided by ?2 is the typical error.
- The 3.4 multiplied by 1.96 are the limits of
agreement. - The 2.6 is the change in the mean.
5Reliability the Essentials
- And we can define retest correlationsPearson
(for two trials) and intraclass (two or more
trials).
6Assessment of Individual Clients and Patients
- When you test or retest an individual, take
account of relative magnitudes of signal and
noise. - The signal is what you are trying to measure.
- It's the smallest clinically or practically
important change (within the individual) or
difference (between two individuals or between
an individual and a criterion value). - Rarely it's larger changes or differences.
7Assessment of Individual Clients and Patients
- The noise is the typical error of measurement.
- It needs to come from a reliability study in
which there are no real changes in the subjects. - Or in which any real changes are the same for all
subjects. - Otherwise the estimate of the noise will be too
large. - Time between tests is therefore as short as
necessary. - A practice trial may be important, to avoid real
changes. - If published error is not relevant to your
situation, do your own reliability study.
8Assessment of Individual Clients and Patients
- If noise ltlt signal...
- Example body mass noise in scales 0.1 kg,
signal 1 kg. - The scales are effectively noise-free.
- Accept the measurement without worry.
- If noise gtgt signal...
- Example speed at ventilatory threshold noise
3, signal 1. - The noise swamps all but large changes or
differences. - Find a better test.
9Assessment of Individual Clients and Patients
- If noise ? signal...
- Examples many lab and field tests.
- Accept the result of the test cautiously.
- Or improve assessment by...
- 1. averaging several tests
- 2. using confidence limits
- 3. using likelihoods
- 4. possibly using Bayesian adjustment
10Assessment of Individual Clients and Patients
- 1. Average several tests to reduce the noise.
- Noise reduces by a factor of 1/?n, where n
number of tests. - 2. Use likely (confidence) limits for the
subject's true value. - Practically useful confidence is less than the
95 of research. - For a single test, single score typical error
are 68 confidence limits. - For test and retest, change score typical error
are 52 confidence limits.
11Assessment of Individual Clients and Patients
- Example of likely limits for a change score
noise (typical error) 1.0, smallest important
change 0.9.
? "a positive change?"
? "no real change?"
12Assessment of Individual Clients and Patients
- 3. Use likelihoods that the true value is
greater/less than an important reference value or
values. - More precise than confidence limits, but needs a
spreadsheet for the calculations. - For single scores, the reference value is usually
a pass-fail threshold. - For change scores, the best reference values are
the smallest important change.
13Assessment of Individual Clients and Patients
- Same example of a change score, to illustrate
likelihoods noise (typical error) 1.0,
smallest important change 0.9.
? "a positive change"
? "maybe no real change"
14Assessment of Individual Clients and Patients
- 4. Go Bayesian?
- That is, take into account your prior belief
about the likely outcome of the test. - When you scale down or reject outright an
unlikely high score, you are being a Bayesian... - because you attribute the high score partly or
entirely to noise, not the client.
15Assessment of Individual Clients and Patients
- To go Bayesian quantitatively
- 1. specify your prior belief with likely limits
- 2. combine your belief with the observed score
and the noise to give - 3. an adjusted score with adjusted likely limits
or likelihoods. - But how do you specify your prior belief
believably? - Example if you believe a change couldn't be
outside 3, where does the 3 come from, and
what likely limits define couldn't? 80, 90,
95, 99... ? - So use Bayes qualitatively but not quantitatively.
16Estimation of Sample Size for Experiments
- Based on having acceptable precision for the
effect. - Precision is defined by 95 likely limits.
- Estimate of likely limits needs typical error
from a reliability study in which the time frame
is the same as in the experiment. - If published error is not relevant, try to do
your own reliability study. - Acceptable limits
17Estimation of Sample Size for Experiments
- Acceptable limits can't be both substantially
positive and negative, in the worst case of
observed effect 0.
- For a crossover, 95 likely limits ?2 x
(typical error)/?(sample size) x t0.975,DF
d, where DF is the degrees of freedom in the
experiment. - Therefore sample size 8(typical error)2/d2,
so...
18Estimation of Sample Size for Experiments
- When typical error smallest effect, sample size
8. - For a study with a control group, sample size
32 (4x as many). - Beware typical error in an experiment is often
larger than in a reliability study, so you may
need more subjects. - When typical error ltlt smallest effect, sample
size could be 1, but use 8 in each group to
ensure sample is representative.
19Estimation of Sample Size for Experiments
- When typical error gtgt smallest effect
- Test 100s of subjects to estimate small effects.
- Or test fewer subjects many times pre and post
the treatment. - Or use a smaller sample and find a test with a
smaller typical error. - Or use a smaller sample and hope for a large
effect. - Because larger effects need less precision.
- If you get a small effect, tell the editor your
study will contribute to a meta-analysis.
20Individual Responses to a Treatment
- An important but neglected aspect of research.
- How to see them? Three ways.
- 1. Display each subject's values
21Individual Responses to a Treatment
- 2. Look for an increase in the standard deviation
of the treatment group in the post test. - But you might miss it
22Individual Responses to a Treatment
- 3. Look for a bigger standard deviation of the
post-pre change scores in the treatment group. - Now much easier to see any individual responses
- To present the magnitude of individual
responses...
23Individual Responses to a Treatment
- Express individual responses as a standard
deviation. - Example effect of drug 14 7 units (mean
SD). - This SD for individual responses is free of
measurement error. - It is NOT the SD of the change score for the
drug group. - There is a simple formula for this SD (see next
slide), but getting its likely limits is more
challenging. - If you find individual responses, try to account
for them in your analysis using subject
characteristics as covariates.
24Individual Responses to a Treatment
- How to derive this standard deviation
- From the standard deviations of the change scores
of the treatment and control groups ?(SD2treat -
SD2cont). - Or from analysis of the treatment and control
groups as reliability studies ?2?(error2treat -
error2cont). - Or by using mixed modeling, especially to get its
confidence limits. - Identify subject characteristics responsible for
the individual responses by using
repeated-measures analysis of covariance. - This approach also increases precision of the
estimate of the mean effect.
25This presentation, spreadsheets, more information
at
A New View of Statistics
newstats.org
SUMMARIZING DATA
GENERALIZING TO A POPULATION
Simple Effect Statistics
Precision of Measurement
Confidence Limits
Statistical Models
Dimension Reduction
Sample-Size Estimation