Title: t
1t
Measurement Theory
X1
X3
Xk
X2
e1
e2
e3
ek
2- The Big Questions . . .
- Why does it matter whether the parallel tests
model or the domain sampling model is correct? - How are k and rij related to reliability?
- What assumptions (and dangers) are involved in
forcasting the k needed for a desired
reliability? - What cant reliability tell us?
3The earliest formal measurement theorythe theory
of parallel testswas based on the following
simple model
4- The model makes three assumptions
- True score does not change over measurement
occasions - Errors are random
- Fallible observations are the additive
combination of true score and error
5(No Transcript)
6- If true, these assumptions have the following
consequences - Sa Sb . . . Sn
- se1 se2 . . . sek
- Sa Sb . . . Sn se1 se2 . . . sek
- Mea Men Me1 Mek
- s2x s2t s2e
7- If true, these assumptions have the following
practical consequences - MX1 MX2 . . . MXk
- s2X1 s2X2 . . . s2Xk
- rXiXj rXiXl . . . rXiXk
- rX1y rX2y . . . rXky
8Reliability is the extent to which measurements
are consistent, repeatable, or generalizable
across measurement occasions. This will be true
to the extent that the variability in a sample of
people is due to underlying true score
variability and not error variability. This
suggests that reliability be defined as If the
model is correct and assumptions hold, then a
correlation between two strictly parallel
measures should arrive at this definition.
9Only this part can correlate with another
variable. The larger this part is, the larger a
correlation can be. A correlation between two
parallel measures then should inform us about
reliability.
10(No Transcript)
11(No Transcript)
12What is the correlation between a measurement and
the true score it is supposed to tap? This is a
hypothetical value, but can it be linked to
reliability?
13This is known as the index of reliability. It is
a reminder that reliability is a proportion of
variancethe variance that a measure and its true
score have in common.
14Reliability can be represented in several ways.
One especially important form is the standard
error of measurement. It is the standard
deviation of the errors and can be used to
establish confidence intervals around true scores.
15The standard error of measurement is similar to
the standard error of estimate in regression (the
standard deviation of the errors of prediction).
16The theory of parallel tests also produces the
well-known formula for attenuation of a
correlation due to measurement error.
17(No Transcript)
18(No Transcript)
19The theory of parallel tests has a major flaw. It
assumes that each measure provides a faithful
duplication of the true score so that multiple
measures are strictly parallel. That is rather
unrealistic. A more realistic modelknow as the
domain sampling modelargues that measures
represent random samples from a domain of
possible measures, each providing an imperfect
assessment of the true score, but all being
equivalent in the long run. This model produces
the same basic outcomes as the theory of parallel
tests, but also provides insights into how
measurements can be made more reliable.
20The domain sampling model assumes that any
measure may not be able to capture all of a
construct, that constructs may be multi-faceted
and so require more than one measurement occasion
to tap the essential features, and so linear
combinations of measurements may be required for
good measurement. One implication is that
heterogeneous domains require larger collections
of measures to provide good measurement.
21In the domain sampling model we assume that an
infinite (or at least very large) domain of items
(components) exists, that any given component is
as good as any other (in the long run), and that
the collection of items to be used is a random
sample from the domain.
Together, these imply that items or components
are exchangeable, at least from a sampling
standpoint. They are not strictly parallel, but
are considered parallel in the random sampling
sense.
22One key advantage to the domain sampling model
over the theory of parallel tests is that to
establish reliability, we do not need to have two
actual parallel measures to correlate. Instead,
we can use a hypothetical parallel form. Provided
certain assumptions hold, we can arrive at a way
to estimate reliability with a single
measure. Assume that we have two linear
combinations of items XR (XR1 XR2 . . .
XRk) Real XH (XH1 XH2 . . . XHk)
Hypothetical Assume further that there is no
difference in principle between these two
samples, that is, the actual sample is as good a
draw from the domain as any other sample we could
have drawn (e.g., the hypothetical one).
23The correlation between these two linear
combinations will be an estimate of reliability
for either linear combination
This correlation can be solved using the
principles of linear combinations.
24Assume a variance-covariance matrix for the
elements (some real and some hypothetical) of
each linear combination
25Assume a variance-covariance matrix for the
elements (some real and some hypothetical) of
each linear combination
Real
26Assume a variance-covariance matrix for the
elements (some real and some hypothetical) of
each linear combination
Hypothetical
27Assume a variance-covariance matrix for the
elements (some real and some hypothetical) of
each linear combination
Real and Hypothetical
28The variance of a linear combination is just the
sum of the elements in the variance-covariance
matrix for those elements. The square root of
that sum provides the needed term in the
denominator.
29The variance of this hypothetical linear
combination is defined in the same way--the sum
of the elements in the variance-covariance matrix
for the hypothetical elements.
30Þ
But, because we assume that the real sample is
just as representative as the hypothetical, we
can substitute the variance for the real linear
combination to estimate the variance for the
hypothetical.
31The numerator is the covariance between the two
linear combinations and is estimated by the sum
of all covariances between the real and
hypothetical elements.
32But, because the real and hypothetical sets are
assumed to be comparable, the covariances among
the real elements can be used instead to estimate
the hypothetical values.
33Because of the assumptions underlying the model,
we can express the reliability entirely in terms
of the elements of the real linear combination.
34(No Transcript)
35In standard score form, the reliability of a
linear combination depends on the number of items
combined and the average inter-item correlation.
This produces the Spearman-Brown prophecy formula
that can forecast the reliability of a measure
containing any number of components. When kthe
actual number of items on the measure, then this
formula gives the standardized Cronbachs alpha,
an estimate of internal consistency reliability.
36Another common form of Cronbachs alpha, for
unstandardized components
Cronbachs alpha is equal to the average of all
possible split-half reliabilities.
37The Spearman-Brown formula can be used to explore
the consequences of changing the number of
elements in the linear combination (k) and the
average correlation among those elements . . .
38The effects of the number of elements in the
linear combination (k) and the average
correlation among those elements on the 95
confidence interval around a score can also be
explored . . .
39How many measurements are needed?
40The effect of changing the reliability on the
correlation between two variables
41The effect of changing the variability of X on
the reliability of X
The variability in X may be artificially high or
low and so produce an erroneous estimate of
reliability. This formula is also a reminder that
reliability is context-dependent.
42- Some important points . . .
- The move from the theory of parallel tests to the
domain sampling model means that reliability is
estimated rather than calculated. The estimation
becomes more precise with increases in the number
of items included in the linear combination.
43- Some important points . . .
- The level of reliability does not increase with
increases in sample size (N). - Reliability depends on the conditions of
measurement. There are as many reliabilities for
a measure as there are different conditions of
measurement. - Application of the Spearman-Brown prophecy
formula assumes that new items are the same as
old items.
44- Some important points . . .
- Coefficient alpha is an estimate of internal
consistency reliability. It may not produce the
same results as test-retest reliability. It also
is a weak indicator of unidimensionality.
How many dimensions? What is the reliability of
this 6-item measure?
2
.80