t - PowerPoint PPT Presentation

1 / 38

About This Presentation

Title:

t

Description:

... assumes that each measure provides a faithful duplication of the true score so ... Brown prophecy formula assumes that new items are the same as old items. ... – PowerPoint PPT presentation

Number of Views:20

Avg rating:3.0/5.0

Slides: 39

Provided by: michael1175

Category:

more less

Transcript and Presenter's Notes

Title: t

1
t
Measurement Theory
X1
X3
Xk
X2
e1
e2
e3
ek
2

The Big Questions . . .
Why does it matter whether the parallel tests
model or the domain sampling model is correct?
How are k and rij related to reliability?
What assumptions (and dangers) are involved in
forcasting the k needed for a desired
reliability?
What cant reliability tell us?

3
The earliest formal measurement theorythe theory
of parallel testswas based on the following
simple model
4

The model makes three assumptions
True score does not change over measurement
occasions
Errors are random
Fallible observations are the additive
combination of true score and error

If true, these assumptions have the following
practical consequences
MX1 MX2 . . . MXk
s2X1 s2X2 . . . s2Xk
rXiXj rXiXl . . . rXiXk
rX1y rX2y . . . rXky

6
Reliability is the extent to which measurements
are consistent, repeatable, or generalizable
across measurement occasions. This will be true
to the extent that the variability in a sample of
people is due to underlying true score
variability and not error variability. This
suggests that reliability be defined as If the
model is correct and assumptions hold, then a
correlation between two strictly parallel
measures should arrive at this definition.
7
(No Transcript)
8
(No Transcript)
9
What is the correlation between a measurement and
the true score it is supposed to tap? This is a
hypothetical value, but can it be linked to
reliability?
10
This is known as the index of reliability. It is
a reminder that reliability is a proportion of
variancethe variance that a measure and its true
score have in common.
11
Reliability can be represented in several ways.
One especially important form is the standard
error of measurement. It is the standard
deviation of the errors and can be used to
establish confidence intervals around true scores.
12
The standard error of measurement is similar to
the standard error of estimate in regression (the
standard deviation of the errors of prediction).
13
The theory of parallel tests also produces the
well-known formula for attenuation of a
correlation due to measurement error.
14
(No Transcript)
15
(No Transcript)
16
The theory of parallel tests has a major flaw. It
assumes that each measure provides a faithful
duplication of the true score so that multiple
measures are strictly parallel. That is rather
unrealistic. A more realistic modelknow as the
domain sampling modelargues that measures
represent random samples from a domain of
possible measures, each providing an imperfect
assessment of the true score, but all being
equivalent in the long run. This model produces
the same basic outcomes as the theory of parallel
tests, but also provides insights into how
measurements can be made more reliable.
17
The domain sampling model assumes that any
measure may not be able to capture all of a
construct, that constructs may be multi-faceted
and so require more than one measurement occasion
to tap the essential features, and so linear
combinations of measurements may be required for
good measurement. One implication is that
heterogeneous domains require larger collections
of measures to provide good measurement.
18
In the domain sampling model we assume that an
infinite (or at least very large) domain of items
(components) exists, that any given component is
as good as any other (in the long run), and that
the collection of items to be used is a random
sample from the domain.
Together, these imply that items or components
are exchangeable, at least from a sampling
standpoint. They are not strictly parallel, but
are considered parallel in the random sampling
sense.
19
One key advantage to the domain sampling model
over the theory of parallel tests is that to
establish reliability, we do not need to have two
actual parallel measures to correlate. Instead,
we can use a hypothetical parallel form. Provided
certain assumptions hold, we can arrive at a way
to estimate reliability with a single
measure. Assume that we have two linear
combinations of items XR (XR1 XR2 . . .
XRk) Real XH (XH1 XH2 . . . XHk)
Hypothetical Assume further that there is no
difference in principle between these two
samples, that is, the actual sample is as good a
draw from the domain as any other sample we could
have drawn (e.g., the hypothetical one).
20
The correlation between these two linear
combinations will be an estimate of reliability
for either linear combination
This correlation can be solved using the
principles of linear combinations.
21
Assume a variance-covariance matrix for the
elements (some real and some hypothetical) of
each linear combination
22
The variance of a linear combination is just the
sum of the elements in the variance-covariance
matrix for those elements. The square root of
that sum provides the needed term in the
denominator.
23
The variance of this hypothetical linear
combination is defined in the same way--the sum
of the elements in the variance-covariance matrix
for the hypothetical elements.
24
Þ
But, because we assume that the real sample is
just as representative as the hypothetical, we
can substitute the variance for the real linear
combination to estimate the variance for the
hypothetical.
25
The numerator is the covariance between the two
linear combinations and is estimated by the sum
of all covariances between the real and
hypothetical elements.
26
But, because the real and hypothetical sets are
assumed to be comparable, the covariances among
the real elements can be used instead to estimate
the hypothetical values.
27
Because of the assumptions underlying the model,
we can express the reliability entirely in terms
of the elements of the real linear combination.
28
(No Transcript)
29
In standard score form, the reliability of a
linear combination depends on the number of items
combined and the average inter-item correlation.
This produces the Spearman-Brown prophecy formula
that can forecast the reliability of a measure
containing any number of components. When kthe
actual number of items on the measure, then this
formula gives the standardized Cronbachs alpha,
an estimate of internal consistency reliability.
30
Another common form of Cronbachs alpha, for
unstandardized components
Cronbachs alpha is equal to the average of all
possible split-half reliabilities.
31
The Spearman-Brown formula can be used to explore
the consequences of changing the number of
elements in the linear combination (k) and the
average correlation among those elements . . .
32
The effects of the number of elements in the
linear combination (k) and the average
correlation among those elements on the 95
confidence interval around a score can also be
explored . . .
33
How many measurements are needed?
34
The effect of changing the reliability on the
correlation between two variables
35
The effect of changing the variability of X on
the reliability of X
The variability in X may be artificially high or
low and so produce an erroneous estimate of
reliability. This formula is also a reminder that
reliability is context-dependent.
36

Some important points . . .
The move from the theory of parallel tests to the
domain sampling model means that reliability is
estimated rather than calculated. The estimation
becomes more precise with increases in the number
of items included in the linear combination.

Some important points . . .
The level of reliability does not increase with
increases in sample size (N).
Reliability depends on the conditions of
measurement. There are as many reliabilities for
a measure as there are different conditions of
measurement.
Application of the Spearman-Brown prophecy
formula assumes that new items are the same as
old items.

Some important points . . .
Coefficient alpha is an estimate of internal
consistency reliability. It may not produce the
same results as test-retest reliability. It also
is a weak indicator of unidimensionality.

Write a Comment

User Comments (0)