Title: More About Reliability and Introduction to Validity
1More About Reliability and Introduction to
Validity
2Techniques for Boosting Reliability
- Formula for alpha indicates that to boost the
reliability of a test, you may either - (1) increase the homogeneity of the test (its
mean interitem correlation), or - (2) increase the number of items
3Item Analysis
- Once data is gathered for a test, it is possible
to study the items one-by-one to see which is
contributing most to the test, and which is
contributing the least - This process, called item analysis, can be
valuable for refining and improving a test - Gives item means and variances (to look for floor
and ceiling effects), and correlations of each
item with the total test score, called item-total
correlations
4Item-Total Correlations (ITCs)
- Items with the highest ITCs contribute the most
to the test homogeneity (and therefore to the
test reliability) - Items with ITCs near 0 (or even negative ITCs)
detract from homogeneity (and therefore depress
reliability) - With this item-by-item information, we can select
and delete items to affect the reliability of a
test
5Boosting Reliability Adding More Items
reliability of existing test
multiple of items needed
desired reliability
6Examples
- If k 2, then we would need to double the number
of items on our test to achieve the desired
reliability - If our 10-item test has a disappointing
reliability of .5, and we want to obtain a
reliability of .8 - Therefore, our test needs to be 4(10) 40 items
long - We will need to add (4010) 30 items
7Stability
- A.k.a. test-retest reliability, stability indexes
how well a test correlates with itself sometime
later - Stability is the consistency of test scores over
time - Stability and internal consistency reliability
for the most part refer to different sources of
random measurement error (e.g., mood vs. silly
scale) - Measurement of traits requires both high
stability and high internal consistency
measurement of states requires only high internal
consistency
8What is Adequate Reliability?
- For a test to be used in research (for computing
group means and correlations with other
variables), reliability should be at least .6 - For a test to be used to make judgments about
individuals, such as in clinical and counseling
work, reliability should be at least .8 and
preferably more like .9 or higher
9Standard Error of Measurement
- For the clinical problem of interpreting the
score of a particular person, an index of the
likely amount of error in the persons test score
is useful - Standard error of measurement is most common
standard deviation of the test
reliability of the test
10Using the standard error of measurement
- Individuals test score serr yields a 68 CI
we can be 68 confident that the test-takers
true score lies within this interval - Individuals test score 2(serr) yields a 95
CI we can be 95 confident that the test-takers
true score lies within this interval - WISC-IV subtest with sd3 and reliability of .75
- For a person who scores 8, the 95 CI (5,11)
11Reliability of Differences Between Tests
- Random measurement error becomes most vexing when
we are considering differences in test scores - E.g., Assessing change on a test such as before
vs. after therapy, or discrepancies in several
subtests - Difference between 2 measures is usually less
reliable than either original measure, and can be
much less reliable
12Reliability of Difference Scores
- Reliability of a difference between subtest x and
subtest y is
reliability of subtest x
reliability of subtest y
correlation of subtest x with subtest y
13Example
- If were interested in the difference between two
subtests whose average reliability is .75 and
whose intercorrelation is .6, the resulting
reliability is - Proportion of random measurement error more than
doubles when looking at discrepancies between
these tests - Due to subtracting out half of systematic
variance in both subtests, while the random
measurement errors persist
14Correction For Attenuation
latent or true correlation between
variables X and Y (measured without error)
manifest or observed correlation between
our tests of X and Y (measured with error)
reliability of test of X
reliability of test of Y
15Test Validity
- Validity has to do with whether a test measures
what it is supposed to measure, and not something
else instead - Validity of a test is examined by correlating it
with an external variable (i.e., a measure
outside of the test in question) - Such a correlation is called a validity
coefficient
16Sources of Test Invalidity
- The 2 sources of test invalidity are
- Random measurement error (RME)
- Systematic measurement error (SME)
- RME limits test validity in a way we already know
about The maximum a validity coefficient can
attain is the square root of the tests
reliability - Thus, demonstrating high test validity is also
necessarily a demonstration of high test
reliability - The role of SME is the aspect of test invalidity
well focus on now
17Correlating With External Variables
- A test could correlate with an external variable
- Due to its non-error test variance. That is, the
external variable measures, or is related to,
what our test measures. - Due to its systematic measurement error. That is,
the external variable measures what our test
should not measure.
18Non-Error Test Variance (NETV)
- Predictive Criterion Validity Whether our test
correlates with some future criterion, such as
future violence - Concurrent Criterion Validity Whether our test
correlates with a same time criterion (such as an
unimpeachable test that is quite expensive to
administer) - Concurrent Validity Whether our test correlates
well with other, previously validated tests that
use similar measurement methods
19Non-Error Test Variance (NETV) cont
- Convergent Validity Whether our test correlates
well with other, previously validated tests that
use different measurement methods - Construct Validity Whether our test can be used
to verify hypotheses about construct hypotheses
stem from theory of construct most inclusive - Content Validity Whether our test looks like it
has appropriate content (by a panel of experts) - Face Validity Whether our test looks like it has
appropriate content (to the untrained eye)
20Systematic Measurement Error (SME)
- Discriminant Validity Demonstration that our
test does not correlate with what it should not
be measuring
21Several External Variables
- Possible to examine the correlations of a test
with several external variables, possibly some
correlating due to NETV and some due to SME
Factor II
X3
E
G
F
X2
A
C
X1
X4
B
Factor I
D