RELIABILITY presentation

About This Presentation

Transcript and Presenter's Notes

Title: RELIABILITY

1
LECTURE 6

RELIABILITY

2
RELIABILITY

Reliability is a proportion of variance measure
(squared variable)
Defined as the proportion of observed score (x)
variance due to true score (? ) variance
?2x? ?xx
?2? / ?2x

3
VENN DIAGRAM REPRESENTATION
Var(?)
Var(e)
Var(x)
reliability
4
PARALLEL FORMS OF TESTS

If two items x1 and x2 are parallel, they have
equal true score variance
Var(?1 ) Var(?2 )
equal error variance
Var(e1 ) Var(e2 )
Errors e1 and e2 are uncorrelated ?(e1 , e2 )
0
?1 ?2

5
Reliability 2 parallel forms

x1 ? e1 , x2 ? e2
?(x1 ,x2 ) reliability
?xx
correlation between parallel
forms

6
Reliability parallel forms
x1
x2
?x?
?x?
?
e
e
?xx ?x? ?x?
7
Reliability 3 or more parallel forms

For 3 or more items xi, same general form holds
reliability of any pair is the correlation
between them
Reliability of the composite (sum of items) is
based on the average inter-item correlation
stepped-up reliability, Spearman-Brown formula

8
Reliability 3 or more parallel forms

Spearman-Brown formula for reliability
rxx k r(i,j) / 1 (k-1) r(i,j)
Example 3 items, 1 correlates .5 with 2, 1
correlates .6 with 3, and 2 correlates .7 with 3
average is .6
rxx 3(.6) / 1 2(.6) 1.8/2.2 .87

9
Reliability tau equivalent scores

If two items x1 and x2 are tau equivalent, they
have
?1 ?2
equal true score variance
Var(?1 ) Var(?2 )
unequal error variance
Var(e1 ) ? Var(e2 )
Errors e1 and e2 are uncorrelated ?(e1 , e2 )
0

10
Reliability tau equivalent scores

x1 ? e1 , x2 ? e2
?(x1 ,x2 ) reliability
?xx
correlation between tau
eqivalent forms
(same computation as for parallel, observed score
variances are different)

11
Reliability Spearman-Brown

Can show the reliability of the parallel forms or
tau equivalent composite is
?kk k ?xx/1 (k-1) ?xx
k times test is lengthened
example test score has rel.7
doubling length produces rel 2(.7)/1.7
.824

12
Reliability Spearman-Brown

example test score has rel.95
Halving (half length) produces
?xx .5(.95)/1(.5-1)(.95)
.905
Thus, a short form with a random sample of half
the items will produce a test with adequate score
reliability

13
Reliability KR-20 for parallel or tau equivalent
items/scores

Items are scored as 0 or 1, dichotomous scoring
Kuder and Richardson (1937)
special cases of Cronbachs more general
equation for parallel tests.
KR-20 k/(k-1) 1 - ?piqi / ?2y ,
where pi proportion of respondents obtaining a
score of 1 and qi 1 pi .
pi is the item difficulty

14
Reliability KR-21 for parallel forms assumption

Items are scored as 0 or 1, dichotomous scoring
Kuder and Richardson (1937)
KR-21 k/(k-1) 1 - k?p. q. / ?2c
p. is the mean item difficulty and q. 1 p.
KR-21 assumes that all items have the same
difficulty (parallel forms)
item mean gives the best estimate of the
population values.
KR-21 ? KR-20.

15
Reliability congeneric scores

If two items x1 and x2 are congeneric,
1. ?1 ? ?2
2. unequal true score variance
Var(?1 ) ? Var(?2 )
3. unequal error variance
Var(e1 ) ? Var(e2 )
4. Errors e1 and e2 are uncorrelated
?(e1 , e2 ) 0

16
Reliability congeneric scores

x1 ?1 e1 , x2 ?2 e2
?jj Cov(t1 , t2 )/ ?x1?x2
This is the correlation between two separate
measures that have a common latent variable

17
Congeneric measurement structure
x2
x1
?12
?x1?1
?x2?2
?1
e1
e2
?2
?xx ?x1? 1?12 ?x2?2
18
Reliability Coefficient alpha

Compositesum of k parts, each with its own true
score and variance
C x1 x2 xk
? 1 - ??2k / ?2c
?est k/(k-1)1 - ?s2k / s2c

19
Reliability Coefficient alpha

Alpha
1. Spearman-Brown for parallel or tau equivalent
tests
2. KR20 for dichotomous items (tau equiv.)
Hoyt, even for ?2? x item ? 0
(congeneric)

20
Hoyt reliability

Based on ANOVA concepts extended during the 1930s
by Cyrus Hoyt at U. Minnesota
Considers items and subjects as factors that are
either random or fixed (different models with
respect to expected mean squares)
Presaged more general Coefficient alpha
derivation

21
Reliability Hoyt ANOVA
Source df Expected Mean Square Person
(random) I-1 ?2? ?2? x items K?2? Items
(random) K-1 ?2? k?2? x item
I?2items error (I-1)(K-1) ?2? ?2? x item
parallel forms gt ?2? x item 0 ?Hoyt
E(MSpersons) - E(MSerror) / E(MSpersons) est
?Hoyt (MSpersons) - (MSerror) / (MSpersons)
22
Reliability Coefficient alpha

Compositesum of k parts, each with its own true
score and variance
C x1 x2 xk
Example sx1 1, sx22, sx33
sc 5
?est 3/(3-1)1 - ?(149)/25
1.51 14/25
16.5/25 .66

23
(No Transcript)
24
SPSS DATA FILE
JOE 1 1 1 0 SUZY 1 0 1 1 FRANK 0 0 1 0 JUAN 0 1
1 1 SHAMIKA 1 1 1 1 ERIN 0 0 0 1 MICHAEL 0 1 1 1
BRANDY 1 1 0 0 WALID 1 0 1 1 KURT 0 0 1 0 ERIC
1 1 1 0 MAY 1 0 0 0
25
SPSS RELIABILITY OUTPUT
R E L I A B I L I T Y A N A L Y S I S
- S C A L E (A L P H A) Reliability
Coefficients N of Cases 12.0
N of Items 4 Alpha .1579
26
SPSS RELIABILITY OUTPUT
R E L I A B I L I T Y A N A L Y S I S -
S C A L E (A L P H A) Reliability
Coefficients N of Cases 12.0
N of Items 8 Alpha .6391 Note same
items duplicated
27
TRUE SCORE THEORY AND STRUCTURAL EQUATION MODELING

True score theory is consistent with the concepts
of SEM
- latent score (true score) called a factor in
SEM
- error of measurement
- path coefficient between observed score x and
latent score ? is same as index of reliability

28
COMPOSITES AND FACTOR STRUCTURE

3 Manifest (Observed) Variables required for a
unique identification of a single factor
Parallel forms implies
Equal path coefficients (termed factor loadings)
for the manifest variables
Equal error variances
Independence of errors

29
Parallel forms factor diagram
e
e
x1
x2
?x?
?x?
e
?
?x?
x3
?xixj ?xi? ?xj? reliability between
variables i and j
30
RELIABILITY FROM SEM

TRUE SCORE VARIANCE OF THE COMPOSITE IS
OBTAINABLE FROM THE LOADINGS
k ? ? ?2i Variance of factor
i1
k items or subtests
k?2x? k times pairwise
average reliability of items

31
RELIABILITY FROM SEM

RELIABILITY OF THE COMPOSITE IS OBTAINABLE FROM
THE LOADINGS ? k/(k-1)1 - 1/
?
example ?2x? .8 , K11 ? 11/(10)1 -
1/8.8 .975

32
TAU EQUIVALENCE

ITEM TRUE SCORES DIFFER BY A CONSTANT ?i
?j ?k
ERROR STRUCTURE UNCHANGED AS TO EQUAL VARIANCES,
INDEPENDENCE

33
CONGENERIC MODEL

LESS RESTRICTIVE THAN PARALLEL FORMS OR TAU
EQUIVALENCE
LOADINGS MAY DIFFER
ERROR VARIANCES MAY DIFFER
MOST COMPLEX COMPOSITES ARE CONGENERIC
WAIS, WISC-III, K-ABC, MMPI, etc.

34
e2
e1
x1
x2
?x1?
?x2?
e3
?
?x3?
x3
?(x1 , x2 ) ?x1? ?x2?
35
COEFFICIENT ALPHA

?xx 1 - ?2E /?2X
1 - ??2i (1 - ?ii )/?2X ,
since errors are uncorrelated
? k/(k-1)1 - ??s2i / s2C
where C ??xi (composite score)
?s2i variance of subtest ?xi
?sC variance of composite
Does not assume knowledge of subtest ?ii

36
COEFFICIENT ALPHA- NUNNALLYS COEFFICIENT

IF WE KNOW RELIABILITIES OF EACH SUBTEST, ?i
?N K/(K-1)1-?s2i (1- rii )/ s2X
where rii coefficient alpha of each subtest
Willson (1996) showed ? ? ?N ? ?xx

37
NUNNALLYS RELIABILITY CASE
e2
e1
x1
x2
?x1?
?x2?
s1
s2
e3
?
?x3?
x3
s3
?XiXi ?2xi? s2i
38
Reliability Formula for SEM with Multiple factors
(congeneric with subtests)

Single factor model
? ? ?i2 / ? ?i2 ??ii ? ??ij
?gt ?
If eij 0, reduces to
? ? ?i2 / ? ?i2 ??ii Sum(factor
loadings on 1st factor)/ Sum of observed
variances
This generalizes (Bentler, 2004) to the sum of
factor loadings on the 1st factor divided by the
sum of variances and covariances of the factors
for multifactor congeneric tests
Maximal Reliability for Unit-weighted Composites
Peter M. Bentler
University of California, Los Angeles
UCLA Statistics Preprint No. 405
October 7, 2004
http//preprints.stat.ucla.edu/405/MaximalReliabil
ityforUnit-weightedcomposites.pdf

39
Multifactor models and specificity

Specificity is the correlation between two
observed items independent of the true score
Can be considered another factor
Cronbachs alpha can overestimate reliability if
such factors are present
Correlated errors can also result in alpha
overestimating reliability

40
CORRELATED ERROR PROBLEMS
e2
e1
s
x1
x2
?x1?
?x2?
e3
?
?x3?
Specificities can be misinterpreted as a
correlated error model if they are correlated or
a second factor
x3
s3
41
CORRELATED ERROR PROBLEMS
e1
e2
x1
x2
?x1?
?x2?
e3
?
?x3?
Specificieties can be misinterpreted as a
correlated error model if specificities are
correlated or are a second factor
x3
s3
42
SPSS SCALE ANALYSIS

ITEM DATA
EXAMPLE (Likert items, 0-4 scale)
Mean Std Dev Cases
1. CHLDIDEAL (0-8) 2.7029 1.4969
882.0
2. BIRTH CONTROL
PILL OK 2.2959 1.0695
882.0
3. SEXED IN SCHOOL 1.1451 .3524
882.0
4. POL. VIEWS
(CONS-LIB) 4.1349 1.3379
882.0
5. SPANKING OK
IN SCHOOL 2.1111 .8301
882

43
CORRELATIONS

Correlation Matrix
CHLDIDEL PILLOK SEXEDUC
POLVIEWS
CHLDIDEL 1.0000
PILLOK .1074 1.0000
SEXEDUC .1614 .2985 1.0000
POLVIEWS .1016 .2449 .1630
1.0000
SPANKING -.0154 -.0307 -.0901
-.1188

44
SCALE CHARACTERISTICS

Statistics for Mean Variance Std Dev
Variables
Scale 12.3900 7.5798 2.7531
5
Items Mean Minimum Maximum Range
Max/Min Variance
2.4780 1.1451 4.1349 2.9898
3.6109 1.1851
Item Variances
Mean Minimum Maximum Range
Max/Min Variance
1.1976 .1242 2.2408 2.1166
18.0415 .7132
Inter-itemCorrelations
Mean Minimum Maximum Range
Max/Min Variance
.0822 -.1188 .2985 .4173
-2.5130 .0189

45
ITEM-TOTAL STATS

Item-total Statistics
Scale Scale Corrected
Mean Variance Item- Squared
Alpha Total Multiple if item
Correlation R deleted
CHLDIDEAL 9.6871 4.4559 .1397 .0342
.2121
PILLOK 10.0941 5.2204 .2487 .1310
.0961
SEXEDUC 11.2449 6.9593 .2669 .1178
.2099
POLVIEWS 8.2551 4.7918 .1704 .0837
.1652
SPANKING 10.2789 7.3001 -.0913 .0196
.3655

46
ANOVA RESULTS

Analysis of Variance
Source of
Variation Sum of Sq. DF Mean Square F
Prob.
Between People 1335.5664 881 1.5160
Within People 8120.8000 3528 2.3018
Measures 4180.9492 4 1045.2373
934.9 .0000
Residual 3939.8508 3524 1.1180
Total 9456.3664 4409 2.1448

47
RELIABILITY ESTIMATE

Reliability Coefficients 5 items
Alpha .2625 Standardized item alpha
.3093
Standardized means all items parallel

48
RELIABILITY APPLICATIONS
49
STANDARD ERRORS

se standard error of measurement
sx 1 - ?xx? 1/2
can be computed if ?xx? is estimable
provides error band around an observed
score -1.96se x, 1.96se x

50
x
1.96se
-1.96se
ASSUMES ERRORS ARE NORMALLY DISTRIBUTED
51
TRUE SCORE ESTIMATE

?est ?xx? x 1 - ?xx? xmean
example x 90, mean100, rel..9
?est .9 (90) 1 - .9 100 81
10 91

52
STANDARD ERROR OF TRUE SCORE ESTIMATE

S? sx ?xx? 1/2 1 - ?xx? 1/2
Provides estimate of range of likely true scores
for an estimated true score

53
DIFFERENCE SCORES

Difference scores are widely used in education
and psychology Learning disability
Achievement - Predicted Achievement
Gain score from beginning to end of school year
Brain injury is detected by a large discrepancy
in certain IQ scale scores

54
RELIABILITY OF D SCORES

D x - y
s2D s2x s2y - 2rxy sx sy
rDD rxx s2x ryy s2y -2 rxy sx sy / s2x
s2y - 2rxy sx sy

55
REGRESSION DISCREPANCY

D y - ypred
where ypred bx b0
sDD (1 - r2xy )(1- rDD)1/2
where
rDD ryy rxx rxy -2r2xy / 1- r2xy

56
TRUE DISCREPANCY

D b D y.x(y - ymn) bD x.y(x - xmn)
sD b2D y.x b2D x.yn 2(b Dy.x bDx.y rxy
and rDD 2-(rxx-ryy)2 (ryy-rxy)2 -
2(ryy-rxy)(rxx-rxy)r2xy /
(1-rxy)(ryyrxx-2rxy)-1

Write a Comment

User Comments (0)

About PowerShow.com

RELIABILITY PowerPoint PPT Presentation