Generalizability Theory - PowerPoint PPT Presentation

1 / 88
About This Presentation
Title:

Generalizability Theory

Description:

(People:Sex) x (Cartoon:Artist) (P:S) x (C:A) S A (P, PS) (C, CA) SA (SC,SCA) (AP, APS) ... WSFACTOR = artist 3 Polynomial cartoon 15 Polynomial /METHOD = SSTYPE(3) ... – PowerPoint PPT presentation

Number of Views:314
Avg rating:3.0/5.0
Slides: 89
Provided by: michael1175
Category:

less

Transcript and Presenter's Notes

Title: Generalizability Theory


1
Generalizability Theory
2
  • Big Questions . . .
  • How do I get what I want (well, in stats at
    least)?
  • What happens if I overlook a facet?
  • What the heck is a hidden facet and why the
    heck should I care?

3
Judge 1 Judge 2
Judge 3 AM PM AM PM
AM PM Person 1 2 3 1
3 3 5 Person 2 1 2
2 4 4 6 Person 3
2 3 2 4 5
4 Person 4 3 4 3 3
4 6 Person 5 4 5 3
5 5 7 Person 6 4 6
3 3 5 4 Person 7
3 7 4 6 6
7 Person 8 4 7 4 6
5 6 Person 9 3 5 4
7 3 7 Person 10 4 4
4 5 4 4 Person 11
3 5 3 4 5
5 Person 12 3 4 3 2
3 5 Person 13 3 3 2
4 1 2 Person 14 1 2
2 3 2 4 Person 15 2
3 1 2 3
3 Mean 2.80 4.20 2.73 4.07 3.87
5.00 Var. 1.03 2.60 1.07 2.21
1.84 2.29
4
A typical Subjects (Rows) x Measures (Columns)
format. Easiest format for doing most common
statistical analyses that match the default model
assumptions.
5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
(No Transcript)
10
Assumed to be fixed
11
(No Transcript)
12
GLM j1am j1pm j2am j2pm j3am j3pm /WSFACTOR
judges 3 Polynomial time 2 Polynomial /METHOD
SSTYPE(3) /EMMEANS TABLES(OVERALL) /EMMEANS
TABLES(judges) /EMMEANS TABLES(time)
/EMMEANS TABLES(judgestime) /PRINT
DESCRIPTIVE ETASQ OPOWER PARAMETER RSSCP
/CRITERIA ALPHA(.05) /WSDESIGN judges time
judgestime .
Subjects are not explicitly part of the design.
The crossing of subjects with all measures is
assumed given the structure of the data file.
13
(No Transcript)
14
(No Transcript)
15
The most important source of variance in the GT
analysis is considered error in the ANOVA.
16
A very useful matrix to have when subjects are
nested within experimental groups. This residual
matrix removes the effect of any between-subjects
factors.
17
If the judge factor is really random, rather than
the fixed factor assumed here, the F test will be
wrong.
?
Whats wrong with these numbers?
?
?
18
?
19
An alternative data format includes subjects as a
design variable.
20
Even though the data are repeated measures, the
analysis proceeds as if it were a univariate
modelthe repeated measures nature of the data
gets modeled explicitly by including subjects as
part of the design.
21
(No Transcript)
22
UNIANOVA aggress BY subject judge time
/METHOD SSTYPE(3) /INTERCEPT INCLUDE
/EMMEANS TABLES(OVERALL) /DESIGN judge time
subject judgetime judgesubject timesubject
judgetimesubject .
The design statement now includes subject as a
factor and crosses it with the other factors,
producing the repeated measures effects.
23
Now we have the full set of F tests.
Anything odd about this number?
Watch how this one changes across designs.
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
VARCOMP aggress BY subject judge time
/RANDOM subject judge time /METHOD MINQUE
(1) /DESIGN /INTERCEPT INCLUDE .
30
(No Transcript)
31
If the effects really are fixed, they can be
specified that way.
32
UNIANOVA aggress BY judge time subject
/RANDOM subject /METHOD SSTYPE(3)
/INTERCEPT INCLUDE /EMMEANS
TABLES(OVERALL) /PRINT DESCRIPTIVE ETASQ
OPOWER /CRITERIA ALPHA(.05) /DESIGN judge
time subject judgetime judgesubject
timesubject judgetimesubject .
33
(No Transcript)
34

The summary table from the initial analysis in
which effects are assumed to be fixed.
35
(No Transcript)
36
(No Transcript)
37
UNIANOVA aggress BY subject judge time
/RANDOM subject judge time /METHOD
SSTYPE(3) /INTERCEPT INCLUDE /EMMEANS
TABLES(OVERALL) /PRINT DESCRIPTIVE ETASQ
OPOWER /CRITERIA ALPHA(.05) /DESIGN
subject time subjecttime judge(time)
subjectjudge(time) .
38
(No Transcript)
39
VARCOMP aggress BY subject judge time
/RANDOM subject judge time /METHOD MINQUE
(1) /DESIGN subject time subjecttime
judge(time) subjectjudge(time) /INTERCEPT
INCLUDE .
40
(No Transcript)
41
(No Transcript)
42
UNIANOVA aggress BY subject judge time
/RANDOM subject judge time /METHOD
SSTYPE(3) /INTERCEPT INCLUDE /EMMEANS
TABLES(OVERALL) /PRINT DESCRIPTIVE ETASQ
OPOWER /CRITERIA ALPHA(.05) /DESIGN
subject time(subject) judge(time(subject)).
43
(No Transcript)
44
VARCOMP aggress BY subject judge time
/RANDOM subject judge time /METHOD MINQUE
(1) /DESIGN subject time(subject)
judge(time(subject)) /INTERCEPT INCLUDE .
45
(No Transcript)
46
(No Transcript)
47
UNIANOVA aggress BY subject judge time
/RANDOM subject judge time /METHOD
SSTYPE(3) /INTERCEPT INCLUDE /EMMEANS
TABLES(OVERALL) /PRINT DESCRIPTIVE ETASQ
OPOWER /CRITERIA ALPHA(.05) /DESIGN
subject time(subject) judge(subject)
judgetime(subject) .
48
(No Transcript)
49
VARCOMP aggress BY subject judge time
/RANDOM subject judge time /METHOD MINQUE
(1) /DESIGN subject time(subject)
judge(subject) judgetime(subject)
/INTERCEPT INCLUDE .
50
(No Transcript)
51
(PeopleSex) x (CartoonArtist) (PS) x (CA)
S A (P,
PS) (C, CA) SA
(SC,SCA) (AP, APS) (PC,
PCA, PSC, PSCA)


52
VARCOMP humor BY subject cartoon artist sex
/RANDOM subject cartoon artist /METHOD
MINQUE (1) /DESIGN sex subject(sex) artist
cartoon(artist) sexartist sexcartoon(artist)
artistsubject(sex) /INTERCEPT INCLUDE .
We can specify every component but the highest
order componentit will be modeled as the
residual.
53
The effect of sex is trivial
54
VARCOMP humor BY subject cartoon artist
/RANDOM subject cartoon artist /METHOD
MINQUE (1) /DESIGN subject artist
cartoon(artist) subjectartist
subjectcartoon(artist) /INTERCEPT INCLUDE .
55
Why the slight differences from the components
estimated before?
56
When data are modeled as a repeated measures
design, complete data are required. If any
participant is missing any measure, that
participant is eliminated from the entire
analysis (listwise deletion). When data are
modeled as a univaritate design with subjects
as a factor, all data can be included.
57
(No Transcript)
58
GLM w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12 w13
w14 w15 h1 h2 h3 h4 h5 h6 h7 h8 h9 h10 h11 h12
h13 h14 h15 l1 l2 l3 l4 l5 l6 l7 l8 l9 l10 l11
l12 l13 l14 l15 /WSFACTOR artist 3 Polynomial
cartoon 15 Polynomial /METHOD SSTYPE(3)
/EMMEANS TABLES(OVERALL) /EMMEANS
TABLES(artist) /EMMEANS TABLES(cartoon)
/EMMEANS TABLES(artistcartoon) /PRINT
DESCRIPTIVE ETASQ PARAMETER /CRITERIA
ALPHA(.05) /WSDESIGN artist cartoon
artistcartoon .
59
(No Transcript)
60
We lose a case due to missing data
61
UNIANOVA humor BY subject cartoon artist
/RANDOM subject cartoon artist /METHOD
SSTYPE(3) /INTERCEPT INCLUDE /CRITERIA
ALPHA(.05) /DESIGN subject artist
subjectartist cartoon(artist)
subjectcartoon(artist) .
62
(No Transcript)
63
All cases are included.
64
What if a facet is omitted? In the playground
study, the researchers might have intended to
have a simple P x J design and had each judge
produce two ratings so that the People x Judges
interaction could be separated from random error.
But, if the two ratings contain a systematic
source of variance, failing to model it could
affect estimation of all variance components.
65
VARCOMP aggress BY subject judge /RANDOM
subject judge /METHOD MINQUE (1) /DESIGN
/INTERCEPT INCLUDE .
66
Not a trivial value and an indication of model
misspecification.
67
When the model changes, information is not lost
or gainedit simply gets redistributed. The best
place to keep track of information is in the sums
of squares.
68
(No Transcript)
69
UNIANOVA aggress BY subject judge /RANDOM
subject judge /METHOD SSTYPE(3) /INTERCEPT
INCLUDE /EMMEANS TABLES(OVERALL) /PRINT
DESCRIPTIVE ETASQ OPOWER /CRITERIA
ALPHA(.05) /DESIGN subject judge
subjectjudge .
70
In this design, we now have a residual error term
that is distinguishable from the Subject x Judge
interaction. The residual must contain the
missing information.
71
(No Transcript)
72
A subtle and important problem in the proper
application of generalizability theory is the
potential presence of hidden facets. A facet is
hidden when it has no opportunity to emerge in a
G Study but is very likely to be present in
future applications. Its absence in the G Study
keeps it from being properly taken into account,
biasing the anticipated generalizability
indicated by a D study.
73
Example A researcher constructs a 10-item
attitudes about social security reform
questionnaire and calculates coefficient alpha as
an indicator of reliability. The measure will be
used in a study investigating the effectiveness
of different persuasion tactics that might be
used to enact changes to the social security
system.
74
The generalizability theory approach to this
problem would be to estimate variance components
for a People x Item design and calculate the
following generalizability coefficient
75
The research problem suggests that the measure
will be used to indicate change due to treatment,
in which case there is a hidden facettime. The
measure has only been administered once, so
time has not been given an opportunity to
influence generalizability. Its influence will
not be fairly taken into account. Had it been
included, the generalizability coefficient would
have changed and included more variance
components.
76
If time is considered random, then the
coefficient that the researcher really wants is
this one
77
The coefficient actually calculated will be an
overestimate of the coefficient that is desired
if the variance components involving time are
greater than zero.
Why is nt assumed to be 1?
78
But, wait, what about those items? Are they kept
constant? The same ones used each time? Then
items should be considered fixed. The desired
coefficient is really
79
ni is assumed to be 10 for this example.
80
What if time is considered fixedonly two times
will be testedbut different collections of items
will be used each time (alternative forms)? The
desired coefficient is then
Again, potentially quite different from the one
actually calculated . . .
81
nt is assumed to be 1 in this case because
aggregation will not occur over time.
82
If time is not of interest, setting most
certainly is. A measure developed in one setting
most likely will be used in many other settings
(lab, work, home, school). Not taking that hidden
facet into account could produce quite mistaken
views about reliability.
Random
Fixed
83
Generalizability theory is easily applied to
interrater reliability, but the nature of the
facets needs to be considered carefully. Case 1
Raters assess the performance of subjects on the
same task. How reliable are the raters? What are
the facets? How are they best conceptualized?
People x Raters (x Task) with Task fixed.
84
nt 1 If nr 1, the formula gives the
reliability of a single rater. If nr 2, it
gives the reliability of a 2 rater composite.
85
Case 2 Raters assess the performance of
subjects, who perform a task of their choice
(e.g., an essay topic). How reliable are the
raters? What are the facets? How are they best
conceptualized?
(TaskPeople) x Raters with Task random.
86
nt 1 If nr 1, the formula gives the
reliability of a single rater. If nr 2, it
gives the reliability of a 2 rater composite.
87
A key point that ties together all measurement
theories is the importance of correctly
specifying the model. That is especially clear in
Generalizability Theory, where the consequences
of different model choices can be explicitly
explored.
88
On to research design, where the epic battle of
good (truth) versus evil (error) is waged on a
different front . That battle cannot be fought,
however, without good measurement.
Write a Comment
User Comments (0)
About PowerShow.com