Item Response Modeling in Behavioral Research - PowerPoint PPT Presentation

1 / 43

About This Presentation

Title:

Item Response Modeling in Behavioral Research

Description:

Item Response Modeling in Behavioral Research. Diane Allen, Mark ... The Rasch Model: Polytomous Function. The Data. Courtesy of Behavior Change Consortium ... – PowerPoint PPT presentation

Number of Views:67

Avg rating:3.0/5.0

Slides: 44

Provided by: markwi4

Category:

more less

Transcript and Presenter's Notes

Title: Item Response Modeling in Behavioral Research

1
Item Response Modeling in Behavioral Research

Diane Allen, Mark Wilson,
and Jun Corser Li
University of California, Berkeley
March, 2005

2
Outline

Introduction
The Data
Results for the Self-Efficacy Scale
Comparison with Classical Test Theory
Further Work with IRM
Conclusion

3
Item Response Models Connections

theory of test/instrument scores (CTT)
content referencing (e.g., Guttman)
IRM

4
CTT vs IRM Equations

CTT
X T E
IRM (Rasch)

or
5
CTT vs IRM Issues

CTT
Confounding of instrument and respondents
Assumption of linearity of scores
IRM
Model needs to fit/allows one to select models
Comment IRM addresses CTT issues

6
The Rasch Model Idea
?
?i
?i
?
?i
?
7
The Rasch Model Graph
8
The Rasch Model Dichotomous Function
9
The Rasch Model Polytomous Function
10
The Data

Courtesy of Behavior Change Consortium
Multiple data sources
(Ory, Jordan, Bazzarre, 2002)
Stanford, OHSU, UT, U of Rochester, IIT
Multiple behaviors/interventions
exercise, diet, smoking

11
Scales for Mediators of Changed Behavior

self-efficacy scale
self-determination scale
decisional balance scale

12
Self-efficacy (SE) Scale for exercise

a specific belief in ones ability to perform a
particular behavior (Garcia King, 1991, p.
396)
14 items that express the certainty the
respondent has that he or she could exercise
under various adverse conditions (see next slide)
Respondents rate each item in 10 increments from
0 indicating I cannot do it at all to 100
indicating certain that I can do it

13
Self-Efficacy Items
14
Self-Determination Scale

Assesses the motivating factors for pursuing a
particular behavior
a person who is self-determined has autonomous
reasons for behaving
15 items
Respondents rate how true a statement is, from 1
not at all to 7 very

15
Self-Determination Items, Examples
16
Decisional Balance (DB) Scale

Examines how people think about exercise
Ten items that acknowledge positive aspects of
exercise (pros)
Six items that focus on the negative aspects
(cons)
Respondents rate importance of statement 1 not
at all to 5 extremely
Score is calculated by subtracting the cons total
from the pros total

17
Decisional Balance Items, Examples

I would feel more comfortable with my body if I
exercised regularly
Regular exercise would help me have a more
positive outlook on life
I think I would be too tired to do my daily work
after exercising
Regular exercise would help me relieve tension
I would find it difficult to find an exercise
activity that I enjoy that is not affected by bad
weather

18
SE Scale results

11 categories--10 thresholds
Wright map

19
(No Transcript)
20
Standard Error of Measurement
21
Standard Errors of Measurement
22
Model fit
23
Framework for Comparison

Standards for Educational and
Psychological Tests
(AERA/APA/NCME, 1999)

24
Choosing a Model

CTT
same model always
IRM
Different models fit persons and items better
may be informative
Alternative models allow exploration of
measurement implications

25
Choosing a Model Partial Credit Model vs.
Rating Scale Model

RSM constrains all thresholds to same relative
distances apart for every item.
Likelihood ratio test for SE Scale
c2 336.23 (df117), p lt .0001
Effect size (real difference)

26
(No Transcript)
27
Reliability Reliability Coefficients

CTT
Cronbachs ? .91.
IRM
MML reliability .92.
Comment
usually similar except under missing data contexts

28
ReliabilityStandard Errors of Measurement

CTT Constant value 7.66
IRM

29
Validity Based on Instrument Content

CTT
Contributes little
IRM
Can contribute a lot (cf. work of Wright et al.)
Comment
SE Scale not a good example of content validity

30
High Self-Efficacy
Low Self-Efficacy
31
ValidityBased on Response Process

Respondents react to the instrument as projected.
Sources think-alouds exit interviews
No differences in CTT and IRM usage
Potential uses of IRM may emerge
Comment No response processes with SE Scale data

32
ValidityBased on Internal Structure 1
Structure of Construct

CTT
no usage
IRM
Well-established methodology for relating
theoretical construct to parameters in Wright
maps.
Comment
SE Scale not a good example of construct validity

33
ValidityBased on Internal Structure 2 Item
Analysis

CTT
item discrimination index
for categories, point biserial correlations
IRM
means of respondents who chose each category

34
CTT Point-biserial Correlations
35
IRM Mean of Respondent Locations for Each
Category
36
Validity Based on Internal Structure3
Differential Item Functioning

DIF occurs when respondents in different groups,
but with the same location, have different
probabilities of positive response on an item
CTT no contribution (but could use, say,
logistic regression on raw scores--ignoring
measurement results)

37
Validity Internal Structure DIF--Continued

IRM Add interaction parameter between item i and
group g, gig, to the equation
(? - ?i ?ig)
e
Probability (Xi 1? ?, ?i, ?ig)
(? - ?i ?ig)
1 e
Test for statistical significance and effect size
of DIF for Gender in SE Scale
Overall c2 13.021 (df14), p gt .5

38
ValidityBased on Other Variables

CTT Many external validity studies available for
SE Scale
IRM Would give very similar results

39
ValidityBased on Consequences

Use of the instrument led to the projected
consequences.
CTT and IRM Similar usage

40
Results for the SE Scale

Aligned with some but not all Standards
model
aspects of reliability
aspects of validity
Positive features include
categories cover respondents well, and behave
well
no threat from DIF (for gender)
Recommend
incorporating meaningful category labels
interpreting results at extremes with caution

41
Results of Comparing CTT and IRM

Three types
Similar usage and results
reliability coefficients, external validity
Not much usage currently, neutral results
response process, consequential validity
IRM used much more, extended results
choosing a model, standard error of measurement,
content validity, construct validity

42
Further Work with IRM

Equating
self-determination scale
two diverse groups
simulation study, comparing the effect of
different numbers of overlapping items
Multi-dimensional analyses
SD and DB scales
better fit, more information for researcher
improved reliability with few items

43
Conclusion