Item Response Modeling in Behavioral Research - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Item Response Modeling in Behavioral Research

Description:

Item Response Modeling in Behavioral Research. Diane Allen, Mark ... The Rasch Model: Polytomous Function. The Data. Courtesy of Behavior Change Consortium ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 44
Provided by: markwi4
Category:

less

Transcript and Presenter's Notes

Title: Item Response Modeling in Behavioral Research


1
Item Response Modeling in Behavioral Research
  • Diane Allen, Mark Wilson,
  • and Jun Corser Li
  • University of California, Berkeley
  • March, 2005

2
Outline
  • Introduction
  • The Data
  • Results for the Self-Efficacy Scale
  • Comparison with Classical Test Theory
  • Further Work with IRM
  • Conclusion

3
Item Response Models Connections
  • theory of test/instrument scores (CTT)
  • content referencing (e.g., Guttman)
  • IRM

4
CTT vs IRM Equations
  • CTT
  • X T E
  • IRM (Rasch)

or
5
CTT vs IRM Issues
  • CTT
  • Confounding of instrument and respondents
  • Assumption of linearity of scores
  • IRM
  • Model needs to fit/allows one to select models
  • Comment IRM addresses CTT issues

6
The Rasch Model Idea
?
?i
?i
?
?i
?
7
The Rasch Model Graph
8
The Rasch Model Dichotomous Function
9
The Rasch Model Polytomous Function
10
The Data
  • Courtesy of Behavior Change Consortium
  • Multiple data sources
  • (Ory, Jordan, Bazzarre, 2002)
  • Stanford, OHSU, UT, U of Rochester, IIT
  • Multiple behaviors/interventions
  • exercise, diet, smoking

11
Scales for Mediators of Changed Behavior
  • self-efficacy scale
  • self-determination scale
  • decisional balance scale

12
Self-efficacy (SE) Scale for exercise
  • a specific belief in ones ability to perform a
    particular behavior (Garcia King, 1991, p.
    396)
  • 14 items that express the certainty the
    respondent has that he or she could exercise
    under various adverse conditions (see next slide)
  • Respondents rate each item in 10 increments from
    0 indicating I cannot do it at all to 100
    indicating certain that I can do it

13
Self-Efficacy Items
14
Self-Determination Scale
  • Assesses the motivating factors for pursuing a
    particular behavior
  • a person who is self-determined has autonomous
    reasons for behaving
  • 15 items
  • Respondents rate how true a statement is, from 1
    not at all to 7 very

15
Self-Determination Items, Examples
16
Decisional Balance (DB) Scale
  • Examines how people think about exercise
  • Ten items that acknowledge positive aspects of
    exercise (pros)
  • Six items that focus on the negative aspects
    (cons)
  • Respondents rate importance of statement 1 not
    at all to 5 extremely
  • Score is calculated by subtracting the cons total
    from the pros total

17
Decisional Balance Items, Examples
  • I would feel more comfortable with my body if I
    exercised regularly
  • Regular exercise would help me have a more
    positive outlook on life
  • I think I would be too tired to do my daily work
    after exercising
  • Regular exercise would help me relieve tension
  • I would find it difficult to find an exercise
    activity that I enjoy that is not affected by bad
    weather

18
SE Scale results
  • 11 categories--10 thresholds
  • Wright map

19
(No Transcript)
20
Standard Error of Measurement
21
Standard Errors of Measurement
22
Model fit
23
Framework for Comparison
  • Standards for Educational and
  • Psychological Tests
  • (AERA/APA/NCME, 1999)

24
Choosing a Model
  • CTT
  • same model always
  • IRM
  • Different models fit persons and items better
  • may be informative
  • Alternative models allow exploration of
    measurement implications

25
Choosing a Model Partial Credit Model vs.
Rating Scale Model
  • RSM constrains all thresholds to same relative
    distances apart for every item.
  • Likelihood ratio test for SE Scale
  • c2 336.23 (df117), p lt .0001
  • Effect size (real difference)

26
(No Transcript)
27
Reliability Reliability Coefficients
  • CTT
  • Cronbachs ? .91.
  • IRM
  • MML reliability .92.
  • Comment
  • usually similar except under missing data contexts

28
ReliabilityStandard Errors of Measurement
  • CTT Constant value 7.66
  • IRM

29
Validity Based on Instrument Content
  • CTT
  • Contributes little
  • IRM
  • Can contribute a lot (cf. work of Wright et al.)
  • Comment
  • SE Scale not a good example of content validity

30
High Self-Efficacy
Low Self-Efficacy
31
ValidityBased on Response Process
  • Respondents react to the instrument as projected.
  • Sources think-alouds exit interviews
  • No differences in CTT and IRM usage
  • Potential uses of IRM may emerge
  • Comment No response processes with SE Scale data

32
ValidityBased on Internal Structure 1
Structure of Construct
  • CTT
  • no usage
  • IRM
  • Well-established methodology for relating
    theoretical construct to parameters in Wright
    maps.
  • Comment
  • SE Scale not a good example of construct validity

33
ValidityBased on Internal Structure 2 Item
Analysis
  • CTT
  • item discrimination index
  • for categories, point biserial correlations
  • IRM
  • means of respondents who chose each category

34
CTT Point-biserial Correlations
35
IRM Mean of Respondent Locations for Each
Category
36
Validity Based on Internal Structure3
Differential Item Functioning
  • DIF occurs when respondents in different groups,
    but with the same location, have different
    probabilities of positive response on an item
  • CTT no contribution (but could use, say,
    logistic regression on raw scores--ignoring
    measurement results)

37
Validity Internal Structure DIF--Continued
  • IRM Add interaction parameter between item i and
    group g, gig, to the equation

  • (? - ?i ?ig)

  • e
  • Probability (Xi 1? ?, ?i, ?ig)

  • (? - ?i ?ig)

  • 1 e
  • Test for statistical significance and effect size
    of DIF for Gender in SE Scale
  • Overall c2 13.021 (df14), p gt .5

38
ValidityBased on Other Variables
  • CTT Many external validity studies available for
    SE Scale
  • IRM Would give very similar results

39
ValidityBased on Consequences
  • Use of the instrument led to the projected
    consequences.
  • CTT and IRM Similar usage

40
Results for the SE Scale
  • Aligned with some but not all Standards
  • model
  • aspects of reliability
  • aspects of validity
  • Positive features include
  • categories cover respondents well, and behave
    well
  • no threat from DIF (for gender)
  • Recommend
  • incorporating meaningful category labels
  • interpreting results at extremes with caution

41
Results of Comparing CTT and IRM
  • Three types
  • Similar usage and results
  • reliability coefficients, external validity
  • Not much usage currently, neutral results
  • response process, consequential validity
  • IRM used much more, extended results
  • choosing a model, standard error of measurement,
    content validity, construct validity

42
Further Work with IRM
  • Equating
  • self-determination scale
  • two diverse groups
  • simulation study, comparing the effect of
    different numbers of overlapping items
  • Multi-dimensional analyses
  • SD and DB scales
  • better fit, more information for researcher
  • improved reliability with few items

43
Conclusion
  • IRM has strengths that can benefit behavioral
    researchers
  • refinement of construct
  • dimensionality
  • different models
  • aligning persons and items on same scale
  • item and person specific standard error of
    measurement
Write a Comment
User Comments (0)
About PowerShow.com