Scale development Theory and applications - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Scale development Theory and applications

Description:

Not as easy as it sounds, even on ability tests, e.g., writing samples on the WJ ... notions that are considered stable and intrinsic properties of human beings) ... – PowerPoint PPT presentation

Number of Views:233
Avg rating:3.0/5.0
Slides: 26
Provided by: samu49
Category:

less

Transcript and Presenter's Notes

Title: Scale development Theory and applications


1
Scale developmentTheory and applications
  • Chapter 7
  • Item Response Theory

Samuel O. Ortiz, Ph.D. St. Johns University
2
Item Response Theory (IRT)
  • Is an alternative to classical measurement theory
    (CMT) or classical test theory (CTT)
  • In CMT, observed score is the result of
    repondents true score plus error
  • In CMT error is not differentiated but rather
    collected in a single error term
  • In IRT, error is differentiated more finely,
    particularly with respect to item characteristics

3
Item Response Theory (IRT)
  • In CMT, focus is on composites, or scales
  • In IRT, focus is on individual items and their
    characteristics
  • IRT used mainly for ability tests (e.g., SAT)
    dichotomous responses but can be applied to other
    domains.
  • In CMT items are aggregated to gain
    reliabilityachieved by redundancy
  • In IRT items are individually evaluated in the
    search for better items, better relationship to
    the attribute.
  • More IRT items increase ability to differentiate
    levels of the attribute but doesnt increase
    reliability

4
Item Response Theory (IRT)
  • In CMT items share a common causesame in IRTand
    are thus similar to each other
  • But IRT items are designed to tap different
    degrees or levels of the attribute
  • Seeks to establish certain characteristics of
    items irrespective of who completes themlike a
    scale that measures weight or a ruler that
    measures inches

5
Item Response Theory (IRT)
  • Different Models
  • Main difference is in number of parameters of
    concern
  • Most common is three parameter model item
    difficulty, capacity to discriminate, and
    susceptibility to false positives
  • Rasch scaling concerned only with item
    difficulty

6
Item Response Theory (IRT)
  • Item Difficulty
  • Refers to level of the attribute being measured
    that is associated with a transition from
    failing to passing that item
  • Idea is to construct items with different
    degrees of difficulty
  • Should be able to calibrate difficulty of items
    independent of who is responding

7
Item Response Theory (IRT)
  • Item Difficulty
  • If accomplished, then passing the item
    represents and absolute level of the attribute
    needed and it has a constant meaning with respect
    to the attributethat is, you know what amount of
    the attribute is required
  • The attribute is assessed via a common metric
    that is not subject to individual differences
    other than the variable of interest.

8
Item Response Theory (IRT)
  • Item Discrimination
  • Refers to degree to which an item unambiguously
    classifies a response as a pass or fail
  • The less ambiguity about whether someone passed
    or failed, the higher the item discrimination
  • Not as easy as it sounds, even on ability tests,
    e.g., writing samples on the WJ III

9
Item Response Theory (IRT)
  • Item Discrimination
  • An item that discriminates very well has a very
    narrow portion of the range of the phenomenon of
    interest in which the results are ambiguous.
  • A less discriminating item has a larger region
    of ambiguity (similar to issue of reliability,
    but not exact)

10
Item Response Theory (IRT)
  • False Positives
  • Is a response indicating that some characteristic
    or degree of an attribute exists when in
    actuality it does not.
  • Difficult to do on an ability test where answer
    is either correct or not, but could happen to
    some extent with guessing or luck on certain
    tests (blocks falling together in the right place
    on Block Design or guessing answers to arithmetic
    questions)
  • In cases where guessing or false positives are
    not an issue (measuring weight) a 2 parameter
    model may be enough (difficulty and
    discrimination)

11
Item Response Theory (IRT)
  • Summary
  • The parameters represent measurement error
  • difficulty of the item is inappropriate (too hard
    or too easy)
  • the area of ambiguity between a pass and a fail
    is too large
  • the item indicates that the attribute is present
    when it really is not
  • IRT quantifies these sources of error so that
    items can be selected that will perform well in a
    given context

12
Item Response Theory (IRT)
  • Item Characteristic Curves (ICC)
  • When parameters are quantified, item
    characteristic curves are formed as a graphical
    summary of them
  • X axis typically represents the strength of the
    characteristic or attribute
  • Y axis represents probability of passing the
    item
  • Easiest to see when comparing curves

13
Item Response Theory (IRT)
  • Difficulty point at which 50 pass is
    differenta factual difference (B is harder, more
    difficult)

100
B
A
80
65
50
LIKELIHOOD OF PASSING
40
25
10
0
STRENGTH OF ATTRIBUTE
14
Item Response Theory (IRT)
  • Discrimination A steeper slope around the 50
    pass point means smaller increase in attribute
    leads to passing
  • Thus, because region for A (blue) is smaller than
    region for B (green), less ambiguity means A
    discriminates better

wide region
narrow region
A
100
B
80
65
50
LIKELIHOOD OF PASSING
40
25
10
0
STRENGTH OF ATTRIBUTE
15
Item Response Theory (IRT)
  • False Positives is the point at which curves
    intersect Y axis indicating the lowest percent of
    passes at zero level of the attribute
  • Thus, a false positive is the probability of
    passing without having any of the attribute
  • Lower is better thus, A has less false positives
    than B and is a better item

100
B
A
80
65
50
Intercept B
17
LIKELIHOOD OF PASSING
40
Intercept A
5
25
10
0
STRENGTH OF ATTRIBUTE
16
Item Response Theory (IRT)
  • Additional Issues in IRT
  • Utility of IRT is that items can be made for
    special groups or populationsmatching the
    parameters of performance to the expected levels
    of the attribute of interest.
  • High-stakes decision making should use items with
    better discrimination (low ambiguity) and better
    (lower) rates of false positive responses.

17
Item Response Theory (IRT)
  • Differences between IRT and CMT
  • In CMT, we know if an item performs well or
    poorly, but dont know reasons why.
  • In IRT, we can pinpoint the nature of an items
    deficiencies or its strengths and weaknesses as
    compared to other items.
  • Like CMT, IRT does not determine characteristics
    of itemsonly quantifies them.
  • The work of developing good items still rests on
    the researcher. IRT doesnt write good items or
    make bad ones better.

18
Item Response Theory (IRT)
  • Differences between IRT and CMT
  • CMT trades simplicity in development for a more
    general notion of error. IRT gains precision in
    nature of error but gains complexity. Not easy to
    do, generally speaking.
  • Item characteristics must not be associated with
    other attribute-independent sample
    characteristics such as gender, age, or other
    variables that SHOULD be uncorrelated to the one
    being measured. Same assumption in CMTeffect of
    a single latent variable.

19
Item Response Theory (IRT)
  • Differences between IRT and CMT
  • Difficult to start out with IRT since true level
    of the attribute theta (-) is unknown. Need to
    use lots of people, going back and forth to test
    items and reveal nature of attribute.
  • Developing anchoring itemsthose that perform
    equivalently across groups can serve as
    calibration points.

20
Item Response Theory (IRT)
  • When to use IRT
  • Hierarchical Items In CMT items, are roughly
    parallel. In reality, may not be the case. So in
    cases where hierarchical phenomena are of
    interest, and IRT model may be best
  • I can ambulate independently
  • I can ambulate only with an assistive device
  • I cannot walk at all

21
Item Response Theory (IRT)
  • When to use IRT
  • Items are not parallelanswer of yes to c
    means a and b cannot be answered yes also. Not
    the same as hierarchical responses, e.g., on
    Likert scalesresponse options should lead to
    similar ratings or same level of attribute. IRT
    is more similar to Guttman or Thurstone scaling.
  • Another advantage of IRT with hierarchical items
    is possibility of developing item banks
    tailored to specific ranges of attribute or
    ability or development (e.g., age or grade level)

22
Item Response Theory (IRT)
  • When to use IRT
  • By focusing on an appropriate attribute level,
    items can be selected that are within the
    individuals range that can discriminate best.
    Eliminates need to give all out-of-range items
    (reflected as basals and ceilings on many tests).
  • Psychological variables not typically assessed
    via IRT, except for cognitive abilities/intelligen
    ce. But there may be some that are well-suited to
    IRT techniques, e.g., self-efficacy (or other
    notions that are considered stable and intrinsic
    properties of human beings). But even such
    variables can be measured quite well by CMT.

23
Item Response Theory (IRT)
  • Differential Item Functioning (DIF)
  • Useful when research questions indicate that its
    necessary to distinguish differences in group
    membership and item characteristics.
  • Mostly it boils down to figuring out why an item
    performs differently across groups that are
    actually equivalent in the attribute being
    assessed.
  • We want the items to be stable and perform the
    same across groups, but empirical verification
    may be necessary when groups differ on some
    characteristics such as age, grade (DeVellis
    mentions culture also but doesnt explain
    muchlikely that English language proficiency is
    another variable here).

24
Item Response Theory (IRT)
  • Differential Item Functioning (DIF)
  • Note that the identification of DIF can be
    interpreted in two waysthe item is flawed (due
    to the influence of another co-variable) or that
    the two groups do actually differ on the
    attribute being measured.
  • Often hierarchical items and DIF analysis are
    combined in assessmenteducational assessment,
    health outcomes or other cases where its
    important to differentiate true group differences
    for variables where endpoints or hierarchies are
    important.

25
Item Response Theory (IRT)
  • Conclusion
  • Both IRT and CMT continue to be useful. One not
    necessarily better than the other.
  • Just because an item performs well, doesnt mean
    it will.
  • According to DeVellis, Having credible
    independent knowledge of the attributes being
    measured is a requirement of IRT that is
    difficult to satisfy strictly but that can be
    very adequately approximated with repeated
    testing of large and heterogeneous samples.
Write a Comment
User Comments (0)
About PowerShow.com