Item Response Models - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Item Response Models

Description:

Item Response Models – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 16
Provided by: golds97
Category:

less

Transcript and Presenter's Notes

Title: Item Response Models


1
Item Response Models
  • Assumptions
  • Basic statistical models
  • Caveats and interpretations

Harvey Goldstein University of Bristol h.goldstein
_at_bristol.ac.uk
2
Basic assumptions
  • Consider a test with p binary (correct/incorrect)
    responses
  • Each item is assumed to reflect one or more
    underlying (latent) dimensions of achievement
    or ability ..
  • So
  • Let us start with an assumed 1-dimensional test,
    say of mathematics with 40 items.
  • How do we get a value (score) on the mathematics
    scale from a set of 40 (1/0) responses from each
    individual?
  • Well. we set up a model

3
Some simple models
  • First some basic notation (following Goldstein
    and Wood, 1989)
  • Let represent the latent (factor) score
    for individual j. Let be the probability
    that individual j responds correctly to item i.
  • Then a simple item response model is

This is just a binary response factor analysis
model.
Goldstein, H. and Wood, R. (1989). Five decades
of item response modelling. British Journal of
mathematical and statistical psychology 42
139-167.
4
A potted history
  • Lawley (1944) really started it off.
  • Lord (1980) promoted the term item response
    theory as opposed to classical item analysis
  • Now IRT is the standard procedure for test
    construction
  • Note that the theory is statistical not
    substantive.
  • Technical elaborations include
  • parameters for guessing
  • Partial credit (degrees of correctness) responses
  • Multidimensional models
  • BUT the workhorse is still the Lord model (with
    the factor assumed to be a random rather than
    fixed variable), as follows

Lord, F. M. (1980). Applications of item response
theory to practical testing problems. Hillsdale,
New Jersey, Lawrence Erlbaum Associates Lawley,
D. N. (1943). The application of the maximum
likelihood method to factor analysis. British
Journal of Psychology 33 172-175.
5
Classical item analysis
  • This is really an item response model (IRM)
  • A reasonable (consistent) estimate of (a
    random variable - so in red) is given by the raw
    score i.e. percentage (or total) of correct
    items.
  • A somewhat more efficient estimate is given by a
    weighted percentage, using the as weights.
  • The Lord model is simply

6
Item response relationships

For a single item in a test
7
The Rasch model
  • As used in PISA for example
  • Here the discrimination (roughly the
    correlation between the response for an item and
    the factor value) is assumed to be the same for
    each item
  • The resulting (maximum likelihood) factor score
    estimates are then a 1 1 transformation of the
    raw scores.
  • So Rasch Model is a special case and will often
    (e.g. in PISA) not fit the data very well.

8
What are the advantages of modelling?
  • We can add further predictors, for example social
    background, that may mediate the relationships.
  • If we can rely on the model then we will obtain
    efficient estimates for each individuals factor
    value.
  • Item response practitioners go further
  • If we assume that the item parameters (
    ) are the same across populations, and, for
    example, tests, then we can form common scales
    for different populations and different tests.

9
Applications
  • Consider the case of different populations - or
    the same population at different times.
  • Suppose that we require different tests for each
    population (e.g. for confidentiality reasons) but
    some items are common (say 15).
  • These items are assumed to retain the same
    parameter values in each population, and this
    means we can equate the tests to provide a common
    scale the parameter values of the non-common
    items are determined by linking them to those
    of the common items.

10
Caveats
  • When linking different tests over time the
    parameter constancy assumption is difficult to
    test, and typically remains an assumption.
  • In some cases, e.g. the NAEP reading anomaly, the
    assumption can be falsified.

11
Item analysis
  • IRMs also used to check items. Those that dont
    fit the model being used are candidates for
    removal.
  • A problem with this is that what remains conforms
    to the model but if the model cannot be relied on
    to describe reality we may be losing important
    information.
  • One way a model may be misspecified is because
    the reality is multidimensional.

12
Multidimensional IRMs
  • We can generalise our logistic model as follows
  • Adding a further factor allows an individual to
    be characterised by two underlying traits.
  • A sensible analysis will explore the
    dimensionality structure of a set of item
    responses
  • Assumptions are needed, for example that factors
    are independent, or alternatively that they are
    correlated but each item has a non-zero
    coeffcient (loading) on only 1 factor or an
    intermediate assumption.
  • What are the consequences of a more complex
    structure?

13
  • It allows a more faithful representation of
    multi-faceted achievement.
  • It allows the (multidimensional) structure of
    achievement to be compared among groups or
    populations.in the following ways
  • The correlations between factors can vary
  • The values of loadings can vary
  • The factor scores can be allowed to depend on
    further variables such as gender and the
    resulting regressions may vary. For example

With extensions to multilevel modelling etc. a
structural equation model.
14
Another assumption and an extension
  • In all these models we have to assume
    conditional independence that is that for any
    given individual the response to an item depends
    only on the model parameters and is independent
    of the responses to other items.
  • This may break down in several ways and is a
    persistent problem with these models.
  • One violation is where a series of
    (correct/incorrect) responses relate to the same
    question scenario. In such cases we can
    reformulate the set of responses as an ordered
    (partial credit) response and such response types
    are easily incorporated into the model.

15
Conclusions
  • Formalising a test in terms of an underlying
    model helps to clarify what is being measured.
  • These models can incorporate group differences
    and general dependencies that can be explored and
    efficient and valid statistical analyses
    undertaken.
  • The full complexity of achievement responses can
    be summarised in a small number of parameters
    using a full structural (and multilevel) approach
    without the need to adopt a very simple model
    such as the Rasch model.
  • If we wish to make the necessary assumptions and
    carry out equating this can be done more
    realistically using a multidimensional model.
Write a Comment
User Comments (0)
About PowerShow.com