Item Analysis: Classical and Beyond - PowerPoint PPT Presentation

About This Presentation
Title:

Item Analysis: Classical and Beyond

Description:

Classical Test Theory Discrimination. The discrimination of an item is the ... Differences in the discrimination of questions can lead to differences in the ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 19
Provided by: mhairim
Learn more at: https://www.uky.edu
Category:

less

Transcript and Presenter's Notes

Title: Item Analysis: Classical and Beyond


1
Item Analysis Classical and Beyond
  • SCROLLA SymposiumMeasurement Theory and Item
    Analysis
  • Modified for EPE 773 by Kelly Bradley on
    September 3, 2006

2
Why is item analysis relevant?
  • Item analysis provides a way of measuring the
    quality of questions - seeing how appropriate
    they were for the respondents how well they
    measured their ability.
  • Item analysis also provides a way of re-using
    items over and over again in different
    instruments with prior knowledge of how they are
    going to perform.

3
What kinds of item analysis are there?
  • Item Analysis

Classical
Latent Trait Models
Rasch
Item Response theory
IRT1 IRT2 IRT3 IRT4
4
Classical Test Theory
  • Classical analysis is the easiest and most widely
    used form of analysis. The statistics can be
    computed by generic statistical packages (or at a
    push by hand) and need no specialist software.
  • Classical analysis is performed on the survey
    instrument (or test) as a whole rather than on
    the item and although item statistics can be
    generated, they apply only to that group of
    students on that collection of items

5
Classical Test Theory Assumptions
  • Classical test theory assumes that any test score
    (or survey instrument sum) is comprised of a
    true value, plus randomized error.
  • Crucially it assumes that this error is normally
    distributed uncorrelated with true score and the
    mean of the error is zero.
  • xobs xtrue G(0, ?err)

6
Classical Analysis Statistics
  • Difficulty (item level statistic)
  • Discrimination (item level statistic)
  • Reliability (instrument level statistic)

7
Classical Test Theory Difficulty
  • The difficulty of a (single response selection)
    question in classical analysis is simply the
    proportion of people who answered the question
    incorrectly. For multiple mark questions, it is
    the average mark expressed as a proportion.
  • Given on a scale of 0-1, the higher the
    proportion the greater the difficulty.

8
Classical Test Theory Discrimination
  • The discrimination of an item is the (Pearson)
    correlation between the average item mark and the
    average total test mark.
  • Being a correlation it can vary from 1 to 1
    with higher values indicating (desirable) high
    discrimination.

9
Classical Test Theory Reliability
  • Reliability is a measure of how well the survey
    (or test) holds together. For practical
    reasons, internal consistency estimates are the
    easiest to obtain which indicate the extent to
    which each item correlates with every other item.
  • This is measured on a scale of 0-1. The greater
    the number the higher the reliability.

10
Classical Analysis versus Latent Trait Models
  • Classical analysis has the survey, or test, (not
    the item) as its basis. Although the statistics
    generated are often generalized to similar
    populations completing a similar survey, or
    taking a similar test they only really apply to
    those students taking that test
  • Latent trait models aim to look beyond that at
    the underlying traits which are producing the
    test performance. They are measured at item level
    and provide sample-free measurement

11
Latent Trait Models
  • Latent trait models have been around since the
    1940s, but were not widely used until the 1960s.
    Although theoretically possible, it is
    practically unfeasible to use these without
    specialist software.
  • They aim to measure the underlying ability (or
    trait) which is producing the test performance
    rather than measuring performance per se.
  • This leads to them being sample-free. As the
    statistics are not dependant on the test
    situation which generated them, they can be used
    more flexibly.

12
Rasch versus Item Response Theory
  • Mathematically, Rasch is identical to the most
    basic IRT model (IRT1), however there are some
    important differences which makes it a more
    viable proposition for practical testing
  • For instance,
  • In Rasch the model is superior. Data which does
    not fit the model is discarded.
  • Rasch does not permit abilities to be estimated
    for extreme items and persons.

13
IRT - the generalized model
Where ag gradient of the ICC at the point
? (item discrimination) bg the ability
level at which ag is maximized (item
difficulty) cg probability of low persons
correctly answering question (or endorsing) g
14
IRT - Item Characteristic Curves
  • An ICC is a plot of the respondents ability
    (likeliness to endorse) over the probability of
    them correctly answering the question
    (endorsing). The higher the ability the higher
    the chance that they will respond correctly.

c - intercept
b - ability at max (a)
a - gradient
15
IRT - About the Parameters Difficulty
  • Although there is no correct difficulty for any
    one item, it is clearly desirable that the
    difficulty of the test (or survey instrument) is
    centred around the average ability of the
    respondents.
  • The higher the b parameter the more difficult
    the question.
  • This is inversely proportionate to the
    probability of the question being answered
    correctly.

16
IRT - About the Parameters Discrimination
  • In IRT (unlike Rasch) maximal discrimination is
    sought.
  • Thus the higher the a parameter the more
    desirable the question.
  • Differences in the discrimination of questions
    can lead to differences in the difficulties of
    questions across the ability range.

17
IRT - About the Parameters Guessing
  • A high c parameter suggests that candidates
    with very little ability may choose the correct
    answer.
  • This is rarely a valid parameter outwith multiple
    choice testingand the value should not vary
    excessively from the reciprocal of the number of
    choices.

18
IRT - Parameter Estimation
  • Before being used (in an item bank or for
    measurement) items must first be calibrated. That
    is their parameters must be estimated.
  • There are two main procedures - Joint Maximal
    Likelihood and Marginal Maximal Likelihood. JML
    is most common for IRT1 and 2, while MML is used
    more frequently for IRT3.
  • Bayesian estimation and estimated bounds may be
    imposed on the data to avoid high discrimination
    items being over valued.
Write a Comment
User Comments (0)
About PowerShow.com