Test item analysis: When are statistics a good thing - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Test item analysis: When are statistics a good thing

Description:

Test item analysis: When are statistics a good thing – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 16
Provided by: pmoUmex
Category:

less

Transcript and Presenter's Notes

Title: Test item analysis: When are statistics a good thing


1
Test item analysis When are statistics a good
thing?
  • Andrew Martin
  • Purdue Pesticide Programs

2
What were going to do
  • Review some simple, descriptive statistics.
  • Discuss the concept of random error.
  • Identify important item characteristics.
  • Conduct an item analysis using real data and
    actual test items.

3
Measures of central tendency
  • Mean the sum of the test scores the number of
    test scores (i.e., an average)
  • Median the middle score

4
Individual differences
  • Range highest score - lowest score
  • Standard deviation (roughly!) the range 5, or
    (more precisely) the square root of the
    average, squared deviation score1
  • Variance the standard deviation squared
  • 1. A deviation score is an individuals score
    minus the group mean score

5
Score distribution
6
Score distribution (cont.)
  • Licensure test scores are generally NOT normally
    distributed, as shown in the preceding slide.
  • They are often left skewed (i.e., scores are
    concentrated on the right-hand side of the
    distribution).
  • But it doesnt matter. Were going to treat score
    distributions AS IF they are normal.

7
Individual differences (variance) and random error
  • Under classical measurement theory, individual
    score differences are the result of
  • 1. true differences in achievement and
  • 2. random error
  • We are interested in the former and want to
    minimize the latter.

8
Estimating the influence of random error on score
results
  • A RELIABLE test generates scores that are
    reasonably free from the influence of random
    error (i.e., the test has a high degree of
    precision).
  • A reliability coefficient indicates a tests
    precision of measurement.
  • The general index of reliability is KR-20.

9
KR-20
  • KR-20 can range in value from 0 (perfectly
    unreliable) to 1 (perfectly reliable).
  • KR-20 values for licensure exams should range
    above .90.
  • KR-20 values are affected by the number of items
    on the test and by how strongly the items relate
    to (or correlate with) one another. Shorter tests
    are generally LESS reliable than longer ones and
    anything that restricts test score variance will
    also reduce the the value of KR-20.

10
Standard error of measurement (SEM)
  • SEM offers another means of examining the
    influence of random error.
  • It is an estimate of the standard deviation of
    test scores for any person resulting from
    repeated administrations of similar parallel
    test forms.
  • With qualifications, SEM can be used to place
    confidence intervals around a persons actual
    score.

11
Item difficulty
  • Item difficulty is estimated by the p-value. It
    is the percentage of test takers who correctly
    answer the item.
  • Item p-values at .50 offer the greatest
    contribution to test reliability.
  • p-values are potentially biased on the sample of
    test takers from which they were calculated.

12
Item discrimination
  • Item discrimination describes an items ability
    to differentiate between persons who are
    knowledgeable about item content from those who
    are not.
  • Item discrimination is typically estimated by rpb
    (point-biserial correlation).
  • rpb indicates the strength of relationship
    (correlation) between how individuals answer an
    item and their score total.

13
Item discrimination (cont.)
  • High achievers are expected to answer an item
    correctly more frequently than low achievers.
    Consequently, an rpb should be positive.
  • rpbs above .30 are highly discriminating (and
    offer the greatest contribution to test
    reliability).
  • rpbs, like p-values, are potentially biased on
    the sample from which they were calculated.

14
Item omits
  • Omits indicate the number of persons who failed
    to respond to an item.
  • Numerous omits (assuming no correction for
    guessing) may indicate a problem with the amount
    of time allotted for the test.
  • Extensive non-response is a threat to valid score
    interpretation and use.

15
Resources for additional help
  • Haladyna, T. (2004). Developing and Validating
    Multiple-Choice Test Items.Third Edition.
    Lawrence Erlbaum Associates, Inc. Publishers.
    Mahwah, NJ.
  • Osterlind, S. (1998). Constructing Test Items
    Multiple-Choice, Constructed-Response,
    Performance, and Other Formats. Second Edition.
    Kluwer Academic Publishers Group. Norwell, MA.
  • or
  • Contact your state land grant universitys
    college of education, department of psychology,
    or testing service about performing and
    interpreting item analysis reports.
  • or try
  • http//www.eflclub.com/elvin/publications/2003/ite
    manalysis.html
Write a Comment
User Comments (0)
About PowerShow.com