Title: Item Response Theory
1Item Response Theory
2Shortcomings of Classical True Score Model
- Sample dependence
- Limitation to the specific test situation.
- Dependence on the parallel forms
- Same error variance for all
3Sample Dependence
- The first shortcoming of CTS is that the values
of commonly used item statistics in test
development such as item difficulty and item
discrimination depend on the particular examinee
samples in which they are obtained. The average
level of ability and the range of ability scores
in an examinee sample influence, often
substantially, the values of the item statistics. - Difficulty level changes with the level of
samples ability and discrimination index is
different between heterogeneous sample and the
homogeneous sample.
4Limitation to the Specific Test Situation
- The task of comparing examinees who have taken
samples of test items of differing difficulty
cannot easily be handled with standard testing
models and procedures.
5Dependence on the Parallel Forms
- The fundamental concept, test reliability, is
defined in terms of parallel forms.
6Same Error Variance For All
- CTS presumes that the variance of errors of
measurement is the same for all examinees.
7Item Response Theory
- The purpose of any test theory is to describe how
inferences from examinee item responses and/or
test scores can be made about unobservable
examinee characteristics or traits that are
measured by a test. - An individuals expected performance on a
particular test question, or item, is a function
of both the level of difficulty of the item and
the individuals level of ability.
8Item Response Theory
- Examinee performance on a test can be predicted
(or explained) by defining examinee
characteristics, referred to as traits, or
abilities estimating scores for examinees on
these traits (called "ability scores") and using
the scores to predict or explain item and test
performance. Since traits are not directly
measurable, they are referred to as latent traits
or abilities. An item response model specifies a
relationship between the observable examinee test
performance and the unobservable traits or
abilities assumed to underlie performance on the
test.
9Assumptions of IRT
- Unidimensionality
- Local independence
10Unidimensionality Assumption
- It is possible to estimate an examinee's ability
on the same ability scale from any subset of
items in the domain of items that have been
fitted to the model. The domain of items needs to
be homogeneous in the sense of measuring a single
ability If the domain of items is too
heterogenous, the ability estimates will have
little meaning. - Most of the IRT models that are currently being
applied make the specific assumption that the
items in a test measure a single, or
unidimensional ability or trait, and that the
items form a unidimensional scale of measurement.
11Local Independence
- This assumption states that an examinee's
responses to different items in a test are
statistically independent. For this assumption to
be true, an examinee's performance on one item
must not affect, either for better or for worse,
his or her responses on any other items in the
test.
12Item Characteristic Curves
- Specific assumptions about the relationship
between the test taker's ability and his
performance on a given item are explicitly stated
in the mathematical formula, or item
characteristic curve (ICC).
13Item Characteristic Curves
- The form of the ICC is determined by the
particular mathematical model on which it is
based. The types of information about item
characteristics may include - (1)Â the degree to which the item discriminates
among individuals of differing levels of ability
(the 'discrimination' parameter a)
14Item Characteristic Curves
- (2)Â the level of difficulty of the item (the
'difficulty' parameter b), and - (3)Â Â the probability that an individual of low
ability can answer the item correctly (the
'pseudo-chance' or 'guessing' parameter c). - One of the major considerations in the
application of IRT models, therefore, is the
estimation of these item parameters.
15ICC
- pseudo-chance parameter c p0.20 for two items
- difficulty parameter b halfway between the
pseudo-chance parameter and one - discrimination parameter a proportional to the
slop of the ICC at the point of the difficulty
parameter The steeper the slope, the greater the
discrimination parameter.
Probability
Ability Scale
16Ability Score
- 1. The test developer collects a set of observed
item responses from a relatively large number of
test takers. - 2. After an initial examination of how well
various models fit the data, an IRT model is
selected. - 3.  Through an iterative procedure, parameter
estimates are assigned to items and ability
scores to individuals, so as to maximize the
agreement, or fit between the particular IRT
model and the test data.
17Ability Score
18Item Information Function
- The limitations on CTS theory approaches to
precision of measurement are addressed in the IRT
concept of information function. The item
information function refers to the amount of
information a given item provides for estimating
an individual's level of ability, and is a
function of both the slope of the ICC and the
amount of variation at each ability level. - The information function of a given item will be
at its maximum for individuals whose ability is
at or near the value of the difficulty parameter.
19Item Information Function
20Item Information Function
21Item Information Function
- The information function of a given item will be
at its maximum for individuals whose ability is
at or near the value of the difficulty parameter. - (1)Â provides the most information about
differences in ability at the lower end of the
ability scale. - (2)Â provides relatively little information at any
point on the ability scale. - (3)Â provides the most information about
differences in ability at the high end of the
ability scale.
22Test Information Function
- The test information function (TIF) is the sum of
the item information functions, each of which
contributes independently to the total, and is a
measure of how much information a test provides
at different ability levels. - The TIF is the IRT analog of CTS theory
reliability and the standard error of
measurement.
23Item Bank
- If there is a need for regular test
administration and analysis, the construction of
item bank may be taken into consideration. - Item bank is not a simple collection of test
items that is organized in their raw form, but
with parameters assigned on the basis of CTS or
IRT models. - Item bank should also have a data processing
system that assures the steady quality of the
data in the bank (describing, classifying,
accepting, and rejecting items)
24Specifications in CTS Item Bank
- Form of items
- Type of item parts
- Describing data
- Classifying data
25Form of Items
- Dichotomous
- Listening comprehension
- Statement question choices
- Short conversation question choices
- Long conversation / passage some questions
choices - Reading comprehension
- Passage some questions choices
- Passage T/F questions
- Syntactic knowledge / vocabulary
- Question stem with blank/underlined parts
choices - Cloze
- Passage choices
26Form of Items
- Nondichotomous
- Listening comprehension
- Dictation
- Dictation passage with blanks to be filled
27Describing data
- Ability measured
- Difficulty index
- Discrimination
- Storage code