Estimation of Item Response Models

About This Presentation

Title:

Estimation of Item Response Models

Description:

Estimation of Item Response Models Mister Ibik Division of Psychology in Education Arizona State University EDP 691: Advanced Topics in Item Response Theory – PowerPoint PPT presentation

Number of Views:142

Avg rating:3.0/5.0

Slides: 55

Provided by: RLevy

Category:

more less

Transcript and Presenter's Notes

Title: Estimation of Item Response Models

1
Estimation of Item Response Models

Mister Ibik
Division of Psychology in Education
Arizona State University
EDP 691 Advanced Topics in Item Response Theory

2
Motivation and Objectives

Why estimate?
Distinguishing feature of IRT modeling as
compared to classical techniques is the presence
of parameters
These parameters characterize and guide inference
regarding entities of interest (i.e., examinees,
items)
We will think through
Different estimation situations
Alternative estimation techniques
The logic and mathematics underpinning these
techniques
Various strengths and weaknesses
What you will have
A detailed introduction to principles and
mathematics
A resource to be revisitedand revisitedand
revisited

3
Outline

Some Necessary Mathematical Background
Maximum Likelihood and Bayesian Theory
Estimation of Person Parameters When Item
Parameters are Known
ML
MAP
EAP
Estimation of Item Parameters When Person
Parameters are Known
ML
Simultaneous Estimation of Item and Person
Parameters
JML
CML
MML
Other Approaches

4
Background Finding the Root of an Equation

Newton-Raphson Algorithm
Finds the root of an equation
Example the function f(x) x2
Has a root (where f(x) 0) at x 0

5
Newton-Raphson

Newton-Raphson takes a given point, x0, and
systematically progresses to find the root of the
equation
Utilizes the slope of the function to find where
the root may be
The slope of the function is given by the
derivative
Denoted
Gives the slope of the straight line that is
tangent to f(x) at x
Tangent best linear prediction of how the
function is changing
For x0, the best guess for the root is the point
where f'(x) 0
This occurs at
So the next candidate point for the root is

6
Newton-Raphson Updating (1)

Suppose x0 1.5

f'(x0) 3
f(x0) 2.25
x1 0.75
x0 1.5
7
Newton-Raphson Updating (2)

Now x1 0.75

f'(x1) 1.5
f(x1) 0.5625
x2 0.375
x1 0.75
8
Newton-Raphson Updating (3)

Now x2 0.375

f'(x2) 0.75
f(x2) 0.1406
x3 0.1875
x2 0.375
9
Newton-Raphson Updating (4)

Now x3 0.1875

f'(x3) 0.375
f(x3) 0.0352
x4 0.0938
x3 0.1875
10
Newton-Raphson Example
Iteration Value f(x)
0 1.5000 2.2500 3.0000 0.7500 0.7500
1 0.7500 0.5625 1.5000 0.3750 0.3750
2 0.3750 0.1406 0.7500 0.1875 0.1875
3 0.1875 0.0352 0.3750 0.0938 0.0938
4 0.0938 0.0088 0.1875 0.0469 0.0469
5 0.0469 0.0022 0.0938 0.0234 0.0234
6 0.0234 0.0005 0.0469 0.0117 0.0117
7 0.0117 0.0001 0.0234 0.0059 0.0059
8 0.0059 0.0000 0.0117 0.0029 0.0029
9 0.0029 0.0000 0.0059 0.0015 0.0015
10 0.0015 0.0000 0.0029 0.0007 0.0007
11
Newton-Raphson Summary

Iterative algorithm for finding the root of an
equation
Takes a starting point and systematically
progresses to find the root of the function
Requires the derivative of the function
Each successive point is given by
The process continues until we get arbitrarily
close, as usually measured by the change in some
function

12
Difficulties With Newton-Raphson

Some functions have multiple roots
Which root is found often depends on the start
value

13
Difficulties With Newton-Raphson

Numerical complications can arise
When the derivative is relatively small in
magnitude, the algorithm shoots into outer space

14
Logic of Maximum Likelihood

A general approach to parameter estimation
The use of a model implies that the data may be
sufficiently characterized by the features of the
model, including the unknown parameters
Parameters govern the data in the sense that the
data depend on the parameters
Given values of the parameters we can calculate
the (conditional) probability of the data
P(Xij 1 ?i, bj) exp(?i bj)/(1 exp(?i
bj))
Maximum likelihood (ML) estimation asks What
are the values of the parameters that make the
data most probable?

15
Example Series of Bernoulli Variables With
Unknown Probability

Bernoulli variable P(X 1) p
The probability of the data is given by pX
(1-p)(1-X)
Suppose we have two random variables X1 and X2
When taken as a function of the parameters, it is
called the likelihood
Suppose X1 1, X2 0
P(X1 1, X2 0p) L(pX1 1, X2 0) p
(1-p)
Choose p to maximize the conditional probability
of the data
For p 0.1, L 0.1 (1-0.1) 0.09
For p 0.2, L 0.2 (1-0.2) 0.16
For p 0.3, L 0.3 (1-0.3) 0.21

16
Example Likelihood Function
17
The Likelihood Function in IRT

The Likelihood may be thought of as the
conditional probability, where the data are known
and the parameters vary
Let Pij P(Xij 1 ?i, ?j)
The goal is to maximize this function what
values of the parameters yield the highest value?

18
Log-Likelihood Functions

It is numerically easier to maximize the natural
logarithm of the likelihood
The log-likelihood has the same maximum as the
likelihood

19
Maximizing the Log-Likelihood

Note that at the maximum of the function, the
slope of the tangent line equals 0
The slope of the tangent is given by the first
derivative
If we can find the point at which the first
derivative equals 0, we will have also found the
point at which the function is maximized

20
Overview of Numerical Techniques

One can maximize the lnL function by finding a
point where its derivative is 0
A variety of methods are available for maximizing
L, or lnL
Newton-Raphson
Fisher Scoring
Estimation-Maximization (EM)
The generality of ML estimation and these
numerical techniques results in the same concepts
and estimation routines being employed across
modeling situations
Logistic regression, log-linear modeling, FA,
SEM, LCA

21
ML Estimation of Person Parameters When Item
Parameters Are Known

Assume item parameters bj, aj, and cj, are known
Assume unidimensionality, local and respondent
independence

Conditional probability now depends on person
parameter only

Likelihood function for the person parameters
only

22
ML Estimation of Person Parameters When Item
Parameters Are Known

Choose each ?i such that L or lnL is maximized
Lets suppose we have one examinee
Maximize this function using any of several
methods
Well use Newton-Raphson

23
Newton-Raphson Estimation Recap

Recall NR seeks to find the root of a function
(where 0)
NR updates follow the general structure

What is the derivative of this function?
What is our function of interest?
Current value

Derivative of the function of interest

Updated value

Function of interest
24
Newton-Raphson Estimation of Person Parameters

Newton-Raphson uses the derivative of the
function of interest
Our function is itself a derivative, the first
derivative of lnL with respect to ?i
Well need the second derivative as well as the
first derivative
Updates given by

25
ML Estimation of Person Parameters When Item
Parameters Are Known The Log-Likelihood

The log-likelihood to be maximized
Select a start value and iterate towards a
solution using Newton-Raphson
A hill-climbing sequence

26
ML Estimation of Person Parameters When Item
Parameters Are Known Newton-Raphson

Start at -1.0

27
ML Estimation of Person Parameters When Item
Parameters Are Known Newton-Raphson

Move to 0.09

28
ML Estimation of Person Parameters When Item
Parameters Are Known Newton-Raphson

Move to -0.0001
When the change in ?i is arbitrarily small (e.g.,
less than 0.001), stop estimation
No meaningful change in next step
The key is that the tangent is 0

29
Newton-Raphson Estimation of Multiple Person
Parameters

But we have N examinees each with a ?i to be
estimated
We need a multivariate version of the
Newton-Raphson algorithm

30
First Order Derivatives

First order derivatives of the log-likelihood
?lnL/??i only involves terms corresponding to
subject i

Why???
31
Second Order Derivatives

Hessian second order partial derivatives of the
log-likelihood
This matrix needs to be inverted
In the current context, this matrix is diagonal

Why???
32
Second Order Derivatives

The inverse of the Hessian is diagonal with
elements that are the reciprocals of the diagonal
of the Hessian
Updates for each ?i do not depend on any other
subjects ?

33
Second Order Derivatives

The updates for each ?i are independent of one
another
The procedure can be performed one examinee at a
time

34
ML Estimation of Person Parameters When Item
Parameters Are Known Standard Errors

The approximate, asymptotic standard error of the
ML estimate of ?i is
where I(?i) is the information function
Standard errors are
asymptotic with respect to the number of items
approximate because only an estimate of ?i is
employed
asymptotically approximately unbiased

35
ML Estimation of Person Parameters When Item
Parameters Are Known Strengths

ML estimates have some desirable qualities
They are consistent
If a sufficient statistic exists, then the MLE is
a function of that statistic (Rasch models)
Asymptotically normally distributed
Asymptotically most efficient (least variable)
estimator among the class of normally distributed
unbiased estimators
Asymptotically with respect to what?

36
ML Estimation of Person Parameters When Item
Parameters Are Known Weaknesses

ML estimates have some undesirable qualities
Estimates may fly off into outer space
They do not exist for so called perfect scores
(all 1s or 0s)
Can be difficult to compute or verify when the
likelihood function is not single peaked (may
occur with 3-PLM or more complex IRT models)

37
ML Estimation of Person Parameters When Item
Parameters Are Known Weaknesses

Strategies to handle wayward solutions
Bound the amount of change at any one iteration
Atheoretical
No longer common
Use an alternative estimation framework (Fisher,
Bayesian)
Strategies to handle perfect scores
Do not estimate ?i
Use an alternative estimation framework
(Bayesian)
Strategies to handle local maxima
Re-estimate the parameters using different
starting points and look for agreement

38
ML Estimation of Person Parameters When Item
Parameters Are Known Weaknesses

An alternative to the Newton-Raphson technique is
Fishers method of scoring
Instead of the Hessian, it uses the information
matrix (based on the Hessian)
This usually leads to quicker convergence
Often is more stable than Newton-Raphson
But what about those perfect scores?

39
Bayes Theorem

We can avoid some of the problems that occur in
ML estimation by employing a Bayesian approach
All entities treated as random variables
Bayes Theorem for random variables A and B

Posterior distribution of A, given B The
probability of A, given B.

Conditional probability of B, given A

Prior probability of A

Marginal probability of B

40
Bayes Theorem

If A is discrete
If A is continuous
Note that P(BA) L(AB)

41
Bayesian Estimation of Person Parameters The
Posterior

Select a prior distribution for ?i denoted P(?i)
Recall the likelihood function takes on the form
P(Xi ?i)
The posterior density of ?i given Xi is
Since P(Xi) is a constant

42
Bayesian Estimation of Person Parameters The
Posterior

The Likelilhood

The Prior

The Posterior

43
Maximum A Posteriori Estimation of Person
Parameters

The Maximum A Posteriori (MAP) estimate is
the maximum of the posterior density of ?i
Computed by maximizing the posterior density, or
its log
Find ?i such that
Use Newton-Raphson or Fisher scoring
Max of lnP(?i Xi) occurs at max of lnP(Xi
?i) lnP(?i)
This can be thought of as augmenting the
likelihood with prior information

44
Choice of Prior Distribution

Choosing P(?i) U(-8, 8) yields the posterior to
be proportional to the likelihood
In this case, the MAP is very similar to the ML
estimate
The prior distribution P(?i) is often assumed to
be N(0, 1)
The normal distribution commonly justified by
appeal to CLT
Choice of mean and variance identifies the scale
of the latent continuum

45
MAP Estimation of Person Parameters Features

The approximate, asymptotic standard error of the
MAP is
where I(?i) is the information from the posterior
density
Advantages of the MAP estimator
Exists for every response pattern why?
Generally leads to a reduced tendency for local
extrema
Disadvantages of the MAP estimator
Must specify a prior
Exhibits shrinkage in that it is biased towards
the mean May need lots of items to swamp the
prior if its misspecified
Calculations are iterative and may take a long
time
May result in local extrema

46
Expected A Posteriori (EAP) Estimation of Person
Parameters

The Expected A Posteriori (EAP) estimator is the
mean of the posterior distribution
Exact computations are often intractable
We approximate the integral using numerical
techniques
Essentially, we take a weighted average of the
values, where the weights are determined by the
posterior distribution
Recall that the posterior distribution is itself
determined by the prior and the likelihood

47
Numerical Integration Via Quadrature

The Posterior Distribution
With quadrature points
Evaluate the heights of the distribution at each
point
Use the relative heights as the weights

? .165
.021 / .165 .127
.002 / .165 .015
48
EAP Estimation of via Quadrature

The Expected A Posteriori (EAP) is estimated by a
weighted average
where H(Qr) is weight of point Qr in the
posterior (compare Embretson Reise, 2000 p.
177)
The standard error is the standard deviation in
the posterior and may also be approximated via
quadrature

49
EAP Estimation of via Quadrature

Advantages
Exists for all possible response patterns
Non-iterative solution strategy
Not a maximum, therefore no local extrema
Has smallest MSE in the population
Disadvantages
Must specify a prior
Exhibits shrinkage to the prior mean If the
prior is misspecified, may need lots of items to
swamp the prior

50
ML Estimation of Item Parameters When Person
Parameters Are Known Assumptions

Assume
person parameters ?i are known
respondent and local independence
Choose values for item parameters that maximize
lnL

51
Newton-Raphson Estimation

What is the structure of this matrix?

52
ML Estimation of Item Parameters When Person
Parameters Are Known

Just as we could estimate subjects one at a time
thanks to respondent independence, we can
estimate items one at time thanks to local
independence
Multivariate Newton-Raphson

53
ML Estimation of Item Parameters When Person
Parameters Are Known Standard Errors

To obtain the approximate, asymptotic standard
errors
Invert the associated information matrix, which
yields the variance-covariance matrix
Take the square root of the elements of the
diagonal
Asymptotic w.r.t. sample size and approximate
because we only have estimates of the parameters
This is conceptually similar to those for the
estimation of ?
But why do we need a matrix approach?

54
ML Estimation of Item Parameters When Person
Parameters Are Known Standard Errors

ML estimates of item parameters have same
properties as those for person parameters
consistent, efficient, asymptotic (w.r.t.
subjects)
aj parameters can be difficult to estimate, tend
to get inflated with small sample sizes
cj parameters are often difficult to estimate
well
Usually because theres not a lot of information
in the data about the asymptote
Especially true when items are easy
Generally need larger and more heterogeneous
samples to estimate 2-PL and 3-PL
Can employ Bayesian estimation (more on this
later)

Write a Comment

User Comments (0)

About PowerShow.com

Estimation of Item Response Models - PowerPoint PPT Presentation

Estimation of Item Response Models

Estimation of Item Response Models Mister Ibik Division of Psychology in Education Arizona State University EDP 691: Advanced Topics in Item Response Theory – PowerPoint PPT presentation