Estimation of Item Response Models - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

Estimation of Item Response Models

Description:

Estimation of Item Response Models Mister Ibik Division of Psychology in Education Arizona State University EDP 691: Advanced Topics in Item Response Theory – PowerPoint PPT presentation

Number of Views:131
Avg rating:3.0/5.0
Slides: 55
Provided by: RLevy
Category:

less

Transcript and Presenter's Notes

Title: Estimation of Item Response Models


1
Estimation of Item Response Models
  • Mister Ibik
  • Division of Psychology in Education
  • Arizona State University
  • EDP 691 Advanced Topics in Item Response Theory

2
Motivation and Objectives
  • Why estimate?
  • Distinguishing feature of IRT modeling as
    compared to classical techniques is the presence
    of parameters
  • These parameters characterize and guide inference
    regarding entities of interest (i.e., examinees,
    items)
  • We will think through
  • Different estimation situations
  • Alternative estimation techniques
  • The logic and mathematics underpinning these
    techniques
  • Various strengths and weaknesses
  • What you will have
  • A detailed introduction to principles and
    mathematics
  • A resource to be revisitedand revisitedand
    revisited

3
Outline
  • Some Necessary Mathematical Background
  • Maximum Likelihood and Bayesian Theory
  • Estimation of Person Parameters When Item
    Parameters are Known
  • ML
  • MAP
  • EAP
  • Estimation of Item Parameters When Person
    Parameters are Known
  • ML
  • Simultaneous Estimation of Item and Person
    Parameters
  • JML
  • CML
  • MML
  • Other Approaches

4
Background Finding the Root of an Equation
  • Newton-Raphson Algorithm
  • Finds the root of an equation
  • Example the function f(x) x2
  • Has a root (where f(x) 0) at x 0

5
Newton-Raphson
  • Newton-Raphson takes a given point, x0, and
    systematically progresses to find the root of the
    equation
  • Utilizes the slope of the function to find where
    the root may be
  • The slope of the function is given by the
    derivative
  • Denoted
  • Gives the slope of the straight line that is
    tangent to f(x) at x
  • Tangent best linear prediction of how the
    function is changing
  • For x0, the best guess for the root is the point
    where f'(x) 0
  • This occurs at
  • So the next candidate point for the root is

6
Newton-Raphson Updating (1)
  • Suppose x0 1.5

f'(x0) 3
f(x0) 2.25
x1 0.75
x0 1.5
7
Newton-Raphson Updating (2)
  • Now x1 0.75

f'(x1) 1.5
f(x1) 0.5625
x2 0.375
x1 0.75
8
Newton-Raphson Updating (3)
  • Now x2 0.375

f'(x2) 0.75
f(x2) 0.1406
x3 0.1875
x2 0.375
9
Newton-Raphson Updating (4)
  • Now x3 0.1875

f'(x3) 0.375
f(x3) 0.0352
x4 0.0938
x3 0.1875
10
Newton-Raphson Example
Iteration Value f(x)
0 1.5000 2.2500 3.0000 0.7500 0.7500
1 0.7500 0.5625 1.5000 0.3750 0.3750
2 0.3750 0.1406 0.7500 0.1875 0.1875
3 0.1875 0.0352 0.3750 0.0938 0.0938
4 0.0938 0.0088 0.1875 0.0469 0.0469
5 0.0469 0.0022 0.0938 0.0234 0.0234
6 0.0234 0.0005 0.0469 0.0117 0.0117
7 0.0117 0.0001 0.0234 0.0059 0.0059
8 0.0059 0.0000 0.0117 0.0029 0.0029
9 0.0029 0.0000 0.0059 0.0015 0.0015
10 0.0015 0.0000 0.0029 0.0007 0.0007
11
Newton-Raphson Summary
  • Iterative algorithm for finding the root of an
    equation
  • Takes a starting point and systematically
    progresses to find the root of the function
  • Requires the derivative of the function
  • Each successive point is given by
  • The process continues until we get arbitrarily
    close, as usually measured by the change in some
    function

12
Difficulties With Newton-Raphson
  • Some functions have multiple roots
  • Which root is found often depends on the start
    value

13
Difficulties With Newton-Raphson
  • Numerical complications can arise
  • When the derivative is relatively small in
    magnitude, the algorithm shoots into outer space

14
Logic of Maximum Likelihood
  • A general approach to parameter estimation
  • The use of a model implies that the data may be
    sufficiently characterized by the features of the
    model, including the unknown parameters
  • Parameters govern the data in the sense that the
    data depend on the parameters
  • Given values of the parameters we can calculate
    the (conditional) probability of the data
  • P(Xij 1 ?i, bj) exp(?i bj)/(1 exp(?i
    bj))
  • Maximum likelihood (ML) estimation asks What
    are the values of the parameters that make the
    data most probable?

15
Example Series of Bernoulli Variables With
Unknown Probability
  • Bernoulli variable P(X 1) p
  • The probability of the data is given by pX
    (1-p)(1-X)
  • Suppose we have two random variables X1 and X2
  • When taken as a function of the parameters, it is
    called the likelihood
  • Suppose X1 1, X2 0
  • P(X1 1, X2 0p) L(pX1 1, X2 0) p
    (1-p)
  • Choose p to maximize the conditional probability
    of the data
  • For p 0.1, L 0.1 (1-0.1) 0.09
  • For p 0.2, L 0.2 (1-0.2) 0.16
  • For p 0.3, L 0.3 (1-0.3) 0.21

16
Example Likelihood Function
17
The Likelihood Function in IRT
  • The Likelihood may be thought of as the
    conditional probability, where the data are known
    and the parameters vary
  • Let Pij P(Xij 1 ?i, ?j)
  • The goal is to maximize this function what
    values of the parameters yield the highest value?

18
Log-Likelihood Functions
  • It is numerically easier to maximize the natural
    logarithm of the likelihood
  • The log-likelihood has the same maximum as the
    likelihood

19
Maximizing the Log-Likelihood
  • Note that at the maximum of the function, the
    slope of the tangent line equals 0
  • The slope of the tangent is given by the first
    derivative
  • If we can find the point at which the first
    derivative equals 0, we will have also found the
    point at which the function is maximized

20
Overview of Numerical Techniques
  • One can maximize the lnL function by finding a
    point where its derivative is 0
  • A variety of methods are available for maximizing
    L, or lnL
  • Newton-Raphson
  • Fisher Scoring
  • Estimation-Maximization (EM)
  • The generality of ML estimation and these
    numerical techniques results in the same concepts
    and estimation routines being employed across
    modeling situations
  • Logistic regression, log-linear modeling, FA,
    SEM, LCA

21
ML Estimation of Person Parameters When Item
Parameters Are Known
  • Assume item parameters bj, aj, and cj, are known
  • Assume unidimensionality, local and respondent
    independence
  • Conditional probability now depends on person
    parameter only
  • Likelihood function for the person parameters
    only

22
ML Estimation of Person Parameters When Item
Parameters Are Known
  • Choose each ?i such that L or lnL is maximized
  • Lets suppose we have one examinee
  • Maximize this function using any of several
    methods
  • Well use Newton-Raphson

23
Newton-Raphson Estimation Recap
  • Recall NR seeks to find the root of a function
    (where 0)
  • NR updates follow the general structure

What is the derivative of this function?
What is our function of interest?
Current value
  • Derivative of the function of interest
  • Updated value

Function of interest
24
Newton-Raphson Estimation of Person Parameters
  • Newton-Raphson uses the derivative of the
    function of interest
  • Our function is itself a derivative, the first
    derivative of lnL with respect to ?i
  • Well need the second derivative as well as the
    first derivative
  • Updates given by

25
ML Estimation of Person Parameters When Item
Parameters Are Known The Log-Likelihood
  • The log-likelihood to be maximized
  • Select a start value and iterate towards a
    solution using Newton-Raphson
  • A hill-climbing sequence

26
ML Estimation of Person Parameters When Item
Parameters Are Known Newton-Raphson
  • Start at -1.0

27
ML Estimation of Person Parameters When Item
Parameters Are Known Newton-Raphson
  • Move to 0.09

28
ML Estimation of Person Parameters When Item
Parameters Are Known Newton-Raphson
  • Move to -0.0001
  • When the change in ?i is arbitrarily small (e.g.,
    less than 0.001), stop estimation
  • No meaningful change in next step
  • The key is that the tangent is 0

29
Newton-Raphson Estimation of Multiple Person
Parameters
  • But we have N examinees each with a ?i to be
    estimated
  • We need a multivariate version of the
    Newton-Raphson algorithm

30
First Order Derivatives
  • First order derivatives of the log-likelihood
  • ?lnL/??i only involves terms corresponding to
    subject i

Why???
31
Second Order Derivatives
  • Hessian second order partial derivatives of the
    log-likelihood
  • This matrix needs to be inverted
  • In the current context, this matrix is diagonal

Why???
32
Second Order Derivatives
  • The inverse of the Hessian is diagonal with
    elements that are the reciprocals of the diagonal
    of the Hessian
  • Updates for each ?i do not depend on any other
    subjects ?

33
Second Order Derivatives
  • The updates for each ?i are independent of one
    another
  • The procedure can be performed one examinee at a
    time

34
ML Estimation of Person Parameters When Item
Parameters Are Known Standard Errors
  • The approximate, asymptotic standard error of the
    ML estimate of ?i is
  • where I(?i) is the information function
  • Standard errors are
  • asymptotic with respect to the number of items
  • approximate because only an estimate of ?i is
    employed
  • asymptotically approximately unbiased

35
ML Estimation of Person Parameters When Item
Parameters Are Known Strengths
  • ML estimates have some desirable qualities
  • They are consistent
  • If a sufficient statistic exists, then the MLE is
    a function of that statistic (Rasch models)
  • Asymptotically normally distributed
  • Asymptotically most efficient (least variable)
    estimator among the class of normally distributed
    unbiased estimators
  • Asymptotically with respect to what?

36
ML Estimation of Person Parameters When Item
Parameters Are Known Weaknesses
  • ML estimates have some undesirable qualities
  • Estimates may fly off into outer space
  • They do not exist for so called perfect scores
    (all 1s or 0s)
  • Can be difficult to compute or verify when the
    likelihood function is not single peaked (may
    occur with 3-PLM or more complex IRT models)

37
ML Estimation of Person Parameters When Item
Parameters Are Known Weaknesses
  • Strategies to handle wayward solutions
  • Bound the amount of change at any one iteration
  • Atheoretical
  • No longer common
  • Use an alternative estimation framework (Fisher,
    Bayesian)
  • Strategies to handle perfect scores
  • Do not estimate ?i
  • Use an alternative estimation framework
    (Bayesian)
  • Strategies to handle local maxima
  • Re-estimate the parameters using different
    starting points and look for agreement

38
ML Estimation of Person Parameters When Item
Parameters Are Known Weaknesses
  • An alternative to the Newton-Raphson technique is
    Fishers method of scoring
  • Instead of the Hessian, it uses the information
    matrix (based on the Hessian)
  • This usually leads to quicker convergence
  • Often is more stable than Newton-Raphson
  • But what about those perfect scores?

39
Bayes Theorem
  • We can avoid some of the problems that occur in
    ML estimation by employing a Bayesian approach
  • All entities treated as random variables
  • Bayes Theorem for random variables A and B

Posterior distribution of A, given B The
probability of A, given B.
  • Conditional probability of B, given A
  • Prior probability of A
  • Marginal probability of B

40
Bayes Theorem
  • If A is discrete
  • If A is continuous
  • Note that P(BA) L(AB)

41
Bayesian Estimation of Person Parameters The
Posterior
  • Select a prior distribution for ?i denoted P(?i)
  • Recall the likelihood function takes on the form
    P(Xi ?i)
  • The posterior density of ?i given Xi is
  • Since P(Xi) is a constant

42
Bayesian Estimation of Person Parameters The
Posterior
  • The Likelilhood
  • The Prior
  • The Posterior

43
Maximum A Posteriori Estimation of Person
Parameters
  • The Maximum A Posteriori (MAP) estimate is
    the maximum of the posterior density of ?i
  • Computed by maximizing the posterior density, or
    its log
  • Find ?i such that
  • Use Newton-Raphson or Fisher scoring
  • Max of lnP(?i Xi) occurs at max of lnP(Xi
    ?i) lnP(?i)
  • This can be thought of as augmenting the
    likelihood with prior information

44
Choice of Prior Distribution
  • Choosing P(?i) U(-8, 8) yields the posterior to
    be proportional to the likelihood
  • In this case, the MAP is very similar to the ML
    estimate
  • The prior distribution P(?i) is often assumed to
    be N(0, 1)
  • The normal distribution commonly justified by
    appeal to CLT
  • Choice of mean and variance identifies the scale
    of the latent continuum

45
MAP Estimation of Person Parameters Features
  • The approximate, asymptotic standard error of the
    MAP is
  • where I(?i) is the information from the posterior
    density
  • Advantages of the MAP estimator
  • Exists for every response pattern why?
  • Generally leads to a reduced tendency for local
    extrema
  • Disadvantages of the MAP estimator
  • Must specify a prior
  • Exhibits shrinkage in that it is biased towards
    the mean May need lots of items to swamp the
    prior if its misspecified
  • Calculations are iterative and may take a long
    time
  • May result in local extrema

46
Expected A Posteriori (EAP) Estimation of Person
Parameters
  • The Expected A Posteriori (EAP) estimator is the
    mean of the posterior distribution
  • Exact computations are often intractable
  • We approximate the integral using numerical
    techniques
  • Essentially, we take a weighted average of the
    values, where the weights are determined by the
    posterior distribution
  • Recall that the posterior distribution is itself
    determined by the prior and the likelihood

47
Numerical Integration Via Quadrature
  • The Posterior Distribution
  • With quadrature points
  • Evaluate the heights of the distribution at each
    point
  • Use the relative heights as the weights

? .165
.021 / .165 .127
.002 / .165 .015
48
EAP Estimation of via Quadrature
  • The Expected A Posteriori (EAP) is estimated by a
    weighted average
  • where H(Qr) is weight of point Qr in the
    posterior (compare Embretson Reise, 2000 p.
    177)
  • The standard error is the standard deviation in
    the posterior and may also be approximated via
    quadrature

49
EAP Estimation of via Quadrature
  • Advantages
  • Exists for all possible response patterns
  • Non-iterative solution strategy
  • Not a maximum, therefore no local extrema
  • Has smallest MSE in the population
  • Disadvantages
  • Must specify a prior
  • Exhibits shrinkage to the prior mean If the
    prior is misspecified, may need lots of items to
    swamp the prior

50
ML Estimation of Item Parameters When Person
Parameters Are Known Assumptions
  • Assume
  • person parameters ?i are known
  • respondent and local independence
  • Choose values for item parameters that maximize
    lnL

51
Newton-Raphson Estimation
  • What is the structure of this matrix?

52
ML Estimation of Item Parameters When Person
Parameters Are Known
  • Just as we could estimate subjects one at a time
    thanks to respondent independence, we can
    estimate items one at time thanks to local
    independence
  • Multivariate Newton-Raphson

53
ML Estimation of Item Parameters When Person
Parameters Are Known Standard Errors
  • To obtain the approximate, asymptotic standard
    errors
  • Invert the associated information matrix, which
    yields the variance-covariance matrix
  • Take the square root of the elements of the
    diagonal
  • Asymptotic w.r.t. sample size and approximate
    because we only have estimates of the parameters
  • This is conceptually similar to those for the
    estimation of ?
  • But why do we need a matrix approach?

54
ML Estimation of Item Parameters When Person
Parameters Are Known Standard Errors
  • ML estimates of item parameters have same
    properties as those for person parameters
    consistent, efficient, asymptotic (w.r.t.
    subjects)
  • aj parameters can be difficult to estimate, tend
    to get inflated with small sample sizes
  • cj parameters are often difficult to estimate
    well
  • Usually because theres not a lot of information
    in the data about the asymptote
  • Especially true when items are easy
  • Generally need larger and more heterogeneous
    samples to estimate 2-PL and 3-PL
  • Can employ Bayesian estimation (more on this
    later)
Write a Comment
User Comments (0)
About PowerShow.com