Empirical Bayes DIF Assessment Rebecca Zwick, UC Santa Barbara - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Empirical Bayes DIF Assessment Rebecca Zwick, UC Santa Barbara

Description:

Compare item performance for members of 2 groups, after matching on total test score, S. ... where k is the population odds ratio at score level k. ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 40
Provided by: rebecc126
Category:

less

Transcript and Presenter's Notes

Title: Empirical Bayes DIF Assessment Rebecca Zwick, UC Santa Barbara


1
Empirical Bayes DIF Assessment Rebecca Zwick, UC
Santa Barbara
  • Presented at Measured Progress
  • August 2007

2
Overview
  • Definition and causes of DIF
  • Assessing DIF via Mantel-Haenszel
  • EB enhancement to MH DIF (1994-2002, with D.
    Thayer C. Lewis)
  • Model and Applications
  • Simulation findings
  • Discussion

3
Whats differential item functioning ?
  • DIF occurs when equally skilled members of 2
    groups have different probabilities of answering
    an item correctly.
  • (Only dichotomous items considered today)

4
IRT Definition of (absence of) DIF
  • Lord, 1980 P(Yi 1 ?, R) P(Yi 1 ?,
    F) means DIF is absent
  • P(Yi 1 ?, G) is the probability of correct
    response to item i, given ?, in group G,
  • G F (focal) or R (Reference).
  • ? is a latent ability variable, imperfectly
    measured by test score S. (More later...)

5
Reasons for DIF
  • Construct-irrelevant difficulty (e.g., sports
    content in a math item)
  • Differential interests or educational background
    NAEP History items with DIF favoring Black
    test-takers were about M. L. King, Harriet
    Tubman, Underground Railroad (Zwick Ercikan,
    1989)
  • Often mystifying (e.g., X 5 10 has DIF Y
    8 11 doesnt)

6
Mini-history of DIF analysis
  • DIF research dates back to 1960s
  • In late 1980s (Golden Rule), testing companies
    started including DIF analysis as a QC procedure.
  • Mantel-Haenszel (Holland Thayer, 1988) method
    of choice for operational DIF analyses
  • Few assumptions
  • No complex estimation procedures
  • Easy to explain

7
Mantel-Haenszel
  • Compare item performance for members of 2 groups,
    after matching on total test score, S.
  • Suppose we have K levels of the score used for
    matching test-takers, s1, s2, sK
  • In each of the K levels, data can be represented
    as a 2 x 2 table (Right/Wrong by
    Reference/Focal).

8
Mantel-Haenszel
  • For each table, compute conditional odds ratio
  • Odds of correct response Ssk, GR
  • Odds of correct response Ssk, GF
  • Weighted combination of these K values is MH odds
    ratio,
  • MH DIF statistic is -2.35 ln( )

9
Mantel-Haenszel
  • The MH chi-square tests the hypothesis,
  • H0 ?k ? 1, k 1, 2, K versus
  • H1 ? k ? ? 1, k 1, 2, K
  • where ?k is the population odds ratio at score
    level k.
  • (Above H0 is similar, but not, in general,
    identical to the IRT H0 see Zwick, 1990 Journal
    of Educational Statistics)

10
Mantel-Haenszel
  • ETS Size of DIF estimate, plus chi-square
    results are used to categorize item
  • A negligible DIF
  • B slight to moderate DIF
  • C substantial DIF
  • For B and C, or - used to indicate DIF
    direction - means DIF against focal group.
  • Designation determines items fate.

11
Drawbacks to usual MH approach
  • May give impression that DIF status is
    deterministic or is a fixed property of the item
  • Reviewers of DIF items often ignore SE
  • Is unstable in small samples, which may arise in
    CAT settings

12
EB enhancement to MH
  • Provides more stable results
  • May allow variability of DIF findings to be
    represented in a more intuitive way
  • Can be used in three ways
  • Substitute more stable point estimates for MH
  • Provide probabilistic perspective on true DIF
    status (A, B, C) and future observed status
  • Loss-function-based DIF detection

13
Main Empirical Bayes DIF Work (supported by ETS
and LSAC)
  • An EB approach to MH DIF analysis (with Thayer
    Lewis). JEM, 1999. General approach,
    probabilistic DIF
  • Using loss functions for DIF detection An EB
    approach (with Thayer Lewis). JEBS, 2000.
    Loss functions
  • The assessment of DIF in CATs. In van der Linden
    Glas (Eds.) CAT Theory and Practice, 2000.
    review
  • Application of an EB enhancement of MH DIF
    analysis to a CAT (with Thayer). APM, 2002.
    simulated CAT-LSAT

14
Whats an Empirical Bayes Model?(See Casella
(1985), Am. Statistician)
  • In Bayesian statistics, we assume that parameters
    have prior distributions that describe parameter
    behavior.
  • Statistical theory, or past research may inform
    us about the nature of those distributions.
  • Combining observed data with the prior
    distribution yields a posterior (after the
    data) distribution that can be used to obtain
    improved parameter estimates.
  • EB means priors parameters are estimated from
    data (unlike fully Bayes models).

15
EB DIF Model
16
EB DIF Model
17
EB DIF Model
18
EB DIF Model
19
EB DIF Model
20
(No Transcript)
21
Recall EB DIF estimate is a weighted combination
of MHi and prior mean.
22
Next
  • Performance of EB DIF estimator
  • Probabilistic DIF idea

23
How does EB DIF estimator EBi compare to MHi?
  • Applied to real data, including GRE
  • Applied to simulated data, including simulated
    CAT-LSAT (Zwick Thayer, 2002)
  • Testlet CAT data simulated, including items with
    varying amounts of DIF
  • EB and MH both used to estimate (known) True DIF
  • Performance compared using RMSR, variance, and
    bias measures

24
Design of Simulated CAT
  • Pool 30 5-item testlets (150 items total)
  • 10 Testlets at each of 3 difficulty levels
  • Item data generated via 3PL model
  • CAT algorithm was based on testlet scores
  • Examinees received 5 testlets (25 items)
  • Test score (used as DIF matching variable) was
    expected true score on pool (Zwick, Thayer,
    Wingersky, 1994 APM)

25
Simulation Conditions Differed on Several Factors
  • Ability distribution
  • Always N(0,1) in Reference group
  • Focal group either N(0,1) or N(-1,1)
  • Initial sample size per group 1000 or 3000
  • DIF Absent or Present (in amounts that vary
    across items)
  • 600 replications for results shown today

26
Definition of True DIF for Simulation
Range of True DIF -2.3 to 2.9, SD 1.
27
Definition of Root Mean Square Residual
28
MSR Variance Squared Bias
  • MSR RMSR2

29
RMSRs for No-DIF condition, Initial N1000
Item Ns 80 to 300
30
RMSRs - 50 hard items, DIF condition, Focal
N(-1,1)Focal Ns 16 to 67, Reference Ns
80 to 151
31
RMSRs for DIF condition, Focal N(-1,1)Initial
N1000 Item Ns 16 to 307
32
Variance and Squared Bias for Same
ConditionInitial N1000 Item Ns 16 to 307
33
Summary-Performance of EB DIF Estimator
  • RMSRs (and variances) are smaller for EB than for
    MH, especially in (1) no-DIF case and
  • (2) very small-sample case.
  • EB estimates more biased than MH bias is toward
    0.
  • Above findings are consistent with theory.
  • Implications to be discussed.

34
External Applications/Elaborations of EB DIF
Point Estimation
  • Defense Dept CAT-ASVAB (Krass Segal, 1998)
  • ACT Simulated multidimensional CAT data (Miller
    Fan, NCME, 1998)
  • ETS Fully Bayes DIF model (NCME, 2007) of
    Sinharay et al Like EB, but parameters of
    prior are determined using past data (see ZTL).
  • Also tried loss function approach.

35
Probabilistic DIF
  • In our model, posterior distribution is normal,
    so is fully determined by mean and variance.
  • Can use posterior distribution to infer the
    probability that DIF falls into each of the ETS
    categories (C-, B-, A, B, C), each of which
    corresponds to a particular DIF magnitude.
  • (Statistical significance plays no role
    here.)
  • Can display graphically.

36
Probabilistic DIF status for an A item in LSAT
sim.MH 4.7, SE 2.2, Identified Status
CPosterior Mean EBi .7, Posterior SD .8
NR101 NF 23
37
Probabilistic DIF, continued
  • In EB approach can be used to accumulate DIF
    evidence across administrations.
  • Prior can be modified each time an item is given
    Use former posterior distribution as new prior
    (Zwick, Thayer Lewis, 1999).
  • Pie chart could then be modified to reflect new
    evidence about an items status.

38
Predicting an Items Future Status The Posterior
Predictive Distribution
  • A variation on the above can be used to predict
    future observed DIF status
  • Mean of posterior predictive distribution is same
    as posterior mean, but variance is larger.
  • For details and an application to GRE items, see
    Zwick, Thayer, Lewis, 1999 JEM.

39
Discussion
  • EB point estimates have advantages over MH
    counterparts
  • EB approach can be applied to non-MH DIF methods
  • Advisability of shrinkage estimation for DIF
    needs to be considered
  • Reducing Type I error may yield more
    interpretable results
  • Degree of shrinkage can be fine-tuned
  • Probabilistic DIF displays may have value in
    conveying uncertainty of DIF results.
Write a Comment
User Comments (0)
About PowerShow.com