When Are Statistically Significant DTF Effects Practically Important - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

When Are Statistically Significant DTF Effects Practically Important

Description:

When a high cut score is used for selection, adverse impact (AI) often occurs. ... DTF alone causes AI, so test revision is warranted ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 18
Provided by: Stephe4
Category:

less

Transcript and Presenter's Notes

Title: When Are Statistically Significant DTF Effects Practically Important


1
When Are Statistically Significant DTF Effects
Practically Important?
  • Stephen Stark, Sasha Chernyshenko, Fritz
    Drasgow
  • University of Illinois at Urbana-Champaign

2
Pros and Cons of Testing
  • Use of valid tests for selection and promotion
    improves organizational performance.
  • But, causes differential selection rates among
    groups that differ in test score means.
  • When a high cut score is used for selection,
    adverse impact (AI) often occurs.
  • AI raises concerns that tests are biased against
    lower scoring groups.

3
Key Question Do Observed Mean Differences Result
from Test Bias?
  • IRT methods can be used to answer this question
  • Mean differences can be separated into two parts
  • Bias differences due to problems with item
    content
  • Impact true differences that relate to
    performance
  • Typically, we refer to
  • Lower-scoring or smaller groups as focal (F)
    groups
  • Highest scoring or largest group as reference (R)
    group

4
Bias Can Occur on Two Levels
  • Item bias (DIF) refers to differences in
  • Probability of correctly answering an item among
    individuals having the same level of ability, but
    belonging to different groups
  • Test bias (DTF) refers to differences in
  • Expected total scores
  • DTF is more important, because
  • Decisions are made based on test scores, not item
    scores
  • DIF can cancel to produce tests having little or
    no DTF

5
Issues Concerning DTF
  • Many IRT methods available for detecting DTF
  • But, findings often difficult to interpret
  • Rely on statistical significance tests, which
    depend on sample size
  • Statistically significant results may not be
    practically important
  • Unclear what to do when DTF is found
  • Should test be revised?
  • Should a different cut score be chosen?

6
Purpose of Research
  • Develop methods to assess importance of DTF
  • Relate DTF to
  • 1) Observed mean differences across groups
  • 2) Traditional measures of effect size (as in lab
    studies)
  • 3) 4/5 Rule (legal definition of adverse impact)
  • Use these methods to examine two substantive
    issues related to personnel selection

7
The DTF-R and d Indices
  • Determine contribution of DTF to observed mean
    differences across reference and focal groups
  • DTF-R magnitude of effect in raw score points
  • d traditional effect size measure

8
Computing DTF-R and d
  • Select IRT model (e.g., 3PL) and estimate item
    parameters for reference focal groups
    separately
  • BILOG computer program
  • Compute linking constants and equated focal group
    item parameters
  • ITERLINK computer program
  • Compute DTF-R and d values using equated item
    parameters, linking constants, and SD of focal
    group observed scores
  • DTF-EFFECT computer program

9
Advantages and Disadvantages of DTF-R and d
  • Advantages
  • Easy interpretation
  • Large positive values indicate substantial bias
    against focal group
  • Statistical significance tests not needed
  • Disadvantage
  • DTF-R and d values may be small, even when
    substantial bias is present
  • Occurs if one group is favored at low trait
    levels, and the other is favored at the high
    trait levels

10
The RSR Index
  • Developed to address limitation of DTF-R and d
  • RSR index relates DTF directly to 4/5 rule for
    selection decisions
  • Compares proportions selected from reference and
    focal groups at specific cut points on observed
    score metric
  • So, RSR can be used to determine, whether
  • DTF alone causes AI, so test revision is
    warranted
  • AI can be eliminated simply by choosing a
    different cut score

11
Computing RSR
  • Obtain equated item parameters
  • Get distribution of observed scores at each theta
    using Lords recursion
  • For each possible cut score, xc , compute
    proportion of persons in reference and focal
    groups at or above xc by integration
  • Compute the ratio of proportions hired
  • DTF-RSR computer program

12
Advantages of RSR
  • Can be used to assess importance of DTF at
    different cut scores on observed metric
  • Simple interpretation
  • gt 1 focal group favored
  • 1 no effect
  • lt 1 DTF favors reference group
  • lt .8 DTF alone causes adverse impact
  • Values gt 1 dont present a problem
  • Bias against focal group is primary concern
  • DTF acts to reduce observed mean difference,
    because focal group mean is usually lower

13
Investigating the Practical Importance of DTF in
Personality Assessment
  • Stark et al. (2001) found that most scales of the
    16PF showed statistically significant DIF and DTF
    across samples of job applicants and
    nonapplicants
  • DIF/DTF was interpreted as evidence of faking
  • Research issue
  • Does DTF have any practical implications for
    selection?
  • In this study, we used the new DTF effect size
    indices to answer this question

14
DTF-R and d Results
15
RSR Results for Four Scales
Perfectionism DTF caused AI for Xcgt 6
No AI due to DTF for other scales, when
reasonable cut scores selected
16
Summary of Results
  • Although applicants scored higher on all scales,
    DTF favored nonapplicants in most cases.
  • When applicants were favored by DTF, the
    contributions to observed mean differences were
    small.
  • Thus, DTF did not tend to decrease the hiring of
    honest respondents.

17
Conclusions
  • Mean score inflation due to faking poses an
    important problem for selection
  • But, DTF did not appear to be the primary source.
  • Instead, models of faking should represent item
    measurement properties as relatively invariant,
    but examinees thetas as changed.
  • Future research should focus on developing
  • Methods to identify inflated thetas (Zickar
    Drasgow, 1996)
  • Creating personality inventories that are
    fake-resistant (White Young, 1998)
Write a Comment
User Comments (0)
About PowerShow.com