Using IRT Methods to Construct and Score Personality Measures that are FakeResistant - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Using IRT Methods to Construct and Score Personality Measures that are FakeResistant

Description:

Using IRT Methods to Construct and Score Personality Measures that are Fake-Resistant ... Benefits may not be realized unless IRT model, used for parameter estimation, ... – PowerPoint PPT presentation

Number of Views:159
Avg rating:3.0/5.0
Slides: 23
Provided by: Stephe4
Category:

less

Transcript and Presenter's Notes

Title: Using IRT Methods to Construct and Score Personality Measures that are FakeResistant


1
Using IRT Methods to Construct and Score
Personality Measures that are Fake-Resistant
  • Stephen Stark
  • Georgia Institute of Technology
  • Oleksandr S. Chernyshenko
  • University of Canterbury

2
Addressing Quality and Fairness of Personality
Testing
  • IRT methods can be used to
  • Understand nature of response process
  • Test hypotheses about behavior by comparing fit
    of models bearing different assumptions
  • Facilitate computer adaptive testing
  • Create shorter, informative tests that provide
    accurate scoring
  • Benefits may not be realized unless IRT model,
    used for parameter estimation, adequately fits
    the data

3
Modeling Responses to Traditional Personality
Items
  • Stark, Chernyshenko, Drasgow (2002) compared
    fit of ideal point and dominance models to 16PF
    data
  • Found comparable fit for several scales
  • Some scales, which were fit poorly by dominance
    models, were fit better by ideal point models
  • Conclusion
  • Ideal point process seems appropriate for
    personality items
  • Fly in the ointment
  • Correct specification of response process does
    not guarantee more accurate assessment, because
    traditional items are easily FAKED

4
How to Deal With Faking?
  • Social Desirability (SD) scales often used to
    detect and correct for faking
  • Adjustments made to content scale scores
  • Little effect on validity
  • Correcting for faking using SD scores is
    problematic, because
  • SD scales may function differently across testing
    situations (Stark, Chernyshenko, Chan, Lee
    Drasgow, 2001)
  • Need to develop fake-resistant items

5
Examples of Traditional Itemsthat are Easily
Faked
In each case, socially desirable response is
obvious.
  • I get along well with others. (A)
  • I try to be the best at everything I do. (C)
  • I insult people. (A-)
  • My peers call me absent minded. (C-)

Because these items consist of individual
statements, theyare commonly referred to as
single stimulus items.
6
Fake-Resistant Format forAdministering
Personality Items
  • Create items by pairing stimuli that are similar
    in desirability, but representing different
    dimensions
  • Positive item
  • I get along well with others. (A)
  • I set very high standards for myself. (C)
  • Negative item
  • I insult people. (A-)
  • I work just enough to pass my classes. (C-)
  • Variation of this approach (Army AIM) has shown
    score inflation of only 0.1 SD
  • (as compared to 1.5 SD for traditional items in
    Army ABLE)

7
Purpose of Research
  • Develop IRT methods for constructing and scoring
    pairwise preference personality items involving
    statements on different dimensions
  • Formulation of model and scoring algorithm
  • Construction of fake-resistant tests
  • Investigation of scoring accuracy

8
Model Notation
9
General Model for Scoring Pairwise Preference
Responses
  • Respondent evaluates each stimulus (personality
    statement) separately and makes independent
    decisions about endorsement.
  • Stimuli may be on different dimensions.
  • Single stimulus response probabilities P0 and
    P1 computed using a unidimensional ideal point
    model for traditional items (GGUM)

1 Agree 0 Disagree
Refer to new pairwise preference model as MUPP
10
MUPP Scoring
  • Latent trait scores (thetas) and standard errors
    (SEs) obtained using Bayes modal estimation.
  • Latent trait score represents a respondents
    standing on a personality dimension
  • SE indicates the precision of a respondents score

11
Test Construction Involves 3 Steps
  • Estimating parameters for individual statements
    representing different dimensions
  • Estimating social desirability ratings for
    individual statements
  • Creating fake-resistant items by pairing
    statements having similar desirability, but
    representing different dimensions

12
Test Construction (Step 1)Get Parameters for
Individual Statements
  • Data
  • 465 Army recruits were instructed to respond
    HONESTLY to approximately 500 personality
    statements measuring six dimensions, using 1 to 6
    format
  • Response data were dichotomized
  • GGUM stimulus parameters were estimated for each
    dimension separately using GGUM2000
  • Model-data fit was examined

13
Calibration and Fit Results fromStarks MODFIT
Computer Program
14
Test Construction (Steps 2 3)Creating
Fake-Resistant Items
  • Social desirability ratings obtained by
  • Computing mean proportion endorsement scores
    obtained from 269 recruits instructed to FAKE
    GOOD
  • Values ranged from 1 (Low) to 6 (High)
    desirability.
  • Created fake-resistant items by pairing
    statements
  • Similar desirability
  • Different dimensions
  • Different location parameters

15
Investigating MUPP Scoring Accuracy1-D
Simulation Study Design
  • Created 10, 20, and 40 item tests by pairing ADJ
    stimuli
  • Could not create items that measured well at
    extremes
  • Scoring accuracy examined by
  • Generating responses for 50 simulees at theta
    values -3, -2.8. , 3
  • Comparing estimated to known thetas using bias
    and error statistics, averaged over replications

16
1-D Simulation ResultsTest Information for 10,
20, 40 Item Tests
High information ? high measurement precision
17
1-D Simulation ResultsBias in Estimated Thetas
for 10, 20, 40 Item Tests
Correlations between estimated and generating
thetas gt .9 for all tests.
18
Investigating MUPP Scoring Accuracy2-D
Simulation Study Design
  • Two factors manipulated
  • Test length
  • Percent of unidimensional pairings (to set common
    metric)
  • Nine tests required
  • Created in similar manner to 1-D case
  • Parameter recovery examined by
  • Generating response vectors for 50 simulees at
    each of 169 points on 2-D grid i.e., -3, -2.5,
    , 3 -3, -2.5, ,3
  • Comparing bias and error statistics across
    experimental conditions using graphs and MANOVA

19
2-D Simulation ResultsTest Information Functions
20
2-D Simulation ResultsAvg. Absolute Bias Across
Dimensions Replications
21
2-D Simulation Results
  • MANOVA
  • Modest main effect for TESTLEN (EtaSqr .39)
  • But, biases did not decrease much
  • Estimation was accurate over wide range of grid
    points, even for short tests
  • Weak main effect for UNIPCT (EtaSqr .08)
  • Only a relatively small percentage of
    unidimensional pairings was needed.
  • Correlations between estimated and generating
    thetas for all tests were large (.77 to .95).

22
Summary Conclusions
  • MUPP scoring procedure was accurate for1-D and
    2-D tests
  • In practice, scoring accuracy depends on quality
    of estimated stimulus (statement) parameters.
  • Tests should be constructed using
  • Roughly 20 items per dimension involved
  • 10 20 of the items should be unidimensional
  • Test construction and scoring approach holds
    promise for reducing effects of faking.

23
Related Research in Progress
  • Constructing and validating fake-resistant
    inventory involving
  • Multidimensional paired comparison items
  • Lower-order facets
  • Computerized adaptive item selection and scoring,
    based on MUPP
Write a Comment
User Comments (0)
About PowerShow.com