Using IRT Methods to Construct and Score Personality Measures that are FakeResistant - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

Using IRT Methods to Construct and Score Personality Measures that are FakeResistant

Description:

Using IRT Methods to Construct and Score Personality Measures that are Fake-Resistant ... Benefits may not be realized unless IRT model, used for parameter estimation, ... – PowerPoint PPT presentation

Number of Views:159

Avg rating:3.0/5.0

Slides: 23

Provided by: Stephe4

Category:

more less

Transcript and Presenter's Notes

Title: Using IRT Methods to Construct and Score Personality Measures that are FakeResistant

1
Using IRT Methods to Construct and Score
Personality Measures that are Fake-Resistant

Stephen Stark
Georgia Institute of Technology
Oleksandr S. Chernyshenko
University of Canterbury

2
Addressing Quality and Fairness of Personality
Testing

IRT methods can be used to
Understand nature of response process
Test hypotheses about behavior by comparing fit
of models bearing different assumptions
Facilitate computer adaptive testing
Create shorter, informative tests that provide
accurate scoring
Benefits may not be realized unless IRT model,
used for parameter estimation, adequately fits
the data

3
Modeling Responses to Traditional Personality
Items

Stark, Chernyshenko, Drasgow (2002) compared
fit of ideal point and dominance models to 16PF
data
Found comparable fit for several scales
Some scales, which were fit poorly by dominance
models, were fit better by ideal point models
Conclusion
Ideal point process seems appropriate for
personality items
Fly in the ointment
Correct specification of response process does
not guarantee more accurate assessment, because
traditional items are easily FAKED

4
How to Deal With Faking?

Social Desirability (SD) scales often used to
detect and correct for faking
Adjustments made to content scale scores
Little effect on validity
Correcting for faking using SD scores is
problematic, because
SD scales may function differently across testing
situations (Stark, Chernyshenko, Chan, Lee
Drasgow, 2001)
Need to develop fake-resistant items

5
Examples of Traditional Itemsthat are Easily
Faked
In each case, socially desirable response is
obvious.

I get along well with others. (A)
I try to be the best at everything I do. (C)
I insult people. (A-)
My peers call me absent minded. (C-)

Because these items consist of individual
statements, theyare commonly referred to as
single stimulus items.
6
Fake-Resistant Format forAdministering
Personality Items

Create items by pairing stimuli that are similar
in desirability, but representing different
dimensions
Positive item
I get along well with others. (A)
I set very high standards for myself. (C)
Negative item
I insult people. (A-)
I work just enough to pass my classes. (C-)
Variation of this approach (Army AIM) has shown
score inflation of only 0.1 SD
(as compared to 1.5 SD for traditional items in
Army ABLE)

7
Purpose of Research

Develop IRT methods for constructing and scoring
pairwise preference personality items involving
statements on different dimensions
Formulation of model and scoring algorithm
Construction of fake-resistant tests
Investigation of scoring accuracy

8
Model Notation
9
General Model for Scoring Pairwise Preference
Responses

Respondent evaluates each stimulus (personality
statement) separately and makes independent
decisions about endorsement.
Stimuli may be on different dimensions.
Single stimulus response probabilities P0 and
P1 computed using a unidimensional ideal point
model for traditional items (GGUM)

1 Agree 0 Disagree
Refer to new pairwise preference model as MUPP
10
MUPP Scoring

Latent trait scores (thetas) and standard errors
(SEs) obtained using Bayes modal estimation.
Latent trait score represents a respondents
standing on a personality dimension
SE indicates the precision of a respondents score

11
Test Construction Involves 3 Steps

Estimating parameters for individual statements
representing different dimensions
Estimating social desirability ratings for
individual statements
Creating fake-resistant items by pairing
statements having similar desirability, but
representing different dimensions

12
Test Construction (Step 1)Get Parameters for
Individual Statements

Data
465 Army recruits were instructed to respond
HONESTLY to approximately 500 personality
statements measuring six dimensions, using 1 to 6
format
Response data were dichotomized
GGUM stimulus parameters were estimated for each
dimension separately using GGUM2000
Model-data fit was examined

13
Calibration and Fit Results fromStarks MODFIT
Computer Program
14
Test Construction (Steps 2 3)Creating
Fake-Resistant Items

Social desirability ratings obtained by
Computing mean proportion endorsement scores
obtained from 269 recruits instructed to FAKE
GOOD
Values ranged from 1 (Low) to 6 (High)
desirability.
Created fake-resistant items by pairing
statements
Similar desirability
Different dimensions
Different location parameters

15
Investigating MUPP Scoring Accuracy1-D
Simulation Study Design

Created 10, 20, and 40 item tests by pairing ADJ
stimuli
Could not create items that measured well at
extremes
Scoring accuracy examined by
Generating responses for 50 simulees at theta
values -3, -2.8. , 3
Comparing estimated to known thetas using bias
and error statistics, averaged over replications

16
1-D Simulation ResultsTest Information for 10,
20, 40 Item Tests
High information ? high measurement precision
17
1-D Simulation ResultsBias in Estimated Thetas
for 10, 20, 40 Item Tests
Correlations between estimated and generating
thetas gt .9 for all tests.
18
Investigating MUPP Scoring Accuracy2-D
Simulation Study Design

Two factors manipulated
Test length
Percent of unidimensional pairings (to set common
metric)
Nine tests required
Created in similar manner to 1-D case

Parameter recovery examined by
Generating response vectors for 50 simulees at
each of 169 points on 2-D grid i.e., -3, -2.5,
, 3 -3, -2.5, ,3
Comparing bias and error statistics across
experimental conditions using graphs and MANOVA

19
2-D Simulation ResultsTest Information Functions
20
2-D Simulation ResultsAvg. Absolute Bias Across
Dimensions Replications
21
2-D Simulation Results

MANOVA
Modest main effect for TESTLEN (EtaSqr .39)
But, biases did not decrease much
Estimation was accurate over wide range of grid
points, even for short tests
Weak main effect for UNIPCT (EtaSqr .08)
Only a relatively small percentage of
unidimensional pairings was needed.
Correlations between estimated and generating
thetas for all tests were large (.77 to .95).

22
Summary Conclusions

MUPP scoring procedure was accurate for1-D and
2-D tests
In practice, scoring accuracy depends on quality
of estimated stimulus (statement) parameters.
Tests should be constructed using
Roughly 20 items per dimension involved
10 20 of the items should be unidimensional
Test construction and scoring approach holds
promise for reducing effects of faking.

23
Related Research in Progress