Can we compare people to each other using ipsative measures presentation

About This Presentation

Transcript and Presenter's Notes

Title: Can we compare people to each other using ipsative measures

1
Can we compare people to each other using
ipsative measures?

Prof Dave Bartram
Research Director
SHL Group plc
25th Biennial Conference of the Society for
Multivariate Analysis in the Behavioural Sciences
2 July 2006

2
Introduction

Both predictors (e.g. personality test items) and
criteria (e.g. line manager ratings on
competencies) can be constructed either using
likert item formats (normative) or forced-choice
formats (ipsative).
What are the differences?
What are their relative advantages and
disadvantages?
Generally argued that
Likert are easier to analyse but are subject to
halo and response bias effects.
Ipsative control response bias but can pose
problems for analysis due to constraints on
correlations.

3
Normative OPQ32
4
Ipsative OPQ32
5
What is ipsative measurement?

Two methods forced choice or ipsatization
Forced choice
choose between items loading on different scales
allocating fixed number of points between scales.
Ipsatization
Subtract average score across scales from each
scale score. This removes one degree of freedom
and locates each profile about the same
mid-point.
Can also go further and equate score variance
across people, so that all profiles are equally
variable across scales (this is not often done)

6
Score ipsatization
7
Forced choice is not always ipsative

Some instruments use forced-choice for pairs
of items from the same scales
A1 I prefer to be alone extraversion ve
A2 I like to spend time with other people
extraversion ve
Versus
B1 I prefer to be alone extraversion ve
B2 I often feel anxious neuroticism ve
A is not ipsative, B is.

8
Alternate ipsative forced choice formats

Pairs of items - dyads
I prefer to be alone Extraversion ve
I often feel anxious Neuroticism ve
Triplets - triads
I prefer to be alone Extraversion ve
I often feel anxious Neuroticism ve
I try to help others Agreeable ve
Quads -tetrads
I prefer to be alone Extraversion ve
I often feel anxious Neuroticism ve
I try to help others Agreeable ve
I am creative Openness ve

9
Scoring a quad
10
Some ipsative maths

Sum of scores across all scales sums to a
constant (by definition)
Lose one degree of freedom. For k scales, df
(k-1).
Scale scores tend to correlate negatively
High scores on one scale ? lower scores on other
scales
Average scale intercorrelation constrained
K2 scales, r -1.0
k4 scales, r -.33
k16 scales, r -.07
k32 scales, r -.03

11
Classical test theory
12
Test theory for ipsative

For ipsative forced choice pairs, c is a constant
equal to the number of items measuring each scale
and k is the number of scales.
ck total number of items
ck/2 total score (one point is given for each
pair of items).
If T true score, eerror, Xobserved score

Meade, 2004
13
Test theory for forced choice ipsative

Meade (2004) argues that the observed score on
any ipsative scale is a function of the true
score and error on that scale minus some function
of the true scores and errors on all the other
scales.
This needs modifying in terms of whether the
ipsative design is complete or incomplete (i.e.
does not contain all possible scale pairings).
So long as the design is balanced, however, this
should not have any differential biasing effects
on scales.
It does however explain why scale scores are
negatively correlated.
Also explains why error terms are correlated in
SEM.

14
Alternate models

Normative
Xi ti erand faking central tendency
acquiescence
Ipsative
Xi ti erand - f ( tjej, where j?i )

15
SEM model for an ipsative quad
I4
ltlt Not included
16
Example OPQ32i quad 1 (UK English)
17
Ipsative Controversies
18
Ipsativity and self-referencing.

The key feature of ipsative measurement is that
it requires people to make comparisons between
trait strengths of different scales.
It is often called self-referenced measurement
because of this.
This is a misnomer, as one can argue that all
multivariate self-report measures are
self-referenced.
However, as a consequence it is argued that one
cannot therefore compare peoples scores on
ipsative scales.
I will argue that with large numbers of scales
(20 or more) the constraints that scores on
scales place on the absolute values each can have
are not substantive and have minimal impact in
practice.

19
Percent raw score point change in score on Scalei
when score changes one raw score point on Scalej
lt 5
DISC 33.3
OPQ32 3.2
20
Construct Validity

Meaning of the scale is a comparison
Correlation matrix is constrained
Average correlation
Are not all scales understood by comparison with
other traits?
When rating an item people compare themselves
both against others and against themselves
whether the format is ipsative or normative

21
Scale Intercorrelations - OPQ32
22
Reliability - Issues

Reliability requires interval measure. Some claim
that there are inflated results from ipsative
data
Reliability can be (a little) distorted with
ipsative measure. Tenopyr (1998)
Reliability is conserved but can be depressed
(Bartram, 1996 Karpatschof Elkjaer, 2000)
Bartram (1996) derived equation for reliability
of ipsative data by showing that reliability is
reduced as a direct function of the range
restriction associated with loss of 1 df.
OPQ32 uses 208 items for normative and 416 for
ipsative to ensure equal reliabilities.

23
Normative-Ipsative equivalence

N488 training delegates
For ipsative, median alpha0.86
For normative, median alpha0.83.
Alternate form scale reliabilities for
OPQ32n-OPQ32i median 0.71
These correlations are lower than internal
consistency reliabilities for the two versions,
or testretest reliabilities for the OPQ32n
Corrected for attenuation, true-score
correlations have median 0.83.

24
Big 5 equivalence n488
25
Profile similarities k32, n488
26
What determines the profile similarity?

The normative profile average deviations were
correlated with the profile similarity
coefficients.
The correlation is r0.51 n488,
The similarity between a persons normative and
ipsative profile is higher for people with more
differentiated normative profiles.
Correlation between ipsative consistency scores
and the similarity between normative and ipsative
profiles.
The correlation is r0.52 n488.
People with a more consistent pattern of
responding to the forced-choice format are likely
to have a similar normative profile and that is
likely to be relatively well differentiated.

27
Scale dependencies in normative and ipsative
forms.

Likert ratings for a 12 item 1-5 rating scale
have theoretical range from 12 to 60.
In practice score obtained is constrained by
scores on other scales, as scales are correlated
For the normative OPQ32 average R for predicting
scalei from all other scales except scalei is
0.66.
For ipsative this is 1.0 by definition.

28
Normative bias

Positive normative bias represents a shift of the
profile to the right (average greater than sten
5.5)
Negative normative bias represents a shift of the
profile to the left (average less than sten 5.5)
For OPQ32n, SD of standardized average scores
across scales 0.27 (n242)
For OPQ32i, by definition, SD0.

29
Distribution of average normative scale z-scores
30
Is normative bias related to personality?

Prediction of normative bias (stepwise)
Using normative scale residuals (n) R0.76 (0.75
adjusted)
Using ipsative scales (i) R0.53 (0.50 adjusted)
Normative residual predictor and ipsative scale
predictor correlate r0.62
People who have positive normative bias are
More
(n i) Achieving Controlling Optimistic
Vigorous
(n) Evaluative Conscientious
(i) Caring Detail Conscious
Less
(n i) Worrying
(n) Decisive Competitive Variety seeking
Modest Independent Conventional

N242 in all cases
31
Implications

So long as number of scales is large you can
compare people across scales.
Normative and ipsative versions are not parallel
forms of the same test, they provide
qualitatively different but highly correlated
information
Most people have similar ipsative and normative
profiles both in shape and location.
Some people will have moderate or large score
differences across forms, especially if their
profiles are flat or if they are showing strong
response bias on the normative version
Where there is a difference between forms, which
is correct or are they both correct?

32
Criterion validity of ipsative measures
33
Criterion Validity

Constraints on scales will have an impact on
external correlations
Lower average inter-scale correlation should
optimise additive effect of scale variances
(increasing multiple Rs)
Greer and Dunlap (1997) found that Type 1 error
rates well preserved and power nearly equivalent
in Monte Carlo study of ANOVA.
Get similar validities for individual scales for
normative and ipsative

34
Other research

Jackson, Wroblewski Ashton (2000) compared
single stimulus and forced choice format for
integrity-related personality items.
Those simulating applying for a job gave 1 SD
better scores when using single stimulus format
instrument and this lead to lower validity
Shift in mean only one third for forced choice
format and validity maintained.
Martin Bowen Hunt (2002) show that ipsative OPQ
is more resistant to faking instructions than
normative.
No differences between faking and honest group
for ipsative, but large differences for normative.

35
Other research Christiansen et al (2005)

Both FC and normative format susceptible to
distortion, but FC more robust with applicants
For validity re supervisor ratings, distortion
had more deleterious effect on validity of
normative, with some evidence for enhancement of
validity of FC format
High ability individuals tend to be better at
distorting FC format instruments than those of
lower ability.
Triad harder to distort than dyad format.
NB OPQ32 uses tetrad format

36
SHL Research Meta analysis of normative vs
ipsative validity data

19 studies (n3241) drawn from meta-analysis of
29 validity studies (Bartram, 2005)
Predictors included both normative and ipsative
forms of OPQ personality tests
Compare studies using likert format (OPQ32n, OPQ
CM5.2 and CCSQ 5.2) with those using
forced-choice format (OPQ32i, OPQ CM4.2, CCSQ
7.2) where the criterion measures where the same
(IMC or CCCI).
Criteria included the mixed item format
(normativeipsative) Inventory of Management
Competencies (IMC).
Compare the validities of likert rating with the
ipsative choices made by the same line managers
using IMC
Control over candidates, instrument, items and
raters.

37
Normative part of normative-ipsative IMC
38
Ipsative part of normative-ipsative IMC
39
Ipsative vs normative predictor
40
Ipsative vs Normative IMC criteria
41
Summary of results

For comparison of predictors
Ipsative k13, n2,348 mean ? 0.268
Normative k4 n 409 mean ? 0.223
For comparison of criterion measures k9,
n1460
Ipsative mean ? 0.315
Normative mean ? 0.189

42
Conclusions

Ipsative scales are not identical to normative
However, with more scales (kgt20) results are very
similar
Both have advantages and disadvantages
As predictors both have good validity, but
ipsative has better differentiation and is more
resistant to distortion
Choice depends on application and likely sources
of error/bias
We can enhance validity by using forced-choice
formats to reduce halo effects for criterion
ratings.

43
We should not argue against the use of a
methodology that provides real practical benefits
just because we do not understand its
psychometric complexities.

Like the bumble bee, rather than using theory to
prove that it cannot fly, we should reflect on
practice and try to understand how it does.

44
Thank you

Email dave.bartram_at_shlgroup.com for copies

45
References

Baron, H. (1996). Strengths and limitations of
ipsative instruments. Journal of Occupational and
Organizational Psychology, 69. 49-56.
Baron, H. (2002). Working with ipsative measures.
Paper presented at the 17th annual conference of
the Society for Industrial and Organizational
Psychology, April 2002, Toronto, Canada.
Bartram, D. (1996). The relationship between
ipsatized and normative measures of personality.
Journal of Occupational and Organizational
Psychology, 69, 25-39.
Bartram, D. (2005) The Great Eight Competencies
A criterion-centric approach to validation.
Journal of Applied Psychology, 90, 1185-1203.
Christiansen, N., Burns, G.N., Montgomery, G.E.
(2005). Reconsidering forced-choice item formats
for applicant personality assessment. Human
Factors, 18, 267-307.
Closs, S. J. (1996). On the factoring and
interpretation of ipsative data. Journal of
Occupational and Organizational Psychology, 69,
41-47.
Converse, P.D., Oswald, F.L., Imus, A., Hedricks,
C., Roy, R., Butera, H. (undated ms). Comparing
yourself with many people or comparing yourself
on many traits Effects of personality test
format on faking, criterion-related validity and
test-taker reactions.
Converse, P.D., Oswald, F.L., Imus, A., Hedricks,
C., Roy, R., Butera, H. (undated ms). Forcing
choices in personality measurement Benefits and
limitations.
Jackson, D.N., Wroblewski, V.R., Ashton, M.C.
(2000). The impact of faking on employment tests
Does forced-choice offer a solution? Human
Performance, 13, 371-388.
Karpatschof, B Elkjaeer, H. K. (2000) Yet the
Bumblebee Flies The reliability of ipsative
scores examined by empirical data and a
simulation study. Research Report no 1.
Department Psychology, University of Copenhagen.
King, L.M., Hunter, J.E., Schmidt, F.L. (1980).
Halo in a multidimensional forced-choice
performance evaluation scale. Journal of Applied
Psychology, 65, 507-516.
Martin, B.A., Bowen, C-C., Hunt, S.T. (2002).
How effective are people at faking on personality
questionnaires? Personality and Individual
Differences, 32, 247-256.
Matthews, G., Oddy, K. (1997). Ipsative and
normative scales in adjectival measurement of
personality Problems of bias and discrepancy.
International Journal of Selection and
Assessment, 5, 169-182.
Meade, A. (2004). Psychometric problems and
issues involved with creating and using ipsative
measures for selection. Journal of Occupational
and Organizational Psychology, 77, 531-552.
McLoy, R.A. (2005). A silk purse from a sows
ear Retrieving normative information from
multi-dimensionla forced-choice items.
Organizational Research Methods, 8(2), 222-248.
Saville, P. Willson, E. (1991). The
reliability and validity of normative and
ipsative approaches in the measurement of
personality. Journal of Occupational and
Organizational Psychology, 64, 219-238.
SHL (1993a). Inventory of Management
Competencies Manual and Users Guide. Thames
Ditton, England SHL Group plc.
SHL (1993b). OPQ Concept Model Manual and Users
Guide. Thames Ditton, England SHL Group plc.
SHL (1999). OPQ32 Manual and Users Guide.
Thames Ditton, England SHL Group plc.

Write a Comment

User Comments (0)

About PowerShow.com

Can we compare people to each other using ipsative measures PowerPoint PPT Presentation