Title: Can we compare people to each other using ipsative measures
1Can we compare people to each other using
ipsative measures?
- Prof Dave Bartram
- Research Director
- SHL Group plc
- 25th Biennial Conference of the Society for
Multivariate Analysis in the Behavioural Sciences - 2 July 2006
2Introduction
- Both predictors (e.g. personality test items) and
criteria (e.g. line manager ratings on
competencies) can be constructed either using
likert item formats (normative) or forced-choice
formats (ipsative). - What are the differences?
- What are their relative advantages and
disadvantages? - Generally argued that
- Likert are easier to analyse but are subject to
halo and response bias effects. - Ipsative control response bias but can pose
problems for analysis due to constraints on
correlations.
3Normative OPQ32
4Ipsative OPQ32
5What is ipsative measurement?
- Two methods forced choice or ipsatization
- Forced choice
- choose between items loading on different scales
- allocating fixed number of points between scales.
- Ipsatization
- Subtract average score across scales from each
scale score. This removes one degree of freedom
and locates each profile about the same
mid-point. - Can also go further and equate score variance
across people, so that all profiles are equally
variable across scales (this is not often done)
6Score ipsatization
7Forced choice is not always ipsative
- Some instruments use forced-choice for pairs
of items from the same scales - A1 I prefer to be alone extraversion ve
- A2 I like to spend time with other people
extraversion ve - Versus
- B1 I prefer to be alone extraversion ve
- B2 I often feel anxious neuroticism ve
- A is not ipsative, B is.
8Alternate ipsative forced choice formats
- Pairs of items - dyads
- I prefer to be alone Extraversion ve
- I often feel anxious Neuroticism ve
- Triplets - triads
- I prefer to be alone Extraversion ve
- I often feel anxious Neuroticism ve
- I try to help others Agreeable ve
- Quads -tetrads
- I prefer to be alone Extraversion ve
- I often feel anxious Neuroticism ve
- I try to help others Agreeable ve
- I am creative Openness ve
9Scoring a quad
10Some ipsative maths
- Sum of scores across all scales sums to a
constant (by definition) - Lose one degree of freedom. For k scales, df
(k-1). -
- Scale scores tend to correlate negatively
- High scores on one scale ? lower scores on other
scales - Average scale intercorrelation constrained
- K2 scales, r -1.0
- k4 scales, r -.33
- k16 scales, r -.07
- k32 scales, r -.03
11Classical test theory
12Test theory for ipsative
- For ipsative forced choice pairs, c is a constant
equal to the number of items measuring each scale
and k is the number of scales. - ck total number of items
- ck/2 total score (one point is given for each
pair of items). - If T true score, eerror, Xobserved score
Meade, 2004
13Test theory for forced choice ipsative
- Meade (2004) argues that the observed score on
any ipsative scale is a function of the true
score and error on that scale minus some function
of the true scores and errors on all the other
scales. - This needs modifying in terms of whether the
ipsative design is complete or incomplete (i.e.
does not contain all possible scale pairings). - So long as the design is balanced, however, this
should not have any differential biasing effects
on scales. - It does however explain why scale scores are
negatively correlated. - Also explains why error terms are correlated in
SEM.
14Alternate models
- Normative
- Xi ti erand faking central tendency
acquiescence - Ipsative
- Xi ti erand - f ( tjej, where j?i )
15SEM model for an ipsative quad
I4
ltlt Not included
16Example OPQ32i quad 1 (UK English)
17Ipsative Controversies
18Ipsativity and self-referencing.
- The key feature of ipsative measurement is that
it requires people to make comparisons between
trait strengths of different scales. - It is often called self-referenced measurement
because of this. - This is a misnomer, as one can argue that all
multivariate self-report measures are
self-referenced. - However, as a consequence it is argued that one
cannot therefore compare peoples scores on
ipsative scales. - I will argue that with large numbers of scales
(20 or more) the constraints that scores on
scales place on the absolute values each can have
are not substantive and have minimal impact in
practice.
19Percent raw score point change in score on Scalei
when score changes one raw score point on Scalej
lt 5
DISC 33.3
OPQ32 3.2
20Construct Validity
- Meaning of the scale is a comparison
- Correlation matrix is constrained
- Average correlation
-
- Are not all scales understood by comparison with
other traits? - When rating an item people compare themselves
both against others and against themselves
whether the format is ipsative or normative
21Scale Intercorrelations - OPQ32
22Reliability - Issues
- Reliability requires interval measure. Some claim
that there are inflated results from ipsative
data - Reliability can be (a little) distorted with
ipsative measure. Tenopyr (1998) - Reliability is conserved but can be depressed
(Bartram, 1996 Karpatschof Elkjaer, 2000) - Bartram (1996) derived equation for reliability
of ipsative data by showing that reliability is
reduced as a direct function of the range
restriction associated with loss of 1 df. - OPQ32 uses 208 items for normative and 416 for
ipsative to ensure equal reliabilities.
23Normative-Ipsative equivalence
- N488 training delegates
- For ipsative, median alpha0.86
- For normative, median alpha0.83.
- Alternate form scale reliabilities for
OPQ32n-OPQ32i median 0.71 - These correlations are lower than internal
consistency reliabilities for the two versions,
or testretest reliabilities for the OPQ32n - Corrected for attenuation, true-score
correlations have median 0.83.
24Big 5 equivalence n488
25Profile similarities k32, n488
26What determines the profile similarity?
- The normative profile average deviations were
correlated with the profile similarity
coefficients. - The correlation is r0.51 n488,
- The similarity between a persons normative and
ipsative profile is higher for people with more
differentiated normative profiles. - Correlation between ipsative consistency scores
and the similarity between normative and ipsative
profiles. - The correlation is r0.52 n488.
- People with a more consistent pattern of
responding to the forced-choice format are likely
to have a similar normative profile and that is
likely to be relatively well differentiated.
27Scale dependencies in normative and ipsative
forms.
- Likert ratings for a 12 item 1-5 rating scale
have theoretical range from 12 to 60. - In practice score obtained is constrained by
scores on other scales, as scales are correlated - For the normative OPQ32 average R for predicting
scalei from all other scales except scalei is
0.66. - For ipsative this is 1.0 by definition.
28Normative bias
- Positive normative bias represents a shift of the
profile to the right (average greater than sten
5.5) - Negative normative bias represents a shift of the
profile to the left (average less than sten 5.5) - For OPQ32n, SD of standardized average scores
across scales 0.27 (n242) - For OPQ32i, by definition, SD0.
29Distribution of average normative scale z-scores
30Is normative bias related to personality?
- Prediction of normative bias (stepwise)
- Using normative scale residuals (n) R0.76 (0.75
adjusted) - Using ipsative scales (i) R0.53 (0.50 adjusted)
- Normative residual predictor and ipsative scale
predictor correlate r0.62 - People who have positive normative bias are
- More
- (n i) Achieving Controlling Optimistic
Vigorous - (n) Evaluative Conscientious
- (i) Caring Detail Conscious
- Less
- (n i) Worrying
- (n) Decisive Competitive Variety seeking
Modest Independent Conventional
N242 in all cases
31Implications
- So long as number of scales is large you can
compare people across scales. - Normative and ipsative versions are not parallel
forms of the same test, they provide
qualitatively different but highly correlated
information - Most people have similar ipsative and normative
profiles both in shape and location. - Some people will have moderate or large score
differences across forms, especially if their
profiles are flat or if they are showing strong
response bias on the normative version - Where there is a difference between forms, which
is correct or are they both correct?
32Criterion validity of ipsative measures
33Criterion Validity
- Constraints on scales will have an impact on
external correlations - Lower average inter-scale correlation should
optimise additive effect of scale variances
(increasing multiple Rs) - Greer and Dunlap (1997) found that Type 1 error
rates well preserved and power nearly equivalent
in Monte Carlo study of ANOVA. - Get similar validities for individual scales for
normative and ipsative
34Other research
- Jackson, Wroblewski Ashton (2000) compared
single stimulus and forced choice format for
integrity-related personality items. - Those simulating applying for a job gave 1 SD
better scores when using single stimulus format
instrument and this lead to lower validity - Shift in mean only one third for forced choice
format and validity maintained. - Martin Bowen Hunt (2002) show that ipsative OPQ
is more resistant to faking instructions than
normative. - No differences between faking and honest group
for ipsative, but large differences for normative.
35Other research Christiansen et al (2005)
- Both FC and normative format susceptible to
distortion, but FC more robust with applicants - For validity re supervisor ratings, distortion
had more deleterious effect on validity of
normative, with some evidence for enhancement of
validity of FC format - High ability individuals tend to be better at
distorting FC format instruments than those of
lower ability. - Triad harder to distort than dyad format.
- NB OPQ32 uses tetrad format
36SHL Research Meta analysis of normative vs
ipsative validity data
- 19 studies (n3241) drawn from meta-analysis of
29 validity studies (Bartram, 2005) - Predictors included both normative and ipsative
forms of OPQ personality tests - Compare studies using likert format (OPQ32n, OPQ
CM5.2 and CCSQ 5.2) with those using
forced-choice format (OPQ32i, OPQ CM4.2, CCSQ
7.2) where the criterion measures where the same
(IMC or CCCI). - Criteria included the mixed item format
(normativeipsative) Inventory of Management
Competencies (IMC). - Compare the validities of likert rating with the
ipsative choices made by the same line managers
using IMC - Control over candidates, instrument, items and
raters.
37Normative part of normative-ipsative IMC
38Ipsative part of normative-ipsative IMC
39Ipsative vs normative predictor
40Ipsative vs Normative IMC criteria
41Summary of results
- For comparison of predictors
- Ipsative k13, n2,348 mean ? 0.268
- Normative k4 n 409 mean ? 0.223
- For comparison of criterion measures k9,
n1460 - Ipsative mean ? 0.315
- Normative mean ? 0.189
42Conclusions
- Ipsative scales are not identical to normative
- However, with more scales (kgt20) results are very
similar - Both have advantages and disadvantages
- As predictors both have good validity, but
ipsative has better differentiation and is more
resistant to distortion - Choice depends on application and likely sources
of error/bias - We can enhance validity by using forced-choice
formats to reduce halo effects for criterion
ratings.
43We should not argue against the use of a
methodology that provides real practical benefits
just because we do not understand its
psychometric complexities.
- Like the bumble bee, rather than using theory to
prove that it cannot fly, we should reflect on
practice and try to understand how it does.
44Thank you
- Email dave.bartram_at_shlgroup.com for copies
45References
- Baron, H. (1996). Strengths and limitations of
ipsative instruments. Journal of Occupational and
Organizational Psychology, 69. 49-56. - Baron, H. (2002). Working with ipsative measures.
Paper presented at the 17th annual conference of
the Society for Industrial and Organizational
Psychology, April 2002, Toronto, Canada. - Bartram, D. (1996). The relationship between
ipsatized and normative measures of personality.
Journal of Occupational and Organizational
Psychology, 69, 25-39. - Bartram, D. (2005) The Great Eight Competencies
A criterion-centric approach to validation.
Journal of Applied Psychology, 90, 1185-1203. - Christiansen, N., Burns, G.N., Montgomery, G.E.
(2005). Reconsidering forced-choice item formats
for applicant personality assessment. Human
Factors, 18, 267-307. - Closs, S. J. (1996). On the factoring and
interpretation of ipsative data. Journal of
Occupational and Organizational Psychology, 69,
41-47. - Converse, P.D., Oswald, F.L., Imus, A., Hedricks,
C., Roy, R., Butera, H. (undated ms). Comparing
yourself with many people or comparing yourself
on many traits Effects of personality test
format on faking, criterion-related validity and
test-taker reactions. - Converse, P.D., Oswald, F.L., Imus, A., Hedricks,
C., Roy, R., Butera, H. (undated ms). Forcing
choices in personality measurement Benefits and
limitations. - Jackson, D.N., Wroblewski, V.R., Ashton, M.C.
(2000). The impact of faking on employment tests
Does forced-choice offer a solution? Human
Performance, 13, 371-388. - Karpatschof, B Elkjaeer, H. K. (2000) Yet the
Bumblebee Flies The reliability of ipsative
scores examined by empirical data and a
simulation study. Research Report no 1.
Department Psychology, University of Copenhagen. - King, L.M., Hunter, J.E., Schmidt, F.L. (1980).
Halo in a multidimensional forced-choice
performance evaluation scale. Journal of Applied
Psychology, 65, 507-516. - Martin, B.A., Bowen, C-C., Hunt, S.T. (2002).
How effective are people at faking on personality
questionnaires? Personality and Individual
Differences, 32, 247-256. - Matthews, G., Oddy, K. (1997). Ipsative and
normative scales in adjectival measurement of
personality Problems of bias and discrepancy.
International Journal of Selection and
Assessment, 5, 169-182. - Meade, A. (2004). Psychometric problems and
issues involved with creating and using ipsative
measures for selection. Journal of Occupational
and Organizational Psychology, 77, 531-552. - McLoy, R.A. (2005). A silk purse from a sows
ear Retrieving normative information from
multi-dimensionla forced-choice items.
Organizational Research Methods, 8(2), 222-248. - Saville, P. Willson, E. (1991). The
reliability and validity of normative and
ipsative approaches in the measurement of
personality. Journal of Occupational and
Organizational Psychology, 64, 219-238. - SHL (1993a). Inventory of Management
Competencies Manual and Users Guide. Thames
Ditton, England SHL Group plc. - SHL (1993b). OPQ Concept Model Manual and Users
Guide. Thames Ditton, England SHL Group plc. - SHL (1999). OPQ32 Manual and Users Guide.
Thames Ditton, England SHL Group plc.