Title: Preference Analysis
1Preference Analysis
Joachim Giesen and Eva Schuberth May 24, 2006
2Outline
- Motivation
- Approximate sorting
- Lower bound
- Upper bound
- Aggregation
- Algorithm
- Experimental results
- Conclusion
3Motivation
- Find preference structure of consumer w.r.t. a
set of products - Common assign value function to products
- Value function determines a ranking of products
- Elicitation pairwise comparisons
- Problem deriving metric value function from
non-metric information - ? We restrict ourselves to finding ranking
4Motivation
? Find for every respondent a ranking individually
- Efficiency measure number of comparisons
- Comparison based sorting algorithm
- Lower Bound comparisons
- As set of products can be large this is too much
5Motivation
- Possible solutions
- Approximation
- Aggregation
- Modeling and distribution assumptions
6Approximation(joint work with J. Giesen and M.
Stojakovic)
- Lower bound (proof)
- Algorithm
7Approximation
- Consumers true ranking of n products
corresponds toIdentity increasing permutation
id on 1, .., n - WantedApproximation of ranking corresponds
to s.t. small
8Metric on Sn
- Needed metric on
- Meaningful in the market research context
- Spearmans footrule metric D
9We show
- To approximate ranking within expected distance
- at least comparisons necessary
comparisons always sufficient
10Lower bound
- randomized approximate sorting algorithm
-
A
If for every input permutation the expected
distance of the output to id is at most r, then A
performs at leastcomparisons in the worst
case.
11Lower bound Proof
Follows Yaos Minimax Principle
12Lower bound Lemma
- For rgt0
- ball centered at with radius r
r
id
Lemma
13Lower bound Proof of Lemma
-
- uniquely determined by the sequence
- For sequence of non-negative integers at
most 2n permutations satisfy
14Lower bound deterministic case
Now to show
For fixed, the number of input permutations
which have output at distance more than 2r to id
is more than
15Lower bound deterministic case
k comparisons ? 2k classes of same outcome
16Lower bound deterministic case
k comparisons ? 2k classes of same outcome
17Lower bound deterministic case
For in the same class
For in the same class
18Lower bound deterministic case
For in the same class
19Lower bound deterministic case
At most 2k input permutations have same output
20Lower bound deterministic case
At most input permutations
with output in
21Lower bound deterministic case
At least input
permutations
with output outside
22Upper Bound
- Algorithm (suggested by Chazelle) approximates
- any ranking within distance
- with less than comparisons.
23Algorithm
- Partitioning of elements into equal sized bins
- Elements within bin smaller than any element in
subsequent bin. - No ordering of elements within a bin
- Output permutation consistent with sequence of
bins
24Algorithm
Round
0
1
2
25Analysis of algorithm
- m rounds ? 2m bins
- Output ranking consistent with ordering of bins
26Algorithm Theorem
Any ranking consistent with bins computed in
rounds, i.e. with less than comparisons has
distance at most
27Approximation Summary
- For sufficiently large error less comparisons
than for exact sortingerror
, const
comparisonserror
comparisons - For real applications still too much
- Individual elicitation of value function not
possible - ? Second approach Aggregation
28Aggregation(joint work with J. Giesen and D.
Mitsche)
- Motivation
- We think that population splits into preference/
customer types - Respondents answer according to their type (but
deviation possible) - Instead of
- Individual preference analysis or
- aggregation over the population
- ? aggregate within customer types
29Aggregation
- Idea
- Ask only a constant number of questions (pairwise
comparisons) - Ask many respondents
- Cluster the respondents according to answers into
types - Aggregate information within a cluster to get
type rankings
Philosophy First segment then aggregate
30Algorithm
- The algorithm works in 3 phases
- Estimate the number k of customer types
- Segment the respondents into the k customer types
- Compute a ranking for each customer type
31Algorithm
Every respondent performs pairwise
comparisons. Basic data structure matrix A
aij Entry aij in -1,1,0, refers to respondent
i and the j-th product pair (x,y)
if respondent i prefers y over x
if respondent i prefers x over y
if respondent i has not compared x and y
32Algorithm
- Define B AAT
- Then Bij number of product pairs on which
respondent i and j agree minus number of pairs on
which they disagree (not counting 0s).
33Algorithm phase 1
- Phase 1 Estimation of number k of customer types
- Use matrix B
- Analyze spectrum of B
- We expect k largest eigenvalues of B to be
substantially larger than the other eigenvalues - ? Search for gap in the eigenvalues
34Algorithm phase 2
- Phase 2 Cluster respondents into customer types
- Use again matrix B
- Compute projector P onto the space spanned by the
eigenvectors to the k largest eigenvalues of B - Every respondent corresponds to a column of P
- Cluster columns of P
35Algorithm phase 2
- Intuition for using projector example on graphs
36Algorithm phase 2
Ad
37Algorithm phase 2
P
38Algorithm phase 2
P
39Algorithm phase 2
Embedding of the columns of P
40Algorithm phase 3
- Phase 3 Compute the ranking for each type
- For each type t compute characteristic vector
ct -
- For each type t compute ATctif entry for
product pair (x,y) is
if respondent i belongs to that type
otherwise
positive x preferred over y by t
negative y preferred over x by t
zero type t is indifferent
41Experimental study
- On real world data
- 21 data sets from Sawtooth Software, Inc.
(Conjoint data sets) - Questions
- Do real populations decompose into different
customer types - Comparison of our algorithm to Sawtooths
algorithm
42Conjoint structures
- Attributes Sets A1, .. An, Aimi
- An element of Ai is called level of the i-th
attribute - A product is an element of A1x x An
- Example Car
- Number of seats 5, 7
- Cargo area small, medium, large
- Horsepower 240hp, 185hp
- Price 29000, 33000, 37000
In practical conjoint studies
43Quality measures
- Difficulty we do not know the real type rankings
- We cannot directly measure quality of result
- Other quality measures
- Number of inverted pairs average
number of inversions in the partial rankings of
respondents in type i with respect to the j-th
type ranking - Deviation probability
- Hit Rate (Leave one out experiments)
44 respondents 270Size of study 8 x 3 x 4
96 questions 20
Study 1
Largest eigenvalues of matrix B
45 respondents 270Size of study 8 x 3 x 4
96 questions 20
Study 1
- two types
- Size of clusters 179 91
Number of inversions and deviation probability
46 respondents 270Size of study 8 x 3 x 4
96 questions 20
Study 1
- Hitrates
- Sawtooth ?
- Our algorithm 69
47 respondents 539Size of study 4 x 3 x 3
x 5 180 questions 30
Study 2
Largest eigenvalues of matrix B
48 respondents 539Size of study 4 x 3 x 3
x 5 180 questions 30
Study 2
- four types
- Size of clusters 81 119 130 209
Number of inversions and deviation probability
49 respondents 539Size of study 4 x 3 x 3
x 5 180 questions 30
Study 2
- Hitrates
- Sawtooth 87
- Our algorithm 65
50 respondents 1184Size of study 9 x 6 x
5 270 questions 48
Study 3
Size of clusters 6 3 1164 8 3
Size of clusters 3 1175 6
1-p 12
Largest eigenvalues of matrix B
51 respondents 1184Size of study 9 x 6 x
5 270 questions 48
Study 3
- Hitrates
- Sawtooth 78
- Our algorithm 62
52 respondents 300Size of study 6 x 4 x 6
x 3 x 2 3456 questions 40
Study 4
Largest eigenvalues of matrix B
53 respondents 300Size of study 6 x 4 x 6
x 3 x 2 3456 questions 40
Study 4
- Hitrates
- Sawtooth 85
- Our algorithm 51
54Aggregation - Conclusion
- Segmentation seems to work well in practice.
- Hitrates not goodReason information too sparse
- ? Additional assumptions necessary
- Exploit conjoint structure
- Make distribution assumptions
55 56Yaos Minimax Principle
- finite set of input instances
- finite set of deterministic algorithms
- C(i,a) cost of algorithm a on input i, where i
and a
For all distributions p over and q over