Title: Rank Aggregation Methods II Experiments
1Rank Aggregation Methods IIExperiments
2Recall the Rank Aggregation Problem
- m candidates (a.k.a. alternatives)
- M 1,,m set of candidates
- n voters (a.k.a. agents or judges)
- N 1,,n set of voters
- Each voter i, has an ranking ?i on M
- ?i(a) lt ?i(b) means i-th voter prefers a to b
- Ranking may be a total or partial order
- The rank aggregation problem
- Combine ?1,,?n into a single ranking ? on M,
which represents the social choice of the
voters. - Rank aggregation function f(?1,,?n) ?
- ? may be a total or partial order
-
3Experiments Distance Measures
- Goal Quantitatively compare different rank
aggregation methods. - Performance Measures
- (1) Spearman footrule distance is sum of
pointwise distances. It is normalized by dividing
this number by the maximum value (1/2)S2, value
between 0 and 1. - (2) Kendall tau distance counts the number of
pairwise disagreements. Dividing by the maximum
possible value (1/2)S(S - 1) we obtain a
normalized version, value between 0 and 1. - (3) The induced footrule distance is obtained by
taking the projections of a full list s with each
partial list. In a similar manner, induced
Kendall tau distance can be defined. - (4) The scaled footrule distance weights
contributions of elements based on the length of
the lists they are present in. If s is a full
list and t is a partial list, then - SF(s, t) Sum  s(i)/s) - (t(i)/t) .
Normalize SF by dividing by t/2.
4Experiments Distance Measures
- So for each aggregation method and each distance
measure we get a vector of values, each component
representing a distance to from the aggregation
to each voter list - Simplest is to take the average (or 1-norm)
- Other norms are interesting
- Mean square distance (2-norm)
- Max distance (8-norm)
5 Experiments Minimizing AverageAltavista (AV),
Alltheweb (AW), Excite (EX), Google (GG), Hotbot
HB),Lycos (LY), and Northernlight (NL)
K Kendall distance
SF scaled footrule distance IF induced
footrule distance LK Local
Kemenization
6Experiments in Spam Filtering
- Define spam to be web pages are low-ranked by
majority opinion (machine and human a
simplifying assumption) although they may be
highly ranked by some search engines - Intuition if a page spams most search engines
for a particular query, then no combination of
these search engines can filter the
spam.---garbage in, garbage out. - Spam pages are the Condorcet losers, and will
occupy the bottom of ranking that satisfies the
extended Condorcet criterion - Similarly, good pages will be in the Condorcet
winners, and will rank above the losers.
7Condorcet Criteria
- Condorcet Criterion
- An candidate of M which wins every other in
pairwise simple majority voting should be ranked
first. - Extended Condorcet Criterion (XCC)
- Version 1 If most voters prefer candidate a to
candidate b (i.e., of i s.t. ?i(a) lt ?i(b) is
at least n/2), then also ? should prefer a to b
(i.e., ?(a) lt ?(b)). - Version 2 If there is a partition (W, L) of M
such that for any x in W and y in L the majority
prefers x to y, then x must be ranked above y. W
is called Condorcet winners and L is Condorcet
losers
8XCC(2) and SPAM Filtering
- Note that XCC(1) gt XCC(2), so Version 1 is
stronger - But XCC(1) is not always realizable
- As we will see XCC(2) is always realizable via
Local Keminization - Hence using rank aggregation with XCC(2) should
assist in SPAM filtering, since Condorcet losers
will be lowest rank - Let us look at where spam pages (human
determined) are ranked with good aggregation
methods.
9Experiments Filtering SPAM
10Experiment Word association
- Different search engines and portals have
different (default) semantics of handling a
multi-word query. - Some use OR semantics (documents contain one of
the given query terms) while Google uses the AND
semantics (all the query words must appear). Both
inconvenient in many situations. - Consider searching for the job of a software
engineer from an on-line job database. The user
lists a number of skills and a number of
potential keywords in the job description, for
example, "Silicon Valley C Java CORBA TCP-IP
algorithms start-up pre-IPO stock options". It is
clear that the "AND" rule might produce no
document or SPAM, and the "OR" rule is equally
disastrous. - Experiment with rank aggregation using multiple
queries based on small subsets of terms.
11- Results for query madras madurai coimbatore
vellore. (cities in the state of Tamil Nadu,
India)  - Google www.mssrf.org/Fris9809/location-tamilnadu.h
tml www.indiaplus.com/Info/schools.htmlÂ
www.focustamilnadu.com/tamilnadu/Policy20Note
...Forests.html www.tn.gov.in/policy/environ.htm
 www.indiacolleges.com/Tamil_Nadu.htm - SFO with LK www.madurai.com www.ozemail.com.au/c
lday/locations.htm www.utoledo.edu/homepages/spe
elam/coimbatore.html www.ozemail.com.au/clday/ma
dras.htm www.madurai.com/around.htmÂ
www.indiatraveltimes.com/tamilnadu/tamil1.html - MC4 with LK www.madurai.com www.surfindia.com/om
sakthi/tourism.htm www.indiatraveltimes.com/tami
lnadu/tamil1.html www.indiatraveltimes.com/tamil
nadu/tamil2.html www.indiatravels.com/forts/vell
ore_fort.htm www.india-tourism.de/english/south/
tamil_nadu.html - Â
- Â
12Locally Kemeny optimal aggregation and XCC(2)
- Many of existing aggregation methods do not
satisfy XCC(1) or XCC(2). - It is possible to use your favorite aggregation
method to obtain a
full list. Then apply local kemenization to
realize XCC(2) which filters Condorcet losers.
13Locally Kemeny optimal
- Recall that Kemeny optimal is NP-hard
- Definition of locally optimalA permutation p is
a locally Kemeny optimal aggregation of partial
lists t1, t2, ..., tk, if there is no permutation
p' that can be obtained from p by performing a
single transposition of an adjacent pair of
elements and for which  Kendal distance - K(p', t1, t2, ..., tk) lt K(p, t1, t2, ..., tk).
- In other words, it is impossible to reduce the
total distance to the t's by flipping an adjacent
pair.
14Example of LKO but not KO
- Example 1
- t1 (1,2), t2 (2,3), t3 t4 t5 (3,1).
- p (1,2,3), We have that p satisfies
Definition of LKO, K(p, t1, t2, ..., t5) 3, but
transposing 1 and 3 decreases the sum to 2.
15LKO satisfies XCC(2)
- Proof by contradiction If the result is false
then there exist partial lists t1, t2, ..., tk, a
LKO aggregation p, and a partition (W,L) that
violates XCC(2) that is some pair c in W and d
in L, such that p(d) lt p(c). Let (c,d) be the
closest such pair in p. - Consider the immediate successor of d in p, call
it e. If ec then c is adjacent to d in p and
transposing this adjacent pair of alternatives
produces a p' such that K(p', t1, t2, ..., tk) lt
K(p, t1, t2, ..., tk), contradicting the
assumption on p. - If e does not equal c, then either e is in W, in
which case the pair (e,d) is a closer pair in p
than (d,c) and also violates the XCC(2), or e is
in L, in which case (e,c) is a closer pair than
(d,c) that violates XCC(2). Both cases contradict
the choice of (d,c).
16Local Kemenization procedure
- A local Kemenization of a full list with respect
to preference lists so as to compute a locally
Kemeny optimal aggregation that is maximally
consistent with original. - This approach
- (1) preserves the strengths of the initial
aggregation - (2) ranks non-spam above spam.
- (3) gives a result that disagrees with original
on any pair (i, j) only if a majority
endorse this disagreement. - (4) for every d, 1 d µ , the restriction
of the output is a local Kemenization of the top
d elements of µ
17Local Kemenization procedure
- A simple inductive construction.
- Assume inductively for that we have constructed
p, a local Kemenization of the projection of the
t's onto the elements 1, ..., l-1. - Insert next element x into the lowest-ranked
"permissible" position in p just below the
lowest-ranked element y in p such that - (a) no majority among the (original) t's prefers
x to y and - (b) for all successors z of y in p there is a
majority that prefers x to z. - In other words, we try to insert x at the end
(bottom) of the list p we bubble it up toward
the top of the list as long as a majority of the
t's insists that we do.
18Example local kemenization procedure
- Local Kemenization Example!
A B F E C D
B C A E F D
A C F D E B
B F D C A E
C A B F E D
B A DC E F
B
B A
A B
A B D
A B DC
A B CD
A B CF E D
disagree
AgtB 3 AltB 2
BgtD 4 BltD 1
19RA and Searching Workplace Web
- Axiom 1 Intranet documents are not spam
- Axiom 2 Queries usually have unique answers (not
broad topic based) - Axiom 3 Intranet docs are not search engine
friendly (docs are accessed through portals and
database queries - Rank aggregation allows us to combine number of
heuristic alternatives static and dynamic, query
dependent and independent