Title: Ranking in DB
1Ranking in DB
- Laks V.S. Lakshmanan
- Depf. of CS
- UBC
2Why ranking in query answering? 1/3
- Mutimedia data fuzzy querying e.g., find top
2 red objects with a soft texture.
Obj Score
D 0.85
B 0.80
A 0.75
E 0.65
C 0.60
Obj Score
A 0.9
D 0.8
C 0.4
B 0.3
E 0.1
Overall score
Combine scores
3Why ranking? 2/3
- IR find top 5 documents relevant to
computational, neuroscience and brain
theory. - IR systems maintain full text indexes inverted
lists of docs w.r.t. each keyword. - Same Q/A paradigm as before.
- Buying a home several criteria price,
location, area, BRs, school district. ORDER BY
query in SQL. - Finding hotels while traveling.
4Why ranking? 3/3
- Data stream, e.g., of network flow data find 10
users with the max. BW consumption and max.
packets communicated. score may be complex
aggregation of these two measures. - In a social net, find 5 items tagged as most
relevant to lawn mowing and blonging to users
socially close to the seeker. - And now, find top-k recs (recommender systems).
- etc.
- Fagin et al. pioneering papers PODS96, 01,
JCSS 2003. Burgeoned into a field now. - Focus on middleware algorithm, which given a
score combo. function, computes top-k answers by
probing diff. subsystems (or ranked lists).
5Computational model
- Naïve method.
- How to compute top-K efficiently?
- Access methods
- Sorted access (sequential access) SA.
- Random access RA.
- Diff. optimization metrics
- Overall running time of algorithm.
- SA lt RA minimize RAs.
- RA not possible? avoid RAs.
- Combined optimization.
- Has led to a variety of algorithms.
- Memory vs. disk model.
- For the most part, assume score agg. is a
monotone function use SUM in examples.
typical in IR systems.
6Fagins Algorithm (FA)
- m lists sorted by descending scores.
- Access (SA) all lists in parallel.
- For each new object seen, fetch scores from other
lists by RA. Overall score t(x) t(x1, , xm).
Store (obj, score) in set Y. - Remember each object seen (under SA) in all lists
in set H. - Repeat until H gt K.
- Sort Y in descending order of scores, breaking
ties arbitrarily, and output top K.
7Example of FA
Answers seen in gt1 list, i.e., Y unsorted.
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
C(0.95)
E(1.00)
J(1.00)
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
Answers seen (under SA) in all 4 lists, i.e., H.
A(0.30)
J(0.30)
F(0.50)
I(0.30)
8Example of FA
Answers seen in gt1 list, i.e., Y unsorted.
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
C(0.95)
E(1.00)
J(1.00)
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
Answers seen (under SA) in all 4 lists, i.e., H.
A(0.30)
J(0.30)
F(0.50)
I(0.30)
9Example of FA
Answers seen in gt1 list, i.e., Y unsorted.
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
C(0.95)
E(1.00)
J(1.00)
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.30
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
Answers seen (under SA) in all 4 lists, i.e., H.
A(0.30)
J(0.30)
F(0.50)
I(0.30)
10Example of FA
Answers seen in gt1 list, i.e., Y unsorted.
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
C(0.95)
E(1.00)
J(1.00)
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.30
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
2.65
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
Answers seen (under SA) in all 4 lists, i.e., H.
A(0.30)
J(0.30)
F(0.50)
I(0.30)
11Example of FA
Answers seen in gt1 list, i.e., Y unsorted.
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
C(0.95)
E(1.00)
J(1.00)
3.40
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
3.05
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.30
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
2.65
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
Answers seen (under SA) in all 4 lists, i.e., H.
A(0.30)
J(0.30)
F(0.50)
I(0.30)
12Example of FA
Answers seen in gt1 list, i.e., Y unsorted.
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
3.05
C(0.95)
E(1.00)
J(1.00)
3.40
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
3.05
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.15
3.30
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
2.65
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
Answers seen (under SA) in all 4 lists, i.e., H.
A(0.30)
J(0.30)
F(0.50)
I(0.30)
13Example of FA
Answers seen in gt1 list, i.e., Y unsorted.
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
3.05
C(0.95)
E(1.00)
J(1.00)
3.40
B(0.90)
C(0.95)
J(0.80)
G(0.95)
2.55
D(0.70)
E(0.85)
G(0.85)
H(0.90)
3.05
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.15
3.30
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
2.65
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
Answers seen (under SA) in all 4 lists, i.e., H.
A(0.30)
J(0.30)
F(0.50)
I(0.30)
14Example of FA
Answers seen in gt1 list, i.e., Y unsorted.
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
3.05
C(0.95)
E(1.00)
J(1.00)
3.40
B(0.90)
C(0.95)
J(0.80)
G(0.95)
2.55
D(0.70)
E(0.85)
G(0.85)
H(0.90)
3.05
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.15
3.30
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
2.65
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
Answers seen (under SA) in all 4 lists, i.e., H.
A(0.30)
J(0.30)
F(0.50)
I(0.30)
H
15Example of FA
Answers seen in gt1 list, i.e., Y unsorted.
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
3.05
C(0.95)
E(1.00)
J(1.00)
3.40
B(0.90)
C(0.95)
J(0.80)
G(0.95)
2.55
D(0.70)
E(0.85)
G(0.85)
H(0.90)
3.05
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.15
3.30
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
2.65
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
Answers seen (under SA) in all 4 lists, i.e., H.
A(0.30)
J(0.30)
F(0.50)
I(0.30)
H, G
16Example of FA
Answers seen in gt1 list, i.e., Y unsorted.
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
3.05
C(0.95)
E(1.00)
J(1.00)
3.40
B(0.90)
C(0.95)
J(0.80)
G(0.95)
2.55
D(0.70)
E(0.85)
G(0.85)
H(0.90)
3.05
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.15
3.30
B(0.55)
C(0.70)
I(0.70)
B(0.75)
2.05
D(0.65)
F(0.60)
I(0.50)
A(0.65)
2.65
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
Answers seen (under SA) in all 4 lists, i.e., H.
A(0.30)
J(0.30)
F(0.50)
I(0.30)
H, G, B, C
H 4.
17FA Example concluded
- A, F not seen in any list. Yet, we are sure
they cant make it to top-4. Why? - Based on where the cursors are now, whats the
max. possible score for A, F? - What assumptions are being made about t()?
- FA is shown to be optimal with very high
probability Fagin PODS 1996. - But can be beaten by other algorithms on specific
inputs. - What about buffer size?
18Threshold Algorithm
- Do parallel SA on all m lists.
- For each object x seen under SA in a list, fetch
its scores from other lists by RA and compute
overall score. - If Buffer lt K add x to Buffer
- Else if score(x) lt k-th score in buffer, toss
- Else replace bottom of buffer with (x, score(x))
resort. - Stop when threshold lt k-th score in buffer.
- Threshold t(worst score seen on L1, , worst
score seen on Lm). - Output the top-K objects scores (in buffer).
19TA Example
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
C(0.95)
E(1.00)
J(1.00)
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
F(0.50)
I(0.30)
20TA Example
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
C(0.95)
E(1.00)
J(1.00)
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
F(0.50)
I(0.30)
21TA Example
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
C(0.95)
E(1.00)
J(1.00)
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
B(0.55)
C(0.70)
3.30
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
E(0.45)
I(0.55)
A(0.60)
A(0.50)
Threshold Bar
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
x1 x2 x3 x4 0.95 1.00 0.95 1.00
F(0.50)
I(0.30)
22TA Example
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
C(0.95)
E(1.00)
J(1.00)
3.40
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
3.05
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
B(0.55)
C(0.70)
3.30
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
2.65
E(0.45)
I(0.55)
A(0.60)
A(0.50)
Threshold Bar T 3.90.
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
x1 x2 x3 x4 0.95 1.00 0.95 1.00
F(0.50)
I(0.30)
23TA Example
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
C(0.95)
E(1.00)
3.05 X
J(1.00)
3.40
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
3.05
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.15
B(0.55)
C(0.70)
3.30
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
2.65 X
E(0.45)
I(0.55)
A(0.60)
A(0.50)
Threshold Bar T3.60.
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
x1 x2 x3 x4 0.90 0.95 0.80 0.95
F(0.50)
I(0.30)
24TA Example
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
C(0.95)
E(1.00)
3.05 X
J(1.00)
3.40
B(0.90)
C(0.95)
J(0.80)
G(0.95)
2.55 X
D(0.70)
E(0.85)
G(0.85)
H(0.90)
3.05
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.15
B(0.55)
C(0.70)
3.30
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
2.65 X
E(0.45)
I(0.55)
A(0.60)
A(0.50)
Threshold Bar T3.30.
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
x1 x2 x3 x4 0.85 0.85 0.70 0.90
F(0.50)
I(0.30)
25TA Example
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
C(0.95)
E(1.00)
3.05 X
J(1.00)
3.40
B(0.90)
C(0.95)
J(0.80)
G(0.95)
2.55 X
D(0.70)
E(0.85)
G(0.85)
H(0.90)
3.05
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.15
B(0.55)
C(0.70)
3.30
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
2.65 X
E(0.45)
I(0.55)
A(0.60)
A(0.50)
Threshold Bar T3.10.
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
x1 x2 x3 x4 0.80 0.80 0.65 0.85
F(0.50)
I(0.30)
26TA Example
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
C(0.95)
E(1.00)
3.05 X
J(1.00)
3.40
B(0.90)
C(0.95)
J(0.80)
G(0.95)
2.55 X
D(0.70)
E(0.85)
G(0.85)
H(0.90)
3.05
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.15
B(0.55)
C(0.70)
3.30
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
2.65 X
E(0.45)
I(0.55)
A(0.60)
A(0.50)
Threshold Bar T2.90. gt can stop!
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
x1 x2 x3 x4 0.75 0.75 0.60 0.80
F(0.50)
I(0.30)
27TA Remarks
28TA is Instance Optimal
29TA IO Proof (contd.)
30Proof (contd.)
31Proof (contd.)
32Proof (contd.)
33Proof (concluded)
34No Random Access Algorithm
- What if RA gt SA or RA wasnt allowed?
- Do SA on all lists in parallel. At depth d
- Maintain worst scores x1, , xm.
- x any object seen in lists 1, , i.
- Best(x) t(x1, , xi, xi1, , xm).
- Worst(x) t(x1, , xi, 0, , 0).
- TopK contains K objects with max worst scores at
depth d. Break ties using Best. M k-th Worst
score in TopK. - Object y is viable if Best(y) gt M.
- Stop when TopK contains gtK distinct objects and
no object outside TopK is viable. Return TopK.
35NRA Example
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
C(0.95)
E(1.00)
J(1.00)
0.95, 3.90
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
1.00, 3.90
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
0.95, 3.90
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
1.00, 3.90
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
F(0.50)
I(0.30)
36NRA Example
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
C(0.95)
E(1.00)
0.90, 3.60
J(1.00)
1.90, 3.75
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
1.00, 3.65
H(0.80)
H(0.65)
B(0.85)
0.95, 3.60
E(0.75)
G(0.75)
G(0.60)
D(0.80)
0.95, 3.65
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
1.80, 3.65
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
F(0.50)
I(0.30)
37NRA Example
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
C(0.95)
E(1.00)
0.90, 3.35
J(1.00)
1.90, 3.65
B(0.90)
C(0.95)
J(0.80)
G(0.95)
0.70, 3.30
D(0.70)
E(0.85)
G(0.85)
H(0.90)
1.85, 3.40
H(0.80)
H(0.65)
B(0.85)
1.80, 3.35
E(0.75)
G(0.75)
G(0.60)
D(0.80)
1.85, 3.40
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
1.80, 3.55
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
F(0.50)
I(0.30)
38NRA Example
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
C(0.95)
E(1.00)
1.75, 3.20
J(1.00)
2.70, 3.55
B(0.90)
C(0.95)
J(0.80)
G(0.95)
0.70, 3.15
D(0.70)
E(0.85)
G(0.85)
H(0.90)
1.85, 3.30
H(0.80)
H(0.65)
B(0.85)
1.80, 3.25
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.30, 3.30
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
1.80, 3.45
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
F(0.50)
I(0.30)
39NRA Example
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
C(0.95)
E(1.00)
1.75, 3.10
J(1.00)
2.70, 3.50
B(0.90)
C(0.95)
J(0.80)
G(0.95)
1.50, 3.00
D(0.70)
E(0.85)
G(0.85)
H(0.90)
2.60, 3.20
H(0.80)
H(0.65)
B(0.85)
3.15, 3.15
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.30, 3.30
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
1.80, 3.35
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
F(0.50)
I(0.30)
40NRA Example
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
C(0.95)
E(1.00)
3.05, 3.05
J(1.00)
3.40, 3.40
B(0.90)
C(0.95)
J(0.80)
G(0.95)
1.50, 2.95
D(0.70)
E(0.85)
G(0.85)
H(0.90)
2.60, 3.15
H(0.80)
H(0.65)
B(0.85)
3.15, 3.15
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.30, 3.30
B(0.55)
C(0.70)
I(0.70)
B(0.75)
0.70, 2.70
D(0.65)
F(0.60)
I(0.50)
A(0.65)
1.80, 3.20
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
F(0.50)
I(0.30)
41NRA Features
- What sort of t() do we need to assume, for NRA to
work correctly? - How large can the buffers get?
- How does the amount of bookkeeping compare with
TA? - NRA is instance optimal over algos not making RA
(and of course, not making wild guesses).
42Combined optimization
- What if we are told cost(RA) ??.cost(SA)?
- Can we find algos better than NRA and TA in this
case? - Combined algorithm CA. (See Fagin et al.s
paper for details.)
43Worrying about I/O cost
- Based on Bast et al. VLDB 2006.
- Inverted lists of (itemID, score) entries in
desc. score order, as usual, but on disk. - Blocks sorted by itemID across blocks still in
desc. score order. - ? Inverted Block Index (IBI) Algorithm.
- What is an IBI?
44A Motivating Example
- List 1 List 2
List 3 - Doc17 0.8 Doc25 0.7 Doc83
0.9 - Doc78 0.2 Doc38 0.5 Doc17
0.7 - . Doc14 0.5
Doc61 0.3 - Doc83 0.5
-
- Doc17 0.2
-
- Round 1 (SA on 1,2,3)
- Doc17 0.8 , 2.4
- Doc25 0.7 , 2.4
- Doc83 0.9 , 2.4
- unseen 2.4
45A Motivating Example
List 1 List 2
List 3 Doc17 0.8 Doc25 0.7
Doc83 0.9 Doc78 0.2 Doc38 0.5
Doc17 0.7 .
Doc14 0.5 Doc61 0.3
Doc83 0.5
Doc17
0.2
Round 2 (SA on 1,2,3) Doc17 1.5 , 2.0 Doc25
0.7 , 1.6 Doc83 0.9 , 1.6 unseen 1.4
Round 1 (SA on 1,2,3) Doc17 0.8 , 2.4 Doc25
0.7 , 2.4 Doc83 0.9 , 2.4 unseen 2.4
46A Motivating Example
List 1 List 2
List 3 Doc17 0.8 Doc25 0.7
Doc83 0.9 Doc78 0.2 Doc38 0.5
Doc17 0.7 .
Doc14 0.5 Doc61 0.3
Doc83 0.5
Doc17
0.2
Round 2 (SA on 1,2,3) Doc17 1.5 , 2.0 Doc25
0.7 , 1.6 Doc83 0.9 , 1.6 unseen 1.4
Round 3 (SA on 2,2,3!) Doc17 1.5 , 2.0 Doc83
1.4 , 1.6 unseen 1.0
Round 1 (SA on 1,2,3) Doc17 0.8 , 2.4 Doc25
0.7 , 2.4 Doc83 0.9 , 2.4 unseen 2.4
47A Motivating Example
List 1 List 2
List 3 Doc17 0.8 Doc25 0.7
Doc83 0.9 Doc78 0.2 Doc38 0.5
Doc17 0.7 .
Doc14 0.5 Doc61 0.3
Doc83 0.5
Doc17
0.2
Round 2 (SA on 1,2,3) Doc17 1.5 , 2.0 Doc25
0.7 , 1.6 Doc83 0.9 , 1.6 unseen 1.4
Round 1 (SA on 1,2,3) Doc17 0.8 , 2.4 Doc25
0.7 , 2.4 Doc83 0.9 , 2.4 unseen 2.4
Round 3 (SA on 2,2,3!) Doc17 1.5 , 2.0 Doc83
1.4 , 1.6 unseen 1.0
Note deviation from round-robin.
Round 4 (RA for Doc17) Doc17 1.7 all others lt
1.7 done!
48IBI Algorithm
- Same setting as NRA/CA, except use IBI.
- Maintain two lists Top-K items (T d1, , dk)
and StillHaveASHot (SHASH) (S dk1, , dkq)
items. - Pos_i curr cursor position on list Li.
- high_i score in Li at curr cursor position
(upper bounds score of unseen items). - For items d in S
- Which attr scores are known E(d).
- Which attr scores are unknown E(d).
- Worst(d) total score from E(d).
- Best(d) Worst(d) ?? high_i(d) i ?E(d).
- (Exactly as Fagin.)
49IBI Algorithm (contd.)
- In each round, compute
- min-k minWorst(d) d ? T.
- bestscore that any unseen doc can have sum of
all high_is. - For dj ? S def_j min-k worst(d_j). denotes
deficit below qualification level for top-k. - T sorted in desc. Worst() S sorted in desc.
Best(). sorting on (score, ItemID) for fast
processing. - Invatiant min-k gt maxWorst(d) d ? S.
- Termination when min-k gt maxBest(d) d ? S.
- Can remove an obj from S whenever its Best lt
min-k. ? stop when S . - Early termination AND minimal bookkeeping are
BOTH important for performance.
50More on IBI Framework
- Instead of scheduling SAs using RR, use a
differential approach for diff. lists based on
expected score reductions at future cursor
positions (Knapsack). - Do SARA.
- Order RAs based on estimated Probdj can get into
top-k answers.