Title: Ranking in DB
1Ranking in DB
- Laks V.S. Lakshmanan
- Depf. of CS
- UBC
2Why ranking in query answering? 1/3
- Mutimedia data fuzzy querying e.g., find top
2 red objects with a soft texture.
Obj Score
D 0.85
B 0.80
A 0.75
E 0.65
C 0.60
Obj Score
A 0.9
D 0.8
C 0.4
B 0.3
E 0.1
Overall score
Combine scores
3Why ranking? 2/3
- IR find top 5 documents relevant to
computational, neuroscience and brain
theory. - IR systems maintain full text indexes inverted
lists of docs w.r.t. each keyword. - Same Q/A paradigm as before.
4Why ranking? 3/3
- Data stream, e.g., of network flow data find 10
users with the max. BW consumption and max.
packets communicated. - In a social net, find 5 items tagged as most
relevant to lawn mowing by users friends. - etc.
- Fagin et al. pioneering papers PODS96, 01,
TODS 2003. Burgeoned into a field now. - Focus on middleware algorithm, which given a
score combo. function, computes top-K answers by
probing diff. subsystems (or ranked lists).
5Computational model
- Naïve method.
- How to compute top-K efficiently?
- Access methods
- Sorted access (sequential access) SA.
- Random access RA.
- Diff. optimization metrics
- Overall running time of algorithm.
- SA lt RA minimize RAs.
- RA not possible? avoid RAs.
- Combined optimization.
- Has led to a variety of algorithms.
- Memory vs. disk model.
typical in IR systems.
6Fagins Algorithm (FA)
- m lists sorted by descending scores.
- Access (SA) all lists in parallel.
- For each new object seen, fetch scores from other
lists by RA. Overall score t(x) t(x1, , xm).
Store (obj, score) in set Y. - Remember each object seen (under SA) in all lists
in set H. - Repeat until H gt K.
- For each seen object, do RA on lists as needed to
find missing scores. Compute score of x as t(x)
t(x1, , xm). - Sort Y in descending order of scores, breaking
ties arbitrarily, and output top K.
7Example of FA
Answers seen in gt1 list, i.e., Y unsorted.
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
C(0.95)
E(1.00)
J(1.00)
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
Answers seen (under SA) in all 4 lists, i.e., H.
A(0.30)
J(0.30)
F(0.50)
I(0.30)
8Example of FA
Answers seen in gt1 list, i.e., Y unsorted.
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
C(0.95)
E(1.00)
J(1.00)
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
Answers seen (under SA) in all 4 lists, i.e., H.
A(0.30)
J(0.30)
F(0.50)
I(0.30)
9Example of FA
Answers seen in gt1 list, i.e., Y unsorted.
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
C(0.95)
E(1.00)
J(1.00)
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.30
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
Answers seen (under SA) in all 4 lists, i.e., H.
A(0.30)
J(0.30)
F(0.50)
I(0.30)
10Example of FA
Answers seen in gt1 list, i.e., Y unsorted.
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
C(0.95)
E(1.00)
J(1.00)
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.30
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
2.65
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
Answers seen (under SA) in all 4 lists, i.e., H.
A(0.30)
J(0.30)
F(0.50)
I(0.30)
11Example of FA
Answers seen in gt1 list, i.e., Y unsorted.
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
C(0.95)
E(1.00)
J(1.00)
3.40
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
3.05
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.30
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
2.65
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
Answers seen (under SA) in all 4 lists, i.e., H.
A(0.30)
J(0.30)
F(0.50)
I(0.30)
12Example of FA
Answers seen in gt1 list, i.e., Y unsorted.
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
3.05
C(0.95)
E(1.00)
J(1.00)
3.40
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
3.05
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.15
3.30
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
2.65
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
Answers seen (under SA) in all 4 lists, i.e., H.
A(0.30)
J(0.30)
F(0.50)
I(0.30)
13Example of FA
Answers seen in gt1 list, i.e., Y unsorted.
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
3.05
C(0.95)
E(1.00)
J(1.00)
3.40
B(0.90)
C(0.95)
J(0.80)
G(0.95)
2.55
D(0.70)
E(0.85)
G(0.85)
H(0.90)
3.05
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.15
3.30
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
2.65
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
Answers seen (under SA) in all 4 lists, i.e., H.
A(0.30)
J(0.30)
F(0.50)
I(0.30)
14Example of FA
Answers seen in gt1 list, i.e., Y unsorted.
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
3.05
C(0.95)
E(1.00)
J(1.00)
3.40
B(0.90)
C(0.95)
J(0.80)
G(0.95)
2.55
D(0.70)
E(0.85)
G(0.85)
H(0.90)
3.05
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.15
3.30
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
2.65
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
Answers seen (under SA) in all 4 lists, i.e., H.
A(0.30)
J(0.30)
F(0.50)
I(0.30)
H
15Example of FA
Answers seen in gt1 list, i.e., Y unsorted.
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
3.05
C(0.95)
E(1.00)
J(1.00)
3.40
B(0.90)
C(0.95)
J(0.80)
G(0.95)
2.55
D(0.70)
E(0.85)
G(0.85)
H(0.90)
3.05
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.15
3.30
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
2.65
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
Answers seen (under SA) in all 4 lists, i.e., H.
A(0.30)
J(0.30)
F(0.50)
I(0.30)
H, G
16Example of FA
Answers seen in gt1 list, i.e., Y unsorted.
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
3.05
C(0.95)
E(1.00)
J(1.00)
3.40
B(0.90)
C(0.95)
J(0.80)
G(0.95)
2.55
D(0.70)
E(0.85)
G(0.85)
H(0.90)
3.05
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.15
3.30
B(0.55)
C(0.70)
I(0.70)
B(0.75)
2.05
D(0.65)
F(0.60)
I(0.50)
A(0.65)
2.65
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
Answers seen (under SA) in all 4 lists, i.e., H.
A(0.30)
J(0.30)
F(0.50)
I(0.30)
H, G, B, C
H 4.
17FA Example concluded
- A, F not seen in any list. Yet, we are sure
they cant make it to top-4. Why? - Based on where the cursors are now, whats the
max. possible score for A, F? - What assumptions are being made about t()?
- FA is shown to be optimal with very high
probability Fagin PODS 1996. - But can be beaten by other algorithms on specific
inputs. - What about buffer size?
18Threshold Algorithm
- Do parallel SA on all m lists.
- For each new object x, fetch its scores from
other lists and compute overall score. - If Buffer lt K add x to Buffer
- Else if score(x) lt k-th score in buffer, toss
- Else replace bottom of buffer with (x, score(x)).
- Stop when threshold lt k-th score in buffer.
- Threshold t(worst score seen on L1, , worst
score seen on Lm). - Output the top-K objects scores (in buffer).
19TA Example
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
C(0.95)
E(1.00)
J(1.00)
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
F(0.50)
I(0.30)
20TA Example
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
C(0.95)
E(1.00)
J(1.00)
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
F(0.50)
I(0.30)
21TA Example
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
C(0.95)
E(1.00)
J(1.00)
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
B(0.55)
C(0.70)
3.30
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
E(0.45)
I(0.55)
A(0.60)
A(0.50)
Threshold Bar
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
x1 x2 x3 x4 0.95 1.00 0.95 1.00
F(0.50)
I(0.30)
22TA Example
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
C(0.95)
E(1.00)
J(1.00)
3.40
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
3.05
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
B(0.55)
C(0.70)
3.30
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
2.65
E(0.45)
I(0.55)
A(0.60)
A(0.50)
Threshold Bar T 3.90.
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
x1 x2 x3 x4 0.95 1.00 0.95 1.00
F(0.50)
I(0.30)
23TA Example
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
C(0.95)
E(1.00)
3.05 X
J(1.00)
3.40
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
3.05
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.15
B(0.55)
C(0.70)
3.30
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
2.65 X
E(0.45)
I(0.55)
A(0.60)
A(0.50)
Threshold Bar T3.60.
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
x1 x2 x3 x4 0.90 0.95 0.80 0.95
F(0.50)
I(0.30)
24TA Example
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
C(0.95)
E(1.00)
3.05 X
J(1.00)
3.40
B(0.90)
C(0.95)
J(0.80)
G(0.95)
2.55 X
D(0.70)
E(0.85)
G(0.85)
H(0.90)
3.05
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.15
B(0.55)
C(0.70)
3.30
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
2.65 X
E(0.45)
I(0.55)
A(0.60)
A(0.50)
Threshold Bar T3.30.
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
x1 x2 x3 x4 0.85 0.85 0.70 0.90
F(0.50)
I(0.30)
25TA Example
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
C(0.95)
E(1.00)
3.05 X
J(1.00)
3.40
B(0.90)
C(0.95)
J(0.80)
G(0.95)
2.55 X
D(0.70)
E(0.85)
G(0.85)
H(0.90)
3.05
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.15
B(0.55)
C(0.70)
3.30
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
2.65 X
E(0.45)
I(0.55)
A(0.60)
A(0.50)
Threshold Bar T3.10.
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
x1 x2 x3 x4 0.80 0.80 0.65 0.85
F(0.50)
I(0.30)
26TA Example
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
C(0.95)
E(1.00)
3.05 X
J(1.00)
3.40
B(0.90)
C(0.95)
J(0.80)
G(0.95)
2.55 X
D(0.70)
E(0.85)
G(0.85)
H(0.90)
3.05
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.15
B(0.55)
C(0.70)
3.30
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
2.65 X
E(0.45)
I(0.55)
A(0.60)
A(0.50)
Threshold Bar T2.90. gt can stop!
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
x1 x2 x3 x4 0.75 0.75 0.60 0.80
F(0.50)
I(0.30)
27TA Remarks
- What properties do we require of t() for TA to be
correct? - How large does the buffer ever get with TA? What
happened with FA? - Performance guarantee of TA (instance
optimality) - D class of DBs A class of algorithms A? A
is instance optimal provided ??B?A, ??D?D,
cost(A,D) c.cost(B,D) c, for some fixed
constants c, c. - c optimality ratio.
- TA is instance optimal over algos not making
wild guesses.
28No Random Access Algorithm
- What if RA gt SA or RA wasnt allowed?
- Do SA on all lists in parallel. At depth d
- Maintain worst scores x1, , xm.
- x any object seen in lists 1, , i.
- Best(x) t(x1, , xi, xi1, , xm).
- Worst(x) t(x1, , xi, 0, , 0).
- TopK contains K objects with max worst scores at
depth d. Break ties using Best. M k-th Worst
score in TopK. - Object y is viable if Best(y) gt M.
- Stop when TopK contains gtK distinct objects and
no object outside TopK is viable. Return TopK.
29NRA Example
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
C(0.95)
E(1.00)
J(1.00)
0.95, 3.90
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
1.00, 3.90
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
0.95, 3.90
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
1.00, 3.90
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
F(0.50)
I(0.30)
30NRA Example
L1 L2 L3 L4
H(0.95)
C(0.80
A B C D E F G H I J
C(0.95)
E(1.00)
0.90, 3.60
J(1.00)
1.90, 3.75
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
1.00, 3.65
H(0.80)
H(0.65)
B(0.85)
0.95, 3.60
E(0.75)
G(0.75)
G(0.60)
D(0.80)
0.95, 3.65
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
1.80, 3.65
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
F(0.50)
I(0.30)
31NRA Features
- What sort of t() do we need to assume, for NRA to
work correctly? - How large can the buffers get?
- How does the amount of bookkeeping compare with
TA? - NRA is instance optimal over algos not making RA
32Combined optimization
- What if we are told cost(RA) ??.cost(SA)?
- Can we find algos better than NRA and TA in this
case? - Combined algorithm CA. (See Fagin et al.s
paper for details.)
33Worrying about I/O cost
- Based on Bast et al. VLDB 2006.
- Inverted lists of (itemID, score) entries in
desc. score order, as usual, but on disk. - Blocks sorted by itemID across blocks still in
desc. score order. - ? Inverted Block Index (IBI) Algorithm.
- What is an IBI?
34A Motivating Example
- List 1 List 2
List 3 - Doc17 0.8 Doc25 0.7 Doc83
0.9 - Doc78 0.2 Doc38 0.5 Doc17
0.7 - . Doc14 0.5
Doc61 0.3 - Doc83 0.5
-
- Doc17 0.2
-
- Round 1 (SA on 1,2,3)
- Doc17 0.8 , 2.4
- Doc25 0.7 , 2.4
- Doc83 0.9 , 2.4
- unseen 2.4
35A Motivating Example
List 1 List 2
List 3 Doc17 0.8 Doc25 0.7
Doc83 0.9 Doc78 0.2 Doc38 0.5
Doc17 0.7 .
Doc14 0.5 Doc61 0.3
Doc83 0.5
Doc17
0.2
Round 2 (SA on 1,2,3) Doc17 1.5 , 2.0 Doc25
0.7 , 1.6 Doc83 0.9 , 1.6 unseen 1.4
Round 1 (SA on 1,2,3) Doc17 0.8 , 2.4 Doc25
0.7 , 2.4 Doc83 0.9 , 2.4 unseen 2.4
36A Motivating Example
List 1 List 2
List 3 Doc17 0.8 Doc25 0.7
Doc83 0.9 Doc78 0.2 Doc38 0.5
Doc17 0.7 .
Doc14 0.5 Doc61 0.3
Doc83 0.5
Doc17
0.2
Round 2 (SA on 1,2,3) Doc17 1.5 , 2.0 Doc25
0.7 , 1.6 Doc83 0.9 , 1.6 unseen 1.4
Round 3 (SA on 2,2,3!) Doc17 1.5 , 2.0 Doc83
1.4 , 1.6 unseen 1.0
Round 1 (SA on 1,2,3) Doc17 0.8 , 2.4 Doc25
0.7 , 2.4 Doc83 0.9 , 2.4 unseen 2.4
37A Motivating Example
List 1 List 2
List 3 Doc17 0.8 Doc25 0.7
Doc83 0.9 Doc78 0.2 Doc38 0.5
Doc17 0.7 .
Doc14 0.5 Doc61 0.3
Doc83 0.5
Doc17
0.2
Round 2 (SA on 1,2,3) Doc17 1.5 , 2.0 Doc25
0.7 , 1.6 Doc83 0.9 , 1.6 unseen 1.4
Round 1 (SA on 1,2,3) Doc17 0.8 , 2.4 Doc25
0.7 , 2.4 Doc83 0.9 , 2.4 unseen 2.4
Round 3 (SA on 2,2,3!) Doc17 1.5 , 2.0 Doc83
1.4 , 1.6 unseen 1.0
Note deviation from round-robin.
Round 4 (RA for Doc17) Doc17 1.7 all others lt
1.7 done!
38IBI Algorithm
- Same setting as NRA/CA, except use IBI.
- Maintain two lists Top-K items (T d1, , dk)
and StillHaveASHot (SHASH) (S dk1, , dkq)
items. - Pos_i curr cursor position on list Li.
- high_i score in Li at curr cursor position
(upper bounds score of unseen items). - For items d in S
- Which attr scores are known E(d).
- Which attr scores are unknown E(d).
- Worst(d) total score from E(d).
- Best(d) Worst(d) ?? high_i(d) i ?E(d).
- (Exactly as Fagin.)
39IBI Algorithm (contd.)
- In each round, compute
- min-k minWorst(d) d ? T.
- bestscore that any unseen doc can have sum of
all high_is. - For dj ? S def_j min-k worst(d_j). denotes
deficit below qualification level for top-k. - T sorted in desc. Worst() S sorted in desc.
Best(). sorting on (score, ItemID) for fast
processing. - Invatiant min-k gt maxWorst(d) d ? S.
- Termination when min-k gt maxBest(d) d ? S.
- Can remove an obj from S whenever its Best lt
min-k. ? stop when S . - Early termination AND minimal bookkeeping are
BOTH important for performance.
40More on IBI Framework
- Instead of scheduling SAs using RR, use a
differential approach for diff. lists based on
expected score reductions at future cursor
positions (Knapsack). - Do SARA.
- Order RAs based on estimated Probdj can get into
top-k answers.