Best-Effort Top-k Query Processing Under Budgetary Constraints - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Best-Effort Top-k Query Processing Under Budgetary Constraints

Description:

Best-Effort Top-k Query Processing Under Budgetary Constraints Michal Shmueli-Scheuer (IBM Haifa Research Lab and UCI) Yosi Mass, Haggai Roitman – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 46
Provided by: icsUciEd57
Learn more at: https://www.ics.uci.edu
Category:

less

Transcript and Presenter's Notes

Title: Best-Effort Top-k Query Processing Under Budgetary Constraints


1
  • Best-Effort Top-k Query Processing Under
    Budgetary Constraints

Michal Shmueli-Scheuer (IBM Haifa Research Lab
and UCI)
Yosi Mass, Haggai Roitman
Chen Li
Ralf Schenkel, Gerhard Weikum
2
Motivating Example
Mediation Systems
Achieve high query throughput.
Top-k
Top-k
results
queries
Engine
Online Analytics (e.g. logs)
Achieve high query throughput.
Michal Shmueli-Scheuer
3
Traditional top-k query
  • Pre-computed lists over multiple attributes.
  • Combine scores by some monotonic aggregation
    function.
  • Two accesses modes
  • sorted access (Cs)
  • random access (Cr)
  • Objective Compute k objects with highest scores.

Rm Rm
c 0.9
b 0.6
g 0.5
..
a 0.4
R2 R2
d 0.87
a 0.85
f 0.5
..
c 0.2
R1 R1
a 0.9
b 0.6
c 0.5
..
d 0.4
sorted
n
m
Michal Shmueli-Scheuer
4
NRA algorithm (Fagin et al.)
Top-2
R2 R2
d 0.87
a 0.85
f 0.5
. ..
c 0.2
R1 R1
a 0.9
b 0.6
c 0.5
..
d 0.4
Best score
Worst score
highi
a 0.9,1.77
d 0.87,1.77
f SUM
mink
candidates

mink gt best-score of candidates
Michal Shmueli-Scheuer
5
NRA algorithm (Fagin et al.)
Top-2
R2 R2
d 0.87
a 0.85
f 0.25
. ..
c 0.2
R1 R1
a 0.9
b 0.6
c 0.5
..
d 0.4
Best score
Worst score
a 1.75,1.75
d 0.87,1.47
highi
mink
candidates
b 0.6,1.45
mink gt best-score of candidates
Michal Shmueli-Scheuer
6
NRA algorithm (Fagin et al.)
Top-2
R2 R2
d 0.87
a 0.85
f 0.25
. ..
c 0.2
R1 R1
a 0.9
b 0.6
c 0.5
..
d 0.4
Best score
Worst score
a 1.75,1.75
d 0.87,1.37
highi
mink
candidates
b 0.6,0.85
c 0.5,0.75
f 0.25,0.75
mink gt best-score of candidates
Michal Shmueli-Scheuer
7
Top-k with Budget Constraints
Top-2
R2 R2
a 1.0
b 0.9
c 0.85
d 0.8
e 0.7
t 0.6
f 0.4
..
R1 R1
s 0.95
u 0.93
t 0.92
d 0.9
x 0.5
y 0.4
z 0.2

d 1.7
t 1.52
NRA 12Cs 12 precision 0.5
Given budget B, maximize result quality
Cs1, Cr 3 f SUM
TA 7Cs 7Cr 28 precision 0
Budget 10 ?
Michal Shmueli-Scheuer
8
Contributions
  • Sorted Accesses
  • Efficient Plan
  • Solution with Adaptive a
  • Sorted and Random Accesses
  • Efficient Plan
  • Solution with Adaptive a
  • Experiments

Michal Shmueli-Scheuer
9
Results Under Limited Budget
Results for limited budget
K results for unlimited
budget
Michal Shmueli-Scheuer
10
Efficient Plan- Sorted Accesses
  • Assume that we know the k results for unlimited
    budget (REXACT).
  • Plan L1,4 L2,2

Michal Shmueli-Scheuer
11
Efficient Plan- Sorted Accesses
  • Goal find plan t, such that

Plans for B5
Denoted as ROPT
Michal Shmueli-Scheuer
12
Sorted Accesses
  • Observations

L1
L2
L3
O1, SL1
O1, SL2
O2, SL1
O2, SL2
O2, SL3
Prefer high scores
Michal Shmueli-Scheuer
13
Observations contd.
titlewar descriptionweapon
Prefer large score reductions
Michal Shmueli-Scheuer
14
Score Utilities
Score gain
Score reduction
y 3
Michal Shmueli-Scheuer
15
Optimization Problem
  • Bi-objective optimization problem
  • util(Li,x) a gain (1-a) reduction
  • Heuristics
  • Fair Heuristic
  • Rank Heuristic

Where m is the number of lists
Michal Shmueli-Scheuer
16
Adaptive ?
gain
reduction
)?)
(1-?(
time
Michal Shmueli-Scheuer
17
Adaptive ?
top-k
o1 ws,bs
o2 ws,bs
d(o4) 0.8-0.60.2
o3 0.8,bs
candidates
hight1
o4 0.6,bs
hight2
o6 ws,bs
Theobald et al. VLDB04
Michal Shmueli-Scheuer
18
Adaptive ?
Michal Shmueli-Scheuer
19
Efficient Plan- Random Accesses
  • Observations
  • random accesses occur always after sorted
    accesses have been finished.

schedule 1 SARASA.
schedule 2 SASARA.
precision(schedule1) precision(schedule2)
Michal Shmueli-Scheuer
20
Observations- contd.
  • Random accesses are only useful to objects in
    REXACT.

top-k
L2
o1 ws,bs
o2, SL2
Precision reduced
o5 ws,bs
o5, Not in REXACT
o2 ws,bs
o5, SL2
candidates
o4 ws,bs
o1, SL2
o5 ws,bs
Precision remains the same
Michal Shmueli-Scheuer
21
Random Accesses
  • When to switch from SA to RA?

)?(
(1-?(
time
Michal Shmueli-Scheuer
22
Random Accesses
  • Switch from Sorted to Random
  • R (1- ?)S
  • S total cost of sorted accesses.
  • R total cost for random accesses.
  • Which items to access ?
  • maximize expected score.

Michal Shmueli-Scheuer
23
Experimental Data
  • TREC Terabyte
  • 25M webpages
  • 50 queries with average length of 3 words.
  • IMDB
  • 375,000 movies
  • 20 queries , each with 4 attributes Title,
    Genre, Actors, Description
  • Synthetic data
  • Zipf, lists 2,6, objects 10000,1000000
  • Aggregate Function Sum

Michal Shmueli-Scheuer
24
Evaluation Methods
  • percentage of optimal precision

Ropt
Rexact
Ralg
Ropt
  • SME

Michal Shmueli-Scheuer
25
Results- Sorted Accesses
TREC, k100
  • Less budget, more improvement

Michal Shmueli-Scheuer
26
Varied k
IMDB, B400
  • Lower K, more improvement.

Michal Shmueli-Scheuer
27
Number of Lists
Zipf, K100, B4000
  • More lists, more improvement.

Michal Shmueli-Scheuer
28
Results- Random Accesses
TREC, k100,Cr10
TREC, K100, Cr100
29
Related Works
  • Minimize budget for optimal results
  • the algorithm computes the exact results with
    minimum cost. (Bast et al. VLDB06, Bruno et al.
    ICDE02, Chang et al. SIGMOD02)
  • Dual problem.
  • Anytime top-k
  • The algorithm collects statistics during
    processing, which can be used to provide
    probabilistic guarantees at any time during
    processing. (Aray et al. VLDB07)
  • Do not do any optimizations.
  • Approximate top-k
  • approximate results with probabilistic
    guarantees. (Theobald et al. VLDB04, Fagin et al.
    2001)

Michal Shmueli-Scheuer
30
Conclusions
  • First attempt to deal with budget constraints.
  • For SA only, average precision around 70.
  • Tradeoff between RAs and SAs, for relatively low
    cost of RA, RA schedules are improved.

Michal Shmueli-Scheuer
31
Thank You !
32
(No Transcript)
33
Top-k query
  • Given a set of n objects and m scoring lists
    sorted in decreasing order, find the top-k
    objects according to a scoring function f
  • top-k a set T of k objects such that
    f(rj1,,rjm) f(ri1,,rim) for every object Xi
    in T and every object Xj not in T
  • Assumption The scoring function f is monotone
  • f(r1,,rm) f(r1,,rm) if ri ri for all I
  • Two accesses modes
  • sorted access Cs
  • random access - Cr
  • Objective Compute top-k with the minimum cost

34
Sorted Accesses
  • Observations
  • object with high scores has higher potential to
    be part of the top-k.
  • object with mediocre scores does not help.

Prefer high scores
35
Example
useless
36
Applications
  • Mobile Applications
  • Highly impatient users, need fast results.
  • Mediation Systems
  • Achieve high query throughput.
  • Online analytics (e.g. logs)
  • Achieve high query throughput.

Michal Shmueli-Scheuer
37
Motivating Example
Query throughput
Allocate time for each query
Given queries per time unit
38
Terminology
  1. Sorted Access
  2. Random Access
  3. highi
  4. Top-k queue
  5. Candidates queue
  6. mink
  7. worstScore(d)
  8. bestScore(d)

39
Efficient Offline Solution- Sorted
  • Goal find trace t, such that

L1
L2
B5
t1 0 5
t2 1 4
t3 2 3
t4 3 2
t5 4 1
t6 5 0
Denoted as ROPT
40
Efficient Offline Solution- Sorted
  • Goal find trace t, such that

B 5
L1
L2
t1 0 5
t2 1 4
t3 2 3
t4 3 2
t5 4 1
t6 5 0
  • Feasible for K up to 100, and m up to 10.

41
Efficient Offline Solution- Sorted
  • Proof (in negation)
  • Assume that t does not exists, and chose trace s
    that within the budget and has optimal precision.
    Assume s with traces si that are largest
    position of Pi less or equal to si.
  • By construction the score of any object in S is
    the same to S

42
Fair Heuristic
  • Assume budget b

Runs in batches
43
Efficient Offline Solution- Random
  • Budget for RAs (B-tCs)

Top-k
o1, S
o2, S
o3, S
o4, S
o10, S
o14, S
.
44
Motivation
  • Many applications work in budgeted constraint
    environments. Still, they wish to perform top-k
    queries.

Servers
Budget-aware Query processing
Mediator
Engine
User query
45
Future work
  • Different access costs for different lists
  • Time-aware top-k
  • Top-k with budget constraints for P2P
Write a Comment
User Comments (0)
About PowerShow.com