BestEffort Topk Query Processing Under Budgetary Constraints - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

BestEffort Topk Query Processing Under Budgetary Constraints

Description:

mink best-score of candidates. Michal Shmueli-Scheuer. 6. NRA algorithm (Fagin et al. ... mink. worstScore(d) bestScore(d) 39. Efficient Offline Solution- Sorted ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 46
Provided by: ics9
Learn more at: http://www.ics.uci.edu
Category:

less

Transcript and Presenter's Notes

Title: BestEffort Topk Query Processing Under Budgetary Constraints


1
  • Best-Effort Top-k Query Processing Under
    Budgetary Constraints

Michal Shmueli-Scheuer (IBM Haifa Research Lab
and UCI)
Yosi Mass, Haggai Roitman
Chen Li
Ralf Schenkel, Gerhard Weikum
2
Motivating Example
Mediation Systems
Achieve high query throughput.
Top-k
Top-k
results
queries
Engine
Online Analytics (e.g. logs)
Achieve high query throughput.
Michal Shmueli-Scheuer
3
Traditional top-k query
  • Pre-computed lists over multiple attributes.
  • Combine scores by some monotonic aggregation
    function.
  • Two accesses modes
  • sorted access (Cs)
  • random access (Cr)
  • Objective Compute k objects with highest scores.

sorted
n
m
Michal Shmueli-Scheuer
4
NRA algorithm (Fagin et al.)
Top-2
Best score
Worst score
highi
f SUM
mink
candidates
mink gt best-score of candidates
Michal Shmueli-Scheuer
5
NRA algorithm (Fagin et al.)
Top-2
Best score
Worst score
highi
mink
candidates
mink gt best-score of candidates
Michal Shmueli-Scheuer
6
NRA algorithm (Fagin et al.)
Top-2
Best score
Worst score
highi
mink
candidates
mink gt best-score of candidates
Michal Shmueli-Scheuer
7
Top-k with Budget Constraints
Top-2
NRA 12Cs 12 precision 0.5
Given budget B, maximize result quality
Cs1, Cr 3 f SUM
TA 7Cs 7Cr 28 precision 0
Budget 10 ?
Michal Shmueli-Scheuer
8
Contributions
  • Sorted Accesses
  • Efficient Plan
  • Solution with Adaptive a
  • Sorted and Random Accesses
  • Efficient Plan
  • Solution with Adaptive a
  • Experiments

Michal Shmueli-Scheuer
9
Results Under Limited Budget
Results for limited budget
K results for unlimited
budget
Michal Shmueli-Scheuer
10
Efficient Plan- Sorted Accesses
  • Assume that we know the k results for unlimited
    budget (REXACT).
  • Plan L1,4 L2,2

Michal Shmueli-Scheuer
11
Efficient Plan- Sorted Accesses
  • Goal find plan t, such that

Plans for B5
Denoted as ROPT
Michal Shmueli-Scheuer
12
Sorted Accesses
  • Observations

L1
L2
L3
O1, SL1
O1, SL2
O2, SL1
O2, SL2
O2, SL3
Prefer high scores
Michal Shmueli-Scheuer
13
Observations contd.
titlewar descriptionweapon
Prefer large score reductions
Michal Shmueli-Scheuer
14
Score Utilities
Score gain
Score reduction
y 3
Michal Shmueli-Scheuer
15
Optimization Problem
  • Bi-objective optimization problem
  • util(Li,x) a gain (1-a) reduction
  • Heuristics
  • Fair Heuristic
  • Rank Heuristic

Where m is the number of lists
Michal Shmueli-Scheuer
16
Adaptive ?
gain
reduction
)?)
(1-?(
time
Michal Shmueli-Scheuer
17
Adaptive ?
top-k
o1 ws,bs
o2 ws,bs
d(o4) 0.8-0.60.2
o3 0.8,bs
candidates
hight1
o4 0.6,bs
hight2
o6 ws,bs
Theobald et al. VLDB04
Michal Shmueli-Scheuer
18
Adaptive ?
Michal Shmueli-Scheuer
19
Efficient Plan- Random Accesses
  • Observations
  • random accesses occur always after sorted
    accesses have been finished.

schedule 1 SARASA.
schedule 2 SASARA.
precision(schedule1) precision(schedule2)
Michal Shmueli-Scheuer
20
Observations- contd.
  • Random accesses are only useful to objects in
    REXACT.

top-k
L2
o1 ws,bs
o2, SL2
Precision reduced
o5 ws,bs
o5, Not in REXACT
o2 ws,bs
o5, SL2
candidates
o4 ws,bs
o1, SL2
o5 ws,bs
Precision remains the same
Michal Shmueli-Scheuer
21
Random Accesses
  • When to switch from SA to RA?

)?(
(1-?(
time
Michal Shmueli-Scheuer
22
Random Accesses
  • Switch from Sorted to Random
  • R (1- ?)S
  • S total cost of sorted accesses.
  • R total cost for random accesses.
  • Which items to access ?
  • maximize expected score.

Michal Shmueli-Scheuer
23
Experimental Data
  • TREC Terabyte
  • 25M webpages
  • 50 queries with average length of 3 words.
  • IMDB
  • 375,000 movies
  • 20 queries , each with 4 attributes Title,
    Genre, Actors, Description
  • Synthetic data
  • Zipf, lists 2,6, objects 10000,1000000
  • Aggregate Function Sum

Michal Shmueli-Scheuer
24
Evaluation Methods
  • percentage of optimal precision

Ropt
Rexact
Ralg
Ropt
  • SME

Michal Shmueli-Scheuer
25
Results- Sorted Accesses
TREC, k100
  • Less budget, more improvement

Michal Shmueli-Scheuer
26
Varied k
IMDB, B400
  • Lower K, more improvement.

Michal Shmueli-Scheuer
27
Number of Lists
Zipf, K100, B4000
  • More lists, more improvement.

Michal Shmueli-Scheuer
28
Results- Random Accesses
TREC, k100,Cr10
TREC, K100, Cr100
29
Related Works
  • Minimize budget for optimal results
  • the algorithm computes the exact results with
    minimum cost. (Bast et al. VLDB06, Bruno et al.
    ICDE02, Chang et al. SIGMOD02)
  • Dual problem.
  • Anytime top-k
  • The algorithm collects statistics during
    processing, which can be used to provide
    probabilistic guarantees at any time during
    processing. (Aray et al. VLDB07)
  • Do not do any optimizations.
  • Approximate top-k
  • approximate results with probabilistic
    guarantees. (Theobald et al. VLDB04, Fagin et al.
    2001)

Michal Shmueli-Scheuer
30
Conclusions
  • First attempt to deal with budget constraints.
  • For SA only, average precision around 70.
  • Tradeoff between RAs and SAs, for relatively low
    cost of RA, RA schedules are improved.

Michal Shmueli-Scheuer
31
Thank You !
32
(No Transcript)
33
Top-k query
  • Given a set of n objects and m scoring lists
    sorted in decreasing order, find the top-k
    objects according to a scoring function f
  • top-k a set T of k objects such that
    f(rj1,,rjm) f(ri1,,rim) for every object Xi
    in T and every object Xj not in T
  • Assumption The scoring function f is monotone
  • f(r1,,rm) f(r1,,rm) if ri ri for all I
  • Two accesses modes
  • sorted access Cs
  • random access - Cr
  • Objective Compute top-k with the minimum cost

34
Sorted Accesses
  • Observations
  • object with high scores has higher potential to
    be part of the top-k.
  • object with mediocre scores does not help.

Prefer high scores
35
Example
useless
36
Applications
  • Mobile Applications
  • Highly impatient users, need fast results.
  • Mediation Systems
  • Achieve high query throughput.
  • Online analytics (e.g. logs)
  • Achieve high query throughput.

Michal Shmueli-Scheuer
37
Motivating Example
Query throughput
Allocate time for each query
Given queries per time unit
38
Terminology
  • Sorted Access
  • Random Access
  • highi
  • Top-k queue
  • Candidates queue
  • mink
  • worstScore(d)
  • bestScore(d)

39
Efficient Offline Solution- Sorted
  • Goal find trace t, such that

L1
L2
B5
Denoted as ROPT
40
Efficient Offline Solution- Sorted
  • Goal find trace t, such that

B 5
L1
L2
  • Feasible for K up to 100, and m up to 10.

41
Efficient Offline Solution- Sorted
  • Proof (in negation)
  • Assume that t does not exists, and chose trace s
    that within the budget and has optimal precision.
    Assume s with traces si that are largest
    position of Pi less or equal to si.
  • By construction the score of any object in S is
    the same to S

42
Fair Heuristic
  • Assume budget b

Runs in batches
43
Efficient Offline Solution- Random
  • Budget for RAs (B-tCs)

Top-k
o1, S
o2, S
o3, S
o4, S
o10, S
o14, S
.
44
Motivation
  • Many applications work in budgeted constraint
    environments. Still, they wish to perform top-k
    queries.

Servers
Budget-aware Query processing
Mediator
Engine
User query
45
Future work
  • Different access costs for different lists
  • Time-aware top-k
  • Top-k with budget constraints for P2P
Write a Comment
User Comments (0)
About PowerShow.com