Evaluating Top-k Queries over Web-Accessible Databases - PowerPoint PPT Presentation

About This Presentation
Title:

Evaluating Top-k Queries over Web-Accessible Databases

Description:

Evaluating Top-k Queries over Web-Accessible Databases ... GetScoreR2(q,o1) Threshold = 0.95. 0.5. 0.56. GetNextS(q) Threshold = 0.9. o2. 0.8. GetScoreR1(q,o2) ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 34
Provided by: amelie3
Category:

less

Transcript and Presenter's Notes

Title: Evaluating Top-k Queries over Web-Accessible Databases


1
Evaluating Top-k Queries over Web-Accessible
Databases
  • Nicolas Bruno
  • Luis Gravano
  • Amélie Marian
  • Columbia University

2
Top-k Queries Natural in Many Scenarios
  • Example NYC Restaurant Recommendation Service.
  • Goal Find best restaurants for a user
  • Close to address 2290 Broadway
  • Price around 25
  • Good rating

Query Specification of Flexible Preferences
Answer Best k Objects for Distance Function
3
Attributes often Handled by External Sources
  • MapQuest returns the distance between two
    addresses.
  • NYTimes Review gives the price range of a
    restaurant.
  • Zagat gives a food rating to the restaurant.

4
Top-k Query Processing Challenges
  • Attributes handled by external sources (e.g.,
    MapQuest distance).
  • External sources exhibit a variety of interfaces
    (e.g., NYTimes Review, Zagat).
  • Existing algorithms do not handle all types of
    interfaces.

5
Processing Top-k Queries over Web-Accessible Data
Sources
  • Data and query model
  • Algorithms for sources with different interfaces
  • Our new algorithm Upper
  • Experimental results

6
Data Model
  • Top-k Query assignment of weights and target
    values to attributes

lt 25, 2290 Broadway, very good gt
close to address
preferred price
preferred rating
weights lt4, 1, 2gt
Combined in scoring function
price most important attribute
7
Sorted Access Source S
  • Return objects sorted by scores for a given
    query.
  • Example Zagat

GetNextS interface
S-Source Access Time tS(S)
8
Random Access Source R
  • Return the score of a given object for a given
    query.
  • Example MapQuest

GetScoreR interface
R-Source Access Time tR(R)
9
Query Model
  • Attributes scores between 0 and 1.
  • Sequential access to sources.
  • Score Ties broken arbitrarily.
  • No wild guesses.
  • One S-Source (or SR-Source) and multiple
    R-sources. (More on this later.)

10
Query Processing Goals
  • Processing top-k queries over R-Sources.
  • Returning exact answer to top-k query q.
  • Minimizing query response time.
  • Naïve solution too expensive (access all sources
    for all objects).

11
Example NYC Restaurants
  • S-Source
  • Zagat restaurants sorted by food rating.
  • R-Sources
  • MapQuest distance between two input addresses.
  • User address 2290 Broadway
  • NYTimes Review price range of the input
    restaurant.
  • Target Value 25

12
TA Algorithm for SR-Sources
Fagin, Lotem, and Naor (PODS 2001)
  • Perform sorted access sequentially to all
    SR-Sources
  • Completely probe every object found for all
    attributes using random access.
  • Keep best k objects.
  • Stop when scores of best k objects are no less
    than maximum possible score of unseen objects
    (threshold).

Does NOT handle R-Sources
13
Our Adaptation of TA Algorithm for R-Sources
TA-Adapt
  • Perform sorted access to S-Source S.
  • Probe every R-Source Ri for newly found object.
  • Keep best k objects.
  • Stop when scores of best k objects are no less
    than maximum possible score of unseen objects
    (threshold).

14
An Example Execution of TA-Adapt
Object S(Zagat) R1(MQ) R2(NYT) Final Score



Threshold 1
Total Execution Time 9
tS(S)tR(R1)tR(R2)1, wlt3, 2, 1gt, k1 Final
Score (3.scoreZagat 2.scoreMQ 1.scoreNYT)/6
15
Improvements over TA-Adapt
  • Add a shortcut test after each random-access
    probe (TA-Opt).
  • Exploit techniques for processing selections with
    expensive predicates (TA-EP).
  • Reorder accesses to R-Sources.
  • Best weight/time ratio.

16
The Upper Algorithm
  • Selects a pair (object,source) to probe next.
  • Based on the property

The object with the highest upper bound will be
probed before top-k solution is reached.
17
An Example Execution of Upper
Object Upper Bound S(Zagat) R1(MQ) R2(NYT) Final Score



Threshold 1
Total Execution Time 6
tS(S)tR(R1)tR(R2)1, wlt3, 2, 1gt, k1 Final
Score (3.scoreZagat 2.scoreMQ 1.scoreNYT)/6
18
The Upper Algorithm
  • Choose object with highest upper bound.
  • If some unseen object can have higher upper
    bound
  • Access S-Source S
  • Else
  • Access best R-Source Ri for chosen object
  • Keep best k objects
  • If top-k objects have final values higher than
    maximum possible value of any other object,
    return top-k objects.

Interleaves accesses on objects
19
Selecting the Best Source
  • Upper relies on expected values to make its
    choices.
  • Upper computes best subset of sources that is
    expected to
  • Compute the final score for k top objects.
  • Discard other objects as fast as possible.
  • Upper chooses best source in best subset.
  • Best weight/time ratio.

20
Experimental Setting Synthetic Data
  • Attribute scores randomly generated (three data
    sets uniform, gaussian and correlated).
  • tR(Ri) integer between 1 and 10.
  • tS(S) ? 0.1, 0.2,,1.0.
  • Query execution time ttotal
  • Default k50, 10000 objects, uniform data.
  • Results average ttotal of 100 queries.
  • Optimal assumes complete knowledge
  • (unrealistic, but useful performance bound)

21
Experiments Varying Number of Objects Requested k
22
Experiments Varying Number of Database Objects N
23
Experimental Setting Real Web Data
  • S-Source Verizon Yellow Pages
  • (sorted by distance)
  • R-Sources

Subway Navigator Subway time
Altavista Popularity
MapQuest Driving time
NYTimes Review Food and price ratings
Zagat Food, Service, Décor and Price ratings
24
Experiments Real-Web Data
of Random Accesses
25
Evaluation Conclusions
  • TA-EP and TA-Opt much faster than TA-Adapt.
  • Upper significantly better than all versions of
    TA.
  • Upper close to optimal.
  • Real data experiments Upper faster than TA
    adaptations.

26
Conclusion
  • Introduced first algorithm for top-k processing
    over R-Sources.
  • Adapted TA to this scenario.
  • Presented new algorithms Upper and Pick (see
    paper)
  • Evaluated our new algorithms with both real and
    synthetic data.
  • Upper close to optimal

27
Current and Future Work
  • Relaxation of the Source Model
  • Current source model limited
  • Any number of R-Sources and SR-Sources
  • Upper has good results even with only SR-Sources
  • Parallelism
  • Define a query model for parallel access to
    sources
  • Adapt our algorithms to this model
  • Approximate Queries

28
References
  • Top-k Queries
  • Evaluating Top-k Selection Queries, S. Chaudhuri
    and L. Gravano. VLDB 1999
  • TA algorithm
  • Optimal Aggregation Algorithms for Middleware,
    R. Fagin, A. Lotem, and M. Naor. PODS 2001
  • Variations of TA
  • Query Processing Issues on Image (Multimedia)
    Databases, S. Nepal and V. Ramakrishna. ICDE 1999
  • Optimizing Multi-Feature Queries for Image
    Databases, U. Güntzer, W.-T. Balke, and
    W.Kießling. VLDB 2000
  • Expensive Predicates
  • Predicate Migration Optimizing queries with
    Expensive Predicates, J.M. Hellerstein and M.
    Stonebraker. SIGMOD 1993

29
Real-web Experiments
30
Real-web Experiments with Adaptive Time
31
Relaxing the Source Model
TA-EP
Upper
32
Upcoming Journal Paper
  • Variations of Upper
  • Select best source
  • Data Structures
  • Complexity Analysis
  • Relaxing Source Model
  • Adaptation of our Algorithms
  • New Algorithms
  • Variations of Data and Query Model to handle real
    web data

33
Optimality
  • TA instance optimal over
  • Algorithms that do not make wild guesses.
  • Databases that satisfy the distinctness property.
  • TAZ instance optimal over
  • Algorithms that do not make wild guesses.
  • No complexity analysis of our algorithms, but
    experimental evaluation instead
Write a Comment
User Comments (0)
About PowerShow.com