Supporting Topk join Queries in Relational Databases - PowerPoint PPT Presentation

About This Presentation
Title:

Supporting Topk join Queries in Relational Databases

Description:

Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid. Presented by: Z. Joseph, CSE-UT Arlington ... relational databases' - Ihab Ilyas, Walid Aref, Ahmed ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 14
Provided by: wha93
Learn more at: https://crystal.uta.edu
Category:

less

Transcript and Presenter's Notes

Title: Supporting Topk join Queries in Relational Databases


1
Supporting Top-k join Queries in Relational
Databases
  • Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid

Presented by Z. Joseph, CSE-UT Arlington
2
Introduction
  • Often searches are done on multiple features
  • Each feature produces a different ranking for the
    query
  • Must thus join and aggregate rankings on
    different features

3
Example
  • Find location for a house such that the
    combination of the cost of the house and 5 years
    tuition at a nearby school is minimal.
  • Exact location is not predefined in query, per
    location the house and school features would have
    to be analyzed.

4
Motivation
  • Current techniques decouple join and sorting
    (ranking) of results.
  • Sorting is expensive and is a blocking operation.
  • More apparent if ranking and the joining features
    are different.

5
Rank-Join Algorithm
  • Generate new valid join combinations
  • Compute score for each combination
  • For each incoming input, calculate the total
    score of
  • The last seen feature value and the top ranked
    feature value for all other features in the
    query.
  • Store the maximum of these as T (threshold)
  • Store top k in priority queue.
  • Halt when lowest value of queue T

6
Optimality
  • Is Instance Optimal over all correct top-K join
    algorithms.
  • Guarantees that cost of Rank-Join is O (cost of
    any other algorithm).
  • Mathematically
  • Cost(Rank-Join) cCost(Any Other Algorithm)
    c
  • c is the optimality ratio
  • c, c gt 0

7
Rank-Join Continued
  • Join strategy crucial
  • Recommended Ripple Join
  • Alternates between tuples
  • Flexible in the way it sweeps out (rectangular,
    etc)
  • Retains ordering in considering samples
  • Variant of Rank-Join
  • Hash Rank Join (HRJN)
  • Block Ripple Join

8
Hash Rank Join (HRJN) Operator
  • Built on idea of hash ripple join
  • Inputs are as two hash tables
  • Maintains highest (first) and lowest (last
    selected) objects from each relation.
  • Results are added to a priority queue
  • Advantages
  • Smaller space requirement
  • Can be pipelined

9
Hash Rank Join (HRJN) Operator Problems
  • Local Ranking Problem
  • Results from three or more input streams
  • Larger queue sizes
  • More database accesses
  • Buffer Problem
  • Cannot predict how many partial joins will result

10
HRJN Solutions?
  • Block Ripple Joins
  • Do comparisons as blocks
  • Score-Guided Strategy
  • If thresholds are very different, then this may
    be because of the way one of the rankings is
    larger and descends at a slower rate
  • Can then take more inputs from the slower growing
    ranking so that the threshold goes closer to the
    other thresholds

11
Optimal Join-Order
  • Try to have the least number of input records in
    order to get a correct ranking
  • No clear way of estimating the order of joins
  • Have a heuristic Footrule Distance
  • Simple measure of similarity among two rankings.
  • First join the most similar rankings
  • This would quickly yield a join by accessing
    fewer records

12
Rank-Join Algorithm Benefits
  • What can it do?
  • Integrates well with query plans
  • Produces results as fast as possible
  • Provides performance guarantees
  • Minimizes space requirements
  • Offers a mechanism to determine the best order of
    joining to execute query optimally.
  • Can be improved further if random access is
    available
  • Can eliminate on-the-fly duplicate elimination

13
References
  • Supporting top-k join queries in relational
    databases - Ihab Ilyas, Walid Aref, Ahmed
    Elmagarmid (2004)
  • Jing Chen DBIR Spring 2005, CSE-UT Arlington
    http//ranger.uta.edu/gdas/Courses/
    Spring2005/DBIR/slides/top-k_join.ppt
Write a Comment
User Comments (0)
About PowerShow.com