Processing Ranked Queries with the Minimum Space - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Processing Ranked Queries with the Minimum Space

Description:

A top-k query specifies two non-negative weights w1, w2. ... to be ranked in various ways, according to different weights on the dimensions. ... – PowerPoint PPT presentation

Number of Views:14
Avg rating:3.0/5.0
Slides: 31
Provided by: marily192
Category:

less

Transcript and Presenter's Notes

Title: Processing Ranked Queries with the Minimum Space


1
Processing Ranked Queries with the Minimum Space
  • Yufei Tao City University of Hong Kong
  • Marios Hadjieleftheriou Boston University

2
Outline
  • Top-k
  • Top-(k, K)

3
Top-k definition
  • A top-k query specifies two non-negative weights
    w1, w2.
  • If a point has coordinates (x, y), its score
    equals w1 ? x w2 ? y.
  • The goal is to report the k objects with the
    largest scores.
  • The definition extends to higher dimensionalities
    in a straightforward manner.

4
Top-k applications
  • Any application where multi-dimensional points
    need to be ranked in various ways, according to
    different weights on the dimensions.
  • Student ranking
  • Dimensions scores in courses algorithm,
    database,
  • Property sales
  • Price, size, security, distance to the town
    center,
  • Stock marketing
  • Price, expected growth percentage, risk,

5
An observation
  • In practice, users are typically interested in
    top-k queries with small k.
  • E.g., most often, people talk about top-10.
  • The largest k that may be queried is usually
    orders of magnitude smaller than the dataset
    cardinality.

6
Goal
  • Let c be the largest k that will be supported by
    our top-k system.
  • We aim at computing the union of the objects
    retrieved by all top-k queries, when k is at most
    c.
  • Note that, even if k is fixed, there are still an
    infinite number of queries, which differ in
    weights.
  • For example, if c 1, the unionconstitutes the
    convex hull.

7
Why the goal?
  • Goal compute the union of the objects retrieved
    by all top-k queries, when k is at most c.
  • We refer to the union as the minimum covering
    subset.
  • Facts
  • The subset contains the data that must be
    retained by any algorithm that correctly answers
    all top-k queries, for k c.
  • Once the subset is computed, we immediately
    improve the asymptotical performance of any
    algorithm.
  • E.g., if the algorithm answers a top-k query in
    log(n) time, now it takes log(m) time, where m is
    the size of the subset.

8
Previous results
  • Heuristic results?
  • Many!
  • A very well studied problem in, for example,
    databases.
  • These solutions focus on deriving heuristics that
    help solve the problem with small cost in most
    cases, but not in the worst case.
  • Theoretical results?
  • See next.

9
Onion Chang et al. SIGMOD 2000
10
Ranking Index Tsaparas et al. ICDE 2003
11
Our result 1
  • The minimum covering subset
  • The result holds in any dimensionality.

12
Example
  • c 2
  • PCH(D) p2, p3, p4, p5
  • Add the above result into the 2-minimumcovering
    subset.
  • Let us remove p2 from D.
  • PCH(D - p2) p3, p4, p5
  • Union the above result into the
    2-minimumcovering subset, which incurs no
    change.
  • Then, compute PCH(D - p3)
  • Eventually, the 2-minimum covering subset p2,
    p3, p4, p5

13
Our result 2
  • Time for computing the minimum covering subset
  • where ? is the cost of computing the convex
    hull, and ? the size of the convex hull.
  • Remember this works for any dimensionality.

14
An alternative approach
  • Next, lets try to do better in 2D space.
  • The following algorithm has an interesting
    property It is incremental.
  • It first finds the 1-minimum covering subset,
    then the 2-, 3-,

15
Slope space
  • Weights 10, 20 retrieve the same top-k results as
    weights 1, 2.

16
Top-k decomposition
  • An example with k 1.
  • 0, slope of l1), slope of l1, slope of l2),
    slope of l1, slope of l3), slope of l3, 8)
  • All slopes in the same interval have the same
    top-1 result.
  • In the same way, we can also define top-k
    decomposition.

17
Tail set
  • Illustration of tail set.

18
Our result 3
19
Our result 4
  • The 2D minimum covering subset can be computed in
  • O(? ? m ? c)
  • time, where m is the size of the subset.
  • Compare it with our earlier result
  • A side result the total number of top-c results
    is O(m).

20
A 2D ranked index
21
Our result 5
  • Any top-k query with k c can be answered in
  • O(logB(m / B) c / B)
  • I/Os, using
  • O(m / B)
  • space, where m is the size of the c-minimum
    covering subset.

22
Outline
  • Top-k
  • Top-(k, K)

23
Approximate top-k retrieval
  • In practice, slight (bounded) imprecision in the
    top-k result can often be tolerated
  • especially if retrieval of such results
    requires less space and computation overhead.
  • We propose top-(k, K) search.
  • For a top-(k, K) query, a correct result contains
    any k objects among the top-K objects.
  • For example, for a top-(1, 10) query, we may
    return any of the top-1, top-2, , and top-10
    objects.
  • Quality guarantee even in the worst case, the
    returned object is the top-10.

24
Our goal
  • Find a small subset of the database that is
    sufficient for answering all top-(k, C) queries
    correctly, for any k c, where c and C are
    system constants, and c C.
  • c controls the largest number of objects
    returned, and C determines the worst-case quality
    of the returned objects.
  • For instance, if c 5 and C 10, then
    regardless of the query weights, the subset
    should allow retrieval of 5 objects, each of
    which is no worse than a top-10 object.
  • In general, we refer to a subset satisfying our
    requirement as (c, C)-covering subset.

25
Obvious results
  • Result 1
  • is a subset that can answer any top-(k, C) query
    with k c.
  • But the subset may not be the minimum (c,
    C)-covering subset.
  • Result 2 the minimum subset must be a subset of

26
Our result
  • We do not have a solution for finding the minimum
    (c, C)-covering subset
  • but, in 2D space, we can find a covering subset
    that is larger than the minimum one by a factor
    of
    ln(? 1), where ? is the size of the minimum
    subset.

27
Idea of our algorithm
  • Assume c 1, C 2.
  • Compute the top-2 decomposition.
  • Region 1 p2, p4
  • Region 2 p2, p1
  • Region 3 p1, p3
  • Let us consider region 1.
  • Both p2 and p4 can be a legal answerfor a
    top-(1, 2) query whose queryvector falls in this
    region.

28
Idea of our algorithm (cont.)
  • Region 1 p2, p4
  • Region 2 p2, p1
  • Region 3 p1, p3
  • We associate p1 with a legal set Region 3
  • p2 with Regions 1, 2
  • p3 with Region 3
  • p4 with Region 1.
  • Goal identify the smallest number ofpoints such
    that the union of theirlegal sets contains all
    regions.
  • E.g., p2, p3.
  • A minimal set cover problem.

29
Query processing
30
Conclusions
  • top-(k, K) retrieval
Write a Comment
User Comments (0)
About PowerShow.com