Top-k Query Processing - PowerPoint PPT Presentation

About This Presentation
Title:

Top-k Query Processing

Description:

Top-k Query Processing Optimal aggregation algorithms for middleware Ronald Fagin, Amnon Lotem, and Moni Naor + Sushruth P. + Arjun Dasgupta – PowerPoint PPT presentation

Number of Views:158
Avg rating:3.0/5.0
Slides: 29
Provided by: W160
Learn more at: http://www.cs.ucr.edu
Category:

less

Transcript and Presenter's Notes

Title: Top-k Query Processing


1
Top-k Query Processing
  • Optimal aggregation algorithms for middleware
  • Ronald Fagin, Amnon Lotem, and Moni Naor

Sushruth P. Arjun Dasgupta
2
Why top-k query processing
  • Multimedia brings fuzzy data
  • attribute values are graded typically 0,1
  • No clear boundary between answer / no answer
  • A query in a multimedia database means combining
    graded attributes
  • Combine attributes by aggregation function
  • Aggregation function gives overall grade of
    object
  • Return k objects with highest overall grade

Example
3
Top-k query processing
Top-k query processing Finding k objects
that have the highest overall grades
  • How ? ? Which algorithms?
  • Fagins Algorithm (FA)
  • Threshold Algorithm (TA)
  • Which is the best algorithm?
  • Keep in mind Database system serves as
    middleware
  • Multimedia (objects) may be kept in different
    subsystems
  • e.g. photoDB, videoDB, search engine
  • Take into account the limitations of these
    subsystems

4
Example
  • Simple database model
  • Simple query
  • Explaining Fagins Algorithm (FA)
  • Finding top-k with FA
  • Explaining Threshold Algortihm (TA)
  • Finding top-k with TA

5
Example Simple Database model
Sorted L1
Sorted L2
6
Example Simple Query
Find the top 2 (k 2) objects on the following
query executed on the middleware A1 A2
(eg colorred shaperound)
A1 A2 as a query to the middleware results in
the middelware combining the grades of A1 en A2
by min(A1, A2)
  • Aggregation function
  • function that gives objects an overall grade
    based on attribute grades
  • examples min, max functions
  • Monotonicity!

7
Example Fagins Algorithm
  • STEP 1
  • Read attributes from every sorted list
  • Stop when k objects have been seen in common
    from all lists

ID
a
0.85
0.9
d
0.9
b
0.8
0.7
0.72
c
8
Example Fagins Algortihm
  • STEP 2
  • Random access to find missing grades

a
0.85
0.9
0.6
d
0.9
b
0.8
0.7
0.72
0.2
9
Example Fagins Algortihm
  • STEP 3
  • Compute the grades of the seen objects.
  • Return the k highest graded objects.

L1
L2
(a, 0.9)
(b, 0.8)
a
0.85
0.85
0.9
(c, 0.72)
0.6
0.6
d
0.9
. . . .
b
0.8
0.7
0.7
0.2
0.2
0.72
(d, 0.6)
10
New Idea !!! Threshold Algorithm (TA)
  • Read all grades of an object once seen from a
    sorted access
  • No need to wait until the lists give k common
    objects
  • Do sorted access (and corresponding random
    accesses) until you have seen the top k answers.
  • How do we know that grades of seen objects are
    higher than the grades of unseen objects ?
  • Predict maximum possible grade unseen objects

L2
L1
a 0.9
Seen
b 0.8
c 0.72
T min(0.72, 0.7) 0.7
. . . .
f 0.6
f 0.65
Possibly unseen
Threshold value
d 0.6
11
Example Threshold Algorithm
Step 1 - parallel sorted access to each list
For each object seen - get all
grades by random access - determine
Min(A1,A2) - amongst 2 highest seen ? keep in
buffer
a
0.9
0.85
0.85
d
0.9
0.6
0.6
12
Example Threshold Algorithm
Step 2 - Determine threshold value based on
objects currently seen under
sorted access. T min(L1, L2)
- 2 objects with overall grade threshold value
? stop else go to next entry position in sorted
list and repeat step 1
0.85
0.85
0.6
0.6
T min(0.9, 0.9) 0.9
13
Example Threshold Algorithm
Step 1 (Again) - parallel sorted access to each
list
For each object seen - get all
grades by random access - determine
Min(A1,A2) - amongst 2 highest seen ? keep in
buffer
a
0.9
0.85
0.85
d
0.9
0.6
0.6
b
0.8
0.7
0.7
14
Example Threshold Algorithm
Step 2 (Again) - Determine threshold value based
on objects currently seen. T
min(L1, L2)
- 2 objects with overall grade threshold value
? stop else go to next entry position in sorted
list and repeat step 1
0.85
0.85
0.7
0.8
T min(0.8, 0.85) 0.8
15
Example Threshold Algorithm
Situation at stopping condition
0.85
0.85
0.7
0.8
T min(0.72, 0.7) 0.7
16
  • Comparison of Fagins and Threshold Algorithm
  • TA sees less objects than FA
  • TA stops at least as early as FA
  • When we have seen k objects in common in FA,
    their grades are higher or equal than the
    threshold in TA.
  • TA may perform more random accesses than FA
  • In TA, (m-1) random accesses for each object
  • In FA, Random accesses are done at the end, only
    for missing grades
  • TA requires only bounded buffer space (k)
  • At the expense of more random seeks
  • FA makes use of unbounded buffers

17
The best algorithm
  • Which algorithm is the best TA, FA??
  • Define best
  • middleware cost
  • concept of instance optimality
  • Consider
  • wild guesses
  • aggregation functions characteristics
  • Monotone, strictly monotone, strict
  • database restrictions
  • distinctness property

18
The best algorithm concept of optimality
A class of algorithms, A ? A represents an
algorithm
D legal inputs to algorithms (databases), D ?
D represents a database
middleware cost cost for processing data
subsystems scS rcR
Cost(A,D ) middleware cost when running
algorithm A over database D
19
The best algorithm instance optimality wild
guesses
  • Intuitively B instance optimal always the
    best algorithm in A
  • always optimal
  • In reality always is always ? we will
    exclude wild guesses algorithms
  • Wild guess random access on object not
    previously encounter by sorted
    access
  • In practice not possible
  • Database need to know ID to do random access
  • If wild guesses allowed in A then no algorithm
    can be instance optimal
  • Wild guesses can find top-k objects by km
    random accesses
  • (k objects , m lists)

20
The best algorithm aggregation functions
  • Aggregation function t combines object grades
    into objects overall grade
  • x1,,xm t(x1,,xm)
  • Monotone
  • t(x1,,xm) t(x1,,xm) if xi xi for every
    i
  • Strictly monotone
  • t(x1,,xm) lt t(x1,,xm) if xi lt xi for every
    i
  • Strict
  • t(x1,,xm) 1 precisely when xi 1 for every
    i

21
The best algorithm database restrictions
Distinctness property A database has no (sorted)
attribute list in which two objects have the same
grade
22
The best algorithm Fagins Algorithm
  • - Database with N objects, each with m
    attributes.
  • - Orderings of lists are independent
  • FA finds top-k with middleware cost
    O(N(m-1)/mk1/m)
  • FA optimal with high probability in the worst
    case for strict monotone aggregation
    functions

23
The best algorithm Threshold Algorithm
  • TA instance optimal (always optimal) for every
    monotone
    aggregation
    function, over every database (excluding wild
    guesses)
  • optimal in much stronger sense than Fagins
    Algorithm
  • If strict monotone aggregation function
  • Optimality ratio m m (m-1)cR/cs best
    possible (m attributes)
  • If random acces not possible (cr 0 ) ?
    optimality ratio m
  • If sorted access not possible (cs 0) ?
    optimality ratio infinite
  • ? TA not instance optimal
  • TA instance optimal (always optimal) for every
    strictly monotone aggregation function, over
    every database (including wild guesses) that
    satisfies the distinctness property
  • Optimality ratio cm2 with c max cR/cS,
    cS/cR

24
Extending TA
  • What if sorted access is restricted ? e.g. use
    distance database
  • TA z
  • What if random access not possible? e.g. web
    search engine
  • No Random Access Algorithm
  • What if we want only the approximate top k
    objects?
  • TA?
  • What if we consider relative costs of random and
    sorted access?
  • Combined Algorithm (between TA and NRA)

25
NRA
  • What if we also want the scores?

26
Combined Algorithm (CA)
CA in instance optimal
27
Approximation
  • ?-approximation to the top k answers for the
  • aggregation function t is a collection of k
    objects (each along with its grade) such that for
    each y among these k objects and each z not among
    these k objects, ? t(y)gtt(z)
  • T ? As soon as at least k objects have been
    seen whose grade is at least equal to threshold/
    ? then halt.

28
  • ?
Write a Comment
User Comments (0)
About PowerShow.com