Skyline Queries - PowerPoint PPT Presentation

About This Presentation
Title:

Skyline Queries

Description:

Bitmap-based Algorithm. B -tree-based Algorithm. They can return the first ... Both bitmap and tree-base returns ... Bitmap representation. Do (4,8) is a ... – PowerPoint PPT presentation

Number of Views:975
Avg rating:3.0/5.0
Slides: 42
Provided by: Eri6
Category:

less

Transcript and Presenter's Notes

Title: Skyline Queries


1
Skyline Queries
  • HKU CSIS DB Seminar
  • 9 April 2003
  • Speaker Eric Lo

2
Skyline
  • A new operator in database systems
  • Filters out a set of interesting points from a
    potential large set of data points
  • A data point is interesting if it is not
    dominated by any other point

3
Example
  • Find some good places for us to hold the next DB
    Seminar
  • Dataset (Table Homes)

Home Distance from HKU Area (m2)
Kevin 1 km 10
Ben 9 km 100
Felix 5 km 2
K.K Loo 8 km 250
  • Good ? Close to HKU (Min.)
  • Good ? Max. Area (Max.)
  • Return those homes that are not worse than any
    others in ALL DIMENSIONS

4
Outline
  • Introduction to Skyline Queries
  • Skyline Operator in SQL
  • Implementation Algorithms
  • Progressive Algorithms
  • Variations of Skyline Queries
  • Experimental Result
  • Conclusion

5
The Skyline OperatorICDE 2001S.Borzonyi,
D.Kossmann, K.Stocker
  1. Define the skyline operator in databases
  2. Extension of SQL for skyline
  3. Block-nested-loop Algorithm
  4. Divide-and-conquer Algorithm

6
Problem Definition
  • Related to
  • maximum vector problem
  • contour problem
  • convex hull of a data set
  • Assume the whole dataset fits in the memory

7
SQL Extensions
  • SELECT FROM WHERE GROUP BY HAVING
    SKYLINE OF DISTINCT d1 MIN MAX,
    dm MIN MAXORDER BY
  • d1 dm denote the dimensions participate the
    Skyline
  • SELECT FROM HOMESWHERE CITYHKSKYLINE OF
    DIST MIN, AREA MAX

8
Naïve Approach for Skyline
  • 1D skyline is equivalent to computing min, max in
    SQL
  • Naïve 2D skyline
  • Sort the data according to the 2 dimensions
  • Compare every tuple with its predecessor
  • Sorting may need in 2 or more passes if the data
    are not fit into memory ? use current external
    sorting techniques

9
Naïve 2D
Home Distance from HKU Area
Kevin 1 km 10
Felix 5 km 2
KK 8 km 250
Ben 9 km 100
  1. Sort by Distance
  2. Compare Felix with Kevin ? eliminate Felix
  3. Compare KK with Kevin ? incompatible ? part
    of skyline
  4. Compare Ben with KK ? eliminate Ben

10
Naïve 2D not works for gt 2Ds
  • If skyline involves more than 2D, sorting does
    not work

Home Distance from HKU Area Rent
Kevin 1 km 10 9
Felix 5 km 2 5
KK 8 km 250 10
Ben 9 km 100 9
2D 3D
Cmp Felix, Kevin ? eliminated Cmp KK, Kevin ? part of skyline Cmp Ben, KK ? eliminated Cmp Felix, Kevin ? part of skyline Cmp KK, Felix ? part of skyline Cmp Ben, KK ? part of skyline ? No! Ben dominated by Kevin ? predecessor not work!
11
Block-nested-loops Algorithm
  • A straightforward approach
  • Compare each point p with every other point
  • If p is not dominated ? part of skyline
  • Scan the data file and keeping a list of
    candidate skyline points in main memory

12
BNL cont.
  • Insert the 1st data point into the list
  • For each subsequent point p
  • If p is dominated by any point in the list, it is
    discarded
  • If p dominates any point in the list, insert it
    into the list and remove all points dominated by
    p
  • If p neither dominated, nor dominates any point
    in the list, inserted it as part of the skyline
    in the list

13
BNL cont.
  • The candidate list is self-organizing
  • Points that have been dominated other points are
    moved to the top of list
  • Reduces the number of comparisons
  • E.g. the self-organizing list holdings the
    partial skyline like

Home Distance from HKU Area Rent
Kevin 1 km 249 1
K.K 8 km 250 100
... Other skylines which is not as strong as Kevin except a few dimensions ... Other skylines which is not as strong as Kevin except a few dimensions ... Other skylines which is not as strong as Kevin except a few dimensions ... Other skylines which is not as strong as Kevin except a few dimensions
14
More on BNL
  • Point 3 in BNL If p neither dominated, nor
    dominates any point in the list, inserted it as
    part of the skyline in the list. If the are no
    more space in the list, write p on a temporary
    file on disk. Tuples in tmp file will be further
    processed in next iteration of algorithm

15
More on BNL (cont.)
A
E
F
H
A
B
C
D
E
F
G
H
I
J
Dom. by A
After 1st Iteration, A,E,G,H areoutput as
skylines, then clear upthe list and treat I,J
as newdata set and perform BNL again
Dom. by A
G
Dom. by A
Incompatible with A
Incompatible with A,E
I
J



Dom On F, replace F
Incompatible with A,E,G
Incompatible with A,E,G,H, but full! Incompatible
with A,E,G,H, but full!J has not compare with I
16
Short summary on BNL
  • Easy to implement
  • Any dimension without using index or sorting
  • Relies on main memory ? may have many iterations
  • Not adequate for on-line processing ? it has to
    read the entire data file before it returns the
    first skyline point (not progressively)

17
Divide-and-Conquer Algorithm
Price Dist
0.2 4
0.8 4
0.4 3
0.3 2
0.1 2
0.6 1
0.8 3
0.2 3
0.3 3
0.5 2
  1. Find the median of some dimension, sayprice,
    Price(med)0.3
  2. Split the input into 2 partitions according to
    Price(med)
  3. Compute Skyline S1 in P1(lt0.3) and S2 in
    P2(gt0.3) respectively by recursive
    partitioning.Note S1 is better than S2 on
    price
  4. Recursive partitioning until a partition contains
    very few (or 1) tuples
  5. If only a few tuples, find out skyline is very
    easy
  6. Merging the skylines of each partitionsby
    eliminating those S2 which are dominated by
    S1Note None of the tuples in S1 can be
    dominatedby S2 as all tuples in S1 are better
    than S2 on pricei.e. tuples in UPPER never be
    eliminated

18
Divide-and-Conquer Algorithm
0.3 2
0.3 3
0.2 4
0.2 3
0.1 2
Price Dist
0.2 4
0.8 4
0.4 3
0.3 2
0.1 2
0.6 1
0.8 3
0.2 3
0.3 3
0.5 2
  1. Find the median of some dimension, sayprice,
    Price(med)0.3
  2. Split the input into 2 partitions according to
    Price(med)

0.8 4
0.6 1
0.5 2
0.4 3
0.8 3
19
Divide-and-Conquer Algorithm
0.2 4
0.2 3
0.1 2

0.3 2
0.3 3
0.3 2
0.3 3
0.2 4
0.2 3
0.1 2
S1
Price Dist
0.2 4
0.8 4
0.4 3
0.3 2
0.1 2
0.6 1
0.8 3
0.2 3
0.3 3
0.5 2
S2
S3
0.4 3
0.5 2

0.6 1
0.8 3
0.8 4
0.8 4
0.6 1
0.5 2
0.4 3
0.8 3
S4
S5
S6
S7
20
Divide-and-Conquer Algorithm
0.2 4
0.2 3
0.1 2

0.3 2
0.3 3
0.3 2
0.3 3
0.2 4
0.2 3
0.1 2
S1
Price Dist
0.2 4
0.8 4
0.4 3
0.3 2
0.1 2
0.6 1
0.8 3
0.2 3
0.3 3
0.5 2
S2
S1, S2
S3
S1,S2, S7
i.e. tuples in UPPER never be eliminated
0.4 3
0.5 2

0.8 4
0.8 3
0.6 1
0.8 4
0.6 1
0.5 2
0.4 3
0.8 3
S4
S5
S4, S5, S7
S6
S7
21
Efficient Progressive Skyline ComputationVLDB
2001K.L. Tan, P.K. Eng, B.C. Ooi
  • Previous approach require at least one
  • pass over the dataset to return the first
  • interesting point, We propose
  • Bitmap-based Algorithm
  • B-tree-based Algorithm
  • They can return the first interesting point once
  • they identified.

22
Progressive?
  • Both bitmap and tree-base returns skyline very
    quickly
  • Maybe useful if you are not willing to wait so
    long for the first few interesting homes out of
    the large dataset
  • Also outperform BNL and D--C in overall response
    time

23
Skyline by Bitmap
  • Main idea
  • Given a point p, if something can tell you
  • p is not dominated by any other points in DB ?
    skyline!
  • p is dominated by some points in DB ? throw away
  • Non-blocking!
  • Can return the skyline points immediately

24
Bitmap
  • All information requires to decide whether a
    point is in skyline are encoded in bitmaps
  • A data point p (p1, p2, , pd) where d is no.
    of dimensions, is mapped to a m-bit vector, m is
    number of distinct values over all dimensions

25
Bitmap
  • Distinct values on price and distance is 7 and 4
  • m 11

Price Dist
0.2 4
0.8 4
0.4 3
0.3 2
0.1 2
0.6 1
0.8 3
0.2 3
0.3 3
0.5 2
26
Bitmap representation
  • Distinct value on x 10
  • Distinct value on y 10
  • m20
  • 20-bit vector
  • E.g (4,8)
  • 4 is 4-th smallest on dimension x, set 4-th to
    the leftmost be 1 (starting from right)
  • 8 is 8-th smallest on y, set 8-th to the leftmost
    be 1)

Point Bitmap Representation
(1,9) (1111111111, 1100000000)
(2,10) (1111111110, 1000000000)
(4,8 ) (1111111000, 1110000000)
(6,7 ) (1111100000, 1111000000)
(9,10 ) (1100000000, 1000000000)
(7,5 ) (1111000000, 1111110000)
(5,6 ) (1111110000, 1111100000)
(4,3 ) (1111111000, 1111111100)
(3,2 ) (1111111100, 1111111110)
(9,1 ) (1100000000, 1111111111)
(10,4 ) (1000000000, 1111111000)
(6,2 ) (1111100000, 1111111110)
(8,3) (1110000000, 1111111100)
27
Bitmap representation
  • Do (4,8) is a skyline point? (min x, y)
  • Create bit-strings Cx and Cy (Not CY Ng!)
  • Cx 1110000110000
  • Cy 0011011111111
  • Cx Cy 0010000110000
  • If CxCy has more than one 1, dominated by some
    points

Point Bitmap Representation
(1,9) (1111111111, 1100000000)
(2,10) (1111111110, 1000000000)
(4,8 ) (1111111000, 1110000000)
(6,7 ) (1111100000, 1111000000)
(9,10 ) (1100000000, 1000000000)
(7,5 ) (1111000000, 1111110000)
(5,6 ) (1111110000, 1111100000)
(4,3 ) (1111111000, 1111111100)
(3,2 ) (1111111100, 1111111110)
(9,1 ) (1100000000, 1111111111)
(10,4 ) (1000000000, 1111111000)
(6,2 ) (1111100000, 1111111110)
(8,3) (1110000000, 1111111100)
28
Bitmap representation
  • Do (3,2) is a skyline point? (min x, y)
  • Create bit-strings Cx and Cy
  • Cx 1100000010000
  • Cy 0000000011010
  • Cx Cy 0000000010000
  • If CxCy has only 1, it is a skyline

Point Bitmap Representation
(1,9) (1111111111, 1100000000)
(2,10) (1111111110, 1000000000)
(4,8 ) (1111111000, 1110000000)
(6,7 ) (1111100000, 1111000000)
(9,10 ) (1100000000, 1000000000)
(7,5 ) (1111000000, 1111110000)
(5,6 ) (1111110000, 1111100000)
(4,3 ) (1111111000, 1111111100)
(3,2 ) (1111111100, 1111111110)
(9,1 ) (1100000000, 1111111111)
(10,4 ) (1000000000, 1111111000)
(6,2 ) (1111100000, 1111111110)
(8,3) (1110000000, 1111111100)
29
Short summary on Bitmap
  • Need to pre-compute bitmap representation of
    every point
  • Each point retrieve all bitmaps in order to get
    the juxtapositions (Cx and Cy)
  • Large storage if the domain of each attributes
    are large

30
Some other progressive algorithms
  • B-tree index (also proposed by BOC)
  • Organizes the points into d lists (d is no. of
    dimensions in data)
  • Build Btree on the lists for retrieving skylines
  • Suffer similar problem as bitmap approach

31
Some other progressive algorithms (cont.)
  • NN algorithm (by Donald Kossmann again)
  • VLDB 02

32
NN skyline
33
Outline
  • Introduction to Skyline Queries
  • Skyline Operator in SQL
  • Implementation Algorithms
  • Progressive Algorithms
  • Variations of Skyline Queries
  • Experimental Result
  • Conclusion

34
An Optimal and Progressive Algorithm for Skyline
QueriesSIGMOD 2003D.Papadias, Y. Tao, G. Fu, B.
Seeger
  • We propose
  • A NN algorithm which is more efficient andI/O
    Optimal
  • Ranked skyline queries
  • Constrained skyline queries
  • Dynamic skyline queries
  • K-dominating queries

35
Ranked Skyline
  • A ranked skyline returns K skyline points that
    have minimum/max score according to a function f
  • In our example, f 3Dist 7Area
  • Return the top K homes
  • Though skylines are returning interesting points,
    we may want the most interesting points according
    to our own preferences, especially the data set
    is large(say hotels) and skyline is also large!

36
Constrained Skyline
  • Returning the most interesting points in a
    specific data space

37
Dynamic Skyline
  • Returning update skyline dynamically
  • E.g. Ask for hotels with minimum distance and
    price (again?)
  • Minimum distance is now depends on my current
    location

38
Enumerating Skyline
  • Enumerate queries return, for each skyline point
    p, the number of points dominated by p
  • Sometime useful if you want to know this skyline
    hotel C has dominated 1000 hotels, and another
    hotel Y dominated only 1 hotel
  • maybe C is better than Y in many properties (e.g.
    price, dist, etc), but Y has only 1 properties
    better than C, e.g. with PS2

39
Experimental Evaluation
  • Running time comparison on progressive algorithms
    without NN approaches

BNL
DC
Bitmap
Index
40
Conclusion
  • Introduction the skyline queries
  • How to implement (support) the skyline operator
    in DBMS?
  • Variation of skyline queries
  • If the information are placed in different
    places, how to answer skyline queries on a mobile
    device?

41
References
  • S.Borzonyi, D.Kossmann, K.Stocker. The Skyline
    Operator. ICDE 2001.
  • K.L. Tan, P.K. Eng, B.C. Ooi . Efficient
    Progressive Skyline Computation. VLDB 2001.
  • D.Kossmann, F.Ramsak, S. Rost. Shooting Stars in
    the Sky An Online Algorithm for Skyline Queries.
    VLDB 2002.
  • D.Papadias, Y. Tao, G. Fu, B. Seeger. An Optimal
    and Progressive Algorithm for Skyline Queries.
    SIGMOD 2003.
Write a Comment
User Comments (0)
About PowerShow.com