Stabbing the Sky: Efficient Skyline Computation over Sliding Windows - PowerPoint PPT Presentation

About This Presentation
Title:

Stabbing the Sky: Efficient Skyline Computation over Sliding Windows

Description:

Stabbing the Sky: Efficient Skyline Computation over Sliding Windows. COMP9314 Lecture Notes ... Input: a set of points in d- dimensional space. Output: points ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 29
Provided by: Acute
Category:

less

Transcript and Presenter's Notes

Title: Stabbing the Sky: Efficient Skyline Computation over Sliding Windows


1
Stabbing the Sky Efficient Skyline Computation
over Sliding Windows
  • COMP9314 Lecture Notes

2
Outline
  • Introduction
  • n-of-N Queries
  • (n1, n2)-of-N Queries
  • Performance Evaluation
  • Conclusions

3
Skyline
  • Skyline Query
  • Input a set of points in d- dimensional space.
  • Output points not dominated by another point.
  • (x1, x2, , xd) dominates (y1, y2, , yd) iff
    xiltyi (1ltiltd) ?k, xkltyk.

4
Applications
  • Multi-criteria decision making
  • Stock Trading Example
  • What are the top deals?

5
Skyline Query Over Sliding Window
  • Stock Trading Example
  • Top deals of a stock in the last 5 mins? last 4
    mins,
  • Top deals of a stock in the last 10K deals?
  • Queries
  • n-of-N model (?n lt N) the most recent n
    elements
  • (n1, n2)-of-N model
  • One-time queries
  • Continuous queries

6
Challenges
  • Insertions deletions (possibly high speed).
  • On-line information
  • memory requirement
  • processing speed
  • Existing techniques do not support n-of-N
  • Borzsonyi et al (ICDE01), Tan et al (VLDB01),
    Kossman et al (VLDB 02), Papadias et al
    (SIGMOD03), Kapoor (SIAM J. comp00)
  • support the computation of whole dataset
  • O (n logd-2 n) for d gt 4 O (n log n ) otherwise

7
Results
  • n-of-N
  • keep N (N ? N) elements where N O (logd N)
    if data distribution on each dimension is
    independent.
  • a novel encoding scheme, with O (N) space, leads
    to n-of-N query time O ( log N s ) instead of
    O (n logd-2 n).
  • a new trigger based technique for continuously
    processing an n-of-N query.
  • trigger update time O ( log s).
  • result update time O (logd) where d is a result
    change.
  • (n1, n2)-of-N similar results.

8
n-of-N Queries
  • e is redundant Point e in PN (the most recent N
    elements) iff
  • e expires w.r.t PN, or
  • ?e s.t. e ? e, and e is younger than e
  • N 6

9
Optimality
  • Theorem Non-redundant Points (RN) vs. n-of-N
    Skyline Query Result (Qn,N)
  • (PN RN) does not appear in any Qn,N
  • Qn,N must be a subset of RN
  • ?x?RN ? ?n, x?Qn,N
  • RN O(logd-1N) for independent distributions
  • Only need to keep RN the minimum number of
    elements to be kept.

10
Querying RN
  • critical dominance e ? e where e is the
    youngest.
  • dominance graph GRN RN and the critical
    dominance
  • relationships.

11
Querying RN
  • e ? Qn,N iff
  • e is a root in GRN or
  • e ? e in GRN e has expired ? t(e) lt M n
    1 lt t(e)

n RN M-n1 Qn,N
n5 3,4,5,6,7 3 3,4
n4 4,5,6,7 4 4, 7
n3 5,6,7 5 5,6,7
12
Querying RN Optimal Algorithm
  • To answer an n-of-N Query, encode the GRN using
    intervals
  • Stab the intervals by (M-n1).
  • For all returned intervals (x,y), return point
    whose timestamp is y
  • Technique Use an interval tree index to achieve
    optimal O(logRNs) query time

e.g., n6
(0,3 (0,4 (3,7 (4,5 (4,6
13
Maintaining RN
  • new element enew arrives
  • If the oldest eold ? RN expires, remove eold and
    update RN and GRN (interval tree).
  • find D ? RN dominated by enew, update RN and GRN
  • Depth-first search on a R-tree of RN
  • find e ?c enew, update GRN
  • Best-first search on the R-tree of RN

enew 8 eold 3 D 6 e 4
14
Continuous n-of-N Query
  • Trigger-based algorithm
  • Deletion Qn,N eold, and Qn,N D
  • Insertion Qn,N ? enew if ?(?e ?c enew and
    t(e ) gt M-n1)
  • Maintain a min-heap of Qn,N for efficiency

enew 8 M 8 eold 3 D 6 e 4
M 7 n 4, N5 Q4,5 4,7
M 8 n 4, N5 Q4,5 5,7,8
Q5,5 3,4
Q5,5 4,7
15
(n1,n2)-of-N Query
  • More complicated than n-of-N Query
  • PN needs to be kept!
  • (Old) critical dominance t (ae) max t (e)
    e ? e t (e) lt t (e)
  • backward critical dominance t (be) min t
    (e) e ? e t (e) gt t (e)
  • e ? Q(n1,n2),N iff ae lt M-n21 ? e ? M-n11 lt be
  • CBC dominance graph PN the two kinds of
    dominance relationships

(2,4)-of-7 4, 6
a5 3, b5 6 (M-n21, M-n11 (4,6
16
Processing (n1,n2)-of-N Query
  • Encode the CBC dominance graph
  • e ? ((ae, e, be)
  • build an interval tree on (ae, e only
  • Stab using M-n21 against the interval tree and
    check e lt M-n11 lt be) on-the-fly
  • O(logNs), sub-optimal

(2,4)-of-7 ??? (M-n21, M-n11 (4,6
(1,6 (3,4 (3,5 ...
Candidates 4, 5, 6
(2,4)-of-7 4, 6
17
More on (n1,n2)-of-N Query
  • Maintenance Similar to that of n-of-N query, but
  • Always expires the oldest element in PN, and
    maintain the interval tree and the R-tree on RN.
  • Implementation-wise Use two interval trees to
    index RN and PN-RN, respectively.
  • Continuous queries
  • More complicated
  • A new skyline point might not be a skyline in the
    previous result,
  • nor critically dominated by a skyline point in
    the previous result
  • nor a newly arrived point
  • Basic idea
  • Maintain additional Candidate Solutions
    (minimization) triggers
  • Details in the full paper

18
Experiment Setup
  • Hardware
  • P4 2.8G CPU, 1G Memory
  • Datasets
  • Correlated, independent, and anti-correlated
  • d 2 to 5, N 106
  • Algorithms
  • KLP, nN, mnN, cnN, n12N, mn12N
  • Metrics
  • Processing time ? Streaming rate

19
n-of-N Query
  • Varying dimensionality
  • M up to 2M, N 1M, n uniformly from 1K, 1M,
    queries 1000

20
n-of-N Query (contd)
  • Varying n
  • for correlated, independent, and anti-correlated
    datasets

21
Maintenance Costs
  • 2d and 5d datasets, measure average and max time,
    N i 105

22
Scalability
  • M (total number) 2M, N 1M, queries 2M

anti-correlated
independent
23
Continuous n-of-N Queries
  • 2d 5d datasets
  • N 10K and 1M
  • 10 queries with n i(N/10)
  • measures cnN avg, cnN max, nN avg, nN max

24
(n1,n2)-of-N Queries
  • Varying dimensionality
  • M up to 2M, N 1M, queries 1000
  • restricting n2 n1 gt 500
  • Scalability
  • M 2M, N 1M, queries 2M

25
Maintenance
  • 2d and 5d datasets
  • measure average and max time
  • N i 105

26
Conclusions
  • Efficient algorithms for various sliding windows
    skyline queries
  • Keep only minimum number of points
  • Encode and index those points
  • Maintain all the data structures
  • The proposed solutions
  • have theoretical guarantee on the performance,
    and
  • have demonstrated efficiency and scalability in
    the experiments
  • Future work
  • Improve the current solution for (n1,n2)-of-N
    queries
  • Approximate skyline queries

27
QA
  • Thank You!

28
Reference
  • ICDE01 S. Borzsonyi, D. Kossmann, and K.
    Stocker. The skyline operator. ICDE, 2001.
  • VLDB01 K. Tan, P. Eng, and B. Ooi. Efficient
    progressive skyline computation. VLDB, 2001.
  • VLDB 02 D. Kossmann, F. Ramsak, and S. Rost.
    Shooting stars in the sky An online algorithm
    for skyline queries. VLDB, 2002.
  • SIGMOD03 D. Papadias, Y. Tao, G. Fu, and B.
    Seeger. An optimal progressive alogrithm for
    skyline queries. SIGMOD, 2003.
  • SIAM J. comp00 S. Kapoor. Dynamic maintenance
    of maxima of 2-d point sets. SIAM J. Comput.,
    2000.
Write a Comment
User Comments (0)
About PowerShow.com