Stabbing the Sky: Efficient Skyline Computation over Sliding Windows - PowerPoint PPT Presentation

About This Presentation

Title:

Stabbing the Sky: Efficient Skyline Computation over Sliding Windows

Description:

Stabbing the Sky: Efficient Skyline Computation over Sliding Windows. COMP9314 Lecture Notes ... Input: a set of points in d- dimensional space. Output: points ... – PowerPoint PPT presentation

Number of Views:63

Avg rating:3.0/5.0

Slides: 29

Provided by: Acute

Category:

more less

Transcript and Presenter's Notes

Title: Stabbing the Sky: Efficient Skyline Computation over Sliding Windows

1
Stabbing the Sky Efficient Skyline Computation
over Sliding Windows

COMP9314 Lecture Notes

2
Outline

Introduction
n-of-N Queries
(n1, n2)-of-N Queries
Performance Evaluation
Conclusions

3
Skyline

Skyline Query
Input a set of points in d- dimensional space.
Output points not dominated by another point.
(x1, x2, , xd) dominates (y1, y2, , yd) iff
xiltyi (1ltiltd) ?k, xkltyk.

4
Applications

Multi-criteria decision making
Stock Trading Example
What are the top deals?

5
Skyline Query Over Sliding Window

Stock Trading Example
Top deals of a stock in the last 5 mins? last 4
mins,
Top deals of a stock in the last 10K deals?
Queries
n-of-N model (?n lt N) the most recent n
elements
(n1, n2)-of-N model
One-time queries
Continuous queries

6
Challenges

Insertions deletions (possibly high speed).
On-line information
memory requirement
processing speed
Existing techniques do not support n-of-N
Borzsonyi et al (ICDE01), Tan et al (VLDB01),
Kossman et al (VLDB 02), Papadias et al
(SIGMOD03), Kapoor (SIAM J. comp00)
support the computation of whole dataset
O (n logd-2 n) for d gt 4 O (n log n ) otherwise

7
Results

n-of-N
keep N (N ? N) elements where N O (logd N)
if data distribution on each dimension is
independent.
a novel encoding scheme, with O (N) space, leads
to n-of-N query time O ( log N s ) instead of
O (n logd-2 n).
a new trigger based technique for continuously
processing an n-of-N query.
trigger update time O ( log s).
result update time O (logd) where d is a result
change.
(n1, n2)-of-N similar results.

8
n-of-N Queries

e is redundant Point e in PN (the most recent N
elements) iff
e expires w.r.t PN, or
?e s.t. e ? e, and e is younger than e
N 6

9
Optimality

Theorem Non-redundant Points (RN) vs. n-of-N
Skyline Query Result (Qn,N)
(PN RN) does not appear in any Qn,N
Qn,N must be a subset of RN
?x?RN ? ?n, x?Qn,N
RN O(logd-1N) for independent distributions
Only need to keep RN the minimum number of
elements to be kept.

10
Querying RN

critical dominance e ? e where e is the
youngest.
dominance graph GRN RN and the critical
dominance
relationships.

11
Querying RN

e ? Qn,N iff
e is a root in GRN or
e ? e in GRN e has expired ? t(e) lt M n
1 lt t(e)

n RN M-n1 Qn,N
n5 3,4,5,6,7 3 3,4
n4 4,5,6,7 4 4, 7
n3 5,6,7 5 5,6,7
12
Querying RN Optimal Algorithm

To answer an n-of-N Query, encode the GRN using
intervals
Stab the intervals by (M-n1).
For all returned intervals (x,y), return point
whose timestamp is y
Technique Use an interval tree index to achieve
optimal O(logRNs) query time

e.g., n6
(0,3 (0,4 (3,7 (4,5 (4,6
13
Maintaining RN

new element enew arrives
If the oldest eold ? RN expires, remove eold and
update RN and GRN (interval tree).
find D ? RN dominated by enew, update RN and GRN
Depth-first search on a R-tree of RN
find e ?c enew, update GRN
Best-first search on the R-tree of RN

enew 8 eold 3 D 6 e 4
14
Continuous n-of-N Query

Trigger-based algorithm
Deletion Qn,N eold, and Qn,N D
Insertion Qn,N ? enew if ?(?e ?c enew and
t(e ) gt M-n1)
Maintain a min-heap of Qn,N for efficiency

enew 8 M 8 eold 3 D 6 e 4
M 7 n 4, N5 Q4,5 4,7
M 8 n 4, N5 Q4,5 5,7,8
Q5,5 3,4
Q5,5 4,7
15
(n1,n2)-of-N Query

More complicated than n-of-N Query
PN needs to be kept!
(Old) critical dominance t (ae) max t (e)
e ? e t (e) lt t (e)
backward critical dominance t (be) min t
(e) e ? e t (e) gt t (e)
e ? Q(n1,n2),N iff ae lt M-n21 ? e ? M-n11 lt be
CBC dominance graph PN the two kinds of
dominance relationships

(2,4)-of-7 4, 6
a5 3, b5 6 (M-n21, M-n11 (4,6
16
Processing (n1,n2)-of-N Query

Encode the CBC dominance graph
e ? ((ae, e, be)
build an interval tree on (ae, e only
Stab using M-n21 against the interval tree and
check e lt M-n11 lt be) on-the-fly
O(logNs), sub-optimal

(2,4)-of-7 ??? (M-n21, M-n11 (4,6
(1,6 (3,4 (3,5 ...
Candidates 4, 5, 6
(2,4)-of-7 4, 6
17
More on (n1,n2)-of-N Query

Maintenance Similar to that of n-of-N query, but
Always expires the oldest element in PN, and
maintain the interval tree and the R-tree on RN.
Implementation-wise Use two interval trees to
index RN and PN-RN, respectively.
Continuous queries
More complicated
A new skyline point might not be a skyline in the
previous result,
nor critically dominated by a skyline point in
the previous result
nor a newly arrived point
Basic idea
Maintain additional Candidate Solutions
(minimization) triggers
Details in the full paper

18
Experiment Setup

Hardware
P4 2.8G CPU, 1G Memory
Datasets
Correlated, independent, and anti-correlated
d 2 to 5, N 106
Algorithms
KLP, nN, mnN, cnN, n12N, mn12N
Metrics
Processing time ? Streaming rate

19
n-of-N Query

Varying dimensionality
M up to 2M, N 1M, n uniformly from 1K, 1M,
queries 1000

20
n-of-N Query (contd)

Varying n
for correlated, independent, and anti-correlated
datasets

21
Maintenance Costs

2d and 5d datasets, measure average and max time,
N i 105

22
Scalability

M (total number) 2M, N 1M, queries 2M

anti-correlated
independent
23
Continuous n-of-N Queries

2d 5d datasets
N 10K and 1M
10 queries with n i(N/10)
measures cnN avg, cnN max, nN avg, nN max

24
(n1,n2)-of-N Queries

Varying dimensionality
M up to 2M, N 1M, queries 1000
restricting n2 n1 gt 500
Scalability
M 2M, N 1M, queries 2M

25
Maintenance

2d and 5d datasets
measure average and max time
N i 105

26
Conclusions

Efficient algorithms for various sliding windows
skyline queries
Keep only minimum number of points
Encode and index those points
Maintain all the data structures
The proposed solutions
have theoretical guarantee on the performance,
and
have demonstrated efficiency and scalability in
the experiments
Future work
Improve the current solution for (n1,n2)-of-N
queries
Approximate skyline queries

27
QA

Thank You!

28
Reference

ICDE01 S. Borzsonyi, D. Kossmann, and K.
Stocker. The skyline operator. ICDE, 2001.
VLDB01 K. Tan, P. Eng, and B. Ooi. Efficient
progressive skyline computation. VLDB, 2001.
VLDB 02 D. Kossmann, F. Ramsak, and S. Rost.
Shooting stars in the sky An online algorithm
for skyline queries. VLDB, 2002.
SIGMOD03 D. Papadias, Y. Tao, G. Fu, and B.
Seeger. An optimal progressive alogrithm for
skyline queries. SIGMOD, 2003.
SIAM J. comp00 S. Kapoor. Dynamic maintenance
of maxima of 2-d point sets. SIAM J. Comput.,
2000.