Title: Stabbing the Sky: Efficient Skyline Computation over Sliding Windows
1Stabbing the Sky Efficient Skyline Computation
over Sliding Windows
2Outline
- Introduction
- n-of-N Queries
- (n1, n2)-of-N Queries
- Performance Evaluation
- Conclusions
3Skyline
- Skyline Query
- Input a set of points in d- dimensional space.
- Output points not dominated by another point.
- (x1, x2, , xd) dominates (y1, y2, , yd) iff
xiltyi (1ltiltd) ?k, xkltyk. -
4Applications
- Multi-criteria decision making
- Stock Trading Example
- What are the top deals?
5Skyline Query Over Sliding Window
- Stock Trading Example
- Top deals of a stock in the last 5 mins? last 4
mins, - Top deals of a stock in the last 10K deals?
- Queries
- n-of-N model (?n lt N) the most recent n
elements - (n1, n2)-of-N model
- One-time queries
- Continuous queries
-
6Challenges
- Insertions deletions (possibly high speed).
- On-line information
- memory requirement
- processing speed
- Existing techniques do not support n-of-N
- Borzsonyi et al (ICDE01), Tan et al (VLDB01),
Kossman et al (VLDB 02), Papadias et al
(SIGMOD03), Kapoor (SIAM J. comp00) - support the computation of whole dataset
- O (n logd-2 n) for d gt 4 O (n log n ) otherwise
7Results
- n-of-N
- keep N (N ? N) elements where N O (logd N)
if data distribution on each dimension is
independent. - a novel encoding scheme, with O (N) space, leads
to n-of-N query time O ( log N s ) instead of
O (n logd-2 n). - a new trigger based technique for continuously
processing an n-of-N query. - trigger update time O ( log s).
- result update time O (logd) where d is a result
change. - (n1, n2)-of-N similar results.
8n-of-N Queries
- e is redundant Point e in PN (the most recent N
elements) iff - e expires w.r.t PN, or
- ?e s.t. e ? e, and e is younger than e
- N 6
9Optimality
- Theorem Non-redundant Points (RN) vs. n-of-N
Skyline Query Result (Qn,N) - (PN RN) does not appear in any Qn,N
- Qn,N must be a subset of RN
- ?x?RN ? ?n, x?Qn,N
- RN O(logd-1N) for independent distributions
- Only need to keep RN the minimum number of
elements to be kept.
10Querying RN
- critical dominance e ? e where e is the
youngest. - dominance graph GRN RN and the critical
dominance - relationships.
11Querying RN
- e ? Qn,N iff
- e is a root in GRN or
- e ? e in GRN e has expired ? t(e) lt M n
1 lt t(e)
n RN M-n1 Qn,N
n5 3,4,5,6,7 3 3,4
n4 4,5,6,7 4 4, 7
n3 5,6,7 5 5,6,7
12Querying RN Optimal Algorithm
- To answer an n-of-N Query, encode the GRN using
intervals - Stab the intervals by (M-n1).
- For all returned intervals (x,y), return point
whose timestamp is y - Technique Use an interval tree index to achieve
optimal O(logRNs) query time
e.g., n6
(0,3 (0,4 (3,7 (4,5 (4,6
13Maintaining RN
- new element enew arrives
- If the oldest eold ? RN expires, remove eold and
update RN and GRN (interval tree). - find D ? RN dominated by enew, update RN and GRN
- Depth-first search on a R-tree of RN
- find e ?c enew, update GRN
- Best-first search on the R-tree of RN
enew 8 eold 3 D 6 e 4
14Continuous n-of-N Query
- Trigger-based algorithm
- Deletion Qn,N eold, and Qn,N D
- Insertion Qn,N ? enew if ?(?e ?c enew and
t(e ) gt M-n1) - Maintain a min-heap of Qn,N for efficiency
enew 8 M 8 eold 3 D 6 e 4
M 7 n 4, N5 Q4,5 4,7
M 8 n 4, N5 Q4,5 5,7,8
Q5,5 3,4
Q5,5 4,7
15(n1,n2)-of-N Query
- More complicated than n-of-N Query
- PN needs to be kept!
- (Old) critical dominance t (ae) max t (e)
e ? e t (e) lt t (e) - backward critical dominance t (be) min t
(e) e ? e t (e) gt t (e) - e ? Q(n1,n2),N iff ae lt M-n21 ? e ? M-n11 lt be
- CBC dominance graph PN the two kinds of
dominance relationships
(2,4)-of-7 4, 6
a5 3, b5 6 (M-n21, M-n11 (4,6
16Processing (n1,n2)-of-N Query
- Encode the CBC dominance graph
- e ? ((ae, e, be)
- build an interval tree on (ae, e only
- Stab using M-n21 against the interval tree and
check e lt M-n11 lt be) on-the-fly - O(logNs), sub-optimal
(2,4)-of-7 ??? (M-n21, M-n11 (4,6
(1,6 (3,4 (3,5 ...
Candidates 4, 5, 6
(2,4)-of-7 4, 6
17More on (n1,n2)-of-N Query
- Maintenance Similar to that of n-of-N query, but
- Always expires the oldest element in PN, and
maintain the interval tree and the R-tree on RN. - Implementation-wise Use two interval trees to
index RN and PN-RN, respectively. - Continuous queries
- More complicated
- A new skyline point might not be a skyline in the
previous result, - nor critically dominated by a skyline point in
the previous result - nor a newly arrived point
- Basic idea
- Maintain additional Candidate Solutions
(minimization) triggers - Details in the full paper
18Experiment Setup
- Hardware
- P4 2.8G CPU, 1G Memory
- Datasets
- Correlated, independent, and anti-correlated
- d 2 to 5, N 106
- Algorithms
- KLP, nN, mnN, cnN, n12N, mn12N
- Metrics
- Processing time ? Streaming rate
19n-of-N Query
- Varying dimensionality
- M up to 2M, N 1M, n uniformly from 1K, 1M,
queries 1000
20n-of-N Query (contd)
- Varying n
- for correlated, independent, and anti-correlated
datasets
21Maintenance Costs
- 2d and 5d datasets, measure average and max time,
N i 105
22Scalability
- M (total number) 2M, N 1M, queries 2M
anti-correlated
independent
23Continuous n-of-N Queries
- 2d 5d datasets
- N 10K and 1M
- 10 queries with n i(N/10)
- measures cnN avg, cnN max, nN avg, nN max
24(n1,n2)-of-N Queries
- Varying dimensionality
- M up to 2M, N 1M, queries 1000
- restricting n2 n1 gt 500
- Scalability
- M 2M, N 1M, queries 2M
25Maintenance
- 2d and 5d datasets
- measure average and max time
- N i 105
26Conclusions
- Efficient algorithms for various sliding windows
skyline queries - Keep only minimum number of points
- Encode and index those points
- Maintain all the data structures
- The proposed solutions
- have theoretical guarantee on the performance,
and - have demonstrated efficiency and scalability in
the experiments - Future work
- Improve the current solution for (n1,n2)-of-N
queries - Approximate skyline queries
27QA
28Reference
- ICDE01 S. Borzsonyi, D. Kossmann, and K.
Stocker. The skyline operator. ICDE, 2001. - VLDB01 K. Tan, P. Eng, and B. Ooi. Efficient
progressive skyline computation. VLDB, 2001. - VLDB 02 D. Kossmann, F. Ramsak, and S. Rost.
Shooting stars in the sky An online algorithm
for skyline queries. VLDB, 2002. - SIGMOD03 D. Papadias, Y. Tao, G. Fu, and B.
Seeger. An optimal progressive alogrithm for
skyline queries. SIGMOD, 2003. - SIAM J. comp00 S. Kapoor. Dynamic maintenance
of maxima of 2-d point sets. SIAM J. Comput.,
2000.