Title: Efficient Computation of the Skyline Cube
1Efficient Computation of the Skyline Cube
- Yidong Yuan
- School of Computer Science Engineering
- The University of New South Wales NICTA
- Sydney, Australia
- Joint Work Xuemin Lin (UNSW), Qing Liu (UNSW),
Wei Wang (UNSW), Jeffrey Xu Yu
(CUHK), Qing Zhang (UNSW CSIRO)
2Outline
- Introduction
- Skycube Computation Techniques
- Experiments
- Summary
3Skyline Query
(x1, x2, , xd) (y1, y2, , yd) ? ?i, xi ?
yi ?k, xkltyk
dist
P4
- A real estate example
- P5 P1
- skyline returns data points not dominated by
others
P3
Skyline on price dist
P1
P5
P2
price
Properties and Values
4Skyline Cube
Skycube Example
- Skycube
- Skyline on price dist age
- Skyline on price dist
- Skyline on price age
-
- A union of skyline results of all the non-empty
subsets of d-dimensional set (2d - 1)
Dataset
Lattice Structure of a Skycube
5Motivation
- How to compute Skycube efficiently?
- existing skyline techniques are applicable
- no sharing computation ? Not efficient!
6Motivation (cont.)
- nested-loop-based alg.
- BNL ICDE 01
- redundant comparison ? Not efficient!
- SFS ICDE 03 presort the dataset ? keep the
candidate list minimum - repeated sorting ? Not
efficient!
B
P4
P3
P1
P5
P2
A
7Motivation (cont.)
- divide-and-conquer-based alg. (DC ICDE 01)
- repeat same divide/merge steps ? Not efficient!
B
BC
P4
P4
P3
P3
Divide Step of Skyline on A, B, and C
Divide Step of Skyline on A and B
P1
P1
P5
P5
P2
P2
A
A
mA
mA
mA
mA
mA
mA
8Outline
- Introduction
- Skycube Computation Techniques
- Bottom-Up Skycube Algorithm (BUS)
- Top-Down Skycube Algorithm (TDS)
- Experiments
- Summary
9Property of Skycube
- Distinct Value Condition
- no two data points have same value on the same
dimension - SKYU(S) skyline on sub-dimension set U
- SKYU(S) ? SKYV(S) ? U ? V
- General Case
- Keep track of the bad guys
10Basic Idea
- compute the Skycube in a level-wise and
bottom-up manner - each skyline is computed by a nested-loop-based
algorithm
ABC
AB
AC
BC
A
B
C
11Sharing Strategies
- share-results SKYU(S) ? SKYV(S)
- reduce the size of input
- reduce the of dominance test
- share-sorting sort the dataset on each dimension
- keep the candidate list minimum
- reduce the of sorting from 2d 1 to d
AB
A
B
12Filtering
- Effective Dominance Test
- filter function ??p? sum of ps coordinates
- no false negative ??p? ? ??q? ? q does not
dominate p - maintain the candidate list in a non-decreasing
order of filtering values (e.g. avl-tree)
Skyline on A and B
B
P4
P3
P1
P5
P2
A
13DC Algorithm
Merge Step
Divide Step
S12
S22
B
B
S1
S2
B
P4
P4
P3
P3
P3
P1
P1
P1
mB
P5
P5
P5
P2
P2
P2
S11
S21
mA
mA
A
A
A
14Sharing Opportunities
S1
S1
S2
S2
B
BC
P4
P4
skyline on A and B
skyline on A, B, and C
P3
P3
P1
P1
P5
P5
P2
P2
A
A
mA
mA
mi
mj
mi
mj
15Sharing Opportunities (cont.)
S1
S2
S1
S2
BC
B
P4
P3
P3
P1
P1
P5
P5
P2
P2
A
A
mA
mA
decompose merge step
skyline on A and B
skyline on A, B, and C
16TDS Algorithm
ABC
- Basic Idea
- compute skylines on a path simultaneously
- find a minimal set of paths
- share-parent using parents skyline result as
the input
AB
A
S
ABC
SKYABC(S)
SKYABC(S)
ABC
AB
AC
BC
BC
AC
AB
A
B
A
B
C
C
17Outline
- Introduction
- Skycube Computation Techniques
- Experiments
- Summary
18Experiment Setting
19Effect of Dimensionality
independent
Dimensionality (n 500k)
20Effect of Dimensionality (cont.)
correlated
anti-correlated
Dimensionality (n 500k)
Dimensionality (n 500k)
21Effect of Cardinality
anti-correlated
x100K
Cardinality (d 8)
22Effect of Duplicate Values
independent (d 8)
23Outline
- Introduction
- Skycube Computation Techniques
- Experiments
- Summary
24Summary
- A novel concept Skycube
- Skycube computation Techniques
- Bottom-Up Skycube algorithm
- share-results, share-sorting
- Top-Down Skycube algorithm
- share-partition-and-merging, share-parent
- Future Work
- I/O based techniques
- multiple skyline queries
25QA
26Preliminaries
- Existing Skyline Computation Algorithms
- nested-loop-based
- Block-Nested-Loop (BNL) algorithm BKS, ICDE 01
- Sort-Filter-Skyline (SFS) algorithm CGG, ICDE
03 - divide-and-conquer-based
- Divide-and-Conquer (DC) algorithm BKS, ICDE 01
- index-based
- Bitmap, Index-Method TEO, VLDB 01
- R-tree Index Based KRR, VLDB 02 PTF, SIGMOD 03
27Preliminaries BNL and SFS Algorithms
- BNL algorithm
- SFS algorithm
- entropy value (indicator of the dominance power)
- pre-sort the dataset (e.g., P5, P2, P3, P1, P4)
28Preliminaries DC Algorithm
Merge Step
Divide Step
S12
S22
S1
S2
B
P4
P3
P1
mB
P5
P2
S11
S21
mA
mA
A
29General Case
- Issue SKYU(S) ? SKYV(S) does not necessarily
hold - Solution
- share-results re-examine SKYU(S) on V
SKYB(S) P3, P4, P5 SKYAB(S) P3
30Motivation (cont.)
- other techniques
- Index method VLDB 01
- R-tree based index VLDB 02 SIGMOD 03
- Goal
- Maximizing sharing computation!
pre-computation (e.g. index) is not reusable
repeat pre-computation
?
Not efficient!
?