CURE for Cubes: Cubing Using a ROLAP Engine - PowerPoint PPT Presentation

1 / 99
About This Presentation
Title:

CURE for Cubes: Cubing Using a ROLAP Engine

Description:

Small domains in the higher levels of dimension hierarchies. New partitioning algorithm ... Inherent support for iceberg cubes and holistic functions ... – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 100
Provided by: DBL54
Category:

less

Transcript and Presenter's Notes

Title: CURE for Cubes: Cubing Using a ROLAP Engine


1
CURE for CubesCubing Using a ROLAP Engine
VLDB 2006
  • Konstantinos Morfonios
  • Yannis Ioannidis

University of Athens
2
Introduction
Execution Plan
External Partitioning
Storage Format
Experimental Evaluation
Conclusions
3
Introduction
Execution Plan
External Partitioning
Storage Format
Experimental Evaluation
Conclusions
4
Introduction
SELECT region, sum(revenue) FROM SALES WHERE
month September GROUP BY region
5
Introduction
SELECT A, B, C, SUM(M) FROM R GROUP BY A, B, C
SELECT A, B, SUM(M) FROM R GROUP BY A, B
SELECT SUM(M) FROM R
6
Introduction
  • Problems
  • Construction algorithm
  • Storage scheme
  • Focusing on ROLAP techniques (MVs)
  • Stressed to limits?
  • Complete solution?

Unclear (not finished with efficient storage)
Unclear (not focused on hierarchies)
7
Introduction
Challenges of hierarchies
Efficient execution plan
  • Small domains in the higher levels of dimension
    hierarchies

New partitioning algorithm
  • Number of tuples increases

Novel storage scheme
8
Introduction
Execution Plan
External Partitioning
Storage Format
Experimental Evaluation
Conclusions
9
Introduction
Execution Plan
External Partitioning
Storage Format
Experimental Evaluation
Conclusions
10
Execution Plan
  • Extend BUC (Bottom-Up-Cube) BR99
  • Efficient pipelining
  • Cheap identification of some kinds of redundancy
  • Inherent support for iceberg cubes and holistic
    functions
  • Existing BUC-based methods BU-BST WLFY02 and
    QC-Tables LPH02

11
Execution Plan
Dimensions A, B, C
ABC
AC
BC
AB
B
C
A
?
12
Execution Plan
Dimensions A0?A1?A2, B0?B1, C0
13
Execution Plan
Dimensions A0, A1, A2, B0, B1, C0
14
Execution Plan
Dimensions A0, A1, A2, B0, B1, C0
15
Execution Plan
Dimensions A0, A1, A2, B0, B1, C0
Height 3
16
Execution Plan
Dimensions A0?A1?A2, B0?B1, C0
17
Execution Plan
Dimensions A0?A1?A2, B0?B1, C0
18
Execution Plan
Dimensions A0?A1?A2, B0?B1, C0
Height 6
19
Execution Plan
  • Important properties of BUC-based cubing
  • Recursive calls at higher levels tend to be
    cheaper
  • Benefits from early pruning recursion at some
    node N increase with the number of ancestors of N
    in the execution plan
  • Advantage of taller execution plans

ABC
AC
AB
A
20
Execution Plan
CUREs Plan
21
Introduction
Execution Plan
External Partitioning
Storage Format
Experimental Evaluation
Conclusions
22
Introduction
Execution Plan
External Partitioning
Storage Format
Experimental Evaluation
Conclusions
23
External Partitioning
R
24
External Partitioning
Memory
R
25
External Partitioning
Memory
R
26
External Partitioning
Memory
R
27
External Partitioning
Sound
Memory
R
28
External Partitioning
  • For sound partitioning Biggest partition M
  • In flat datasets this holds in general
  • In hierarchical datasets

29
External Partitioning
?
R 500 GB, M 1 GB
R/M 500
A0 (50,000)?A1 (500)?A2 (5)
30
External Partitioning
R 500 GB, M 1 GB
R/M 500
A0 (50,000)?A1 (500)?A2 (5)
31
External Partitioning
R 500 GB, M 1 GB
R/M 500
A0 (50,000)?A1 (500)?A2 (5)
32
External Partitioning
R 500 GB, M 1 GB
R/M 500
A0 (50,000)?A1 (500)?A2 (5)
?
33
External Partitioning
R 500 GB, M 1 GB
R/M 500
A0 (50,000)?A1 (500)?A2 (5)
34
External Partitioning
R 500 GB, M 1 GB
R/M 500
A0 (50,000)?A1 (500)?A2 (5)
?
35
External Partitioning
R 500 GB, M 1 GB
R/M 500
A0 (50,000)?A1 (500)?A2 (5)
36
External Partitioning
R 500 GB, M 1 GB
R/M 500
A0 (50,000)?A1 (500)?A2 (5)
37
External Partitioning
R 500 GB, M 1 GB
R/M 500
A0 (50,000)?A1 (500)?A2 (5)
A0/A2 times smaller than R? A2B0C0 50 MB
38
External Partitioning
R 500 GB, M 1 GB
R/M 500
A0 (50,000)?A1 (500)?A2 (5)
39
Introduction
Execution Plan
External Partitioning
Storage Format
Experimental Evaluation
Conclusions
40
Introduction
Execution Plan
External Partitioning
Storage Format
Experimental Evaluation
Conclusions
41
Storage Format
  • Two types of redundancy
  • Dimensional Redundancy (DR)
  • Aggregational Redundancy (AR)

42
Storage Format
Example with flat cube only for simplicity
43
Storage Format
CUBE with DR
CUBE without DR
44
Storage Format
CUBE with DR
CUBE without DR
45
Storage Format
CUBE with DR
CUBE without DR
46
Storage Format
CUBE with DR
CUBE without DR
47
Storage Format
CUBE with DR
CUBE without DR
48
Storage Format
Classify tuples according to AR into
  • Normal Tuples (NTs)
  • Trivial Tuples (TTs)
  • Common Aggregate
  • Tuples (CATs)

CUBE with DR
CUBE without DR
49
Storage Format
50
Storage Format
51
Storage Format
52
Storage Format
53
Storage Format
54
Storage Format
55
Storage Format
56
Storage Format
57
Storage Format
58
Storage Format
  • Purpose of the previous example
  • Explanation of different types of redundancy
  • Not construction algorithm
  • Constructing an uncompressed cube and then
    compressing it would be inefficient
  • Instead, CURE classifies tuples during
    construction itself (details in the paper)

59
Introduction
Execution Plan
External Partitioning
Storage Format
Experimental Evaluation
Conclusions
60
Introduction
Execution Plan
External Partitioning
Storage Format
Experimental Evaluation
Conclusions
61
Experimental Evaluation
  • Hierarchical datasets APB-1
  • Product Code (6,500) ? Class (435) ? Group (215)
    ? Family (54) ? Line (11) ? Division (3)
  • Customer Store (640) ? Retailer (71)
  • Time Month (17) ? Quarter (6) ? Year (2)
  • Channel Base (9)
  • Flat datasets CovType, Sep85L, Synthetic

62
Experimental Evaluation
  • Two versions of CURE
  • CURE
  • CURE

63
Experimental Evaluation
64
Experimental Evaluation
65
Experimental Evaluation
66
Experimental Evaluation
67
Experimental Evaluation
68
Introduction
Execution Plan
External Partitioning
Storage Format
Experimental Evaluation
Conclusions
69
Introduction
Execution Plan
External Partitioning
Storage Format
Experimental Evaluation
Conclusions
70
Conclusions
  • Main contribution CURE
  • Efficient execution plan
  • New partitioning algorithm
  • Novel storage scheme
  • Main advantages of CURE
  • Efficient construction of complete cubes over
    large datasets with arbitrary hierarchies
  • Cube compression
  • Optimization opportunities for queries and
    updates
  • Easy implementation

71
Current and Future Work
  • Study of indexing for queries and updates
  • Comparison with the most prominent MOLAP and
    Tree-based techniques

72
Questions???
73
Thank you!
74
Storage Format
Memory Image
Disk Image
75
Storage Format
45
65
100
110
150
Memory Image
Disk Image
76
Storage Format
150
Memory Image
Disk Image
77
Storage Format
Memory Image
Disk Image
78
Storage Format
Memory Image
Disk Image
79
Storage Format
Memory Image
Disk Image
80
Storage Format
20
30
Memory Image
Disk Image
81
Storage Format
30
Memory Image
Disk Image
82
Storage Format
Memory Image
Disk Image
83
Storage Format
Memory Image
Disk Image
84
Storage Format
Memory Image
Disk Image
85
Storage Format
Memory Image
Disk Image
86
Storage Format
Memory Image
Disk Image
87
Storage Format
Memory Image
Disk Image
88
Storage Format
Memory Image
Disk Image
89
Storage Format
Memory Image
Disk Image
90
Storage Format
Memory Image
Disk Image
91
Storage Format
Memory Image
Disk Image
92
Storage Format
Memory Image
Disk Image
93
Storage Format
Memory Image
Disk Image
94
Storage Format
Memory Image
Disk Image
95
Storage Format
Memory Image
Disk Image
96
Storage Format
Memory Image
Disk Image
97
Storage Format
Memory Image
Disk Image
98
Storage Format
Memory Image
Disk Image
99
Storage Format
Write a Comment
User Comments (0)
About PowerShow.com