Cube Computation - PowerPoint PPT Presentation

1 / 21

About This Presentation

Title:

Cube Computation

Description:

Cube Computation Prof. Navneet Goyal Computer Science & Information Systems Department BITS, Pilani * * Cuboids Corresponding to the Cube all product date country ... – PowerPoint PPT presentation

Number of Views:186

Avg rating:3.0/5.0

Slides: 22

Provided by: csisBits6

Category:

more less

Transcript and Presenter's Notes

Title: Cube Computation

1
Cube Computation

Prof. Navneet Goyal
Computer Science Information Systems Department
BITS, Pilani

2
Cuboids Corresponding to the Cube
all
0-D(apex) cuboid
country
product
date
1-D cuboids
product,date
product,country
date, country
2-D cuboids
3-D(base) cuboid
product, date, country
3
Efficient Data Cube Computation

Data cube can be viewed as a lattice of cuboids
The bottom-most cuboid is the base cuboid
The top-most cuboid (apex) contains only one cell
How many cuboids in an n-dimensional cube with L
levels?
Materialization of data cube
Materialize every (cuboid) (full
materialization), none (no materialization), or
some (partial materialization)
Selection of which cuboids to materialize
Based on size, sharing, access frequency, etc.

4
Cube Computation ROLAP-Based Method

Efficient cube computation methods
ROLAP-based cubing algorithms (Agarwal et al96)
Array-based cubing algorithm (Zhao et al97)
Bottom-up computation method (Bayer
Ramarkrishnan99)
ROLAP-based cubing algorithms
Sorting, hashing, and grouping operations are
applied to the dimension attributes in order to
reorder and cluster related tuples
Grouping is performed on some subaggregates as a
partial grouping step
Aggregates may be computed from previously
computed aggregates, rather than from the base
fact table

5
Cube Computation ROLAP-Based Method (2)

Hash/sort based methods (Agarwal et. al. VLDB96)
Smallest-parent computing a cuboid from the
smallest cubod previously computed cuboid.
Cache-results caching results of a cuboid from
which other cuboids are computed to reduce disk
I/Os
Amortize-scans computing as many as possible
cuboids at the same time to amortize disk reads
Share-sorts sharing sorting costs cross
multiple cuboids when sort-based method is used
Share-partitions sharing the partitioning cost
cross multiple cuboids when hash-based algorithms
are used

6
Multi-way Array Aggregation for Cube Computation

Partition arrays into chunks (a small subcube
which fits in memory).
Compressed sparse array addressing (chunk_id,
offset)
Compute aggregates in multiway by visiting cube
cells in the order which minimizes the of times
to visit each cell, and reduces memory access and
storage cost.

What is the best traversing order to do multi-way
aggregation?
7
Multi-way Array Aggregation for Cube Computation

Example3-D data array containing 3 dimensions A,
B, C
Array is partitioned into small, memory-based
chunks
64 chunks
Dimension A is organized into 4 equi-sized
partitions a0-a3
Same for B C
Chunks 1, 2,,64 correspond to the subcubes
a0b0c0, a1b0c0,a3b3c3.

8
Multi-way Array Aggregation for Cube Computation

Example3-D data array containing 3 dimensions A,
B, C
Cardinality of the dimensions A-40, B-400,
C-4000
Size of each partition in A, B, C is therefore
10, 100, 1000.

9
Multi-way Array Aggregation for Cube Computation

Example3-D data array containing 3 dimensions A,
B, C
Many possible orderings with which chunks can be
read into memory
Suppose we want to computer b0c0 chunk of the BC
cuboid
By scanning chunks 1-4 of ABC, the b0c0 chunk is
computed
Cells for b0c0 are aggregated over a0 to a3.

10
Multi-way Array Aggregation for Cube Computation

Chunk memory can now be assigned to the next
chunk b1c0, which completes its aggregation after
scanning the next 4 chunks of ABC 5-8
Continuing in this manner, the entire BC cuboid
can be computed
ONLY 1 chunk of BC needs to be in memory for the
computation of all chunks of BC

11
Multi-way Array Aggregation for Cube Computation

In computing entire BC cubiod, we will have
examined all the 64 chunks
Is there a way to avoid having to rescan all of
these chunks for the computation of other
cuboids?
YES
MULTIWAY COMPUTATION OR SIMULTANEOUS AGGREGATION

12
Multi-way Array Aggregation for Cube Computation

For example, when chunk 1 (a0b0c0) is being
scanned (say for the computation of the 2D chunk
b0c0 of BC), all of the other 2D chunks relating
to a0b0c0 can be simultaneoulsy computed.
That is, when a0b0c0 is being scanned, each of
the three chunks, b0c0, a0c0, a0b0, on the
three 2D aggregation planes, BC, AC, AB, should
be computed then as well

13
Multi-way Array Aggregation for Cube Computation

Multiway computation simultaneously aggregates to
each of the 2D planes while a 3D chunk is in
memory.
Largest 2D plane is BC (40040001600000), then
AC (404000160000) finally AB (4040016000).
Scan the chunks in the order shown below.

14
Multi-way Array Aggregation for Cube Computation
B
b0c0 chunk
15
Multi-way Array Aggregation for Cube Computation
C
64
63
62
61
c3
c2
48
47
46
45
c1
29
30
31
32
c 0
B
60
13
14
15
16
b3
44
28
B
56
9
b2
40
24
52
5
b1
36
20
1
2
3
4
b0
a1
a0
a2
a3
A
16
Multi-way Array Aggregation for Cube Computation

Suppose chunks are scanned in the order shown
below
One chunk of the largest 2D plane BC is fully
computed for each row scanned
b0co is fully aggregated after scanning 1-4
Similarly b1co is fully aggregated after scanning
5-8 so on
Complete computation of 1 chunk of 2nd largest
plane AC, requires 13 chunks, given the ordering
1-64

17
Multi-way Array Aggregation for Cube Computation

Complete computation of 1 chunk of 2nd largest
plane AC, requires 13 chunks, given the ordering
1-64
a0c0 is fully aggregated after scanning of 1,5,9,
13.
For smallest plane AB, the chunk a0b0 requires
scanning 49 chunks. ( 1,17,33, 49)
NOTE that AB requires the longest scan of chunks

18
Multi-Way Array Aggregation for Cube Computation
(Cont.)

Method the planes should be sorted and computed
according to their size in ascending order.
Idea keep the smallest plane in the main memory,
fetch and compute only one chunk at a time for
the largest plane
Limitation of the method computing well only for
a small number of dimensions
If there are a large number of dimensions,
bottom-up computation and iceberg cube
computation methods can be explored

19
Multi-Way Array Aggregation for Cube Computation
(Cont.)
BEST (156000 memory units)
WORST (1641000 memory units)
20
References

S. Agarwal, R. Agrawal, P. M. Deshpande, A.
Gupta, J. F. Naughton, R. Ramakrishnan, and S.
Sarawagi. On the computation of multidimensional
aggregates. In Proc. 1996 Int. Conf. Very Large
Data Bases, 506-521, Bombay, India, Sept. 1996.
K. Beyer and R. Ramakrishnan. Bottom-Up
Computation of Sparse and Iceberg CUBEs. In
Proc. 1999 ACM-SIGMOD Int. Conf. Management of
Data (SIGMOD'99), 359-370, Philadelphia, PA, June
1999.
V. Harinarayan, A. Rajaraman, and J. D. Ullman.
Implementing data cubes efficiently. In Proc.
1996 ACM-SIGMOD Int. Conf. Management of Data,
pages 205-216, Montreal, Canada, June 1996.
Y. Zhao, P. M. Deshpande, and J. F. Naughton. An
array-based algorithm for simultaneous
multidimensional aggregates. In Proc. 1997
ACM-SIGMOD Int. Conf. Management of Data,
159-170, Tucson, Arizona, May 1997.