Cube Computation - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Cube Computation

Description:

Cube Computation Prof. Navneet Goyal Computer Science & Information Systems Department BITS, Pilani * * Cuboids Corresponding to the Cube all product date country ... – PowerPoint PPT presentation

Number of Views:186
Avg rating:3.0/5.0
Slides: 22
Provided by: csisBits6
Category:
Tags: computation | cube

less

Transcript and Presenter's Notes

Title: Cube Computation


1
Cube Computation
  • Prof. Navneet Goyal
  • Computer Science Information Systems Department
  • BITS, Pilani

2
Cuboids Corresponding to the Cube
all
0-D(apex) cuboid
country
product
date
1-D cuboids
product,date
product,country
date, country
2-D cuboids
3-D(base) cuboid
product, date, country
3
Efficient Data Cube Computation
  • Data cube can be viewed as a lattice of cuboids
  • The bottom-most cuboid is the base cuboid
  • The top-most cuboid (apex) contains only one cell
  • How many cuboids in an n-dimensional cube with L
    levels?
  • Materialization of data cube
  • Materialize every (cuboid) (full
    materialization), none (no materialization), or
    some (partial materialization)
  • Selection of which cuboids to materialize
  • Based on size, sharing, access frequency, etc.

4
Cube Computation ROLAP-Based Method
  • Efficient cube computation methods
  • ROLAP-based cubing algorithms (Agarwal et al96)
  • Array-based cubing algorithm (Zhao et al97)
  • Bottom-up computation method (Bayer
    Ramarkrishnan99)
  • ROLAP-based cubing algorithms
  • Sorting, hashing, and grouping operations are
    applied to the dimension attributes in order to
    reorder and cluster related tuples
  • Grouping is performed on some subaggregates as a
    partial grouping step
  • Aggregates may be computed from previously
    computed aggregates, rather than from the base
    fact table

5
Cube Computation ROLAP-Based Method (2)
  • Hash/sort based methods (Agarwal et. al. VLDB96)
  • Smallest-parent computing a cuboid from the
    smallest cubod previously computed cuboid.
  • Cache-results caching results of a cuboid from
    which other cuboids are computed to reduce disk
    I/Os
  • Amortize-scans computing as many as possible
    cuboids at the same time to amortize disk reads
  • Share-sorts sharing sorting costs cross
    multiple cuboids when sort-based method is used
  • Share-partitions sharing the partitioning cost
    cross multiple cuboids when hash-based algorithms
    are used

6
Multi-way Array Aggregation for Cube Computation
  • Partition arrays into chunks (a small subcube
    which fits in memory).
  • Compressed sparse array addressing (chunk_id,
    offset)
  • Compute aggregates in multiway by visiting cube
    cells in the order which minimizes the of times
    to visit each cell, and reduces memory access and
    storage cost.

What is the best traversing order to do multi-way
aggregation?
7
Multi-way Array Aggregation for Cube Computation
  • Example3-D data array containing 3 dimensions A,
    B, C
  • Array is partitioned into small, memory-based
    chunks
  • 64 chunks
  • Dimension A is organized into 4 equi-sized
    partitions a0-a3
  • Same for B C
  • Chunks 1, 2,,64 correspond to the subcubes
    a0b0c0, a1b0c0,a3b3c3.

8
Multi-way Array Aggregation for Cube Computation
  • Example3-D data array containing 3 dimensions A,
    B, C
  • Cardinality of the dimensions A-40, B-400,
    C-4000
  • Size of each partition in A, B, C is therefore
    10, 100, 1000.

9
Multi-way Array Aggregation for Cube Computation
  • Example3-D data array containing 3 dimensions A,
    B, C
  • Many possible orderings with which chunks can be
    read into memory
  • Suppose we want to computer b0c0 chunk of the BC
    cuboid
  • By scanning chunks 1-4 of ABC, the b0c0 chunk is
    computed
  • Cells for b0c0 are aggregated over a0 to a3.

10
Multi-way Array Aggregation for Cube Computation
  • Chunk memory can now be assigned to the next
    chunk b1c0, which completes its aggregation after
    scanning the next 4 chunks of ABC 5-8
  • Continuing in this manner, the entire BC cuboid
    can be computed
  • ONLY 1 chunk of BC needs to be in memory for the
    computation of all chunks of BC

11
Multi-way Array Aggregation for Cube Computation
  • In computing entire BC cubiod, we will have
    examined all the 64 chunks
  • Is there a way to avoid having to rescan all of
    these chunks for the computation of other
    cuboids?
  • YES
  • MULTIWAY COMPUTATION OR SIMULTANEOUS AGGREGATION

12
Multi-way Array Aggregation for Cube Computation
  • For example, when chunk 1 (a0b0c0) is being
    scanned (say for the computation of the 2D chunk
    b0c0 of BC), all of the other 2D chunks relating
    to a0b0c0 can be simultaneoulsy computed.
  • That is, when a0b0c0 is being scanned, each of
    the three chunks, b0c0, a0c0, a0b0, on the
    three 2D aggregation planes, BC, AC, AB, should
    be computed then as well

13
Multi-way Array Aggregation for Cube Computation
  • Multiway computation simultaneously aggregates to
    each of the 2D planes while a 3D chunk is in
    memory.
  • Largest 2D plane is BC (40040001600000), then
    AC (404000160000) finally AB (4040016000).
  • Scan the chunks in the order shown below.

14
Multi-way Array Aggregation for Cube Computation
B
b0c0 chunk
15
Multi-way Array Aggregation for Cube Computation
C
64
63
62
61
c3
c2
48
47
46
45
c1
29
30
31
32
c 0
B
60
13
14
15
16
b3
44
28
B
56
9
b2
40
24
52
5
b1
36
20
1
2
3
4
b0
a1
a0
a2
a3
A
16
Multi-way Array Aggregation for Cube Computation
  • Suppose chunks are scanned in the order shown
    below
  • One chunk of the largest 2D plane BC is fully
    computed for each row scanned
  • b0co is fully aggregated after scanning 1-4
  • Similarly b1co is fully aggregated after scanning
    5-8 so on
  • Complete computation of 1 chunk of 2nd largest
    plane AC, requires 13 chunks, given the ordering
    1-64

17
Multi-way Array Aggregation for Cube Computation
  • Complete computation of 1 chunk of 2nd largest
    plane AC, requires 13 chunks, given the ordering
    1-64
  • a0c0 is fully aggregated after scanning of 1,5,9,
    13.
  • For smallest plane AB, the chunk a0b0 requires
    scanning 49 chunks. ( 1,17,33, 49)
  • NOTE that AB requires the longest scan of chunks

18
Multi-Way Array Aggregation for Cube Computation
(Cont.)
  • Method the planes should be sorted and computed
    according to their size in ascending order.
  • Idea keep the smallest plane in the main memory,
    fetch and compute only one chunk at a time for
    the largest plane
  • Limitation of the method computing well only for
    a small number of dimensions
  • If there are a large number of dimensions,
    bottom-up computation and iceberg cube
    computation methods can be explored

19
Multi-Way Array Aggregation for Cube Computation
(Cont.)
BEST (156000 memory units)
WORST (1641000 memory units)
20
References
  • S. Agarwal, R. Agrawal, P. M. Deshpande, A.
    Gupta, J. F. Naughton, R. Ramakrishnan, and S.
    Sarawagi. On the computation of multidimensional
    aggregates. In Proc. 1996 Int. Conf. Very Large
    Data Bases, 506-521, Bombay, India, Sept. 1996.
  • K. Beyer and R. Ramakrishnan. Bottom-Up
    Computation of Sparse and Iceberg CUBEs. In
    Proc. 1999 ACM-SIGMOD Int. Conf. Management of
    Data (SIGMOD'99), 359-370, Philadelphia, PA, June
    1999.
  • V. Harinarayan, A. Rajaraman, and J. D. Ullman.
    Implementing data cubes efficiently. In Proc.
    1996 ACM-SIGMOD Int. Conf. Management of Data,
    pages 205-216, Montreal, Canada, June 1996.
  • Y. Zhao, P. M. Deshpande, and J. F. Naughton. An
    array-based algorithm for simultaneous
    multidimensional aggregates. In Proc. 1997
    ACM-SIGMOD Int. Conf. Management of Data,
    159-170, Tucson, Arizona, May 1997.

21
Q A
22
Thank You
Write a Comment
User Comments (0)
About PowerShow.com