DenseRegion Based Compact Data Cube

About This Presentation

Title:

DenseRegion Based Compact Data Cube

Description:

Compact Data Cube was proposed by Vitter and Wang in Approximate Computation of ... To enhance the ability of the compact data cube method to handle datasets having ... – PowerPoint PPT presentation

Number of Views:45

Avg rating:3.0/5.0

Slides: 20

Provided by: HKUC

Category:

more less

Transcript and Presenter's Notes

Title: DenseRegion Based Compact Data Cube

1
Dense-Region Based Compact Data Cube

Presented by Kan Kin Fai

2
Outline

Background
Introduction to Compact Data Cube
Pros and cons of the Compact Data Cube method
Dense-Region Based Compact Data Cube

3
Background

Why is a data cube?
Some pre-computed aggregates on the underlying
data warehouse.
System constraints on materializing data cube(s)
Disk space, maintenance cost, etc.
Common approach materialize parts of a data
cube.
Alternative use approximation technique
Reason OLAP applications accept approximate
answers in many scenarios.

4
Introduction to Compact Data Cube

Compact Data Cube was proposed by Vitter and Wang
in Approximate Computation of Multidimensional
Aggregates of Sparse Data Using Wavelets (SIGMOD
99).
Main Ideas
Offline phase perform Haar wavelet transform on
the underlying data (i.e. the base cuboid) and
store the k most significant coefficients.
Online phase process any given query based on
the k most significant coefficients.

5
Introduction to Compact Data Cube

Basics of Haar wavelet transform
Building Compact Data Cube
Thresholding and Ranking
Answering On-Line Queries

6
Introduction to Compact Data Cube

Basics of Haar wavelet transform
e.g. S 2, 2, 0, 2, 3, 5, 4, 4

7
Introduction to Compact Data Cube

Basics of Haar wavelet transform
For compression reasons, the detail coefficients
are normalized.
The coefficients at the lower resolutions are
weighted more heavily.
Approximates the original signal by keeping only
the most significant coefficients.
Requires only O(N) CPU time and O(N/B) I/Os to
compute for a signal of N values.
Multidimensional wavelet transform a series of
one-dimensional wavelet transforms.

8
Introduction to Compact Data Cube

Building the Compact Data Cube
Problem 1 the size of the multidimensional array
representing the underlying data is too large
(assume the data are very sparse).
Solution Divide the wavelet transform process
into multiple passes.

9
Introduction to Compact Data Cube

Building the Compact Data Cube
Problem 2 The density of the intermediate
results would increase from pass to pass.
Solution truncate the intermediate
multidimensional array by cutting off entries
with small magnitude.
I/O complexity

10
Introduction to Compact Data Cube

Thresholding and Ranking
Choice 1 keep the C largest (in absolute value)
wavelet coefficients.
Choice 2 keep the C wavelet coefficients with
the largest weights among the C largest
coefficients (C lt C).
The weight of a coefficient equals to the number
of its dimensions with value zero.

11
Introduction to Compact Data Cube

Answering On-Line Queries
Space ((d1)k), CPU time

12
Pros and cons of the compact data cube method

Pros
Requires little disk spaces (a small number of
disk blocks).
Responds to on-line query fast.
Answers OLAP queries more accurately than other
approximation techniques like histogram and
random sampling.
Can progressively refine the approximate answer
with no added overhead.

13
Pros and cons of the compact data cube method

Cons
Approximates a vast amount of useless empty cells
in base cuboid together with useful non-empty
cells in base cuboid.
Needs to cut off entries with small magnitude at
the end of each pass in order to maintain a
constant amount of I/O operations from pass to
pass.

14
Dense-Region Based Compact Data Cube

Aim
To enhance the ability of the compact data cube
method to handle datasets having
dense-regions-in-sparse-cube property.
Main Idea
To exclude empty cells in base cuboid from
approximation.
Two-phase approach
Compute dense regions in base cuboid.
Approximate each dense region independently.

15
Dense-Region Based Compact Data Cube

Question 1 how can we find the dense regions
efficiently?
Efficient DEnse region Mining (EDEM) algorithm
proposed by Cheung et al. in DROLAP -- A
Dense-Region-Based Approach to On-line Analytical
Processing (DEXA99)

16
Dense-Region Based Compact Data Cube

Basic ideas of EDEM
Build a k-d tree to store the valid cells.
Grow dense region covers along boundaries.
Search dense regions among the covers.
Complexity of EDEM linear to the number of
dimensions and sub-quadratic to the number of
data points.

17
Dense-Region Based Compact Data Cube

Question 2 how should we allocate disk space in
approximating the dense regions?
Choice 1 allocate disk space equally to each
dense regions.
Choice 2 allocate disk space according to the
sizes of dense regions.
Choice 3 order the wavelet coefficients of all
the dense regions and keep the most significant
ones (in absolute value).

18
Dense-Region Based Compact Data Cube

Question 3 how should we treat the data points
outside the dense regions?
Keep all or keep only significant ones.
Question 4 how do we answer on-line queries
using the dense-region based approach?
Check if a dense region covered by the given
query.
Check if the stored coefficients contribute to
the range sum and compute the amount of
contribution if needed.

19
Dense-Region Based Compact Data Cube

One favorable side effect
we may parallelize the construction of compact
data cube.
More questions
How can we handle updates to the underlying data?
How can we approximate iceberg cube? Can we apply
the idea of compact data cube to iceberg cube?
Can compact data cube be used to answer other
types of OLAP queries besides range-sum?

Write a Comment

User Comments (0)