DenseRegion Based Compact Data Cube - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

DenseRegion Based Compact Data Cube

Description:

Compact Data Cube was proposed by Vitter and Wang in Approximate Computation of ... To enhance the ability of the compact data cube method to handle datasets having ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 20
Provided by: HKUC
Category:

less

Transcript and Presenter's Notes

Title: DenseRegion Based Compact Data Cube


1
Dense-Region Based Compact Data Cube
  • Presented by Kan Kin Fai

2
Outline
  • Background
  • Introduction to Compact Data Cube
  • Pros and cons of the Compact Data Cube method
  • Dense-Region Based Compact Data Cube

3
Background
  • Why is a data cube?
  • Some pre-computed aggregates on the underlying
    data warehouse.
  • System constraints on materializing data cube(s)
  • Disk space, maintenance cost, etc.
  • Common approach materialize parts of a data
    cube.
  • Alternative use approximation technique
  • Reason OLAP applications accept approximate
    answers in many scenarios.

4
Introduction to Compact Data Cube
  • Compact Data Cube was proposed by Vitter and Wang
    in Approximate Computation of Multidimensional
    Aggregates of Sparse Data Using Wavelets (SIGMOD
    99).
  • Main Ideas
  • Offline phase perform Haar wavelet transform on
    the underlying data (i.e. the base cuboid) and
    store the k most significant coefficients.
  • Online phase process any given query based on
    the k most significant coefficients.

5
Introduction to Compact Data Cube
  • Basics of Haar wavelet transform
  • Building Compact Data Cube
  • Thresholding and Ranking
  • Answering On-Line Queries

6
Introduction to Compact Data Cube
  • Basics of Haar wavelet transform
  • e.g. S 2, 2, 0, 2, 3, 5, 4, 4

7
Introduction to Compact Data Cube
  • Basics of Haar wavelet transform
  • For compression reasons, the detail coefficients
    are normalized.
  • The coefficients at the lower resolutions are
    weighted more heavily.
  • Approximates the original signal by keeping only
    the most significant coefficients.
  • Requires only O(N) CPU time and O(N/B) I/Os to
    compute for a signal of N values.
  • Multidimensional wavelet transform a series of
    one-dimensional wavelet transforms.

8
Introduction to Compact Data Cube
  • Building the Compact Data Cube
  • Problem 1 the size of the multidimensional array
    representing the underlying data is too large
    (assume the data are very sparse).
  • Solution Divide the wavelet transform process
    into multiple passes.

9
Introduction to Compact Data Cube
  • Building the Compact Data Cube
  • Problem 2 The density of the intermediate
    results would increase from pass to pass.
  • Solution truncate the intermediate
    multidimensional array by cutting off entries
    with small magnitude.
  • I/O complexity

10
Introduction to Compact Data Cube
  • Thresholding and Ranking
  • Choice 1 keep the C largest (in absolute value)
    wavelet coefficients.
  • Choice 2 keep the C wavelet coefficients with
    the largest weights among the C largest
    coefficients (C lt C).
  • The weight of a coefficient equals to the number
    of its dimensions with value zero.

11
Introduction to Compact Data Cube
  • Answering On-Line Queries
  • Space ((d1)k), CPU time

12
Pros and cons of the compact data cube method
  • Pros
  • Requires little disk spaces (a small number of
    disk blocks).
  • Responds to on-line query fast.
  • Answers OLAP queries more accurately than other
    approximation techniques like histogram and
    random sampling.
  • Can progressively refine the approximate answer
    with no added overhead.

13
Pros and cons of the compact data cube method
  • Cons
  • Approximates a vast amount of useless empty cells
    in base cuboid together with useful non-empty
    cells in base cuboid.
  • Needs to cut off entries with small magnitude at
    the end of each pass in order to maintain a
    constant amount of I/O operations from pass to
    pass.

14
Dense-Region Based Compact Data Cube
  • Aim
  • To enhance the ability of the compact data cube
    method to handle datasets having
    dense-regions-in-sparse-cube property.
  • Main Idea
  • To exclude empty cells in base cuboid from
    approximation.
  • Two-phase approach
  • Compute dense regions in base cuboid.
  • Approximate each dense region independently.

15
Dense-Region Based Compact Data Cube
  • Question 1 how can we find the dense regions
    efficiently?
  • Efficient DEnse region Mining (EDEM) algorithm
    proposed by Cheung et al. in DROLAP -- A
    Dense-Region-Based Approach to On-line Analytical
    Processing (DEXA99)

16
Dense-Region Based Compact Data Cube
  • Basic ideas of EDEM
  • Build a k-d tree to store the valid cells.
  • Grow dense region covers along boundaries.
  • Search dense regions among the covers.
  • Complexity of EDEM linear to the number of
    dimensions and sub-quadratic to the number of
    data points.

17
Dense-Region Based Compact Data Cube
  • Question 2 how should we allocate disk space in
    approximating the dense regions?
  • Choice 1 allocate disk space equally to each
    dense regions.
  • Choice 2 allocate disk space according to the
    sizes of dense regions.
  • Choice 3 order the wavelet coefficients of all
    the dense regions and keep the most significant
    ones (in absolute value).

18
Dense-Region Based Compact Data Cube
  • Question 3 how should we treat the data points
    outside the dense regions?
  • Keep all or keep only significant ones.
  • Question 4 how do we answer on-line queries
    using the dense-region based approach?
  • Check if a dense region covered by the given
    query.
  • Check if the stored coefficients contribute to
    the range sum and compute the amount of
    contribution if needed.

19
Dense-Region Based Compact Data Cube
  • One favorable side effect
  • we may parallelize the construction of compact
    data cube.
  • More questions
  • How can we handle updates to the underlying data?
  • How can we approximate iceberg cube? Can we apply
    the idea of compact data cube to iceberg cube?
  • Can compact data cube be used to answer other
    types of OLAP queries besides range-sum?
Write a Comment
User Comments (0)
About PowerShow.com