Approximate Query Processing using Wavelets - PowerPoint PPT Presentation

About This Presentation
Title:

Approximate Query Processing using Wavelets

Description:

Very long time to execute and produce exact answers ... Jth dimension is indexed by the values of attribute Xj ... frequency distribution of all attributes of R ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 36
Provided by: supr4
Learn more at: https://crystal.uta.edu
Category:

less

Transcript and Presenter's Notes

Title: Approximate Query Processing using Wavelets


1
Approximate Query Processing using Wavelets
  • Kaushik Chakrabarti(Univ Of Illinois)
  • Minos Garofalakis(Bell Labs)
  • Rajeev Rastogi(Bell Labs)
  • Kyuseok Shim(KAIST and AITrc)
  • Presented at 26th VLDB Conference, Cairo, Egypt
  • Presented By
  • Supriya Sudheendra

2
Outline
3
Introduction
  • Approximate Query Processing is a viable solution
    for
  • Huge amounts of data
  • High query complexities
  • Stringent response-time requirements
  • Decision Support Systems
  • Support business and organizational
    decision-making activities
  • Helps decision makers compile useful information
    from raw data, solve problems and make decisions

4
Introduction
  • DSS users pose very complex queries to the DBMS
  • Requires complex operations over GB or TBs of
    disk-resident data
  • Very long time to execute and produce exact
    answers
  • Number of scenarios where users prefer a fast,
    approximate answers

5
Prior Work
  • Previous Approximate query processing techniques
  • Focused on specific forms of aggregate queries
  • Data reduction mechanism how to obtain the
    synopses of data
  • Sampling-based Techniques
  • A join-operator on 2 uniform random samples
    results in a non-uniform sample having very few
    tuples
  • For non-aggregate queries, it produces a small
    subset of the exact answer which might be empty
    when joins are involved.

6
Prior Work
  • Histogram Based Techniques
  • Problematic for high-dimensional data
  • Storage overhead
  • High construction cost
  • Wavelet Based Techniques
  • Mathematical tool for hierarchical decomposition
    of functions
  • Apply wavelet decomposition to input data
    collection gt data synopsis
  • Avoids high construction costs and storage
    overhead

7
Contribution of the Paper
  • Viability and effectiveness of wavelets as a
    generic tool for high-dimensional DSS
  • New, I/O-efficient wavelet decomposition
    algorithm for relational tables
  • Novel Query processing algebra for
    Wavelet-Co-Efficient Data Synopses
  • Extensive Experiments

8
Background
  • Mathematical tool to hierarchically decompose
    functions
  • Coarse overall approximation together with detail
    coefficients that influence function at various
    scales
  • Haar wavelets are conceptually simple, fast to
    compute
  • Variety of applications like image editing and
    querying

9
One-Dimensional Haar Wavelets
  • How to compute, given a data array
  • Average the values together pairwise to get a
    lower-resolution representation of data
  • Detailed coefficients-gt differences of the
    averaged value from the computed pairwise average
  • Reconstruction of the data array possible
  • Why Detail Coefficients

10
One-dimensional Haar Wavelets
  • Wavelet Transform Overall average followed by
    detail coefficients in increasing order of
    resolution. Each entry-gtwavelet coefficient
  • WA 4, -2, 0, -1
  • For vectors containing similar values,
  • most detail coefficients have small values that
    can be eliminated
  • Introduces only small errors

11
One-dimensional Haar Wavelets
  • Overall average more important than any detail
    coefficient
  • To normalize the final entries of WA, each
    wavelet coefficient is divided by ?2l
  • l level of resolution
  • WA 4, -2, 0, -1/?2

12
Multi-dimensional Haar Wavelets
  • Haar wavelets can be extended to
    multi-dimensional array
  • Standard Decomposition
  • Fix an ordering for the data dimensions(1,2,d)
  • Apply complete 1-D wavelet transform for each 1-d
    row of array cells along dimension k

13
  • Nonstandard Decomposition
  • Alternates between dimensions during successive
    steps of pairwise averaging and differencing for
    each 1-D row of array cells along dimension k
  • Repeated recursively on quadrant containing all
    averages across all dimensions

14
Non-standard Decomposition
  • Pairwise averaging and differencing for one
    positioning of 2x2 box with root 2i1, 2i2
  • Distribution of the results in the wavelet
    transform array
  • Process is recursed on lower-left quadrant of WA

15
Example Decomposition of a 4 X 4 Array
16
Multi-dimensional Haar coefficients Semantics
and Representation
  • D-dimensional Haar basis function corresponding
    to Wavelet w is defined by
  • D-dimensional rectangular support region
  • Quadrant sign information

17
Support Regions for 16 Nonstandard 2-D Haar Basis
Function
  • Blank areas regions of A whose reconstruction
    is independent of the coefficient
  • WA0,0 overall average
  • WA3,3 contributes only to upper right
    quadrant

18
Haar CoEfficients Semantics and Representation
  • W ltR, S, vgt
  • W.R d-dimensional support hyper-rectangle of W
    encloses all cells in A to which W contributes
  • Hyper-rectangle represented by low and high
    boundaries across each dimension j, 1lt j ltd
  • W.R.boundaryj.lo and W.R.boundaryj.hi
  • W contributes to each data cell Ai1,id where
  • W.R.boundaryj.lo lt ij lt W.R.boundaryj.hi
    for all j

19
  • W.S sign information for all d-dimensional
    quadrants of W.R
  • Denoted by W.S.signj.lo and W.S.signj.hi
    corresponding to lower and upper half of W.Rs
    extent along j
  • Computed as the product of d sign-vector entries
    that map to that quadrant
  • W.v scalar magnitude of W
  • Quantity that W contributes to all data array
    cells enclosed in W.R

20
Building Wavelet Coefficient Synopses
  • Relation R with d attributes X1, X2, Xd
  • Can represent R as a d-dimensional array AR
  • Jth dimension is indexed by the values of
    attribute Xj
  • Cells contain the count of tuples in R having the
    corresponding combination of attribute values
  • AR joint frequency distribution of all
    attributes of R

21
  • Chunk-based organization of relational tables
  • Joint frequency array AR split into
    d-dimensional chunks
  • Tuples of R of same chunk are stored contiguously
    on disk
  • If R is not chunked, one extra pre-processing
    step to reorganize R on disk

22
ComputeWavelet Algorithm
  • When a chunk is loaded for the first time,
    ComputeWavelet can perform entire computation for
    decomposing
  • Pairwise averaging and differencing is performed
    as soon as 2d averages are accumulated
  • Memory efficient- no more than one active
    sub-array at a time for each level of resolution

23
Processing Relational Queries in Wavelet
Coefficient Domain
Wavelet-Coefficient Synopses WT1, WT2,WTk
Wavelet-Coefficient Synopses WT1, WT2,WTk
Op(WT1,.WTk)
Render(WT1WTk)
RS of Wavelet Coefficients WS
Approximate Relations T1, T2,.Tk
Op(T1, T2. Tk)
Render(WS)
Approx. Result Relation S
Approx. Result Relation S
24
Selection Operator
  • Our selection operator has the general form
    selectpred(WT ), where pred represents a generic
    conjunctive predicate on a subset of the d
    attributes in T that is,
  • pred (li1 Xi1 hi1 ) ? . . . ? (lik Xik
    hik ), where lij and hij denote the low and
    high boundaries of the selected range along each
    selection dimension Dij , j 1, 2, , k, k
    d.

25
Selection - Relational Domain
Relation
Joint Data Distribution Array
3
3
2
1
Dim. D1
2
3
1
7
6
3
4
8
6
Dim. D2
Query Range
  • In relational domain, interested in only those
    cells inside query range
  • In wavelet domain, interested in only the
    coefficients that contribute to those cells

26
Projection Operator
27
Projection- Wavelet Domain
28
Join Operator
29
Join Operator- Wavelet Domain
30
Experimental Study
  • Improved answer quality
  • Low synopsis construction costs
  • Fast query execution

31
Query Execution Times
32
SELECT-JOIN-SUM
33
SELECT Query errors on real-life data
34
Conclusion
  • Multidimensional wavelets as an effective tool
    for general purpose approximate query processing
    in modern, high dimensional applications
  • The query processing algorithms operate directly
    on the wavelet-coefficient synopses of relational
    data, thus allowing for very fast processing of
    arbitrarily complex queries entirely in the
    wavelet-coefficient domain
  • Extensive experimental study with synthetic as
    well as real-life data sets that verifies the
    effectiveness of the wavelet-based approach
    compared to both sampling and histograms

35
Thank you?
Write a Comment
User Comments (0)
About PowerShow.com