SFTree and Its Application to OLAP - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

SFTree and Its Application to OLAP

Description:

CarSales(TransID, Buyer, Date, Shop, Color, Price) ... A materialized MOLAP multidimensional array (for colour and shop) ROLAP and MOLAP ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 37
Provided by: CSI115
Category:

less

Transcript and Presenter's Notes

Title: SFTree and Its Application to OLAP


1
SF-Tree andIts Application to OLAP
  • Speaker Ho Wai Shing

2
Overview
  • Introduction
  • Basic OLAP technologies ROLAP and MOLAP
  • Structure of an SF-Tree
  • Using SF-Tree for OLAP
  • Conclusion

3
Introduction
  • On-line Analytical Processing (OLAP) is an
    important tool for decision making
  • Users may ask for the aggregated measure
    attributes for different combination of dimension
    attributes GrayBLP96.

4
Introduction -- OLAP
  • e.g., in the CarSales table of a data warehouse
  • CarSales(TransID, Buyer, Date, Shop, Color,
    Price)
  • The users may want to know the total sales of the
    yellow cars sold in Sept, 2002.

5
Introduction -- OLAP
6
Introduction -- OLAP
  • i.e., answer 60k
  • To answer this efficiently, usually the answers
    are precomputed and stored
  • A popular data model is called data cube
    GrayBLP96.

7
Introduction -- OLAP
8
Introduction -- OLAP
  • Note that we're referring to the model only, not
    every entry is materialized
  • every combination of dimensions are included
    (so, here, it's just one example cuboid, other
    cuboids include , , etc)

9
Introduction -- OLAP
  • Research issues To store this information with
    high space efficiency and/or high query speed.

10
ROLAP and MOLAP
11
ROLAP and MOLAP
  • In a data cube, we have to store the combinations
    of dimension values and the associated aggregate
    values.
  • ROLAP (Relational OLAP) stores the entries of the
    mapping in relational tables.
  • MOLAP (Multi-dimensional OLAP) stores the entries
    in a multi-dimensional array.

12
ROLAP and MOLAP
  • A materialized MOLAP multidimensional array (for
    colour and shop)

13
ROLAP and MOLAP
  • A ROLAP table that stores the entries (for colour
    and shop)

14
ROLAP and MOLAP
  • MOLAP adv
  • quick 1 retrieval for 1 point query (i.e., use
    the dimension values to calculate the address of
    the aggregated value stored)
  • may be space efficient if the cube is very dense
    (since all dimensions are not explicitly stored
    in every tuple)

15
ROLAP and MOLAP
  • MOLAP disadv
  • space inefficient if the cube is sparse (has many
    zero entries) esp. for high dimensional cases.
  • eased by chunking
  • may need a lot of scans if we issue a large range
    query (i.e., involves many dimension values)

16
ROLAP and MOLAP
  • ROLAP
  • index are built on the table to improve query
    performance
  • e.g., B-Tree on each dimension, or R-Tree over
    all points.
  • ROLAP Adv
  • space efficient (non-zero entries are not stored)

17
ROLAP and MOLAP
  • ROLAP Disadv
  • indexes, such as R-Tree, may not be effective in
    high-D data
  • Many joins are required to produce the result (if
    single D indexes are used)
  • Intermediate result may be large
  • Can we do better?

18
SF-Tree
19
SF-Tree
  • stands for Signature File Tree
  • stores a mapping from objects to integer
    flexibly, efficient and has a statistical
    accuracy guarantee

20
SF-Tree
  • Basic Idea
  • divide the objects into groups of the same (or
    similar) associated number.
  • checking the associated number of an object is
    the same as checking which group this object
    belongs.
  • signature files are used to improve the
    efficiency of existence checking, trees are used
    to improve accuracy and speed.

21
SF-Tree
22
SF-Tree
  • Properties (Adv)
  • Space efficient, independent of object size
  • Flexible, can have a tradeoff among space, speed
    and accuracy
  • Speed is independent of number of objects

23
Using SF-Tree for OLAP
24
Using SF-Tree for OLAP
  • The information in OLAP can be modeled as a
    mapping from objects (dimension values) to
    numbers (aggregate values).
  • Thus we can use SF-Tree to store this mapping.

25
Using SF-Tree for OLAP
  • e.g., (TST, Yellow) is an object, it's associated
    number is 10k
  • we can insert it into SF-Tree
  • space requirement
  • m/ln2 bits per object per level
  • independent of object size (i.e., dimensionality)
  • smaller than ROLAP esp. for high-D

26
Using SF-Tree for OLAP
  • Adv
  • more space efficient than ROLAP (definitely much
    better than MOLAP)
  • quicker than ROLAP in point queries (no need to
    do joins)
  • Disadv
  • range queries require scanning all possible
    points in query range (as in MOLAP).

27
Using SF-Tree for OLAP
  • To avoid the disadvantage, we borrow the idea
    from MRA-Tree LazM01
  • MRA-Tree (Multi-Resolution Aggregate Tree)
  • in a data/space partitioning tree, add aggregates
    in all internal nodes.
  • one example is quad-tree aggregates.

28
MRA-Tree
29
MRA-Tree
580k
160k
60k
30
MRA-Tree
  • For answering range queries, the number of
    accesses is reduced
  • Extra space is required
  • Leaf nodes may not contain only 1 record
  • The tree size drop significantly if we increase
    the number of points in a leaf node page.

31
MRA-Tree
32
MRA-Tree
33
SF-Tree with MRA-Tree
  • SF-Tree is more space efficient than ROLAP
  • Use SF-Tree to store leaf nodes
  • each page thus can store more points
  • tree size/depth is reduced
  • less page accesses in query

34
SF-Tree with MRA-Tree
  • Adv
  • more space efficient, i.e., may be small enough
    to fit in memory and reduce page accesses, esp.
    for high-D data
  • Disadv
  • still need to scan the area in leaf nodes (vs.
    scanning data points while using ROLAP)

35
Conclusion
  • SF-Tree is space efficient, can be used to store
    a data cube. (or may be used as a ROLAP index)
  • Though in analysis the speed of SF-Tree is poor
    for range queries, we try to incorporate the idea
    in MRA-Tree on SF-Tree to increase the speed.

36
Reference
  • GrayBLP96 J. Gray, A. Bosworth, A. Layman, and
    H. Piramish. Data cube A relational aggregation
    operator generalizing group-by, cross-tab, and
    sub-total. In ICDE'96.
  • LazM01 I. Lazaridis, and S. Mehrotra.
    Progressive Approximate Aggregate Queryies with a
    Multi-Resolution Tree Structure. In SIGMOD'01.
Write a Comment
User Comments (0)
About PowerShow.com