Maintenance of Data Cubes and Summary Tables in a Warehouse I. S. Mumick, D. Quass, N. S. Mumick Pro - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Maintenance of Data Cubes and Summary Tables in a Warehouse I. S. Mumick, D. Quass, N. S. Mumick Pro

Description:

... be deferred and applied to the warehouse views in large batches for efficiency ... a summary tables while minimizing the batch window needed for maintenance ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 26
Provided by: homeE
Category:

less

Transcript and Presenter's Notes

Title: Maintenance of Data Cubes and Summary Tables in a Warehouse I. S. Mumick, D. Quass, N. S. Mumick Pro


1
Maintenance of Data Cubes and Summary Tables in
a WarehouseI. S. Mumick, D. Quass, N. S.
MumickProc. ACM SIGMOD '97,
  • ???? ???
  • ?? 3??
  • ? ? ?

2
Contents
  • Introduction
  • In this paper
  • Motivating Example
  • Basic Summary-Delta Maintenance Algorithm
  • Efficiently maintaining multiple summary tables
  • Performance
  • Conclusion

3
Introduction
  • Summary table
  • Represent materialized aggregate views of base
    data collected from the source
  • Speed up query processing
  • Group the base data along various dimensions,
    corresponding to different sets of group-by
    attributes
  • Compute various aggregate functions

4
Introduction
  • Summary table
  • A smaller table of precomputed results
  • Summary table management
  • Recognition of which summaries to create
  • Updating of these summaries
  • Complete refresh
  • Incremental refresh
  • add only those records that are new since the
    last refresh of the summary table
  • Select which summary tables are refreshed
  • Select either specific summaries, only summaries
    that use the updated tables, or all summary
    tables
  • Redirecting of queries to these summaries

5
Introduction
  • Warehouse views must be updated to reflect the
    changed stage of data sources
  • Common method
  • Source changes to be deferred and applied to the
    warehouse views in large batches for efficiency
  • The warehouse is unavailable to readers while the
    views are being updated
  • Time required for maintenance is often a limiting
    factor in the number of summary tables that can
    be made available in the warehouse

6
In this paper
  • Using efficiently incremental maintenance
    techniques
  • Increase the number of summary tables available
    in the warehouse
  • Decrease the time that the warehouse is
    unavailable to readers
  • Propose a method of maintaining aggregate views
    (the summary-delta table method)
  • How to efficiently maintain a summary tables
    while minimizing the batch window needed for
    maintenance
  • Propagate and refresh functions
  • How to maintain a large set of summary tables
    defined over the same base tables

7
Motivating Example
  • Consider a warehouse of retail information
  • Fact table point-of-sale data
  • pos(storeID, itemID, date, qty, price)
  • Dimension table
  • stores(storeID, city, region)
  • items(itemID, name, category, cost)
  • COUNT() aggregate function
  • Correctly maintain an aggregate view in the
    presence of deletion
  • Determine when all tuples in a group have been
    deleted

8
(No Transcript)
9
Maintaining a single summary table
  • Maintenance process is divided into two functions
  • Propagate
  • Creating a summary-delta table from the deferred
    set of changes
  • DW can continue to be made available for querying
  • Refresh
  • Applies the net changes represented in the
    summary-delta table to the summary table
  • Updates the summary table to reflect the changes
    in the summary delta table
  • Summary tables are locked designed to run
    quickly

10
Maintaining multiple summary tables
  • Efficiently maintaining multiple summary tables
    together
  • Summary-delta table computed for the maintenance
    of one summary table often can be used to compute
    summary-delta tables for other summary tables
  • Summary-delta table to compute other
    summary-delta tables will likely require fewer
    tuple accesses than computing each summary-delta
    table from the changes directly

11
Basic Summary-Delta Maintenance Algorithm
  • Propagate function
  • Preparing changes
  • Computing the summary-delta table
  • Pre-aggregation
  • Changes to dimension tables
  • Refresh function

12
Propagate function Preparing changes
  • Defining three virtual views
  • Prepare-insertions
  • Prepare-deletions
  • Prepare-changes (The union of prepare-insertions
    and prepare-deletions)
  • Applying any selections conditions and joins that
    appear in the definition of the summary table
  • Then take a projection of the insertions/deletions
    to the base data
  • Projected attributes include
  • Each of the group-by attributes of the summary
    table
  • Aggregate-source attributes corresponding to each
    of the aggregate functions computed in the
    summary table

13
(No Transcript)
14
Propagate functionComputing the summary-delta
table
  • Summary-delta table
  • Computed by aggregating the prepare-changes
    virtual view
  • Each tuple describes the effect of the base-data
    changes on the aggregate functions

15
Propagate functionComputing the summary-delta
table
  • The query to compute the summary-delta table
  • FROM clause -gt prepare-changes
  • WHERE clause is removed
  • Expressions on which the aggregate functions
  • -gt aggregate source attributes of
    prepare-changes
  • COUNT aggregate functions -gt SUM

16
Propagate functionPre-aggregation
  • Pre-aggregation
  • As a potential optimization, it is possible to
    pre-aggregate the insertions and deletions before
    joining with some of the dimension tables
  • Joins with dimension tables whose attributes are
    not referenced in the aggregate functions, can be
    delayed until after pre-aggregation
  • Reduces the number of tuples involved in the join
  • Speeding up the computation of the summary-delta
    table
  • Decision could be made in a cost-based manager by
    a query optimizer

17
Propagate functionChanges to dimension tables
  • Changes to dimension table
  • Dimension-table-specific prepare-insertions and
    prepare-deletions views
  • Represent the changes to the aggregate functions

?gt Prepare-insertions for SiC_sales due to
insertions to items
CREATE VIEW pi_items_SiC_sales (storeID,
category, _count, _date, _quantity)
AS SELECT storeID, category, 1 AS _count, date AS
_date, qty AS _quantity FROM pos, items_ins WHERE
pos.itemID items_ins.itemID
18
Refresh function
  • The refresh function applies the changes
    represented in the summary-delta table to the
    summary table
  • Each corresponding tuple in the summary table
  • Updated, deleted, or inserted
  • Conceptually very simple
  • Left outer-join between the summary-delta table
    and the summary table
  • Updated or deleted as it joins
  • Summary-delta tuple that doesnt join is inserted
    into the summary table
  • Only complication is an occasional recomputation
    of a min/max value

19
(No Transcript)
20
Efficiently maintaining multiple summary tables
21
Efficiently maintaining multiple summary tables
  • Placing generalized cube views into a lattice
  • Making summary tables lattice-friendly
  • Optimizing the lattice
  • Summary-delta lattice
  • Computing the summary-delta lattice

22
Performance
  • Test the performance improvement
  • Summary-delta table method over recomputation
  • To determine the benefits of using the lattice
    structure when maintaining multiple summary
    tables
  • Dataset
  • Data size of the pos table 100,000-500,000
    tuples
  • Size of changes1,000-5,000 tuples
  • Performance of the refresh operation
  • Depended heavily on the number of updates/deletes
    vs. inserts to the summary tables
  • Two types of changes to the pos table
  • Update-Generating Changes
  • Insertion0Generating Changes

23
Performance
24
Performance
25
Related Work and Conclusion
  • Proposing new view maintenance techniques for
    maintaining multiple summary tables (aggregate
    views) over a star schema using a new
    summary-delta paradigm
Write a Comment
User Comments (0)
About PowerShow.com