Title: Maintenance of Data Cubes and Summary Tables in a Warehouse I. S. Mumick, D. Quass, N. S. Mumick Pro
1Maintenance of Data Cubes and Summary Tables in
a WarehouseI. S. Mumick, D. Quass, N. S.
MumickProc. ACM SIGMOD '97,
2Contents
- Introduction
- In this paper
- Motivating Example
- Basic Summary-Delta Maintenance Algorithm
- Efficiently maintaining multiple summary tables
- Performance
- Conclusion
3Introduction
- Summary table
- Represent materialized aggregate views of base
data collected from the source - Speed up query processing
- Group the base data along various dimensions,
corresponding to different sets of group-by
attributes - Compute various aggregate functions
4Introduction
- Summary table
- A smaller table of precomputed results
- Summary table management
- Recognition of which summaries to create
- Updating of these summaries
- Complete refresh
- Incremental refresh
- add only those records that are new since the
last refresh of the summary table - Select which summary tables are refreshed
- Select either specific summaries, only summaries
that use the updated tables, or all summary
tables - Redirecting of queries to these summaries
5Introduction
- Warehouse views must be updated to reflect the
changed stage of data sources - Common method
- Source changes to be deferred and applied to the
warehouse views in large batches for efficiency - The warehouse is unavailable to readers while the
views are being updated - Time required for maintenance is often a limiting
factor in the number of summary tables that can
be made available in the warehouse
6In this paper
- Using efficiently incremental maintenance
techniques - Increase the number of summary tables available
in the warehouse - Decrease the time that the warehouse is
unavailable to readers - Propose a method of maintaining aggregate views
(the summary-delta table method) - How to efficiently maintain a summary tables
while minimizing the batch window needed for
maintenance - Propagate and refresh functions
- How to maintain a large set of summary tables
defined over the same base tables
7Motivating Example
- Consider a warehouse of retail information
- Fact table point-of-sale data
- pos(storeID, itemID, date, qty, price)
- Dimension table
- stores(storeID, city, region)
- items(itemID, name, category, cost)
- COUNT() aggregate function
- Correctly maintain an aggregate view in the
presence of deletion - Determine when all tuples in a group have been
deleted
8(No Transcript)
9Maintaining a single summary table
- Maintenance process is divided into two functions
- Propagate
- Creating a summary-delta table from the deferred
set of changes - DW can continue to be made available for querying
- Refresh
- Applies the net changes represented in the
summary-delta table to the summary table - Updates the summary table to reflect the changes
in the summary delta table - Summary tables are locked designed to run
quickly
10Maintaining multiple summary tables
- Efficiently maintaining multiple summary tables
together - Summary-delta table computed for the maintenance
of one summary table often can be used to compute
summary-delta tables for other summary tables - Summary-delta table to compute other
summary-delta tables will likely require fewer
tuple accesses than computing each summary-delta
table from the changes directly
11Basic Summary-Delta Maintenance Algorithm
- Propagate function
- Preparing changes
- Computing the summary-delta table
- Pre-aggregation
- Changes to dimension tables
- Refresh function
12Propagate function Preparing changes
- Defining three virtual views
- Prepare-insertions
- Prepare-deletions
- Prepare-changes (The union of prepare-insertions
and prepare-deletions) - Applying any selections conditions and joins that
appear in the definition of the summary table - Then take a projection of the insertions/deletions
to the base data - Projected attributes include
- Each of the group-by attributes of the summary
table - Aggregate-source attributes corresponding to each
of the aggregate functions computed in the
summary table
13(No Transcript)
14Propagate functionComputing the summary-delta
table
- Summary-delta table
- Computed by aggregating the prepare-changes
virtual view - Each tuple describes the effect of the base-data
changes on the aggregate functions
15Propagate functionComputing the summary-delta
table
- The query to compute the summary-delta table
- FROM clause -gt prepare-changes
- WHERE clause is removed
- Expressions on which the aggregate functions
- -gt aggregate source attributes of
prepare-changes - COUNT aggregate functions -gt SUM
16Propagate functionPre-aggregation
- Pre-aggregation
- As a potential optimization, it is possible to
pre-aggregate the insertions and deletions before
joining with some of the dimension tables - Joins with dimension tables whose attributes are
not referenced in the aggregate functions, can be
delayed until after pre-aggregation - Reduces the number of tuples involved in the join
- Speeding up the computation of the summary-delta
table - Decision could be made in a cost-based manager by
a query optimizer
17Propagate functionChanges to dimension tables
- Changes to dimension table
- Dimension-table-specific prepare-insertions and
prepare-deletions views - Represent the changes to the aggregate functions
?gt Prepare-insertions for SiC_sales due to
insertions to items
CREATE VIEW pi_items_SiC_sales (storeID,
category, _count, _date, _quantity)
AS SELECT storeID, category, 1 AS _count, date AS
_date, qty AS _quantity FROM pos, items_ins WHERE
pos.itemID items_ins.itemID
18Refresh function
- The refresh function applies the changes
represented in the summary-delta table to the
summary table - Each corresponding tuple in the summary table
- Updated, deleted, or inserted
- Conceptually very simple
- Left outer-join between the summary-delta table
and the summary table - Updated or deleted as it joins
- Summary-delta tuple that doesnt join is inserted
into the summary table - Only complication is an occasional recomputation
of a min/max value
19(No Transcript)
20Efficiently maintaining multiple summary tables
21Efficiently maintaining multiple summary tables
- Placing generalized cube views into a lattice
- Making summary tables lattice-friendly
- Optimizing the lattice
- Summary-delta lattice
- Computing the summary-delta lattice
22Performance
- Test the performance improvement
- Summary-delta table method over recomputation
- To determine the benefits of using the lattice
structure when maintaining multiple summary
tables - Dataset
- Data size of the pos table 100,000-500,000
tuples - Size of changes1,000-5,000 tuples
- Performance of the refresh operation
- Depended heavily on the number of updates/deletes
vs. inserts to the summary tables - Two types of changes to the pos table
- Update-Generating Changes
- Insertion0Generating Changes
23Performance
24Performance
25Related Work and Conclusion
- Proposing new view maintenance techniques for
maintaining multiple summary tables (aggregate
views) over a star schema using a new
summary-delta paradigm