Title: A Transactional Model for Data Warehouse Maintenance
1 A Transactional Model for
Data Warehouse Maintenance
- Authored by
- Jun Chen, Songting Chen, Elke A. Rundensteiner
- Published in ER2002, Finland
- Database Systems Research Group
- Worcester Polytechnic
Institute
2Data Warehousing
- Data Integration from Remote Base Sources
- Difficult and Labor-Intensive
- Better Do it only ONCE and Materialize the
Results - Share Materialized Data by Many Applications
Data Warehouse
3Data Warehouse Maintenance
- Motivation Keep Data Warehouse (DW)
Update-to-Date - Base Changes over Time
- Source Data Updates
- insert, delete, update
- Source Schema Changes
- add, drop, rename
- Basic Idea Incremental instead of
Re-computation - Re-computation may take weeks
4General Maintenance Algorithms
- View Maintenance (VM)
- Incrementally incorporate source data updates
- BLT86, GMS93, ZGH95, SBC00
- View Synchronization (VS)
- Rewrite data warehouse view definition after the
schema - of the source changed
- NLR98, LNR02
- View Adaptation (VA)
- Adapt view extent after the view definition
changed - NR99, GMR01
5DW Maintenance Example
CREATE VIEW Asia_Traveller AS SELECT C.Name,
C.Address, F.FlightNo FROM Customer C,
FlightRes F WHERE C.Name F.Name
AND F.Dest Asia
View Asia_Traveller
FlightNo
Address
Name
AA8384
WPI
Dave
Select FlightNo from FlightRes where
NameSteve
Customer
FlightRes
Address
Name
Dest
FlightNo
Age
Name
WPI
Dave
Asia
AA8384
22
Dave
MA
Ellen
Europe
UA77788
22
Steve
6Maintenance Anomaly Problem
7Inside Broken Query
- Two Transactions
- Base Update Transaction
- w(Bi)c(Bi)
- DW Maintenance Transaction
- r(B1)r(B2)r(Bn)w(DW)c(DW)
- Read-write conflicts between two transactions
- Two Independent Transactions
- w(Bi) / r(Bi)
- Data Update w(Bi) Incorrect Query Results
ZGH95 - Schema Change w(Bi) Broken Query
-
8A Transactional Approach
- A Global Transaction Model
- DWMS_Transaction
- Integrates both base update transaction and its
corresponding DW maintenance transaction - w(Bi)c(Bi)r(B1)r(B2)r(Bn)w(DW)c(DW)
- Maintenance Anomaly
- Rephrased to read-write conflicts of
DWMS_Transactions - w(Bi)c(Bi)r(B1)r(B2)r(Bj)r(Bn)w(DW)c(DW)
- w(Bj)c(Bj)r(B1)r(B2)r(Bn)w(DW)c(DW)
9Serializability of DWMS_Transaction
- Theorem
- A history of DWMS_Transactions S is serializable
- iff it is equivalent to some serial schedule
S of the - same DWMS_Transactions.
- Basis for Solving Anomaly Problems
- To solve the anomaly problem,
we
need all DWMS_Transactions serializable.
10Traditional Serializability Algorithms
- Lock-based
- Reads / writes acquire locks for access to
shared resources - Transactions block each other
- Multiversion-based
- Write on a version, read on another version
- Transactions do not block each other
11Traditional Serializability Algorithms
- Lock-based
- Read / write would need to lock data in sources?
- Not desirable in DW environment
- Data sources are autonomous
- Not realistic to impose locking on them
- Multiversion-based
- Do not block each other
- Desirable in DW environment
- DW and data sources do not block each other
- Need to maintain versions somewhere
12TxnWrap A Multiversion Algorithm
CREATE VIEW Asia_Traveller AS SELECT C.Name,
C.Address, F.FlightNo FROM Customer C,
FlightRes F WHERE C.Name F.Name
AND F.Dest Asia
13Versioned Wrapper
Wrapper for Customer
Relation Customer
Meta Relation
Name Address born dead
Dave WPI 0 ?
Ellen MA 0 ?
Rel Attr Rel Attr born dead
C Name - - 0 ?
C Addr. - - 0 ?
- Semantics life time of a tuple is
- born lt time lt dead
14Source Updates on Versioned Wrapper
Relation Customer (Init)
Transaction1 1. DELETE FROM Customer C
WHERE C.Name Dave 2. INSERT (Steve,
Boston)
Transcation 2 Drop Customer.Address
15DW Maintenance Query Rewritten for Versioned
Wrapper
The maintenance query issued in Transaction2
SELECT Name, Address FROM
Customer WHERE condition Rewritten versioned
maintenance query SELECT Name, Address
FROM Customer WHERE condition and born lt 2
and dead gt 2
16Performance Evaluation
- Implementation
- In Java
- Platform Oracle, JDBC on Windows NT
- Embedded in DyDa CCZ01 System at WPI
- Testbed
- 6 data sources with one relation each
- Each relation has 4 attributes and 100,000
tuples - One materialized joined view over these data
sources - TxnWrap VS. compensation (SWEEP AAS97 DyDa)
-
17Data Update Processing
18Schema Change Processing
19Related Work
- View Maintenance
- View Maintenance / Synchronization / Adaptation
- Maintenance Anomaly
- ECA ZGH95, SWEEP AAS97 handles only
- concurrent data updates
- Compensation-based
- Performance degrades at a high load
- Multi-version Algorithms
- 2-version, n-version, unlimited-version
algorithms MPL92
20Conclusions
- Identify the Maintenance Anomaly Problem in
mixed model environment - Design a global Transaction DWMS_Transaction
model that integrates both source update
transaction and maintenance transaction. - Rephrase the maintenance anomaly in terms of
- serializability of DWMS_Transactions
- Propose multiversion algorithm to achieve
serializability - Implemented the maintenance solution in Dyda
- Achieve stable performance under various
workloads -
21Other Activities and Future Work
- Batching of updates into more complex
maintenance plans - Parallelism of maintenance processes
- Support more complex views, e.g., aggregation
- Generalize to more change types
- Provide alternate view synchronization
algorithms - Discovery of changes by non-cooperating sources
- Discovery of meta data in terms of source
relationships of distributed sources - Move beyond relational middle-layer model
-
22Questions?