A Transactional Model for Data Warehouse Maintenance - PowerPoint PPT Presentation

About This Presentation
Title:

A Transactional Model for Data Warehouse Maintenance

Description:

for Data Warehouse Maintenance. Authored by: Jun Chen, Songting Chen, Elke A. ... 13. Versioned Wrapper. Semantics: life time of a tuple is #born = time #dead ... – PowerPoint PPT presentation

Number of Views:187
Avg rating:3.0/5.0
Slides: 23
Provided by: webC6
Learn more at: http://web.cs.wpi.edu
Category:

less

Transcript and Presenter's Notes

Title: A Transactional Model for Data Warehouse Maintenance


1
A Transactional Model for
Data Warehouse Maintenance
  • Authored by
  • Jun Chen, Songting Chen, Elke A. Rundensteiner
  • Published in ER2002, Finland
  • Database Systems Research Group
  • Worcester Polytechnic
    Institute

2
Data Warehousing
  • Data Integration from Remote Base Sources
  • Difficult and Labor-Intensive
  • Better Do it only ONCE and Materialize the
    Results
  • Share Materialized Data by Many Applications

Data Warehouse
3
Data Warehouse Maintenance
  • Motivation Keep Data Warehouse (DW)
    Update-to-Date
  • Base Changes over Time
  • Source Data Updates
  • insert, delete, update
  • Source Schema Changes
  • add, drop, rename
  • Basic Idea Incremental instead of
    Re-computation
  • Re-computation may take weeks

4
General Maintenance Algorithms
  • View Maintenance (VM)
  • Incrementally incorporate source data updates
  • BLT86, GMS93, ZGH95, SBC00
  • View Synchronization (VS)
  • Rewrite data warehouse view definition after the
    schema
  • of the source changed
  • NLR98, LNR02
  • View Adaptation (VA)
  • Adapt view extent after the view definition
    changed
  • NR99, GMR01

5
DW Maintenance Example
CREATE VIEW Asia_Traveller AS SELECT C.Name,
C.Address, F.FlightNo FROM Customer C,
FlightRes F WHERE C.Name F.Name
AND F.Dest Asia
View Asia_Traveller
FlightNo
Address
Name
AA8384
WPI
Dave
Select FlightNo from FlightRes where
NameSteve
Customer
FlightRes
Address
Name
Dest
FlightNo
Age
Name
WPI
Dave
Asia
AA8384
22
Dave
MA
Ellen
Europe
UA77788
22
Steve
6
Maintenance Anomaly Problem
7
Inside Broken Query
  • Two Transactions
  • Base Update Transaction
  • w(Bi)c(Bi)
  • DW Maintenance Transaction
  • r(B1)r(B2)r(Bn)w(DW)c(DW)
  • Read-write conflicts between two transactions
  • Two Independent Transactions
  • w(Bi) / r(Bi)
  • Data Update w(Bi) Incorrect Query Results
    ZGH95
  • Schema Change w(Bi) Broken Query

8
A Transactional Approach
  • A Global Transaction Model
  • DWMS_Transaction
  • Integrates both base update transaction and its
    corresponding DW maintenance transaction
  • w(Bi)c(Bi)r(B1)r(B2)r(Bn)w(DW)c(DW)
  • Maintenance Anomaly
  • Rephrased to read-write conflicts of
    DWMS_Transactions
  • w(Bi)c(Bi)r(B1)r(B2)r(Bj)r(Bn)w(DW)c(DW)
  • w(Bj)c(Bj)r(B1)r(B2)r(Bn)w(DW)c(DW)

9
Serializability of DWMS_Transaction
  • Theorem
  • A history of DWMS_Transactions S is serializable
  • iff it is equivalent to some serial schedule
    S of the
  • same DWMS_Transactions.
  • Basis for Solving Anomaly Problems
  • To solve the anomaly problem,
    we
    need all DWMS_Transactions serializable.

10
Traditional Serializability Algorithms
  • Lock-based
  • Reads / writes acquire locks for access to
    shared resources
  • Transactions block each other
  • Multiversion-based
  • Write on a version, read on another version
  • Transactions do not block each other

11
Traditional Serializability Algorithms
  • Lock-based
  • Read / write would need to lock data in sources?
  • Not desirable in DW environment
  • Data sources are autonomous
  • Not realistic to impose locking on them
  • Multiversion-based
  • Do not block each other
  • Desirable in DW environment
  • DW and data sources do not block each other
  • Need to maintain versions somewhere

12
TxnWrap A Multiversion Algorithm

CREATE VIEW Asia_Traveller AS SELECT C.Name,
C.Address, F.FlightNo FROM Customer C,
FlightRes F WHERE C.Name F.Name
AND F.Dest Asia
13
Versioned Wrapper
Wrapper for Customer
Relation Customer
Meta Relation
Name Address born dead
Dave WPI 0 ?
Ellen MA 0 ?
Rel Attr Rel Attr born dead
C Name - - 0 ?
C Addr. - - 0 ?
  • Semantics life time of a tuple is
  • born lt time lt dead

14
Source Updates on Versioned Wrapper
Relation Customer (Init)
Transaction1 1. DELETE FROM Customer C
WHERE C.Name Dave 2. INSERT (Steve,
Boston)
Transcation 2 Drop Customer.Address
15
DW Maintenance Query Rewritten for Versioned
Wrapper
The maintenance query issued in Transaction2
SELECT Name, Address FROM
Customer WHERE condition Rewritten versioned
maintenance query SELECT Name, Address
FROM Customer WHERE condition and born lt 2
and dead gt 2
16
Performance Evaluation
  • Implementation
  • In Java
  • Platform Oracle, JDBC on Windows NT
  • Embedded in DyDa CCZ01 System at WPI
  • Testbed
  • 6 data sources with one relation each
  • Each relation has 4 attributes and 100,000
    tuples
  • One materialized joined view over these data
    sources
  • TxnWrap VS. compensation (SWEEP AAS97 DyDa)

17
Data Update Processing
18
Schema Change Processing
19
Related Work
  • View Maintenance
  • View Maintenance / Synchronization / Adaptation
  • Maintenance Anomaly
  • ECA ZGH95, SWEEP AAS97 handles only
  • concurrent data updates
  • Compensation-based
  • Performance degrades at a high load
  • Multi-version Algorithms
  • 2-version, n-version, unlimited-version
    algorithms MPL92

20
Conclusions
  • Identify the Maintenance Anomaly Problem in
    mixed model environment
  • Design a global Transaction DWMS_Transaction
    model that integrates both source update
    transaction and maintenance transaction.
  • Rephrase the maintenance anomaly in terms of
  • serializability of DWMS_Transactions
  • Propose multiversion algorithm to achieve
    serializability
  • Implemented the maintenance solution in Dyda
  • Achieve stable performance under various
    workloads

21
Other Activities and Future Work
  • Batching of updates into more complex
    maintenance plans
  • Parallelism of maintenance processes
  • Support more complex views, e.g., aggregation
  • Generalize to more change types
  • Provide alternate view synchronization
    algorithms
  • Discovery of changes by non-cooperating sources
  • Discovery of meta data in terms of source
    relationships of distributed sources
  • Move beyond relational middle-layer model

22
Questions?
Write a Comment
User Comments (0)
About PowerShow.com