Title: George Papastefanatos1, Panos Vassiliadis2,
1What-if Analysis for Data Warehouse Evolution
- George Papastefanatos1, Panos Vassiliadis2,
- Alkis Simitsis3 ,Yannis Vassiliou1
- (1) National Technical University of Athens
- gpapas,yv_at_dbnet.ece.ntua.gr
- (2) University of Ioannina
- pvassil_at_cs.uoi.gr
- (3) IBM Almaden Research Center
- asimits_at_us.ibm.com
2Outline
- Motivation
- Graph-based modeling of ETL
- Adapting ETL workflows
- Evaluation
- Conclusions
3Outline
- Motivation
- Graph-based modeling of ETL
- Adapting ETL workflows
- Evaluation
- Conclusions
4Data Warehouse environment
WWW
5ETL
WWW
6Motivation
WWW
7Evolving ETL sources
- Schema Changes on the sources of ETL processes.
Design constructs are - Added, Removed, Modified
- ETL processes affected
- Syntactically i.e., become invalid
- Semantically i.e., query must conform to the
new source database semantics - Adaptation of activities, queries and views
- time-consuming task,treated in most of the cases
manually by the administrators/developers
8We would like to know
- What part of the process is affected and how if
e.g., an attribute is deleted? - Can we predict the impact of changes?
- To what extent can readjustment be automated?
- Can we perform what-if analysis for potential
changes of source configurations?
9Contribution
- A general mechanism for performing what-if
analysis for potential changes of ETL source
configurations - A graph model for relations, queries, views, ETL
activities, and their significant properties - A framework for annotating the graph with
policies concerning the behavior of nodes in the
presence of hypothetical changes. - A set of rules that dictate the proper actions,
when additions, deletions or updates are
performed to relations, attributes, and
conditions - An experimental assessment of our proposal
10Outline
- Motivation
- Graph-based modeling of ETL
- Adapting ETL workflows
- Evaluation
- Conclusions
11Query representation
12Query representation
13Outline
- Motivation
- Graph-based modeling of ETL
- Adapting ETL workflows
- Evaluation
- Conclusions
14Annotating the graph with adaptation rules
According to prevailing policy, the proper action
is taken ? graph transformation
15Annotating the graph with adaptation rules
- Assuming that a graph construct is annotated with
a policy for a particular event (e.g., an
activity node is tuned to deny deletions of its
provider attributes) - (a) it performs the identification of the
affected subgraph - (b) if the policy is appropriate, it automates
the readjustment of the graph to fit the new
semantics imposed by the change.
16Query Adaptation - Example
Q SELECT EMP.Emp, EMP.Name FROM EMP
Q SELECT EMP.Emp, EMP.Name, EMP.Phone
FROM EMP
17Algorithm Propagate changeS (PS)
18Conflict resolution
- Graph constructs may have contradictory policies
for the same event
Rule
Policies defined on query graph structures are
stronger than policies defined on view graph
structures which in turn prevail on policies
defined on relation graph structures
19Outline
- Motivation
- Graph-based modeling of ETL
- Adapting ETL workflows
- Evaluation
- Conclusions
20Evaluation of our framework
- Reverse engineering of real-world ETL processes,
extracted from an application of the Greek public
sector - Monitored the changes that took place to the
sources of the studied data warehouse - Performed what-if analysis for several evolution
scenarios
21Configuration of our setting
7 source tables
53 ETL activities 3 lookup tables
9 target tables
22Configuration of our setting
- evolution events
- renaming source tables,
- renaming attributes of source tables,
- adding and deleting attributes from source
tables, - modifying the domain of attributes
- changing the primary key of lookup tables
23Measurements
- For each event, we counted
- (a) the number of activities affected both
semantically and syntactically, - (b) the number of activities, that have
automatically been adapted by our framework
(propagate or block policies) as opposed to those
- (c) that required administrators intervention
(i.e., a prompt policy)
24Adapted activities w.r.t. the ETL scenario size
25Adapted activities w.r.t. the complexity of
activities.
26Outline
- Motivation
- Graph-based modeling of ETL
- Adapting ETL workflows
- Evaluation
- Conclusions
27Summary
- A uniform representation for modeling relations,
queries, views and ETL activities - A framework for annotating the graph with
policies concerning the behavior of nodes in the
presence of hypothetical changes - A set of rules that dictate the proper adaptation
actions, when evolution events are performed to
ETL sources - An experimental assessment of our proposal
28On-going/Future Work
- Hecataeus A tool for visualizing and performing
what-if analysis for several evolution scenarios. - SQL extensions for annotating graph constructs
with evolution semantics - Patterns of evolution sequences
29Danke schön!Questions?
http//www.cs.uoi.gr/pvassil/projects/architectur
e_graph/
30(No Transcript)
31Back up Slides
32Related work
- DB schema Evolution
- Schema Versioning
- DW schema Evolution
- Materialized View Evolution
- Evolution wrt Model Mappings
33Scenario 1
34Evolution Metadata
- Metadata Repository maintaining
- Graph constructs
- Annotations
- What if analysis scenarios
35Extending SQL With Evolution Semantics
- Used for annotating graph constructs
ON TO THEN
E.g.
SELECT Emp, NAME, AGE FROM V ON condition
addition TO V THEN propagate, ON attribute
deletion TO V.AGE THEN block
36Hecataeus
- A tool for visualizing and performing what-if
analysis for several evolution scenarios
37Adapted activities per Event
38Evolution changes occurred on source and lookup
tables
39Annotation of graph constructs with policies for
kinds of events