Schema Evolution in Data Warehousing Environments a schema transformationbased approach presentation

About This Presentation

Transcript and Presenter's Notes

Title: Schema Evolution in Data Warehousing Environments a schema transformationbased approach

1
Schema Evolution in Data Warehousing
Environments a schema transformation-based
approach

Hao Fan Alexandra Poulovassilis
School of Computer Science Information Systems
Birkbeck college, University of London
ER 2004
8th 12th November 2004

2
Outline

Motivation
Overview of AutoMed
Expressing Schema and Data Model Evolution
Handling Schema Evolution
Conclusion

3
Motivation - 1
There are two kinds of heterogeneity in data
integrations

Model Heterogeneity
Heterogeneous data expressed in different data
models
Schema Heterogeneity
Heterogeneous data within different data
schemas expressed the same data model

4
Motivation - 2
In data warehousing environments, schema
evolution includes

Evolution of the schema
Evolution of the modelling language in which the
schema is expressed

It is important for the integration framework to
be evolvable if a schema evolves, so that the
previous data integration effort can be reused
as much as possible.
5
Overview of AutoMed
AutoMed is a heterogeneous data transformation
and integration system, which offers the
capability to handle data integration across
multiple data models.

HDM data model
The IQL query language
Transformation Pathways
The AutoMed repository

(See http//www.doc.ic.ac.uk/automed/ for details)
6
HDM Data Model
HDM is a low-level Hypergraph Data Model.
HDM Schemas consist of a set of Nodes, Edges
and Constraints.
Schemas within conceptual data models, such as
Relational, ER, XML, and Flat-text, can be
represented in terms of HDM.
7
A Simple Relational Data Model
CStab
CStab
Cstab,SID
Cstab,C03
Cstab,SName
Cstab,C02
Cstab,C01
The Extent of each construct CStab
CS01, CS02, Cstab,SID
CS01, CS01, CS02, CS02, Cstab,SNa
me CS01, Jack, CS02,
Tom, Cstab,C01 CS01, 95,
CS02, 88, Cstab,C02 CS01,
82, CS02, 94, Cstab,C03
CS01, 75, CS02, 81,
8
The IQL Query Language
IQL is a comprehension-based functional query
language. Simply, the Syntax of IQL can be
described as following queries
6. aggFun E (aggFun max min
count sum avg) 7. gc aggFun
E /group and compute a bag of pairs on their
first component and apply an aggregation function
to the second component/ 8. e Q1
Qr /comprehension/ 9. map f E

1. E1 E2 Er
/ bag union/
2. E1 -- E2 / bag monus /
3. group E
/ group a bag of pairs on their first
component/
4. sort E
5. distinct E
/remove duplicates/

9
Primitive Transformations

addC (c, q)
extendC (c)
deleteC (c, q)
contractC (c)
renameC (c, c)

For simple relational model, C can be Rel, and
Att
For XML, C can be Element, Attribute, and
NestSet
For ER, C can be Entity, Attribute, and
Constraint
For HDM, C can be Node, Edge, and Constraint

10
Transformation Pathways
11
A Data Integration Example
7. contractHierarchy (ltltSalary,
Person,id,idgtgt) 8. contractHierarchy
(ltltSalary,Job,job_id,job_idgtgt) 9. contractAtt
(ltltSalary, salarygtgt) 10. contractAtt
(ltltSalary, dept_idgtgt) 11. contractFact
(ltltSalary, id, job_idgtgt) 12. contractAtt
(ltltJob, job_descrgtgt) 13. contractDim
(ltltJob, job_idgtgt) 14. contractAtt
(ltltPerson, namegtgt) 15. contractDim
(ltltPerson, idgtgt)
1. addRel (ltltDept,idgtgt, ltltDept_idgtgt) 2. addAtt
(ltltDept,dept_namegtgt, ltlt_,Dept_id,namegtgt) 3.
addAtt (ltltDept,total_salarygtgt,

gc sum (d,s)(i,j,s)lt-ltltSalary,salarygtgt
(i',j',d)lt-ltSalary,dept_idgtgt ii'
jj') 4. delEdge (ltlt_,Dept_id,namegtgt,
ltltDept,dept_namegtgt) 5. delNode
(ltltnamegtgt,n(d,n)lt-ltltDept,dept_namegtgt) 6.
delNode (ltltDept_idgtgt, ltltDept,idgtgt)
12
The AutoMed Repository
13
Discussion AutoMed vs. CDM - 1
14
Discussion AutoMed vs. CDM - 2
Conceptual Data Model
AutoMed

No semantic mismatch
Possible to extend data warehouse views into a
different data model
Easily reuse the trans-formation and integration
efforts if a source schema is changed

Semantic mismatches
Tightly coupled with the CDM
Not straightforward to reuse the integration
effort if a source schema is changed

15
Materialised Data Integration in AutoMed
16
Expressing Schema and Data Model Evolution

Expressing the evolution of schema S to Snew
S and Snew have different schema but expressed in
the same modelling language.
Snew is an equivalent schema of S expressed in
different modelling language.
S to Snew is an evolution of both a change in the
schema and in the data model.

17
Handling Schema Evolution
Supposing evolution S ? Snew consisting of one
primitive Transformation, t , two scenarios need
to be considered

Evolution of the global schema
Evolution of a local schema

18
Evolution of the Global Schema

t is rename c c, nothing further need to do.
t is add c q , nothing further need to do at the
schema level. The new construct c must be
populated.
t is extend c , an empty new construct c is
populated.
t is delete c q or contract c , the extend of c
must be removed from the global database, and the
transformation pathway US ? GSnew can be
optimized.

GSnew
t
GS
GD
T
US
Other transformations
19
Evolution of a Local Schema

t is rename c c, Tnew rename c c T.
t is add c q , Tnew delete c q T.
t is delete c q , Tnew add c q T.
t is contract c , Tnew extend c T. Since the
extend of c is Void, the materialised data in CD
and GD must be modified so as to remove any data
derived from the old extend of c.
t is extend c , Tnew can be contract cT, but it
will lose information.(see next slide for
details)

GS
GD
US
CSi
CDi
T
Tnew
LSi
t
LSinew
LDi
20
t extend c

c appears in schema US and has the same semantics
as the newly added c in LSinew.
Removing extend c from CSi?US and propagating c
into CSi.
c does NOT appears in schema US but can be
derived from US by means of some transformation
T.
Keeping c in CSi and US, and populating c into
LDi.
Repairing other pathways LSj? CSj (i ! j),
adding c in each CSj.
c does NOT appears in schema US but CANNOT be
derived from US.
Keeping c in CSi and US, and populating c into
LDi.
Extending other pathways LSj? CSj (i ! j),
adding c in each CSj.
c appears in schema US but has the different
semantics as the newly added c in LSinew.
renaming c in LSinew to a new construct c.

21
Conclusion Future Work

Our approach is mainly automatic for handling
schema evolution, except for the aspects that
require domain or expert human knowledge
regarding the semantics of new schema constructs.
In our case, the new mappings are a composition
of the original transformation pathway and the
transformation pathway which expresses the schema
evolution.
In this paper, we use IQL. However, the AutoMed
toolkit allows any other query languages. Our
approach could be extended to other query
language.
Our approach is expected to handle the problem of
incremental view maintenance at the date and
schema levels.

Future Work
Implementing our approach and evaluating it in
the context of biological data warehousing
(ISPIDE project).
22
Acknowledge
Thank you!

Write a Comment

User Comments (0)

About PowerShow.com

Schema Evolution in Data Warehousing Environments a schema transformationbased approach PowerPoint PPT Presentation