Schema Evolution in Data Warehousing Environments a schema transformationbased approach PowerPoint PPT Presentation

presentation player overlay
1 / 22
About This Presentation
Transcript and Presenter's Notes

Title: Schema Evolution in Data Warehousing Environments a schema transformationbased approach


1
Schema Evolution in Data Warehousing
Environments a schema transformation-based
approach
  • Hao Fan Alexandra Poulovassilis
  • School of Computer Science Information Systems
  • Birkbeck college, University of London
  • ER 2004
  • 8th 12th November 2004

2
Outline
  • Motivation
  • Overview of AutoMed
  • Expressing Schema and Data Model Evolution
  • Handling Schema Evolution
  • Conclusion

3
Motivation - 1
There are two kinds of heterogeneity in data
integrations
  • Model Heterogeneity
  • Heterogeneous data expressed in different data
    models
  • Schema Heterogeneity
  • Heterogeneous data within different data
    schemas expressed the same data model

4
Motivation - 2
In data warehousing environments, schema
evolution includes
  • Evolution of the schema
  • Evolution of the modelling language in which the
    schema is expressed

It is important for the integration framework to
be evolvable if a schema evolves, so that the
previous data integration effort can be reused
as much as possible.
5
Overview of AutoMed
AutoMed is a heterogeneous data transformation
and integration system, which offers the
capability to handle data integration across
multiple data models.
  • HDM data model
  • The IQL query language
  • Transformation Pathways
  • The AutoMed repository

(See http//www.doc.ic.ac.uk/automed/ for details)
6
HDM Data Model
HDM is a low-level Hypergraph Data Model.
HDM Schemas consist of a set of Nodes, Edges
and Constraints.
Schemas within conceptual data models, such as
Relational, ER, XML, and Flat-text, can be
represented in terms of HDM.
7
A Simple Relational Data Model
CStab
CStab
Cstab,SID
Cstab,C03
Cstab,SName
Cstab,C02
Cstab,C01
The Extent of each construct CStab
CS01, CS02, Cstab,SID
CS01, CS01, CS02, CS02, Cstab,SNa
me CS01, Jack, CS02,
Tom, Cstab,C01 CS01, 95,
CS02, 88, Cstab,C02 CS01,
82, CS02, 94, Cstab,C03
CS01, 75, CS02, 81,
8
The IQL Query Language
IQL is a comprehension-based functional query
language. Simply, the Syntax of IQL can be
described as following queries
6. aggFun E (aggFun max min
count sum avg) 7. gc aggFun
E /group and compute a bag of pairs on their
first component and apply an aggregation function
to the second component/ 8. e Q1
Qr /comprehension/ 9. map f E
  • 1. E1 E2 Er
  • / bag union/
  • 2. E1 -- E2 / bag monus /
  • 3. group E
  • / group a bag of pairs on their first
    component/
  • 4. sort E
  • 5. distinct E
  • /remove duplicates/

9
Primitive Transformations
  • addC (c, q)
  • extendC (c)
  • deleteC (c, q)
  • contractC (c)
  • renameC (c, c)
  • For simple relational model, C can be Rel, and
    Att
  • For XML, C can be Element, Attribute, and
    NestSet
  • For ER, C can be Entity, Attribute, and
    Constraint
  • For HDM, C can be Node, Edge, and Constraint

10
Transformation Pathways
11
A Data Integration Example
7. contractHierarchy (ltltSalary,
Person,id,idgtgt) 8. contractHierarchy
(ltltSalary,Job,job_id,job_idgtgt) 9. contractAtt
(ltltSalary, salarygtgt) 10. contractAtt
(ltltSalary, dept_idgtgt) 11. contractFact
(ltltSalary, id, job_idgtgt) 12. contractAtt
(ltltJob, job_descrgtgt) 13. contractDim
(ltltJob, job_idgtgt) 14. contractAtt
(ltltPerson, namegtgt) 15. contractDim
(ltltPerson, idgtgt)
1. addRel (ltltDept,idgtgt, ltltDept_idgtgt) 2. addAtt
(ltltDept,dept_namegtgt, ltlt_,Dept_id,namegtgt) 3.
addAtt (ltltDept,total_salarygtgt,

gc sum (d,s)(i,j,s)lt-ltltSalary,salarygtgt
(i',j',d)lt-ltSalary,dept_idgtgt ii'
jj') 4. delEdge (ltlt_,Dept_id,namegtgt,
ltltDept,dept_namegtgt) 5. delNode
(ltltnamegtgt,n(d,n)lt-ltltDept,dept_namegtgt) 6.
delNode (ltltDept_idgtgt, ltltDept,idgtgt)
12
The AutoMed Repository
13
Discussion AutoMed vs. CDM - 1
14
Discussion AutoMed vs. CDM - 2
Conceptual Data Model
AutoMed
  • No semantic mismatch
  • Possible to extend data warehouse views into a
    different data model
  • Easily reuse the trans-formation and integration
    efforts if a source schema is changed
  • Semantic mismatches
  • Tightly coupled with the CDM
  • Not straightforward to reuse the integration
    effort if a source schema is changed

15
Materialised Data Integration in AutoMed
16
Expressing Schema and Data Model Evolution
  • Expressing the evolution of schema S to Snew
  • S and Snew have different schema but expressed in
    the same modelling language.
  • Snew is an equivalent schema of S expressed in
    different modelling language.
  • S to Snew is an evolution of both a change in the
    schema and in the data model.

17
Handling Schema Evolution
Supposing evolution S ? Snew consisting of one
primitive Transformation, t , two scenarios need
to be considered
  • Evolution of the global schema
  • Evolution of a local schema

18
Evolution of the Global Schema
  • t is rename c c, nothing further need to do.
  • t is add c q , nothing further need to do at the
    schema level. The new construct c must be
    populated.
  • t is extend c , an empty new construct c is
    populated.
  • t is delete c q or contract c , the extend of c
    must be removed from the global database, and the
    transformation pathway US ? GSnew can be
    optimized.

GSnew
t
GS
GD
T
US
Other transformations
19
Evolution of a Local Schema
  • t is rename c c, Tnew rename c c T.
  • t is add c q , Tnew delete c q T.
  • t is delete c q , Tnew add c q T.
  • t is contract c , Tnew extend c T. Since the
    extend of c is Void, the materialised data in CD
    and GD must be modified so as to remove any data
    derived from the old extend of c.
  • t is extend c , Tnew can be contract cT, but it
    will lose information.(see next slide for
    details)

GS
GD
US
CSi
CDi
T
Tnew
LSi
t
LSinew
LDi
20
t extend c
  • c appears in schema US and has the same semantics
    as the newly added c in LSinew.
  • Removing extend c from CSi?US and propagating c
    into CSi.
  • c does NOT appears in schema US but can be
    derived from US by means of some transformation
    T.
  • Keeping c in CSi and US, and populating c into
    LDi.
  • Repairing other pathways LSj? CSj (i ! j),
    adding c in each CSj.
  • c does NOT appears in schema US but CANNOT be
    derived from US.
  • Keeping c in CSi and US, and populating c into
    LDi.
  • Extending other pathways LSj? CSj (i ! j),
    adding c in each CSj.
  • c appears in schema US but has the different
    semantics as the newly added c in LSinew.
  • renaming c in LSinew to a new construct c.

21
Conclusion Future Work
  • Our approach is mainly automatic for handling
    schema evolution, except for the aspects that
    require domain or expert human knowledge
    regarding the semantics of new schema constructs.
  • In our case, the new mappings are a composition
    of the original transformation pathway and the
    transformation pathway which expresses the schema
    evolution.
  • In this paper, we use IQL. However, the AutoMed
    toolkit allows any other query languages. Our
    approach could be extended to other query
    language.
  • Our approach is expected to handle the problem of
    incremental view maintenance at the date and
    schema levels.

Future Work
Implementing our approach and evaluating it in
the context of biological data warehousing
(ISPIDE project).
22
Acknowledge
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com