AutoMed - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

AutoMed

Description:

if the evolved schema is a contraction of the original schema, schema evolution is automatic ... AutoMed used for the creation and maintenance of the data ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 52
Provided by: Luc75
Category:

less

Transcript and Presenter's Notes

Title: AutoMed


1
AutoMed
  • A Heterogeneous Data Integration System

2
Outline
  • Both-As-View (BAV) approach
  • GAV LAV approaches
  • BAV approach
  • Comparison of integration approaches
  • BAV advantages
  • The AutoMed system
  • Architecture
  • Current future work
  • Testbeds

3
GAV LAV Approaches
  • Global-As-View (GAV) approach describe GS
    constructs with view definitions over LSi
    constructs
  • Local-As-View (LAV) approach describe LSi
    constructs with view definitions over GS
    constructs

4
Global-As-View Approach (GAV)
  • student(id,name,left,degree) x,y,z,w
    ?x,y,z,w,_??ug ? ?x,_,_,_,_??phd ?
  • ?x,y,z,w,_??phd ?
  • w phd
  • monitors(sno,id)
  • x,y ?x,_,_,_,y??ug ?
    ?x,_,_,_,_??phd ?
  • ?x,y??supervises
  • staff(sno,sname,dept)
  • x,y,z ?x,y,z,w,_??tutor ?
    ?x,_,_??supervisor ?
  • ?x,y,z??supervisor

5
Local-As-View Approach (LAV)
  • tutor(sno,sname)
  • x,y ?x,y,_??staff ? ?x,z??monitors
    ?
  • ?z,_,_,w??student ?
  • w ? phd
  • ug(id,name,left,degree,sno)
  • x,y,z,w,v ?x,y,z,w??student ?
    ?v,x??monitors ?
  • w ? phd

6
Both-As-View (BAV) (1/3)
  • Schema transformation approach
  • For each pair (LSi,GS) incrementally modify
    LSi/GS to match GS/LSi

7
Both-As-View (BAV) (2/3)
  • Common Data Model Hypergraph Data Model (HDM)
  • Constructs are nodes, edges constraints
  • It avoids the semantic mismatches that may occur
    between constructs of higher-level modelling
    languages

8
Both-As-View (BAV) (3/3)
  • Modify using primitive schema transformations
  • add/delete
  • rename
  • extend/contract
  • Supply transformations with queries
  • add(??table,attrib3??, q), where
    qt,(a1a2)t,a1???table,attrib1??t,a2???
    table,attrib2??
  • extend(??table,attrib3??, q1,q2)

9
Example (1/2)
  • S1 ? Sg
  • add(??monitors?? ,q1)
  • add(??monitors,sno??,q2)
  • add(??monitors,id??,q3)
  • add(??tutor,dept??,q4)
  • rename(??ug??,??student??)
  • rename(??tutor,??staff??)
  • delete(??student,sno??,q5)
  • S2 ? Sg can be derived similarly

10
Example (2/2)
  • Automatically derivable reverse transformations
  • add(C,q)/extend(C,q1,q2) delete(C,q)/contract(C,
    q1,q2)
  • delete/contract add/extend
  • rename(C1,C2) rename(C2,C1)

11
BAV vs. LAV, GAV GLAV
  • BAV approach subsumes other integration
    approaches
  • Can be used to derive GAV LAV view definitions
    (ICDE03)
  • Comparison with GAV, LAV GLAV in DBIS'04

12
Schema Evolution
  • In GAV LAV view definitions have to be
    regenerated
  • The BAV approach readily supports the evolution
    of both local and global schemas
  • In particular (CAiSE02 ICDE03 papers)
  • if the evolved schema is semantically equivalent
    to the original schema, schema evolution is
    automatic
  • if the evolved schema is a contraction of the
    original schema, schema evolution is automatic
  • if the evolved schema is an extension of the
    original schema, then domain knowledge may be
    required (but again the pathway can be evolved
    rather than regenerated)

13
Local Schema Evolution Example
  • Define the evolution of the global or local
    schema as a schema transformation pathway from
    the old to the new schema

14
Types Of Integration
  • Virtual integration
  • Materialised integration
  • Hybrid integration

15
Outline
  • Both-As-View (BAV) approach
  • GAV LAV approaches
  • BAV approach
  • Comparison with GAV, LAV GLAV
  • BAV advantages
  • The AutoMed system
  • Architecture
  • Current future work
  • Testbeds

16
The AutoMed System
  • The AutoMed toolkit implements the BAV data
    integration approach
  • AutoMed repository
  • Model Definitions Repository (MDR)
  • Schema Transformation Repository (STR)
  • AutoMed query language IQL
  • Higher-level query languages are translated to
    IQL
  • IQL is translated to the query languages of the
    datasources

17
Query Engine
18
Query Engine
Query
Reformulator
19
Wrappers
  • Current
  • Relational (Oracle, PostgreSQL, SQLServer)
  • XML documents (DOM SAX)
  • YATTA
  • RDF
  • Near future
  • Object-oriented (ODMG 3.0 compliant)
  • Native XML Databases (Xindice, Sedna)
  • RDF Schema Specific DataBase (RSSDB)

20
Testbeds
  • BioMap
  • http//www.biochem.ucl.ac.uk/bsm/biomap
  • Data warehouse containing diverse biological data
  • AutoMed used for the creation and maintenance of
    the data warehouse
  • ISPIDER
  • http//www.ispider.man.ac.uk
  • Develop a Grid architecture for sharing data from
    various biological data sources (such as BioMap)
  • Extend AutoMed system with Grid services

21
Development/Research Areas
  • Query engine
  • Query processing optimisation
  • Query language translation
  • Tools
  • Data Warehousing data lineage
  • Automatic schema matching (data mining)
  • Automatic integration of XML data sources
  • Unstructured/semi-structured data
  • Transformation pathway optimisation
  • Visualisation tool
  • Grid/P2P architecture

22
Project Information
  • Homepage http//www.doc.ic.ac.uk/automed
  • Technical details
  • Papers
  • Technical reports
  • Software
  • AutoMed releases
  • Documentation

23
Project Members
  • Birkbeck College
  • Alexandra Poulovassilis (P.I.)
  • Hao Fan
  • Dean Williams
  • Lucas Zamboulis
  • Past members
  • Tanvir Amed Faqueer
  • Edgar Jasper
  • Dimitri Theodoratos
  • Imperial College
  • Peter McBrien (P.I.)
  • Mike Boyd
  • Sasivimol Kittivoravitkul
  • Nikolaos Rizopoulos
  • Nerissa Tong
  • Past members
  • Siegfried Hodgson
  • Charalambos Lazanitis

24
XML Data Transformation Integration
  • Lucas Zamboulis, Alexandra Poulovassilislucas,ap
    _at_dcs.bbk.ac.uk

25
Overview
  • Objective restructuring integration of XML
    files
  • Motivation
  • Interoperability
  • Related work on relational databases
  • Need for XML-specific solutions

26
Outline
  • Semantic Heterogeneity
  • Schema Matching
  • Ontologies
  • Structural Heterogeneity
  • XML schema type in AutoMed
  • Schema transformation
  • Schema integration

27
Semantic Heterogeneity
  • Problem definition
  • Schema Matching
  • Data mining
  • Neural networks
  • Machine learning (LSD)
  • Ontologies (RDFS/OWL)

28
Schema Matching (1/2)
  • Types
  • 1-1, 1-n, n-1, n-m
  • Subset, superset, equivalence
  • Use schema matching output to create the
    intermediate schemas used by the schema
    restructuring / schema integration algorithms

29
Schema Matching (2/2)
  • Necessary transformations
  • add attributes day, month, year in S
  • delete attribute dob from S
  • The reverse transformation pathway describes a
    n-1 match

30
Structural Heterogeneity
  • Problem Same information can be represented in
    many different ways
  • Ancestor descendant ?? different branches
  • Elements attributes not clearly distinguished
    in XML model
  • Ordering policy

31
Aims
  • XML-specific solution
  • Insert-remove-rename operations on elements,
    attributes, edges
  • Efficient move (node/subtree) operation
  • Element-to-attribute, attribute-to-element
    transformations
  • Avoid loss of data due to structural
    incompatibilities
  • Automation

32
A Schema Type For XML
  • DTD
  • Advantage wide adoption
  • Disadvantages
  • Non-XML format
  • Grammar
  • XML Schema
  • Advantage XML format
  • Disadvantages
  • Grammar
  • Unnecessary complexity

33
XML DataSource Schema (1/3)
  • Basic characteristics
  • Structure-only representation
  • XML format ? ease of traversal manipulation
  • Automatically derived from an XML file
  • XMLDSS from other schema types (DTD, XML Schema)

34
XML DataSource Schema (2/3)
35
XML DataSource Schema (3/3)
  • XMLDSS is being extended
  • Structural summary ? schema type (persistence,
    describe multiple documents)
  • Constraints
  • Primary/foreign keys
  • Cardinality
  • Ordering
  • If present, translate DTD/XML Schema to XMLDSS

36
Schema Transformation (1/2)
  • Target schema T given
  • Source schema S is transformed to match the
    structure of T

37
Schema Transformation (2/2)
  • Schema matching phase
  • Schema transformation phase
  • id phase
  • Target schema materialisation

38
Algorithm
  • Growing phase traverse the target schema and
    issue an add/extend transformation for every
    construct that does not exist in the source
    schema.
  • Shrinking phase traverse the source schema and
    issue an delete/contract transformation for every
    construct that does not exist in the target
    schema.
  • Completeness of algorithm

39
Transformation Types
  • AutoMed primitive transformations
  • add/extend
  • delete/contract
  • rename
  • Schema level
  • Insert, remove or rename schema constructs
  • Move element/subtree
  • Element ?? attribute

40
Example 1
  • Insert element C
  • ext(ltCgt,Void,Any)
  • ext(ltA,Cgt, Void,Any)
  • ext(ltC,Bgt, Void,Any)
  • del(ltA,Bgt,q)
  • Remove element C
  • add(ltA,Bgt,q)
  • con(ltCgt, Void,Any)
  • con(ltC,Bgt, Void,Any)
  • con(ltA,Cgt, Void,Any)

41
Example 2
  • Insert/remove edge move operation

42
Example 3
  • Move
  • add(ltroot,Bgt,q3)
  • add(ltB,Agt,
  • b,aa,b?ltA,Bgt)
  • delete(ltA,Bgt)
  • a,bb,a?ltB,Agt)
  • Complete
  • add(ltBgt, ltBgtq1)
  • add(ltA,Bgt, ltA,Bgtq2)
  • delete(ltA,Bgt, ltA,Bgt)
  • delete(ltBgt, ltBgt)
  • rename(ltBgt, ltBgt)

Schemas
Data
43
Example 1 - revisited
  • Actually, this can also be treated with an
    add/delete transformation

44
Example 4
  • Element-to-attribute transformation
  • insert(ltA,ABgt,q)
  • remove(ltA,Bgt,q)
  • remove(ltB,PCDATAgt,q)
  • remove(ltBgt,q)
  • Attribute-to-elementtransformation
  • insert(ltBgt,q)
  • insert(ltA,Bgt,q)
  • insert(ltB,PCDATAgt,q)
  • remove(ltA,ABgt,q)

45
Schema Integration Type I
46
Schema Integration Type II
  • Type I integration performs two tasks at once
  • schema integration
  • schema improvement
  • Type II
  • Augment with missing constructs
  • Remove redundant constructs

47
Schema Integration Type II
  • Improve GS as a second step

48
Materialisation
  • Strategy
  • Materialise root and its attributes
  • Consider all edges (ep,ec) in a depth-first way
  • Materialise ec and its attributes

49
Conclusions
  • XML specific solution
  • element??attribute transformations
  • move operation
  • No loss of data by synthetically creating missing
    structure

50
Evaluation
  • BIOMAP
  • Integration of biological data sources
  • Relational databases, XML documents, XML databases

51
Future Work
  • Short-term
  • Use ontologies for resolving semantic
    heterogeneity
  • Extend XMLDSS
  • Native XML Databases (Xindice, Sedna)
  • XML-Enabled Databases (Oracle)
  • Long-term
  • Schema integration
  • GS improvement (type II)
  • Overlapping data identification
  • Targeted rematerialisation of GS
  • Schema evolution
Write a Comment
User Comments (0)
About PowerShow.com