Integrating Multiple Data Sources using a Standardized XML Dictionary - PowerPoint PPT Presentation

About This Presentation
Title:

Integrating Multiple Data Sources using a Standardized XML Dictionary

Description:

ABC Database X-Spec. Page 22 ... payment (attribute) in ABC with payment table ... is involved in constructing a data warehouse and other operational systems. ... – PowerPoint PPT presentation

Number of Views:122
Avg rating:3.0/5.0
Slides: 28
Provided by: RamonLa4
Category:

less

Transcript and Presenter's Notes

Title: Integrating Multiple Data Sources using a Standardized XML Dictionary


1
Integrating Multiple Data Sources using a
Standardized XML Dictionary
Ramon Lawrence University of Manitoba umlawren_at_cs.
umanitoba.ca Supervisor Dr. Ken Barker
TRLabs - Winnipeg
2
Outline
  • Introduction, Motivation, and Background
  • Integration architecture components
  • Integration architecture
  • Example integration
  • Applications to the WWW
  • Future work and conclusions
  • Demonstration of Unity

3
Introduction
  • Integration of data is required when accessing
    multiple databases within an organization or on
    the WWW.
  • Our focus is automatically combining database
    schema using schema integration.
  • Schema integration requires knowledge of data
    semantics and use of metadata.

4
Motivation
  • Organizations have several database systems which
    must interoperate.
  • Users often access multiple Web databases whose
    knowledge must be integrated and presented in a
    useful form.
  • Data warehouses and OLAP systems require data
    semantics to be understood and data to be
    cleansed and summarized.

5
Background
  • Schema integration involves combining diverse
    database schema into an integrated view by
    resolving conflicts.
  • Schema conflicts include naming, structural, and
    semantic conflicts.
  • Schema integration is required for database
    interoperability, but it is currently a manual
    process.

6
MDBS Architecture
Global Transactions
  • Global Transaction Manager (GTM)
  • processes global transactions
  • insures information in all LDBSs is consistent
  • submits subtransactions to the GTSs for each LDBS

GTM
subtransactions
  • Global Transaction Servers (GTSs)
  • one for each LDBS
  • converts subtransactions from the GTM into a form
    usable by the LDBS and vice versa
  • Local Database Systems (LDBSs)
  • databases combined into MDBS
  • unchanged as still process local transactions

Local Transactions
7
Previous Work
  • Research systems
  • integrating systems by logical rules (Sheth)
  • defining global dictionaries (Castano)
  • Carnot Project using the Cyc knowledge base
  • Industrial systems and standards
  • Metadata Interchange Specification (MDIS)
  • XML, BizTalk, E-commerce portals

8
Architecture Objective
  • The objective of our architecture is to provide a
    system for automatically integrating diverse
    relational schemas into a multidatabase
  • Desirable properties
  • individual mappings - information sources
    integrated one-at-a-time and independently
  • global view constructed for query transparency
  • handles schema conflicts - including semantic,
    structural, and naming conflicts
  • automated global integration - global view
    constructed efficiently and automatically

9
The Idea
  • The major idea is that schema conflicts can be
    resolved if we
  • eliminate all naming conflicts
  • define a language capable of determining schema
    equivalence and performing transformations
  • With these two properties, schema conflicts can
    be resolved automatically at the global level

10
Architecture Components The Global Dictionary
  • A global dictionary (GD) provides standardized
    terms to capture data semantics.
  • Hierarchy of terms related by IS-A or Has-A links
  • Contains base set of common database concepts,
    but new concepts can be added
  • A GD term is a single, unambiguous semantic
    definition.
  • Several GD entries for a single English word are
    required if the word has multiple definitions.

11
Architecture ComponentsUsing the Global
Dictionary
  • GD terms are used to build semantic names to
    describe the semantics of schema elements.
  • Semantic names have the form
  • semantic name CT CT ,CT CN
  • CT context term, CN concept name
  • each CT and CN is a single term from the GD
  • Semantic names are included in specifications
    describing a data source.

12
Architecture ComponentsX-Specs
  • Database metadata and semantic names are combined
    into specifications called X-Specs
  • stored and transmitted using XML
  • contains information on a relational schema
  • organized into database, table, and field levels
  • stores semantic names to describe and integrate
    schema elements

13
Architecture ComponentsIntegrating X-Specs
  • Each database to be integrated is described using
    a X-Spec.
  • Identical concepts in different databases are
    identified by similar semantic names.
  • Concepts with identical (or hierarchially
    related) semantic names are combined regardless
    of their physical representation in the
    individual databases.

14
Integration Architecture
  • Our integration architecture consists of two
    separate phases
  • capture process X-Specs are constructed for each
    data source independently
  • integration process X-Specs are combined using
    the integration algorithm which matches semantic
    names using the global dictionary

15
Integration ArchitectureThe Capture Process
  • Capture process involves
  • automatically extracting the schema information
    and metadata using a specification editor
  • assigning semantic names to each schema element
    (tables and fields) to capture their semantics

16
Integration ArchitectureThe Capture Process
Relational Schema
Automatic Extraction
X-Spec
Specification Editor
DBA Lookup of terms
Global Dictionary
17
Integration ArchitectureThe Integration Process
  • Integration process involves
  • automatically identifying identical concepts by
    matching semantic names
  • constructing a global view of database concepts
    consisting of a hierarchy of concept terms
  • resolving structural differences during query
    generation and submission (e.g. a concept may be
    represented as a table in one database and a
    field (attribute) in another)

18
Integration ArchitectureThe Integration Process
.
Client
Client
Integration Site
Subtransactions
X-Spec
X-Spec
..
RDBMS
RDBMS
19
Integration Architecture Benefits
  • The benefits of the two phase architecture are
  • Dynamic integration schemas integrated as needed
  • X-Specs are constructed only once and independent
    of each other
  • Automatic conflict resolution by integrating
    based on semantic name rather than physical
    structure
  • Users are isolated from system names and
    organization by querying through a global view
    using semantic names for concepts

20
Integration Example
  • Two claims databases to be integrated
  • ABC Company Claims_tb(claim_id, claimant,
    net_amount, paid_amount)
  • XYZ Company T_claims(id, customer, claim_amt),
    T_payments(cid, pid, amount)
  • First step is to construct X-Specs for each
    database.

21
Integration ExampleABC Database X-Spec
22
Integration ExampleXYZ Database X-Spec
23
Integration ExampleIntegrated View
  • Global view after integration
  • Claim
  • Id
  • Net amount
  • Customer
  • name
  • Payment
  • id
  • amount

24
Integration ExampleDiscussion
  • Important points
  • system and field names are not presented to the
    user who queries based on semantic names
  • database structure is not shown to the user
  • different physical representations for the same
    concept are combined (e.g. payment (attribute) in
    ABC with payment table in XYZ database)
  • hierarchially related concepts (customer vs.
    claimant) are combined based on their IS-A
    relationship in the global dictionary

25
Applications to the WWW
  • Integrating diverse data sources is involved in
    constructing a data warehouse and other
    operational systems.
  • The WWW is a diverse organizations of databases
    which users access.
  • Automatically integrating web data sources by a
    browser or portal reduces query complexity and
    integration of results for the user.

26
Conclusions
  • Automatic integration of database schema is
    possible by using a global dictionary of terms
    and constructing semantic names for schema
    elements.
  • Integration of data sources has applications to
    the WWW and construction of data warehouses.

27
Future Work
  • The integration architecture is evolving with
    standards on XML and captures metadata
    information in XML documents.
  • The system is being tested on sample problems,
    and a query mechanism is work-in-progress.
  • We are refining a prototype of the system called
    Unity.
Write a Comment
User Comments (0)
About PowerShow.com