A Semantic Approach to XMLBased Data Integration - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

A Semantic Approach to XMLBased Data Integration

Description:

A Semantic Approach to XML-Based Data Integration ... XML and Telos. DIXSE Framework. Consists of a data model and a derivation mechanism. ... – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 39
Provided by: yil6
Category:

less

Transcript and Presenter's Notes

Title: A Semantic Approach to XMLBased Data Integration


1
A Semantic Approach to XML-Based Data Integration
  • A paper by Patricia Rodríguez-Gianolli and John
    Mylopoulos
  • in University of Toronto
  • Speaker Yi Lu
  • Wayne State University

2
Outline
  • Introduction
  • The DIXSE framework
  • The mapping language
  • A case study
  • System architecture of DIXSE.
  • Conclusion

3
Introduction
  • Integrating data from multiple heterogeneous data
    sources has been a major focus of database
    research for more than two decades.
  • With the widespread acceptance of Web, interest
    in data integration has been renewed, with focus
    on semi-structured data.
  • Little has been proposed for data integration of
    XML documents.

4
Introduction
  • Two data integration category traditional schema
    integration and semi-structured data integration.
    The key to successful data integration is the
    identification of inter-schema relationships.
  • Traditional schema integration the
    identification of inter-schema relationships can
    be done at different levels of abstraction.
  • Inter-schema relationship identification is done
    differently in data integration systems for
    semi-structured data because it lack of schema.

5
Introduction
  • DIXSE stands for Data Integration for Xml based
    on Schematic knowlEdge. It blends techniques
    from conventional and semi-structured data
    integration systems. It provide a semi-automatic
    integration for XML data.
  • The main step of DIXSE
  • DIXSE semi-automatically deriving a common
    semantic description from the input DTDs, and
    allows user to enrich and fine-tune this
    description.
  • DIXSE automatically generates wrappers for XML
    documents that conform these DTDs and populates
    the conceptual schema.

6
System Information Flow in DIXSE
7
XML and Telos
8
DIXSE Framework
  • Consists of a data model and a derivation
    mechanism.
  • The data model supports concepts such as entity
    classes, attributes, and mappings for
    representing conceptual schemas as a collection
    of interrelated entity classes.
  • The mechanism exploits the schema information
    provided by the DTD and generates a DIXSE
    conceptual schema as output. A set of heuristics
    rules drives the derivation process.

9
DIXSE Framework Data Model
  • representing conceptual schemas as collections of
    entity types and their attributes.
  • model supports four main concepts entity
    classes, entity attributes, mappings and document
    types.
  • Entity classes represent types of objects or
    concepts found in XML DTDs.
  • In addition, the data model offers two
    structuring facilities to capture the semantics
    of entity attributes.
  • attribute categories
  • attribute constraints

10
Attribute Categories
  • There are three types of attributes (or
    categories), namely components, properties and
    links.
  • attribute is a component when it represents the
    content (or structure) of an entity
  • it is a property when it represents information
    about the content of an entity
  • and it is a link when it represents
    intra-document or inter-document information.

11
Attribute Constraints
  • These constraints are inspired on the constraints
    that XML itself imposes on elements and
    attributes.
  • exactlyOne
  • atMostOne
  • zeroOrMore
  • oneOrMore
  • union
  • fixed
  • idref
  • xLink
  • key

12
Mappings and Documents
  • A mapping in the XML Framework describes a
    conceptual schema of the information represented
    by a given XML DTD, typically authored for a
    given context.
  • a document type describes a given XML DTD and a
    collection of mappings (i.e. conceptual schemas)
    attached to it. Contexts are represented in the
    data model as distinguishing attributes (string
    names) of document types, mappings and entity
    classes.

13
DIXSE Framework Mechanism
  • The DIXSE framework offers a mechanism to derive
    a default conceptual schema of the information
    represented by an XML DTD. This mapping is purely
    reasoned on the basis of the schematic knowledge
    offered by DTDs, and thus captures the semantics
    conveyed by the data only partially.

14
DIXSE Framework Mechanism (Cont)
15
DIXSE Framework Mechanism -- Default Mapping
Rule 1 (DR1)
16
DIXSE Framework Mechanism -- Default Mapping
Rule 2 (DR2)
17
DIXSE Framework Mechanism -- Default Mapping
Rule 3 (DR3)
18
DIXSE Framework Mechanism -- Default Mapping
Rule 4 (DR4)
19
DIXSE Framework Mechanism --Default Mapping
Rule 5 (DR5)
20
DIXSE Framework Mechanism -- Default Mapping
Rule 6 (DR6)
21
DIXSE Framework Mechanism -- Default Mapping
Rule 7 (DR7)
22
Default Mapping for Sigmod Record
23
The Mapping Language
  • DIXSE offers the possibility of customizing the
    default mapping --- DIXml. DIXml present a simple
    mapping language that allow us to write mapping
    specifications.
  • DIXml is a declarative mapping language for
    specifying a mapping or conceptual schema of the
    information represented by a given XML DTD. This
    specification annotates a DTD with simple
    instructions for generating entity classes from
    DTD element type declarations.

24
The Mapping Language (Cont)
  • DIXml is also a XML, and provides its own
    vocabulary to describe DIXSE mappings. Two main
    elements are directive and DIXSEmapping.
  • directive represents a DIXml directive rule,
    where the target is the value of the directive
    elements attribute and the action is the
    elements contents.
  • DIXSEmapping, represents the mapping itself by
    encompassing the collection of specified
    directive rules.

25
The Mapping Language (Cont)
  • A directive consists of two parts the target
    element and the action body. The first one
    identifies the XML element addressed by the rule.
    The action body describes how this target element
    should be mapped into a DIXSE conceptual
    representation. There are five different
    directive actions, namely default, create-class,
    create-attribute, inline, and ignore.

26
The Mapping Language (Cont)
27
The Mapping Language (Cont) -- The Default
Directive Action
28
The Mapping Language (Cont) -- The
Create-class Directive Action
29
The Mapping Language (Cont) -- The
Create-class Directive Action
30
The Mapping Language (Cont) -- The
Create-attribute Directive Action
31
The Mapping Language (Cont) -- The Inline
Directive Action
32
The Mapping Language (Cont) -- The Ignore
Directive Action
33
A Case Study (Cont)
34
A Case Study (Cont)
35
A Case Study (Cont)
36
System Architecture
37
System Architecture (Cont)
  • Schema Engine and Document Loader
  • Schema Engine subsystem includes five components
    the DTD parser, the XML parser, the Schema
    Derivator, the Schema Generator and the XSL
    Wrapper Generator.
  • the Document Loader consists of the XSL Processor
    and the Data Integrator.
  • The communication between these two subsystems is
    accomplished through the Catalog Manager and the
    XSL Wrapper Repository.

38
Conclusion
  • The paper proposes a semantic framework for XML
    data integration called DIXSE. The DIXSE
    framework offers a tool, which can be used
    semi-automatically to generate a conceptual
    schema from several XML DTDs.
  • DIXSE differs from other data integration
    systems
  • Exploits the structural information provided by
    DTDs
  • It is based on schema integration like
    conventional data integration systems, but allows
    the user to enrich and fine-tune the schematic
    knowledge.
  • It employs a specialized object-based repository
    to store an integrated and semantically richer
    version of data.
Write a Comment
User Comments (0)
About PowerShow.com