MDBS Schema Integration: The Relational Integration Model - PowerPoint PPT Presentation

About This Presentation
Title:

MDBS Schema Integration: The Relational Integration Model

Description:

MDBS Schema Integration: The Relational Integration Model – PowerPoint PPT presentation

Number of Views:224
Avg rating:3.0/5.0
Slides: 26
Provided by: RamonLa4
Category:

less

Transcript and Presenter's Notes

Title: MDBS Schema Integration: The Relational Integration Model


1
MDBS Schema Integration The Relational
Integration Model
Candidacy Exam Presentation for Ramon
Lawrence University of Manitoba umlawren_at_cs.umanit
oba.ca
2
Outline
  • Introduction
  • The MDBS architecture and the Integration problem
  • A schema integration taxonomy
  • Previous Work
  • The RIM Architecture and the RIM Model
  • Future work and conclusions

3
Database Terminology
  • database system - a database and a system to
    manage the data
  • transaction - an atomic sequence of operations
    applied to the database
  • global transaction - a transaction spanning more
    than one database
  • schema integration - the process of combining
    local schemas into a global, integrated schema
  • multidatabase system (MDBS) - a collection of
    autonomous, local databases participating in a
    global database system to share data

4
MDBS Architecture
Global Transactions
  • Global Transaction Manager (GTM)
  • processes global transactions
  • insures information in all LDBSs is consistent
  • submits subtransactions to the GTSs for each LDBS

GTM
subtransactions
  • Global Transaction Servers (GTSs)
  • one for each LDBS
  • converts subtransactions from the GTM into a form
    usable by the LDBS and vice versa
  • Local Database Systems (LDBSs)
  • databases combined into MDBS
  • unchanged as still process local transactions

Local Transactions
5
The Integration Problem
  • Integrating diverse data sources is an important
    issue as organizations interconnect their
    operations and demand more from their database
    systems
  • Integration is a hard problem because structural
    and semantic conflicts exist
  • Two levels of integration
  • schema integration
  • data integration

6
Schema Integration
  • Schema integration is the process of combining
    database schemas into a coherent global view
  • Integration problems include
  • different data models
  • incompatible concept representations
  • different user or view perspectives
  • structural conflicts within a model
  • naming conflicts (homonym, synonym)

7
A Schema Integration Taxonomy
Automation Level
automatic (dynamic)
automatic (static)
semi-automatic
manual
Conflicts Resolved
interschema
all
naming
NONE
structural
semantic
structural
behavioral
both
Transparency
8
Previous Work
  • semantic models
  • Batini (86), canonical models, SDM, DAPLEX
  • schema re-engineering
  • model mapping tools, schema transformations
  • metadata systems
  • rule-based systems
  • object-oriented methods
  • use as a canonical model, schema transformations
  • application-level integration
  • language systems, MSQL, IDL, higher order views

9
Previous Work (cont.)
  • Interdatabase dependencies
  • Sheth - relaxed consistency, integration rules
  • AI techniques
  • Pegasus (spheres of knowledge), knowledge
    packets, Carnot project (Cyc knowledge base)
  • Lexical semantics
  • Summary Schemas Model (Bright et al.) - user
    interface for imprecise queries
  • Industrial systems
  • Interbase

10
RIM Objective
  • The objective of the RIM model is to provide a
    system for automatically integrating diverse
    relational schemas into a multidatabase
  • Desirable properties
  • individual mappings - information sources
    integrated one-at-a-time and independently
  • global view constructed for query transparency
  • handles schema conflicts - including semantic,
    structural, and naming conflicts
  • automated global integration - global view
    constructed efficiently and automatically

11
RIM The Idea
  • The idea behind the RIM model is that most (and
    probably all) schema conflicts can be resolved if
    we
  • eliminate all naming conflicts
  • define a language capable of determining schema
    equivalence and performing transformations
  • With these two properties, schema conflicts can
    be resolved automatically at the global level

12
RIM The Plan
  • The first task is eliminating naming conflicts
  • use a global thesaurus/dictionary like SSM
  • map local schema names into global counterparts
  • identical concepts can be identified by global
    name
  • The integration language must be defined
  • RIM specifications - records capturing semantics
    of each LDBS in a machine-processable form
  • global names captured in RIM specs. to identify
    concepts stored in LDBS

13
RIM The Plan (cont.)
  • Integrate RIM specifications
  • To query the MDBS, the client downloads and
    integrates only RIM specs. of LDBSs accessed
  • Global view is constructed from RIM specs. by
    automatically combining them at client site using
    global names and semantic metadata they contain
  • Use of global names allows system to determine
    identical concepts even though structural
    representations may be different
  • Semantic information captured using metadata

14
RIM The Plan (cont.)
  • Querying the MDBS
  • queries are posed to the MDBS through the global
    view at each client
  • translation from the GV back to the original RIM
    spec. for each LDBS is performed
  • the translated queries are sent to each LDBS
    which transforms the query (specified using RIM)
    into a query for the LDBS
  • results are returned to the client which
    integrates them based on its GV

15
RIM Architecture
  • RIM Specifications
  • constructed at each RDBS
  • local concepts mapped to global names
  • schema can be automatically extracted
  • RIM Integration
  • uses needed RIM specs.
  • constructs global view
  • resolves conflicts by
  • identifying concepts using global names
  • transforming concepts into a form consistent
    with the global view

16
RIM Using Global Names
  • Global names attempt to capture semantics of data
    and its structure
  • Research has found that a single dictionary term
    is insufficient to capture all semantics of a
    given data item
  • Current proposed global name term
  • context term concept name (adjective
    phrases)
  • adjective phrase adjective preposition
    (context term or concept name)

17
RIM Using Global Names (cont.)
  • Here a few examples of using global names
  • the database stores damage claim information
  • Example 1
  • attribute of claim is called net_amount in system
  • GN Claim Net Amount
  • Example 2
  • attribute of claim is called claim_date in system
  • GN1 Claim Claim date (received by system)
  • GN2 Claim Claim date (received by company)
  • GN3 Claim Claim date (submitted by claimant)

18
RIM The Global Dictionary
  • To match concepts across systems, a global
    dictionary is required. Global names are taken
    from this dictionary.
  • Currently developing a simplified on-line
    dictionary
  • stores hierarchy (IS-A) relationships and
    component (Part-of) relationships
  • global terms for RIM are taken from the
    dictionary
  • dictionary will allow user-defined words
  • Future work involves determining how to add
    locally defined terms into the dictionary if
    required

19
RIM Basic Concepts
  • There are 3 basic modeling constructs in RIM
  • entity - a concept whose existence does not
    depend on any other entities
  • relationship - a combination of two or more
    entities which does not exists without them
  • attribute - a characteristic of an entity or a
    relationship
  • All entities and attributes should be
    identifiable by a global name from the dictionary.

20
RIM RIM Specifications
  • A RIM specification consists of two parts
  • table headers - table-level information for each
    relation in database
  • table schemas - information at the attribute
    level of a database relation
  • Most of the information can be automatically
    extracted, however the DBA must assign global
    names to local concepts manually

21
RIM The Table Header
  • The table header provides table-level information
    for each relation and has fields
  • name - unique table name (local)
  • record size and count
  • foreign key list and foreign key access list
  • record insert/delete/update mechanisms
  • record name - semantic name for a table record
  • record type - entity, relationship instance, ...
  • record grouping - why are records in the table?
  • record distinction/duplicates - primary key
  • table comment

22
RIM The Table Schema
  • The table schema contains attribute-level
    information. Some fields include
  • field name - database system name
  • semantic name - global name
  • field use
  • attribute, key, categorization, summation,
    date/time, foreign key, logical, numeric,
    reference

23
RIM Semantic Conflicts
  • There are 6 basic semantic conflicts in RIM
  • attribute-entity conflict
  • attribute-relationship conflict
  • entity-relationship conflict
  • entity-entity conflict (not studied)
  • attribute-attribute conflict (not studied)
  • relationship-relationship conflict (not studied)
  • There is some basic ideas on how to automatically
    resolve the first 3 conflicts.
  • Conflict resolution is an area of future work.

24
Conclusions
  • Current integration methodologies are
    insufficient because they rely on manual
    intervention and do not resolve all types of
    conflicts
  • The RIM model may be able to integrate diverse
    relational schemas using a global dictionary, a
    systematic method for capturing data semantics,
    and automated procedures for performing client
    run-time integration

25
Future Work
  • Determining how the RIM specifications can be
    constructed and what information can be
    automatically extracted
  • Deciding the format for the global dictionary
  • Studying conflict resolution procedures and
    testing methodology on simple integration problems
Write a Comment
User Comments (0)
About PowerShow.com