DBAI perspectives on SchemaOntology Integration - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

DBAI perspectives on SchemaOntology Integration

Description:

Cupid approach. Uses a combination of some of the techniques shown in ... Algorithm for Cupid. Interconnected elements of schemas are modeled as ... Cupid (3) ... – PowerPoint PPT presentation

Number of Views:110
Avg rating:3.0/5.0
Slides: 31
Provided by: rcar4
Category:

less

Transcript and Presenter's Notes

Title: DBAI perspectives on SchemaOntology Integration


1
DB-AI perspectives on Schema/Ontology Integration
By Cartic Ramakrishnan LSIDS-UGA
2
DB Perspective (Kashyap,Sheth - 96)
  • Semantic and Schematic similarities between
    database objects using context.
  • Schematic conflicts between objects are of
    interest when there is semantic similarity
    between them.
  • How do the authors propose to do this?
  • Using the concept of Semantic proximity
  • which is essentially an abstraction/mapping
    between the domains of the 2 objects associated
    with the context of comparison.

3
DB Perspective (Kashyap,Sheth - 96)
  • They propose an explicit but partial context
    representation.
  • Also define the specificity relationships between
    contexts.
  • Contexts are organized in a meet semi-lattice and
    operations such as greatest lower bound are
    defined.

4
DB Perspective (Kashyap,Sheth - 96)
  • At the semantic level the authors represent the
    intentional description of the database objects
    using description logics.
  • The terms that are used to construct the contexts
    are obtained from domain-specific ontology.

5
DB Perspective (Kashyap,Sheth - 96)
  • They define schema correspondences to capture the
    structural similarities between objects.
  • They combine the semantic and schematic
    similarities by defining the schema
    correspondences wrt. a context.

6
Semantic proximity
  • semPro(O1,O2) (S1,S2)
  • - The first element denotes context in which
    the 2 objects O1 and O2 are being compared.
  • - The second identifies the abstraction/mapping
    between the domains of the 2 objects O1 and O2.
  • - The third component enumerates the domain
    definitions of the objects O1 and O2.
  • - The fourth component enumerates the states of
    the 2 objects O1 and O2 which are extensions of
    the objects stored in their respective databases.

7
Context Semantic component
  • So what is Context?
  • Knowledge that is needed to reason about another
    system for the purpose of answering a query.
  • Meaning, content organization and properties of
    data.
  • Modeled using meta-data.
  • Can use an ontology to capture context.

8
Abstractions/Mappings structural component
  • Abstraction refers to the relation between the
    domains of the objects.
  • Mapping between the domains is a mathematical
    expression that denotes the abstraction.
  • Abstractions themselves cannot capture semantic
    similarity. Hence they are associated with a
    context.

9
Some useful well defined abstractions
  • Total 1-1 value mapping.
  • Partial one way mapping.
  • Generalization/Specialization.
  • Aggregation.
  • Functional dependencies.
  • ANY.
  • NONE.

10
Explicit context representation in a
multi-database environment
  • Why?
  • For reasoning real-world semantics required.
  • Structure not enough!
  • Cannot capture real-world semantics.
  • Computational benefits
  • Economy of representation-focusing mechanism
  • Economy of reasoning-reasoning at intentional
    level
  • Handling inconsistent information
  • Flexible semantics-different relations in
    different contexts.

11
Partial context representation
  • Meta-attributes dynamically chose can be used to
    characterize the semantics of the application
    domain.
  • This leads to a partial representation of the
    context as a collection of contextual
    coordinates
  • Context
  • Each C corresponds to a role and V corresponds to
    a filler for the role the object must have.

12
Reasoning about contexts
  • Specificity relationship between contexts.
  • But it is possible that 2 concepts are not
    comparable to each other. Hence using the
    Specificity relationship we get a partial
    ordering of contexts.
  • Can Compute the greatest lower bound of 2
    concepts.
  • Get a context meet lattice as a result.
  • Operations on the lattice overlap and coherence
  • (Refer to the paper KS96 for details)

13
Generic schema matching reference - JPE
  • Schema matching according to the authors has been
    studied in the past as a part of other systems.
    E.g.-
  • To find similar structures between heterogeneous
    schemas which are then used as integration points
    in mediator architectures.

14
Why is schema matching a challenge?
  • Structural differences
  • Naming differences
  • Schemas model similar but not identical content.
  • May be expressed in different data models.
  • Synonymy and other nimies.

15
Why generic schema matching?
  • Matching is pervasive and required for several
    other systems to work (as said in previous
    slides).
  • Therefore the goal is
  • Given 2 input schemas in any data model and,
    optionally auxiliary information and an input
    mapping, compute a mapping between schema
    elements of the 2 input schemas that passes user
    validation.

16
Taxonomy of schema matching techniques
  • Schema vs. Instance based
  • Element vs. Structure granularity
  • Linguistic approaches
  • Constraint based approaches range, type,
    cardinality uniqueness, required-ness.
  • Matching cardinalities
  • Auxiliary information thesauri, dictionary,
    Ontology?
  • Individual vs. Combinational

17
Cupid approach
  • Uses a combination of some of the techniques
    shown in previous slide
  • Linguistic matching
  • Element and structure based matching
  • Biased towards similarity of atomic elements
    where much of the semantics is captured
  • Exploits internal structure
  • Exploits keys, constraints and views
  • Makes context dependent matches of a shared type
    definition that is used in several large
    structures.

18
Algorithm for Cupid
  • Interconnected elements of schemas are modeled as
    a schema tree.

19
Algorithm for Cupid (2)
  • Phase 1 of the Algorithm involves Linguistic
    matching.
  • It matches individual schema elements based on
    their names, data types and domains.
  • A thesaurus is used to match these elements like
  • Qty for Quantity
  • UoM for UnitOfMeasure.

20
Algorithm for Cupid (3)
  • Phase 2 involves structure matching using a
    measure called ssim (structural similarity). How
    is it computed?
  • The structural similarity between 2 trees is
    estimated as the fraction of the leaves in the
    two sub-trees that have at least one strong link
    to some leaf in the other tree.

21
Extending to general schema
  • Real world schemas dont come in trees
  • Generic schema model that captures more semantics
    leading to non-tree schemas.
  • Matching algorithm extended to to use it by
    handling shared types and referential
    constraints.

22
Schema graphs
  • In real schemas elements/nodes are interconnected
    by three types of relationships
  • Containment has delete propagation semantics
  • Aggregation
  • IsDerivedfrom which abstracts IsA and IsTypeOf
    relationships to model shared type information.

23
Matching shared types
  • An element which is a shared type can be the
    target of several IsDerivedFrom relations.

24
Matching shared types
  • In the figure on the previous slide if we change
    the PurchaseOrder schema so that the Address
    field is a shared attribute element, referenced
    by both the DeliverTo and InvoiceTo. Now mappings
    will have to qualified using the context in which
    Address is being referred to. By converting the
    schema graph into a tree all such context
    dependent paths are materialized and the tree
    matching algorithm can be reused.

25
Matching referential constraints
26
Matching referential constraints
  • Referential constraints are interpreted as
    potential join views.
  • For each foreign key introduce a node that
    represents the join of the tables involved.
  • Advantage of doing this is that it increased the
    structural similarity between the two schemas
    being matched.

27
PROMPT Ontology merging and alignment tool
  • Distinction between Ontology merging and
    alignment
  • Merging results in a single coherent Ontology.
  • Making one Ontology coherent with the other but
    keeping them separate.

28
Algorithm for PROMPT
  • The underlying knowledge model for PROMPT is the
    frame-based model and has been designed to be
    compatible with OKBC (Chaudhari et al. 1998).
    This model has
  • Classes collection of objects with similar
    properties. Hierarchies multiple inheritance.
  • Slots named binary relations
  • Facets named ternary relations between a class,
    a slot and either a class or a primitive object.
  • Instances are individual class members.

29
Semi-automated approach of PROMPT
30
My idea??
  • Design methodology.
Write a Comment
User Comments (0)
About PowerShow.com