DBAI perspectives on SchemaOntology Integration - PowerPoint PPT Presentation

1 / 30

About This Presentation

Title:

DBAI perspectives on SchemaOntology Integration

Description:

Cupid approach. Uses a combination of some of the techniques shown in ... Algorithm for Cupid. Interconnected elements of schemas are modeled as ... Cupid (3) ... – PowerPoint PPT presentation

Number of Views:110

Avg rating:3.0/5.0

Slides: 31

Provided by: rcar4

Category:

more less

Transcript and Presenter's Notes

Title: DBAI perspectives on SchemaOntology Integration

1
DB-AI perspectives on Schema/Ontology Integration
By Cartic Ramakrishnan LSIDS-UGA
2
DB Perspective (Kashyap,Sheth - 96)

Semantic and Schematic similarities between
database objects using context.
Schematic conflicts between objects are of
interest when there is semantic similarity
between them.
How do the authors propose to do this?
Using the concept of Semantic proximity
which is essentially an abstraction/mapping
between the domains of the 2 objects associated
with the context of comparison.

3
DB Perspective (Kashyap,Sheth - 96)

They propose an explicit but partial context
representation.
Also define the specificity relationships between
contexts.
Contexts are organized in a meet semi-lattice and
operations such as greatest lower bound are
defined.

4
DB Perspective (Kashyap,Sheth - 96)

At the semantic level the authors represent the
intentional description of the database objects
using description logics.
The terms that are used to construct the contexts
are obtained from domain-specific ontology.

5
DB Perspective (Kashyap,Sheth - 96)

They define schema correspondences to capture the
structural similarities between objects.
They combine the semantic and schematic
similarities by defining the schema
correspondences wrt. a context.

6
Semantic proximity

semPro(O1,O2) (S1,S2)
- The first element denotes context in which
the 2 objects O1 and O2 are being compared.
- The second identifies the abstraction/mapping
between the domains of the 2 objects O1 and O2.
- The third component enumerates the domain
definitions of the objects O1 and O2.
- The fourth component enumerates the states of
the 2 objects O1 and O2 which are extensions of
the objects stored in their respective databases.

7
Context Semantic component

So what is Context?
Knowledge that is needed to reason about another
system for the purpose of answering a query.
Meaning, content organization and properties of
data.
Modeled using meta-data.
Can use an ontology to capture context.

8
Abstractions/Mappings structural component

Abstraction refers to the relation between the
domains of the objects.
Mapping between the domains is a mathematical
expression that denotes the abstraction.
Abstractions themselves cannot capture semantic
similarity. Hence they are associated with a
context.

9
Some useful well defined abstractions

Total 1-1 value mapping.
Partial one way mapping.
Generalization/Specialization.
Aggregation.
Functional dependencies.
ANY.
NONE.

10
Explicit context representation in a
multi-database environment

Why?
For reasoning real-world semantics required.
Structure not enough!
Cannot capture real-world semantics.
Computational benefits
Economy of representation-focusing mechanism
Economy of reasoning-reasoning at intentional
level
Handling inconsistent information
Flexible semantics-different relations in
different contexts.

11
Partial context representation

Meta-attributes dynamically chose can be used to
characterize the semantics of the application
domain.
This leads to a partial representation of the
context as a collection of contextual
coordinates
Context
Each C corresponds to a role and V corresponds to
a filler for the role the object must have.

12
Reasoning about contexts

Specificity relationship between contexts.
But it is possible that 2 concepts are not
comparable to each other. Hence using the
Specificity relationship we get a partial
ordering of contexts.
Can Compute the greatest lower bound of 2
concepts.
Get a context meet lattice as a result.
Operations on the lattice overlap and coherence
(Refer to the paper KS96 for details)

13
Generic schema matching reference - JPE

Schema matching according to the authors has been
studied in the past as a part of other systems.
E.g.-
To find similar structures between heterogeneous
schemas which are then used as integration points
in mediator architectures.

14
Why is schema matching a challenge?

Structural differences
Naming differences
Schemas model similar but not identical content.
May be expressed in different data models.
Synonymy and other nimies.

15
Why generic schema matching?

Matching is pervasive and required for several
other systems to work (as said in previous
slides).
Therefore the goal is
Given 2 input schemas in any data model and,
optionally auxiliary information and an input
mapping, compute a mapping between schema
elements of the 2 input schemas that passes user
validation.

16
Taxonomy of schema matching techniques

Schema vs. Instance based
Element vs. Structure granularity
Linguistic approaches
Constraint based approaches range, type,
cardinality uniqueness, required-ness.
Matching cardinalities
Auxiliary information thesauri, dictionary,
Ontology?
Individual vs. Combinational

17
Cupid approach

Uses a combination of some of the techniques
shown in previous slide
Linguistic matching
Element and structure based matching
Biased towards similarity of atomic elements
where much of the semantics is captured
Exploits internal structure
Exploits keys, constraints and views
Makes context dependent matches of a shared type
definition that is used in several large
structures.

18
Algorithm for Cupid

Interconnected elements of schemas are modeled as
a schema tree.

19
Algorithm for Cupid (2)

Phase 1 of the Algorithm involves Linguistic
matching.
It matches individual schema elements based on
their names, data types and domains.
A thesaurus is used to match these elements like
Qty for Quantity
UoM for UnitOfMeasure.

20
Algorithm for Cupid (3)

Phase 2 involves structure matching using a
measure called ssim (structural similarity). How
is it computed?
The structural similarity between 2 trees is
estimated as the fraction of the leaves in the
two sub-trees that have at least one strong link
to some leaf in the other tree.

21
Extending to general schema

Real world schemas dont come in trees
Generic schema model that captures more semantics
leading to non-tree schemas.
Matching algorithm extended to to use it by
handling shared types and referential
constraints.

22
Schema graphs

In real schemas elements/nodes are interconnected
by three types of relationships
Containment has delete propagation semantics
Aggregation
IsDerivedfrom which abstracts IsA and IsTypeOf
relationships to model shared type information.

23
Matching shared types

An element which is a shared type can be the
target of several IsDerivedFrom relations.

24
Matching shared types

In the figure on the previous slide if we change
the PurchaseOrder schema so that the Address
field is a shared attribute element, referenced
by both the DeliverTo and InvoiceTo. Now mappings
will have to qualified using the context in which
Address is being referred to. By converting the
schema graph into a tree all such context
dependent paths are materialized and the tree
matching algorithm can be reused.

25
Matching referential constraints
26
Matching referential constraints

Referential constraints are interpreted as
potential join views.
For each foreign key introduce a node that
represents the join of the tables involved.
Advantage of doing this is that it increased the
structural similarity between the two schemas
being matched.

27
PROMPT Ontology merging and alignment tool

Distinction between Ontology merging and
alignment
Merging results in a single coherent Ontology.
Making one Ontology coherent with the other but
keeping them separate.

28
Algorithm for PROMPT

The underlying knowledge model for PROMPT is the
frame-based model and has been designed to be
compatible with OKBC (Chaudhari et al. 1998).
This model has
Classes collection of objects with similar
properties. Hierarchies multiple inheritance.
Slots named binary relations
Facets named ternary relations between a class,
a slot and either a class or a primitive object.
Instances are individual class members.

29
Semi-automated approach of PROMPT
30
My idea??