Title: DBAI perspectives on SchemaOntology Integration
1DB-AI perspectives on Schema/Ontology Integration
By Cartic Ramakrishnan LSIDS-UGA
2DB Perspective (Kashyap,Sheth - 96)
- Semantic and Schematic similarities between
database objects using context. - Schematic conflicts between objects are of
interest when there is semantic similarity
between them. - How do the authors propose to do this?
- Using the concept of Semantic proximity
- which is essentially an abstraction/mapping
between the domains of the 2 objects associated
with the context of comparison.
3DB Perspective (Kashyap,Sheth - 96)
- They propose an explicit but partial context
representation. - Also define the specificity relationships between
contexts. - Contexts are organized in a meet semi-lattice and
operations such as greatest lower bound are
defined.
4DB Perspective (Kashyap,Sheth - 96)
- At the semantic level the authors represent the
intentional description of the database objects
using description logics. - The terms that are used to construct the contexts
are obtained from domain-specific ontology.
5DB Perspective (Kashyap,Sheth - 96)
- They define schema correspondences to capture the
structural similarities between objects. - They combine the semantic and schematic
similarities by defining the schema
correspondences wrt. a context.
6Semantic proximity
- semPro(O1,O2) (S1,S2)
-
- - The first element denotes context in which
the 2 objects O1 and O2 are being compared. - - The second identifies the abstraction/mapping
between the domains of the 2 objects O1 and O2. - - The third component enumerates the domain
definitions of the objects O1 and O2. - - The fourth component enumerates the states of
the 2 objects O1 and O2 which are extensions of
the objects stored in their respective databases.
7Context Semantic component
- So what is Context?
- Knowledge that is needed to reason about another
system for the purpose of answering a query. - Meaning, content organization and properties of
data. - Modeled using meta-data.
- Can use an ontology to capture context.
8Abstractions/Mappings structural component
- Abstraction refers to the relation between the
domains of the objects. - Mapping between the domains is a mathematical
expression that denotes the abstraction. - Abstractions themselves cannot capture semantic
similarity. Hence they are associated with a
context.
9Some useful well defined abstractions
- Total 1-1 value mapping.
- Partial one way mapping.
- Generalization/Specialization.
- Aggregation.
- Functional dependencies.
- ANY.
- NONE.
10Explicit context representation in a
multi-database environment
- Why?
- For reasoning real-world semantics required.
- Structure not enough!
- Cannot capture real-world semantics.
- Computational benefits
- Economy of representation-focusing mechanism
- Economy of reasoning-reasoning at intentional
level - Handling inconsistent information
- Flexible semantics-different relations in
different contexts.
11Partial context representation
- Meta-attributes dynamically chose can be used to
characterize the semantics of the application
domain. - This leads to a partial representation of the
context as a collection of contextual
coordinates - Context
- Each C corresponds to a role and V corresponds to
a filler for the role the object must have.
12Reasoning about contexts
- Specificity relationship between contexts.
- But it is possible that 2 concepts are not
comparable to each other. Hence using the
Specificity relationship we get a partial
ordering of contexts. - Can Compute the greatest lower bound of 2
concepts. - Get a context meet lattice as a result.
- Operations on the lattice overlap and coherence
- (Refer to the paper KS96 for details)
13Generic schema matching reference - JPE
- Schema matching according to the authors has been
studied in the past as a part of other systems.
E.g.- - To find similar structures between heterogeneous
schemas which are then used as integration points
in mediator architectures.
14Why is schema matching a challenge?
- Structural differences
- Naming differences
- Schemas model similar but not identical content.
- May be expressed in different data models.
- Synonymy and other nimies.
15Why generic schema matching?
- Matching is pervasive and required for several
other systems to work (as said in previous
slides). - Therefore the goal is
- Given 2 input schemas in any data model and,
optionally auxiliary information and an input
mapping, compute a mapping between schema
elements of the 2 input schemas that passes user
validation.
16Taxonomy of schema matching techniques
- Schema vs. Instance based
- Element vs. Structure granularity
- Linguistic approaches
- Constraint based approaches range, type,
cardinality uniqueness, required-ness. - Matching cardinalities
- Auxiliary information thesauri, dictionary,
Ontology? - Individual vs. Combinational
17Cupid approach
- Uses a combination of some of the techniques
shown in previous slide - Linguistic matching
- Element and structure based matching
- Biased towards similarity of atomic elements
where much of the semantics is captured - Exploits internal structure
- Exploits keys, constraints and views
- Makes context dependent matches of a shared type
definition that is used in several large
structures.
18Algorithm for Cupid
- Interconnected elements of schemas are modeled as
a schema tree.
19Algorithm for Cupid (2)
- Phase 1 of the Algorithm involves Linguistic
matching. - It matches individual schema elements based on
their names, data types and domains. - A thesaurus is used to match these elements like
- Qty for Quantity
- UoM for UnitOfMeasure.
20Algorithm for Cupid (3)
- Phase 2 involves structure matching using a
measure called ssim (structural similarity). How
is it computed? - The structural similarity between 2 trees is
estimated as the fraction of the leaves in the
two sub-trees that have at least one strong link
to some leaf in the other tree. -
21Extending to general schema
- Real world schemas dont come in trees
- Generic schema model that captures more semantics
leading to non-tree schemas. - Matching algorithm extended to to use it by
handling shared types and referential
constraints.
22Schema graphs
- In real schemas elements/nodes are interconnected
by three types of relationships - Containment has delete propagation semantics
- Aggregation
- IsDerivedfrom which abstracts IsA and IsTypeOf
relationships to model shared type information.
23Matching shared types
- An element which is a shared type can be the
target of several IsDerivedFrom relations.
24Matching shared types
- In the figure on the previous slide if we change
the PurchaseOrder schema so that the Address
field is a shared attribute element, referenced
by both the DeliverTo and InvoiceTo. Now mappings
will have to qualified using the context in which
Address is being referred to. By converting the
schema graph into a tree all such context
dependent paths are materialized and the tree
matching algorithm can be reused.
25Matching referential constraints
26Matching referential constraints
- Referential constraints are interpreted as
potential join views. - For each foreign key introduce a node that
represents the join of the tables involved. - Advantage of doing this is that it increased the
structural similarity between the two schemas
being matched.
27PROMPT Ontology merging and alignment tool
- Distinction between Ontology merging and
alignment - Merging results in a single coherent Ontology.
- Making one Ontology coherent with the other but
keeping them separate.
28Algorithm for PROMPT
- The underlying knowledge model for PROMPT is the
frame-based model and has been designed to be
compatible with OKBC (Chaudhari et al. 1998).
This model has - Classes collection of objects with similar
properties. Hierarchies multiple inheritance. - Slots named binary relations
- Facets named ternary relations between a class,
a slot and either a class or a primitive object. - Instances are individual class members.
29Semi-automated approach of PROMPT
30My idea??