Title: OntologyBased Integration of Information A Survey of Existing Approaches
1Ontology-Based Integration of Information A
Survey of Existing Approaches
- H.Wache, T.Vogele, U.Visser, H.Stuckenschmidt,
G.Schuster, H.Neumann and S.Hubner - University of Bremen
2Motivation
- Vast information is available on the WWW
- Growing need for
- Finding relevant information (Information
Extraction) - Creating new knowledge out of the available
information (Web Content Mining) - Personalization of the web
- Learning about customers or individual users (Web
Usage Mining)
3Issues
- Information is widely distributed and
heterogeneous - Schema discovery
- Wrapping
- Reorganizing data sources
- Coping with changes in sources
4Issues (cont.)
- Structural (schematic) heterogeneity
- Data is stored in different structures across the
information systems - Semantic (data) heterogeneity
- Considers the content of an information and its
intended meaning - Causes
- Confounding conflicts
- Items that have the same meaning but differ in
reality - Scaling conflicts
- Different reference systems are used to measure a
value (Eu, ) - Naming conflicts
- Naming schemes of similar items differ
significantly
5Solution - Ontologies
- Refers to shared understanding of a domain of
interest which may be used as a unifying
framework - Embodies some sort of world view with respect to
a given domain - World view is conceived as
- Set of concepts (entities, attributes, processes)
- Definitions
- Inter-relationships
- This is referred to as conceptualization
6Ontologies (cont.)
- consensual, shared and formal description of the
concepts that are important in a given domain - identifies classes of objects that are important
in a domain and organizes these classes in a
subclass hierarchy - each class is characterized by properties shared
by all elements in that class - important relations between classes or between
the elements of the classes are also part of an
ontology
7Ontology example
8Objective
- Evaluate the use of ontologies in information
integration systems - SIMS, TSIMMIS, OBXERVER, CARNOT, Infosleuth,
KRAFT, PICSEL, DWQ, Ontobroker, SHOE
9Criteria for evaluating approaches
- Use of ontologies
- Purpose of using ontologies
- Architecture of ontologies used
- Ontology representation
- Kind of languages used to represent ontologies
- General structure of ontologies
- Use of mappings
- How information is mapped to ontologies
- Inter-ontology mapping
- Ontology engineering
- Support for development of ontolgies
- Support for evolution of ontologies
- Supporting tools
10Outline
- 1) Motivation
- 2) Issues
- 3) Brief introduction to Ontology
- 4) Objective
- 5) Role of Ontologies
- 6) Ontology Representation
- 7) Ontology Mappings
- 8) Ontology Engineering
115. Role of Ontologies
- Content explication
- Ontologies are used for the explicit description
of the information source - Approaches
- Single ontology
- Multiple ontology
- Hybrid ontology
- Query model
- Verification (query containment)
125.1 Single Ontology Approach
- SIMS
- One global ontology
- Hierarchical terminological database
- Combination of several specialized ontolgies
- (for modularization)
- Can be used when all information sources to be
integrated provide nearly the same view on a
domain - Minimal ontology commitment
- Susceptible to changes in the information sources
135.2 Multiple Ontologies
- OBSERVER
- Each information source is described by its own
ontology (source ontology) - No shared vocabulary
- No common and minimal ontology commitment is
needed - Simplifies integration and supports changes in
sources - Difficult to compare different source ontologies
- Inter-ontology mapping is needed
145.3 Hybrid Ontologies
- COIN
- Semantics of each source is described by its own
ontology - Built from a a global shared vocabulary
- Shared vocabulary contains basic terms of a
domain - New sources can easily be added
- Supports acquisition and evolution of ontologies
- Source ontologies are comparable because of
shared vocabulary - Existing ontologies can not easily be reused, but
have to be redeveloped from scratch
155.4 Query Model
- Integrated global view
- Global query schema
- User formulates query in terms of the ontology
- System reformulates queries in terms of
sub-queries for each source - Structure of the query model should be more
intuitive for the user
165.5 Verification
- mappings from a global schema to the local source
schema - Automatic verification
- Query containment
- Ontology concepts corresponding to the local
sub-queries are contained in the ontology
concepts related to the global query
176. Ontology Representations
- Kind of languages used and general structures
that can be found - Description Logics
- Frame-Based systems
- Formal Concept Analysis
- Object Languages
- Annotated Logics
186.1 Ontology Representations - cont
- Description Logics Formal semantics
reasoning - CLASSIC, GRAIL, LOOM, OIL
-
- Describe knowledge in terms of concepts and role
restrictions - Derive classification hierarchies automatically
from concepts and role restrictions - Decidability and completeness guarantee that
reasoning algorithm always terminate with correct
answers - Reasoning tasks satisfiability, subsumption
(is-a), instance checking, classification
196.2 Ontology Representations - cont
- Frame-based systems
- OKBC, Ontolingua, F-Logic
- Frame is a structure for representing a concept
or situation - Frames are composed of slots (attributes) for
which fillers (values) have to be specified - Properties and restrictions can be provided for
fillers - DLs are descendants of frame-based systems
- Classes (objects/concepts), roles
(attributes/properties)
206.3 Ontology Representations cont.
- Formal concept analysis
- Based on the calculation of a common concept
hierarchy for different information sources - limited expressiveness
- Object Languages
- designed for specific needs
- used in geographic domain
- provides solution for integration of spatial and
thematic information - Annotated Logics
- used to resolve conflicts
- eg. KAMEL
217. Mappings Connecting to Information Sources
- Relate the ontologies to the actual content of an
information source - Approaches
- Structure resemblance
- Produce a one-to-one copy of the structure of
the database and encode it in a language that
makes automated reasoning possible - Definition of terms
- Use ontology to define terms from the database
or the database scheme
227.1 Mappings (cont.)
- Structure enrichment (most common)
- A logical model is built that resembles the
structure of the information source and contains
additional definitions and concepts - Can be done using DLs
- Meta-annotation
- Add semantic information to an information
source - ontobroker, SHOE
237.2 Inter-Ontological Mapping
- Defined Mappings (KRAFT)
- special customized mediator agents
- Great flexibility
- Fails to ensure a preservation of semantics - no
verification - Lexical Relations (OBSERVER)
- Extend a common DL model by quantified
inter-ontology relationships - Synonym, hypernym, overlap, covering, disjoint
- Do not have formal semantics
247.2 Inter-Ontology Mapping (cont.)
- Top-level grounding (DWQ)
- Relate all ontolgies used to a single top-level
ontology - Inheriting concepts from a common top-level
ontology - Can resolve conflicts and ambiguities
- Semantic correspondences
- Rely on a common vocabulary
- Uses semantic labels in order to compute
correspondences - Subsumption reasoning can be used to establish
relations between different terminolgies
258. Ontological Engineering
- Development methodology
- 1) Identify a purpose and scope
-
- 2) Building the ontology
- 1) Ontology capture knowledge acquistion
- 2) Ontology coding developing a structured
concept model - 3) Integrating existing ontologies
-
- 4) Evaluation verification and validation
-
- 5) Guidelines for each phase
268.1 Development Methodology (cont.)
- Infosleuth
- Semi-automatically constructs ontologies from
textual databases - Experts provide seed words to represent
high-level concepts - Processes the incoming documents extracting
phrases that involve seed words - Generates corresponding concept terms and then
classifies them into ontologies - Needs experts for evaluation process (Phase -3 )
- Does not mention integration of existing
ontologies
278.1 Development Methodology (cont.)
- SIMS
- An independent model of each info source must be
described for this system - Domain model defined to describe objects and
actions - Includes a hierarchical terminological knowledge
base - Indications of all relationships between the
nodes - Scalability and maintenance issues addressed
- Graphical knowledge base builder can be used
288.2 Supporting Tools
- OntoEdit
- Enables inspecting, browsing, codifying and
modifying ontologies - Support ontology development and maintenance
-
- SHOEs knowledge annotator
- Commits each web page to one or more ontologies
- Can define categories, relations and other
components in an ontology - Provides integrity checks
- Expose to parse annotated web pages
- Parka - knowledge base
298.2 Supporting Tools (cont.)
- DWQ i.com
- Supporting tool for the conceptual design phase
- Uses extended entity relationship conceptual data
model - Enriches it with aggregations and inter-schema
constraints - Serves mainly for intelligent conceptual modeling
308.3 Ontology Evolution
- Support for adding and/or removing sources
- Must be robust to changes in the information
source - SHOE only system that supports ontology
evolution using Expose
31End
32Development Methodology (cont.)
- KRAFT
- Shared Ontologies
- Ontology scoping
- Domain analysis
- Ontology fomralization
- Top level ontology
- Extracting Ontologies
- Bottom-up approach to extract an ontology from
existing shared ontolgies - Syntactic translation from the KRAFT exportable
view of the resource into the KRAFT-schema - Ontological upgrade semi-automatic translation
plus knowledge-based enhancement local ontology
adds knowledge and further relationships between
the entities in the translated schema - Lack evaluation of the ontologies