Title: Ontological Analysis
1- Ontological Analysis Integration of
Terminologies Towards An Environmental Reference
Ontology Library
Geri Steve, Aldo Gangemi, Domenico M. Pisanelli
Istituto di Tecnologie Biomediche, CNR, Rome,
Italy http//saussure.irmkant.rm.cnr.it steve,gan
gemi,pisanelli_at_saussure.irmkant.rm.cnr.it
2Which part are you talking about?
- If my liver is part of my digestive system, and
that system is part of me, is my liver part of
me? - If my liver is a part of me and I am part of the
CNR, is my liver part of the CNR? - My liver is a component of my digestive system,
while I am a member of CNR. No rule for composing
component and member relations - Moreover, I am a body, but I am also a person. A
living person depends on a body. Nevertheless, a
living person can be member of CNR, but a body
cannot
3Object or place?
- A body region is an object that one could cut, or
a place? - A gene is a DNA fragment, or a DNA region
(allele)? - A river is an orographic object, or the
geographic place of a watercourse? - Despite many differences, such three cases seem
analogous they share a polysemy partly dependent
on an abstract difference between objects and
regions, and a related axiom specifying that
objects must be located at some region
4River in the GEMET thesaurus
5Should we worry about those things?
- Even in presence of polysemous names, a
standalone application using a local databank or
terminological repository may be able to
accomplish its task without serious flaws. - However, when it is integrated with another
application, semantic mismatches constitute a
serious obstacle for the agent or interface that
is negotiating or sharing information. - The ever-increasing demand of data sharing has to
rely on a solid conceptual foundation in order to
give a semantics to the terabytes available in
different databases and eventually traveling over
the networks. - Ontologies are currently recognized as the answer
to the needs of conceptual foundation.
6The advantages of ontologies
- to allow a more effective data and knowledge
sharing - to facilitate knowledge re-use in decision
support systems - to give theoretical foundation to vocabulary
standardization activity
7Our task
- We learn domain ontologies (in medicine,
environment) by integrating the conceptual models
that can be extracted from terminological sources - The goal is building Domain Reference Ontologies
in the form of modular libraries of formal
theories - In our ONIONS methodology, ontology learning
needs both incremental bottom-up learning from
sources, and incremental definition and reuse of
general theories that can account for the
intended meaning of terms
8ONtologic Integration Of Naïve Sources
9Minimal history
- ONIONS methodology for ontology integration has
been developed since the early 1990s to account
for the problem of conceptual heterogeneity. It
addresses some problems encountered in the
context of the European project GALEN and the
Italian projects SOLMC (Ontological and
Linguistic Tools for Conceptual Modeling) and
ONTOINT (Ontological Integration of Information)
10Some related research projects
- GALEN GALEN-IN-USE
- CYC anatomy
- SNOMED RT
- HL7 vocabulary committee
- MED
11What is an ontology?
- A specification of a conceptualization
- (Gruber, 1993)
- The subject of ontology is the study of the
categories of things that exist or may exist in
some domain. The product of such a study, called
an ontology, is a catalog of the types of things
that are assumed to exist in a domain of interest
D from the perspective of a person who uses a
language L for the purpose of talking about D.
... - (Sowa, 1997)
- A partial and indirect specification of a
conceptualization - -restricted notion- (Guarino, 1998)
12What is an ontology (restricted notion)?
- An ontology is a set of axioms that account for
the intended meaning (the intended models) of a
vocabulary (the namespace of a logical language) - A set of axioms usually only approximate such
intended models that on their turn only
approximate the conceptualization of vocabulary
items - A conceptualization is a set of conceptual
relations that range over a domain and a set of
relevant states of affairs (possible worlds) for
that domain - Therefore, a precise definition of "ontology" (in
a restricted, formal sense) might be "a partial
specification of the intended models of the
conceptualization of a vocabulary"
13Types of ontologies (broad notion)
- Catalog of normalized terms, e.g. a list of terms
used in the reports from a laboratory no
taxonomy, no axioms, and no glosses - Glossed catalog, e.g. a dictionary of medicine a
catalog with glosses. - Thesaurus, e.g. many parts of the UMLS
Metathesaurus, GEMET a hierarchical collection
of terms the hierarchical link is usually
polysemous - Taxonomy, e.g. the ICD10 a collection of classes
with a partial order induced by inclusion
(classification) - Axiomatized taxonomy, e.g. the GALEN Core Model
a taxonomy with axioms - Ontology library, e.g. the Ontolingua repository
a set of axiomatized taxonomies with relations
among them. Each element of the library is a
module, which can be included into another one.
Also, a concept from a module can be only used
into another one. Ontology modules can be
considered subdivisions of the namespace of a
model
14From Data Integration to Conceptual Integration
- Heterogeneous texts
- Heterogeneous semi-structured texts (retrieval
of web data types and descriptions) - Heterogeneous databases (schema integration,
information brokering) - gt In all these cases, heterogeneity concerns the
conceptualization of the terminology used in the
sources
15Polysemy and overlapping
- Since the primary causes of heterogeneity are
- polysemy (conceptual disalignment, difference of
intended meaning of one name), and - conceptual overlapping (different names having
overlapping meaning) - that arise in the union of the vocabularies of
two any sources, ontologies are a major component
to provide semantic access to (and integration
of) terminological resources - Incidentally, polysemy is usually found within
the same source as well (views, themes, homonyms)
16Ontology Learning
- From Natural Language
- From Semi-structured Data
- From Structured Data
- From Terminologies
-
- gt Integration of sources needs
- (Principled) Conceptual Abstraction
17Conceptual abstraction an example
- The domain ontology A has body region with the
intended meaning of loosely specified part of
the body that can be cut, filled, etc. - The domain ontology B has body region with the
intended meaning of region of the body at which
body parts are located - There is a metonymy acting on body region in A,
whose intended meaning concerns body parts
located at some region, although they are denoted
by referring to the region itself (the intended
meaning in B) - Hence, the metonymic name should be distinguished
from the plain name, and correctly related to it - The distinction between objects (body parts) and
regions, and the notion of a localization
relation holding between objects and regions are
both necessary to make the metonymy clear, and
cannot be found in the specifications given in A
or B. They have to be found in some generic theory
18Ontology integration conceptual issues
- Ontology integration is generally speaking
the construction of an ontology C that formally
specifies the union of the vocabularies of two
other ontologies A and B - To be sure that A and B can be integrated at some
level, C has to commit to both A's and B's
conceptualizations. In other words, the intension
of the concepts in A and B should be mapped to
the intension of C's concepts - Unfortunately, this cannot be realized using only
the conceptual relations specified in A and B for
local tasks (for a specific context). The
methodological principle adopted here is that
generic ontologies reused from the philosophical,
linguistic, mathematical, AI literature must
found the comparison of different intensions. Our
approach may be called principled conceptual
integration
19Aspects of integration
- Three aspects of an ontology are taken into
account - the intended models of the conceptualizations of
its vocabulary - the domain of interest of such models, i.e. the
'topic' of the ontology - the namespace of the ontology
- The most interesting case is when A and B are
supposed to commit to the conceptualization of
the same domain of interest or of two overlapping
domains. In particular, A and B may be
20Some integration cases for the same topic
- Alternative ontologies the intended models of
the conceptualizations of A and B are different
(they partially overlap or are completely
disjoint) while the domain of interest is
(mostly) the same. This is a typical case that
requires integration different descriptions of
the same topic are to be integrated - Truly overlapping ontologies both the intended
models of the conceptualizations of A and B and
their domains of interest have a substantial
overlap. This is another frequent case of
required integration descriptions of strongly
related topics are to be integrated - Equivalent ontologies with vocabulary mismatches
the intended models of the conceptualizations of
A and B are the same, as well as the domain of
interest, but the namespaces of A and B are
overlapping or disjoint. This is the case of
equivalent theories with alternative vocabularies
21Ontological integration operational issues
- Depending on the amount of change necessary to
the operational integration of A and B, different
levels of interoperability can be distinguished - Mediation it requires no changes to A and B, but
only mapping relations that describe the
equivalence (partial or total) of A's and B's
elements to C's elements. This may result in weak
interoperability, since usually the intended
models of A and B overlap only some concepts
from A may not have a correspondent in B, and
vice-versa. This is the design choice for some
recent information brokering architectures.
However, such architectures, have a weak
commitment towards a principled way of conceptual
integration, possibly for its additional cost - Alignment it requires some change to fill the
biggest gaps of A and B respect to an ideal C
that completely integrates A and B. Therefore,
alignment requires at least a partial conceptual
integration. It may support a limited
interoperability for example, deep inferences
may be excluded - Unification it may require a major
reorganization of A and B, which are
'harmonized'. Unification intervenes on the
inferential features of the systems, and consists
in a complete operational integration everything
can be made in one system, can be made in the
other. It results in the most complete
interoperability but requires a complete
conceptual integration as well. From the
conceptual viewpoint, unification consists in the
adoption of C as a standard in the systems using
A or B
22Ontology integration practical issues
- Lack of hierarchies
- Ambiguous hierarchies
- Informality
- Lack of modularity
- Polysemy
- Uncertain semantics
- Prototypical descriptions
- Ontological opaqueness
- Lack of a (minimal) set of axioms
- Confusing lexical clues
- Awkward naming policy
- 'Remainder' partitions
- 'Exception' partitions
- Terminological cycles
- Meta-level soup
- Low maintenance capabilities
23Ontologies some desiderata
- An explicit taxonomy with subsumption among
concepts - Semantic explicitness of links
- Modularity of namespace
- A stratified design of the modules
- Absence of polysemy within a module
- Disjointness of concepts within a module and
within the top-level - A proper interface between the ontology namespace
and one or more sets of lexical realizations - Linguistically meaningful naming policy
(cognitive transparency) - Rich documentation
- Some minimal axiomatization to detail the
difference among sibling concepts - Explicit linkage to concepts and relations from
generic theories - Meta-level assignments to distinguish among the
formal primitives assigned to concepts - Languages and implementations that support the
previous needs as well as the possibility of
collaborative modeling
24The ONIONS Methodology
- ONIONS implementation is meant to provide
extensive axiomatization, clear semantics, and
ontological depth to a domain terminology - Extensive axiomatization is obtained through a
conceptual analysis of the terminological sources
and their representation in a logical language
with a rigorous semantics - Ontological depth is obtained by reusing a
library of generic ontologies, on which the
axiomatization depends. Such library may include
multiple choices among partially incompatible
ontologies. In particular, we suggest the
importance of mereology or theory of parts,
topology or theory of wholes, connexity and
boundaries, morphology, or theory of form and
congruence, localization, or theory of regions,
time theory, actors, or theory of participants in
a process, dependence theory, and the theory of
environmental niches
25The main steps (I)
- 0. Semantically opaque hierarchies and lists are
pre-processed in order to create clean
taxonomies - 1. All concepts, relations, templates, rules, and
axioms from a source ontology are represented in
the ONIONS formalisms, currently Loom,
Ontolingua, and OKBC - 2. When available, plain text descriptions are
analyzed and axiomatized (text formalization) - 3. The union of such products is integrated by
means of a set of generic ontologies. This is the
most characteristic activity in ONIONS, which can
be briefly described as follows
26II
- 3.1. For any set of sibling concepts in a
taxonomy, the conceptual difference between each
of them is inferred, and such difference is
formalized by axioms that reuse the relations and
concepts already in the library. If no concept is
available to represent the difference, new
concepts are added to the library - 3.2. For any set of polysemous senses of a term,
different concepts are stated and placed within
the library according to their topic and to the
available modules. (Polysemy occurs when two
concepts with overlapping or disjoint intended
models have the same name.) - 3.3. Often, polysemous senses of a term - as well
as different 'alternative' concepts - are
metonymically related. For example
process/outcome (as in inflammation),
region/object (as in body region), etc.
Alternatives must be properly defined by making
it explicit the relationship between them e.g.
"has-product" for inflammation, "location" for
body-region - 3.4. When stating new concepts, the relations
necessary to maintain the consistency with the
existing concepts are instantiated. If conflicts
arise with existing theories, a more general
theory is searched which is more comprehensive.
If this is impracticable, an alternative theory
is created
27III
- 3.5. Relevant integration cases. Since ONIONS
requires the use of generic theories to
axiomatize alternative theories, the integration
of a concept C from an ontology O is performed by
comparing C with the concepts D1,,n already
present in the evolving ontology library L, whose
ontology set M1,,n contains at least a
significant subset of generic ontologies and the
set of domain ontologies at that state in the
evolution of L. The following cases appear
relevant to the methodology - 3.5.1. C's name is polysemous in O (internal
polysemy). Iterate 3.2 3.4 - 3.5.2. C's name is homonym with the name of a Di.
(Homonymy occurs when both the intended models
and the domains of two concepts with the same
name are disjoint.) Homonyms must be
differentiated by modifying the name, or by
preventing the homonyms to be included in the
same module namespace - 3.5.3. C's name is synonym with the name of a Di.
(Synonymy is the converse of homonymy and occurs
when two concepts with different names have both
the same intended model and the same domain.)
Synonyms must be preserved, or included in the
set of lexical realizations related to the
concept - 3.5.4. C is subsumed by some Di in L, but it has
no total mapping on any Dj in L. The gap in L
must be filled by adding C as a subconcept of Di
28IV
- 3.5.5. C is an intersection between two concepts
Di and Dj in L. Solved by distinguishing types
and roles, or different defining elements - 3.5.6. C has an alternative concept Di in L (same
domain, but overlapping or disjoint intended
models) - 3.5.6.1. If C metonymically depends on Di, C is
properly related to Di - 3.5.6.2. If C and Di are different viewpoints on
the same domain of interest, both concepts are
kept if the case, they are included in separate
modules - 3.5.6.3. If the intended model of C is finer than
Di's, Di is substituted with C - 3.5.6.4. If the intended model of C is coarser
than Di's, C is ignored (but track of it is kept
for mapping between sources)
29V
- 4. The library of generic, intermediate, and
domain ontologies should be stratified, say
domain modules should include intermediate
modules - that should include generic modules -
so that each set of modules can be plugged or
unplugged from its more general set without
affecting the coherence of the entire library - 5. The source ontologies are explicitly mapped to
the integrated ontology, in order to allow
interoperability. The only admitted mappings are
equivalent and coarser equivalent. Formally for
any source ontology SO and an ontology IO that is
supposed to result (also) from the integration of
SO, for any concept Ci in SO, there is a Di in IO
such that CiI DiI (equivalence of possible
interpretations), or there is a disjunctive
concept (or Di Dj) in IO such that CiI DiI ?
DjI (equivalence of possible interpretations to a
disjunction of concepts i.e. to a union of
finer concepts) - 5.1. Partial mappings must have been already
resolved through the methodology if any, some
step in the integration procedure must be
iterated
30Ambiguous hierarchies
31A principled formalization
- (defconcept ununited-fracture
- is-primitive (and fracture
- (some morphology
- (and bone
- (or (some embodies malunion)
- (not integral))))
- (some dependently-postdates
fracture) - (all interpretant clinical-condition)))
32Some UMLS concepts pertaining the intersection
Amino Acid, Peptide, or Protein Carbohydrate
- (hamster oviduct-specific glycoprotein)
- (Par j I)
- ((Man)6(GlcNAc)2Asn)
- (Zn(2)-IAA)
- (collapsing factor)
- (BDV 18K glycoprotein)
- (SI-gene-associated glycoprotein, Nicotiana)
- (FdI allergen)
- (sca gene product)
- (EPV20 protein)
- (lubricin)
- (Pluritene)
- (Par h 1 allergen)
- (Wnt11 gene product)
- (I-D-Gal-BSA)
- (mannose-bovine serum albumin conjugate)
- (acrosome granule lysin)
- (sulfatide activator)
- (vaccinia virus A34R protein)
gt More than 118,000 UMLS concepts (25) are
classified under an intersection
33Ontological analysis of the intersection
- (defconcept Amino Acid, Peptide, or Protein
Carbohydrate - "834 instances. This conjunct includes two
sibling types. - A protein containing a carbohydrate."
- annotations ((Sugg.Name "carbohydrate-containin
g-protein") - (onto-status integrated))
- is-primitive (and protein
- (some has-component carbohydrate))
- context substances)
34Morphologies
- Names of anatomical morphologies are often
polysemous - Both a condition and the function that caused the
condition ("inflammation", "ulcer", "fracture",
"wound", "hyperplasia") - Both an object and the function that produced the
object ("neoplasm", "hemorrhage") - Both an object O and the condition created in
another object O' by O ("obstruction") - For example "the fracture has been caused by a
fall" vs. "the fracture is transverse" "the
obstruction occurred in the jejunum" vs. "the
obstruction has been removed" - Conceptual analysis puts into evidence other
issues concerning morphologies - The dependence between a morphological condition,
a function, and the related organ. For example,
an "ulcer" (as a condition) of a stomach implies
that the stomach embodies an ulceration function
(an ulcer as a function) - The mereological import of morphologies some are
featured by an organ, some only by a part of an
organ. For instance, an "ectopic heart" is wholly
ectopic, but an "ulcerated stomach" is only
partly ulcerated
35Morphologies analyzed
- a property ("color", "consistency", "thickness",
"size", "number", "shape") - a condition
- a topologically relevant condition
- an alteration of connection
- that creates a configuration (a new property) in
an object ("fracture", "wound") - in the holey interior of an object
("obstruction") - between several objects ("fusion")
- an alteration of the boundary between an object
holey interior and the object complement - creating a configuration in the boundary
("cavitation", "ulcer") - producing a substance flow ("hemorrhage",
"ulcer") - an abnormal placement ("dislocation", "ectopia",
"absence") - a form alteration condition ("deformity",
"hyperplasia", "hypoplasia") - a condition involving the alteration of several
properties ("inflammation", "eruption") - an abnormal, foreign object ("mass", "neoplasm",
"calculus", "obstruction")
36Expliciting relations
37Medical source ontologies
- The UMLS top-level (1998 edition 132 "semantic
types", 91 "relations", and 412 "templates"), - The Snomed-III top-level (510 "terms" and 25
"links"), - GMN top-level (708 "terms"),
- The Icd10 top-level (185 "terms"), and
- The GALEN Core Model v.5h (2,730 "entities", 413
"attributes" and 1,692 axioms), etc. - The 1998 edition of the UMLS Metathesaurus
(476,000 "concepts", 93,000 explicit templates,
and 599,000 thesaurus-like templates)
38The current ON9.2 library
39The current top-level
40Tool for representation
ONTOLINGUA Tool for representation and
classification LOOM Tool
for intermediate representation and
interchange OKBC Tool for
browsing and editing
ONTOSAURUS
41(No Transcript)
42 Results
- ON9.2 integration of the medical top levels
within a library of generic theories. It includes
a set of 50 modules with about 1,500 concepts. It
is available in both Ontolingua and Loom
languages - Explicitation of the Metathesaurus terminological
knowledge intersections of UMLS semantic types,
relations defined by sources (IS_A and other
relations) - Integration of the Metathesaurus intersections
within ON9.2 - Contextualization of the Metathesaurus
- An integrated model of clinical guidelines
43What is a Domain Reference Ontology?
- An ontology usable to build new ontologies in a
domain, or to plug existing ontologies in it - Our research in medical conceptual structures
aims at defining a Medical Reference Ontology
(library) - The current research in environmental metadata
could be reconsidered as the construction of an
Environmental Reference Ontology - We are confident that our methodology is suitable
to this task without substantial revision - Warning at first sight, conceptual heterogeneity
in environment seems harder than medicine
44- "Es gibt nichts praktischers als eine gute
Theorie" - (Ludwig von Boltzmann)
45- "Es gibt nichts praktischers als eine gute
Theorie" - "There is nothing more practical than a good
theory" - (Ludwig von Boltzmann)
46References
- for generalities, the library, and conceptual
investigations - Gangemi A, Pisanelli DM, Steve G, "An overview of
the ONIONS project Applying ontologies to the
integration of medical terminologies", Data and
Knowledge Engineering, 31 (1999), 183-220 - for the investigation of the UMLS
- Pisanelli DM, Gangemi A, Steve G, "An Ontological
Analysis of the UMLS Metathesaurus", Journal of
American Medical Informatics Association, vol. 5
(symposium supplement), 1998 - for the pre-processing of informal terminological
repositories - Steve G, Gangemi A, Pisanelli DM, "Integrating
Medical Terminologies with ONIONS Methodology",
in Kangassalo H, Charrel JP (eds.) Information
Modelling and Knowledge Bases VIII, Amsterdam,
IOS Press 1997 - for the integration of clinical guidelines
- Pisanelli DM, Gangemi A, Steve G, "Toward a
Standard for Guideline Representation an
Ontological Approach", Journal of American
Medical Informatics Association, vol. 6
(symposium supplement), 1999