Title: Ontology and Its Applications Barry Smith http://ontologist.com
1Ontology and Its ApplicationsBarry
Smithhttp//ontologist.com
2OVERVIEW
- Part I A Brief Overview of Developments in
Ontology at the Borderlines of Philosophy and
Computation - Part II Ontology and Biomedical Informatics
3IFOMIS
- now part of
- European Centre for Ontological Research,
Saarbrücken, Germany
4Institute for Formal Ontology and Medical
Information Science
- 16 staff
- 2 medical informaticians
- 1 neurologist
- 1 chemist
- 1 radiologist
- 2 computer scientists
- 9 philosophers
5The problem
- Different communities of researchers use
different and often incompatible concepts /
categories in expressing the results of their work
6Example Medicine
-
- blood is a tissue
-
- blood is a body fluid
- How to integrate competing conceptualizations?
7- Example Molecular Biology
- GDB
- Genome Database of Human Genome Project
- GenBank
- National Center for Biotechnology Information,
Washington DC
8What is a gene?
- GDB a gene is a DNA fragment that can be
transcribed and translated into a protein - GenBank a gene is a DNA region of biological
interest with a name and that carries a genetic
trait or phenotype
9How to integrate competing conceptualizations
- for example across the granular divide between
medicine and molecular biology?
10Answer
-
- ONTOLOGY!
- But what does ontology mean?
-
11Three senses of ontology
- Philosophical sense
- Aristotle an inventory of the types of entities
and relations in reality - Quine an inventory of ontological commitments
- Knowledge engineering sense an ontology as a
consensus representation of the concepts used in
a given domain - Gene Ontology sense a controlled vocabulary for
database annotation / indexing
12Two Communities
- Reference Ontology Community An ontology is an
inventory of the types of entities and relations
which exist in a given domain of reality - KR Community an ontology is a consensus
representation of the concepts used in a given
domain of discourse
13Ontology as used in KR / AI
- had its roots in Quines doctrine of ontological
commitment and in the internal metaphysics of
Carnap/Putnam -
14Quineanism
- ontology is the study of the ontological
commitments or presuppositions embodied in
scientific theories - (or in the beliefs of those experts,
- or in the databases of that company)
15Quineanism, too, faces the integration problem
- If an ontology is the set of ontological
commitments of a theory - how can we cope with questions pertaining to the
relations between the objects to which different
theories are committed? - Quine can tell us what there is
- but can he tell us how it is related together?
16The problem of the unity of science
- The logical positivist solution to this problem
addressed a world in which sciences are
identified with - printed texts
- What if sciences are identified with
- information systems
- or with
- the contents of websites?
17The Semantic Web Initiative
- The Web is a vast edifice of heterogeneous data
sources - Needs the ability to query and integrate across
different and often incompatible conceptual
systems
18How resolve such incompatibilities and make the
various parts of the web interoperable?
- Enforce conceptual compatibility via
standardized taxonomies applied to websites as
meta-tags formulated within the framework of a
common web language like OWL
19Tim Berners Lee
- hyperlinked vocabularies, called ontologies
will be used by Web authors to explicitly define
their words and concepts as they post their stuff
online. - codes would let software "agents" analyze the
Web on our behalf, making smart inferences that
go far beyond the simple linguistic analyses
performed by today's search engines.
20A new silver bullet
21Metadata in Web commerce
- agree on a metadata standard for washing
machines as concerns size, price, etc. - create machine-readable databases and put them
on the net - ? consumers can query multiple sites
simultaneously - and search for highly specific, reliable,
context-sensitive results
22Metadata in science
- agree on metadata standards for molecules
(genes, proteins, drugs), clinical phenomena,
therapies ... - create machine-readable databases and put them
on the net - ? biomedical researchers can query multiple
sites simultaneously - and search for highly specific, reliable,
context-sensitive results
23A world of exhaustive, reliable metadata
- would be utopia
- (Cary Doctorow)
24Problem 1 People lie
- Cheating in assigning meta-tags can confer
benefits to the cheaters - Metadata exists in a competitive world.
- Some people are crooks.
- Some people are cranks.
25Semantic Web effort
- thus far devoted primarily to developing systems
for standardized representation of web pages and
web processes - ( ontology of web typography)
- not to the harder task of developing ontologies
- (reliable taxonomies, term hierarchies)
- for the content of such web pages
26Problem 2 People are lazy
- Half the pages on Geocities are called Please
title this page
27Problem 3 People are stupid
- The vast majority of the Internet's users
- (even those who are native speakers of English)
- cannot spell or punctuate
- Will internet users learn to accurately tag
their information with whatever taxonomy and
syntax they're supposed to be using?
28even with correct XML-syntax
- ltBUSINESS-CARDgt ltFIRSTNAMEgtJuleslt/FIRSTNAMEgt
ltLASTNAMEgtDerycklt/LASTNAMEgt ltCOMPANYgtNewcolt/CO
MPANYgt ltMEMBEROFgtXTC Grouplt/MEMBEROFgt
ltJOBTITLEgtBusiness Managerlt/JOBTITLEgt
ltTELgt32(0)3.471.99.60lt/TELgt ltFAXgt32(0)3.891.
99.65lt/FAXgt ltGSMgt32(0)465.23.04.34lt/GSMgt
ltWEBSITEgtwww.newco.comlt/WEBSITEgt ltADDRESSgt
ltSTREETgtDendersesteenweg 17 lt/STREETgt
29errors still abound
Is "Jules" the first name of the person, or of
the business-card?
- ltBUSINESS-CARDgt ltFIRSTNAMEgtJuleslt/FIRSTNAMEgt
ltLASTNAMEgtDerycklt/LASTNAMEgt ltCOMPANYgtNewcolt/CO
MPANYgt ltMEMBEROFgtXTC Grouplt/MEMBEROFgt
ltJOBTITLEgtBusiness Managerlt/JOBTITLEgt
ltTELgt32(0)3.471.99.60lt/TELgt ltFAXgt32(0)3.891.
99.65lt/FAXgt ltGSMgt32(0)465.23.04.34lt/GSMgt
ltWEBSITEgtwww.newco.comlt/WEBSITEgt ltADDRESSgt
ltSTREETgtDendersesteenweg 17lt/STREETgt
ltZIPgt2630lt/ZIPgt
30errors still abound
Is Jules or Newco the member of XTC Group?
- ltBUSINESS-CARDgt ltFIRSTNAMEgtJuleslt/FIRSTNAMEgt
ltLASTNAMEgtDerycklt/LASTNAMEgt ltCOMPANYgtNewcolt/CO
MPANYgt ltMEMBEROFgtXTC Grouplt/MEMBEROFgt
ltJOBTITLEgtBusiness Managerlt/JOBTITLEgt
ltTELgt32(0)3.471.99.60lt/TELgt ltFAXgt32(0)3.891.
99.65lt/FAXgt ltGSMgt32(0)465.23.04.34lt/GSMgt
ltWEBSITEgtwww.newco.comlt/WEBSITEgt ltADDRESSgt
ltSTREETgtDendersesteenweg 17lt/STREETgt
ltZIPgt2630lt/ZIPgt ltCITYgtAartselaarlt/CITYgt
ltCOUNTRYgtBelgiumlt/COUNTRYgt lt/ADDRESSgt
lt/BUSINESS-CARDgt
31errors still abound
- ltBUSINESS-CARDgt ltFIRSTNAMEgtJuleslt/FIRSTNAMEgt
ltLASTNAMEgtDerycklt/LASTNAMEgt ltCOMPANYgtNewcolt/CO
MPANYgt ltMEMBEROFgtXTC Grouplt/MEMBEROFgt
ltJOBTITLEgtBusiness Managerlt/JOBTITLEgt
ltTELgt32(0)3.471.99.60lt/TELgt ltFAXgt32(0)3.891.
99.65lt/FAXgt ltGSMgt32(0)465.23.04.34lt/GSMgt
ltWEBSITEgtwww.newco.comlt/WEBSITEgt ltADDRESSgt
ltSTREETgtDendersesteenweg 17lt/STREETgt
ltZIPgt2630lt/ZIPgt ltCITYgtAartselaarlt/CITYgt
ltCOUNTRYgtBelgiumlt/COUNTRYgt lt/ADDRESSgt
lt/BUSINESS-CARDgt
Do the phone numbers and address belong to Jules
or to the business?
32Problem 4 Building good ontologies/standardized
taxonomies is very difficult
- and the constraints imposed by OWL and similar
languages make the job even harder
33Problem 5 Ontology Impedance
- semantic mismatch between ontologies
- gene used in websites issued by
- biotech companies involved in gene patenting
- medical researchers interested in role of genes
in predisposition to smoking - insurance companies
34Problem 6 The Concept Orientation
- Tom Gruber An ontology is a specification of a
conceptualization - Semantic Web specify Toms, and Dicks, and
Harrys conceptualizations carefully, - ensure that all are formulated in a common
(XML-based) syntax - Presto conceptualizations will somehow become
integrated
35even a world of exhaustive, reliable metadata
- would not solve the problem of integration
36expressing different systems of concepts
- in a common syntactic environment does not
resolve conceptual incompatibilities
37different conceptualizations
38need not interconnect at all
39we cannot make incompatible terminology-systems
interconnect
just by looking at concepts, or knowledge or
language
40to decide which of a plurality of competing
conceptualizations to accept
we need some tertium quid
41we need, in other words,
to take the world itself into account
42Compare the way biologists resolve disagreements
as to whether they mean the same thing by
different words
- by pointing to the objects in their lab
43(No Transcript)
44The Semantic Web
- is a machine for creating syllogisms (Clay
Shirky) - Humans are mortalGreeks are humanTherefore,
Greeks are mortal
45Lewis Carroll
- No interesting poems are unpopular among people
of real taste No modern poetry is free from
affectation All your poems are on the subject
of soap-bubbles No affected poetry is popular
among people of real taste No ancient poetry is
on the subject of soap-bubbles - Therefore All your poems are bad.
46the promise of the Semantic Web
- it will improve all the areas of your life where
you currently use syllogisms
47Semantic Web
- compatibility problems should be solved
automatically - (by machine)
- Hence ontologies must be applications running in
real time
48Semantic Web methodology
- Get syntax right first
- (Conceptualism weak expressive resource weak
Description Logics to ensure computational
tractability) - and integration of concepts will take care of
itself - but only at the price of Procrustean
simplification
49IFOMIS methodology
- Get ontology right first
- (use powerful logic to develop ontology as theory
of reality - and solve tractability problems later)
- only thus will we have some hope of genuine
integration across different disciplines and data
resources
50Belnap
- it is a good thing logicians were around before
computer scientists - if computer scientists had got there first,
then we wouldnt have numbers - because arithmetic is undecidable
51It is a good thing
- philosophical ontology was around before
Description Logics, because otherwise - we would have only hierarchies of concepts
together with abstract mathematical models - and no universals or instances in reality
52Recall
- GDB a gene is a DNA fragment that can be
transcribed and translated into a protein - Genbank a gene is a DNA region of biological
interest with a name and that carries a genetic
trait or phenotype
53Ontology
- fragment, region, name, carry, trait,
type - ... part, whole, function, inhere,
substance - are ontological terms in the sense of traditional
(philosophical) ontology
54The idea of a reference ontology
- a theory of the kinds of entities existing in
reality and of the relations between them
55The Reference Ontology Community
- IFOMIS (Saarbrücken)
- Laboratories for Applied Ontology (Trento/Rome,
Turin) - Ontology Works (Baltimore)
- Department of Biological Structure (Seattle)
- Medical Ontology Research (Bethesda)
- The Gene Ontology / Open Biological Ontologies
Consortium
56IFOMISs long-term goal
- Build a robust high-level reference ontology
- THE WORLDS FIRST INDUSTRIAL-STRENGTH PHILOSOPHY
- as the basis for an ontologically coherent
unification of biomedical knowledge and
terminology
57Two upper-level ontologies reference
- BFO (Saarbrücken) Basic Formal Ontology
- DOLCE (Trento/Rome)
58Aristotle
First ontologist
59Edmund Husserl
60 Formal Ontology
- term coined by Husserl
- the theory of those ontological structures
- such as part-whole, universal-particular
- which apply to all domains whatsoever
61Husserls Logical Investigations1900/01
- Aristotelian theory of universals and particulars
- theory of part and whole
- theory of ontological dependence
- the theory of boundaries and fusion
62Formal Ontology
- contrasted with material or regional ontologies
- (compare relation between pure and applied
mathematics) - Husserls idea
- If we can build a good formal ontology, this
should save time and effort in building reference
ontologies for each successive material domain
63 In formal ontology
- as in formal logic, we can grasp the properties
of given structures in such a way as to establish
in one go the properties of all formally similar
structures
64Compare
- pure mathematics (theories of structures such as
order, set, function, mapping) employed in every
domain - applied mathematics, applications of these
theories re-using the same definitions,
theorems, proofs in new application domains - physical chemistry, biophysics, etc. adding
detail
65Three levels of ontology
?????
- formal (top-level) ontology
- biomedical ontology has nothing like the
technology of definitions, theorems and proofs
provided by pure mathematics - 2) domain ontology
- UMLS Semantic Network, GO, GALEN CORE
- 3) terminology-based ontology
- UMLS, SNOMED-CT, GALEN, FMA
66(No Transcript)
67The Concept Orientation An ontology is a
consensus representation of concepts
68concept runs together
- meaning shared in common by synonymous terms
- idea shared in common in the minds of those who
use these terms - universal, type, feature or property shared in
common by entities in the world
69There are more word meanings than there are
universals / types of entities in reality
- unicorn
- devil
- canceled workshop
- prevented pregnancy
- imagined mammal
- fractured lip ...
70space of word meanings
space of universals
71space of word meanings
space of word meanings
space of word meanings
space of word meanings
space of universals
72space of word meanings
space of word meanings
space of word meanings
space of word meanings
space of universals
space of universals
space of universals
73(No Transcript)
74if ontological relations are defined across the
whole space of word meanings
- rather than across the space of universals
instantiated in reality - then our tools for dealing with such relations
are blunted
75meningitis is_a disease of the nervous system
is a statement about universals in reality
76A is_a B def. A is narrower in meaning
than B
- unicorn is_a one-horned mammal
77The linguistic reading of concept
- yields a smudgy view of reality, built out of
relations like - synonymous_with
- associated_to
78Goble Shadbolt
79The concept-based approach
- can provide some half-way coherent treatment of
is_a relations
80but it cant cope at all with relations like
- part_of def. composes, with one or more other
physical units, some larger whole - contains def. is the receptacle for fluids or
other substances
81connected_to def. Directly attached to another
physical unit as tendons are connected to
muscles.
- How can a meaning or concept be directly
attached to another physical unit as tendons are
connected to muscles ?
82An example of the concept orientation
- Unified Medical Language System (UMLS)
83- UMLS Metathesaurus
- 1 million biomedical concepts
- 2.8 million concept names
- from more than 100 controlled vocabularies and
classifications - built by US National Library of Medicine
84UMLS Source Vocabularies
- MeSH Medical Subject Headings
-
- ICD International Classification of Diseases
-
- GO Gene Ontology
-
- FMA Foundational Model of Anatomy
-
85To reap the benefits of standardization
- we need to make ONE SYSTEM out of many different
terminologies - UMLS Semantic Network
- nearest thing to an ontology in the UMLS
-
86UMLS SN
- described by its authors as An Upper Level
Ontology for the Biomedical Domain - (Compare the Semantic Web initiative)
87UMLS SN
- 134 Semantic Types
- 54 types of edges (relations)
- yielding a graph containing more than 6,000 edges
88Fragment of UMLS SN
89(No Transcript)
90(No Transcript)
91UMLS SN Top Level
- entity event
- physical conceptual
- object entity
- organism
-
92conceptual entity
- Organism Attribute
- Finding
- Idea or Concept
- Occupation or Discipline
- Organization
- Group
- Group Attribute
- Intellectual Product
- Language
93-
conceptual - entity
- idea or concept
- functional concept
- body system
94- entity
-
- physical conceptual
- object entity
- idea or concept
- functional concept
- body system
confusion of entity and concept
95Functional Concept
- Body system is_a Functional Concept.
- but
- Concepts do not perform functions or have
physical parts.
96This
is not a concept
97Confusion of Ontology and Epistemology
- Physical Object
- Substance
- Food Chemical Body Substance
98Confusion of Ontology and Epistemology
- Chemical
- Chemical Chemical
- Viewed Viewed
- Structurally Functionally
99- Chemical
- Chemical Chemical
- Viewed Viewed
- Structurally Functionally
- Inorganic Organic Enzyme
Biomedical or - Chemical Chemical Dental
Material
100- Chemical
- Chemical Chemical
- Viewed Viewed
- Structurally Functionally
- Inorganic Organic
Biomedical or - Chemical Chemical Dental
Material
Enzyme
101The Hydraulic Equation
- BP COPVR
-
- arterial blood pressure is directly proportional
to the product of blood flow (cardiac output, CO)
and peripheral vascular resistance (PVR)
102Confusion of Ontology and Epistemology
- blood pressure is an Organism Function,
- cardiac output is a Laboratory or Test Result or
Diagnostic Procedure - BP COPVR thus asserts that
- blood pressure is proportional either to a
laboratory or test result or to a diagnostic
procedure
103Fragment of UMLS SN
104UMLS Semantic Network
- anatomical abnormality associated_with daily or
recreational activity - educational activity associated with pathologic
function - bacterium causes experimental model of disease
105(No Transcript)
106GO the Gene Ontology
- 3 large telephone directories of standardized
designations for gene functions and products - organized into hierarchies via is_a and part_of
107When a gene is identified
- three important types of questions need to be
addressed - 1. Where is it located in the cell?
- 2. What functions does it have on the molecular
level? - 3. To what biological processes do these
functions contribute?
108GOs three ontologies
109GO is three ontologies
- cellular components
- molecular functions
- biological processes
- December 16, 2003
- 1372 component terms
- 7271 function terms
- 8069 process terms
110The Cellular Component Ontology (counterpart of
anatomy)
- flagellum
- chromosome
- membrane
- cell wall
- nucleus
-
111The Molecular Function Ontology
- ice nucleation
- protein stabilization
- kinase activity
- binding
-
- The Molecular Function ontology is (roughly) an
ontology of actions on the molecular level of
granularity
112Biological Process Ontology
- Examples
- glycolysis
- death
- adult walking behavior
- response to blue light
- occurrents on the level of granularity of
cells, organs and whole organisms
113Each of GOs ontologies
- is organized in a graph-theoretical structure
involving two sorts of links or edges - is-a ( is a subtype of )
- (copulation is-a biological process)
- part-of
- (cell wall part-of cell)
114(No Transcript)
115GO is species-independent
- an ontology of the unchanging universal building
blocks of life - (substances and processes)
- and of the structures they form
116(No Transcript)
117The Gene Ontology
- error prone
- in part because of its sloppy treatment of
relations - menopause part_of death
118(No Transcript)
119Primary aim of GO
- not rigorous definition and principled
classification - but rather providing a practically useful
framework for keeping track of the biological
annotations that are applied to gene products
120Problems with GO Molecular Functions
- anti-coagulant activity (defined as a
substance that retards or prevents coagulation) - enzyme activity (defined as a substance that
catalyzes) - structural molecule (defined as the action of
a molecule that contributes to structural
integrity)
121GO0005199 structural constituent of cell wall
- Definition The action of a molecule that
contributes to the structural integrity of a cell
wall. - confuses actions, which GO includes in its
function ontology, with constituents, which GO
includes in its cellular component ontology
122(No Transcript)
123(No Transcript)
124cars
- red cars Cadillacs cars with
radios
125Why do these problems arise?
- Because GO has no clear formal understanding of
the role of relations in organizing an ontology - (thus also no clear understanding of the
difference between a function and the activity
which is the realization of a function GO runs
these two together)
126Thesis
- GO can realize its goal more adequately (and
avoid many coding errors) by taking ontology
(especially the logic of classifications and
definitions) seriously
127Digital Anatomist
- Foundational Model of Anatomy(Department of
Biological Structure, University of Washington,
Seattle) -
The first crack in the wall of the Concept
Orientation
128(No Transcript)
129Anatomical Space
Anatomical Structure
is_a
Organ Cavity Subdivision
Organ Cavity
Organ
Serous Sac
Organ Component
Serous Sac Cavity
Tissue
Serous Sac Cavity Subdivision
Pleural Sac
Pleura(Wall of Sac)
Pleural Cavity
part_of
Parietal Pleura
Visceral Pleura
Interlobar recess
Mediastinal Pleura
Mesothelium of Pleura
130part_of
Reference Ontology for Anatomy at every level
of granularity
131The Gene Ontology
The second crack in the wall
- European Bioinformatics Institute, ...
- Open source
- Transgranular
- Cross-Species
- Components, Processes, Functions
132But
- No logical structure
- Viciously circular definitions
- Poor rules for coding, definitions, treatment of
relations, classifications - so highly error-prone
133New GO / OBO Reform Effort
- OBO Open Biological Ontologies
134OBO Library
- Gene Ontology
- MGED Ontology
- Cell Ontology
- Disease Ontology
- Sequence Ontology
- Fungal Ontology
- Plant Ontology
- Mouse Anatomy Ontology
- Mouse Development Ontology
- ...
135coupled with
- Relations Ontology (IFOMIS)
-
- suite of relations for biomedical ontology to be
submitted to CEN as basis for standardization of
biomedical ontologies - alignment of FMA and GALEN
136(No Transcript)
137