Title: Realist Ontology for the Semantic Web: Applications in Biomedical Informatics
1Realist Ontology for the Semantic Web
Applications in Biomedical Informatics
- Werner Ceusters
- European Centre for Ontological Research
- Universität des Saarlandes
- Saarbrücken, Germany
2Lecture overview
- Credentials
- The many faces of ontology
- Realist ontology
- Why is the concept-based approach so wide-spread
? - The price you pay if you go for concepts ...
- Can Description Logics save the world ?
- And then there was OWL
- Take home messages
3European Centre forOntological Research
External members
Local members
Partners
Status Dec 2, 2004
4European Centre forOntological Research
Directors
Member representatives
Management Board
Advisory Board
Status Dec 2, 2004
5Institute for Formal Ontologyand Medical
Information Science
- an interdisciplinary research group
- Philosophy,
- Computer and Information Science,
- Logic,
- Medicine,
- Medical Informatics.
- a center of theoretically grounded research in
both formal and applied ontology. - Main goal to develop a formal ontology that will
be applied and tested in the domain of medical
and biomedical information science.
6IFOMIS competences
Status Dec 2, 2004
7Our building
8What philosophers are good for...
9Short personal history
10Ontology
11WordNet 2.0 - 2003
Is this going to realise the Semantic
Web ?????????????
12Ontology on the web
The most cited definition Tom Grüber 1993
Inactive since August 7, 2004.
W3C Web Ontology initiative
Ontology from a philosophical perspective.
Important bioinformatics resource
Realist ontology in use. Barry Smith
Popular ontology editor from Manchester
SUO Upper Ontology Initiative
John Sowas ontology page
Status Nov 29, 2004
13New search on Nov 3010.000 results more
- 1 What is an Ontology?
- 3 Gene Ontology Consortium
- 4 W3C Web Ontology (WebOnt) Working Group (OWL)
(Closed) - 7 Buffalo Ontology Site
- 15 MGED NETWORK Ontology Working Group (OWG)
- 20 Laboratory for Applied Ontology (LOA)
- 21 ONTOLOGY WORKS INC.
- 34 John Bateman ontology portal root
- 53 The Protégé Ontology Editor and Knowledge
Acquisition System - 59 Institute for Formal Ontology and Medical
Information Science ... - 86 Autofellatio and Ontology
- 188 EUROREC 2004, Implemantation Guidelines, ...
- 192 Foundational Ontology (Leeds)
- 676 Ontology Server research (StarLab)
14If, later, you can remember just one thing of
this presentation, then make sure it is this one
- If you use the word ontology, ALWAYS be
specific about what you mean by it.
15Tom Grubers view
- An ontology is a specification of a
conceptualization. - The word "ontology" seems to generate a lot of
controversy in discussions about AI. It has a
long history in philosophy, in which it refers to
the subject of existence. It is also often
confused with epistemology, which is about
knowledge and knowing.
- In the context of knowledge sharing, I use the
term ontology to mean a specification of a
conceptualization. That is, an ontology is a
description (like a formal specification of a
program) of the concepts and relationships that
can exist for an agent or a community of agents.
This definition is consistent with the usage of
ontology as set-of-concept-definitions, but more
general. And it is certainly a different sense of
the word than its use in philosophy.
16The O-word in science
N. Guarino, P. Giaretta, "Ontologies and
Knowledge Bases Towards a Terminological
Clarification". In Towards Very Large Knowledge
Bases Knowledge Building and Knowledge Sharing,
N. Mars (ed.), pp 25-32. IOS Press, Amsterdam,
1995.
17The O-word in buzz-speak
- An ontology is a classification methodology for
formalizing a subject's knowledge or belief
system in a structured way. Dictionaries and
encyclopedias are examples of ontologies. (X1) - A terminology (or classification) is a kind of
ontology by definition and it should preserve
(and "understand") the relationships between the
1,000s of terms in it or else it would become a
mere dictionary (or at best a thesaurus). (X2) - Ontologies are Web pages that contain a mystical
unifying force that gives differing labels common
meaning. (X3)
18Ontology
- An ontology defines the terms used to describe
and represent an area of knowledge, and are used
by people, databases, and applications that need
to share domain information (a domain is a
specific subject area, such as health or
medicine).
e-Health - making healthcare better for European
citizens An action plan for a European e-Health
Area COM (2004) 356 final, 30.4.2004, p17
OWL Web Ontology Language Use Cases and
Requirements W3C Recommendation 10 February 2004
http//www.w3.org/TR/webont-req/
19Ontology
- Ontologies need to specify descriptions for the
following kinds of concepts - Classes (general things) in the many domains of
interest - The relationships that can exist among things
- The properties (or attributes) those things may
have
OWL Web Ontology Language Use Cases and
Requirements W3C Recommendation 10 February 2004
http//www.w3.org/TR/webont-req/
20Realist Ontology
21A visit to the operating theatre
A lot of objects present
22A visit to the operating theatre
A lot of processes going on
Haydom Lutheran Hospital, Tanzania
23Axiom 1
- If the picture is not a fake, we (i.e., me and
this audience) KNOW that that hand, that surgeon,
... EXIST(ed), i.e. ARE (were) REAL. - But importantly that hand, surgeon, kocher,
mask, ... EXIST(ed) independent of our knowledge
about them and also the part-relationship
between that hand and that surgeon, and the
processes going on, are (were) equally real.
24The realist ontological square (Ignacio Angelelli)
Quality Universals
Substance Universals
differentia
exemplify
instance
instance
inheres
Substance Particulars
Quality Particulars
25How to differentiatequalities from substances ?
- Language may fool us
- Being pale
- Being human
- Being a person
- Being sick
- Can all be properties of particulars, namely me
and you !
- But so does logic
- Pale(x)
- Human(x)
- Person(x)
- Sick(x)
26Realist ontology
- describes what is fundamental in the totality of
what exists, - defines the most general categories to which we
need to refer in constructing a description of
reality, - tells us how these categories are related.
- is able to be used to describe reality at any
point in time.
27Basic Ontological Notions
- Identity
- How are particulars distinguished from each other
? - Unity
- How are all the parts of a particular isolated ?
- Essence
- Can a property change over time ?
- Dependence
- Can an entity exist without some others ?
28Identity instanciation
29A practical example OntoClean
- I The property carries a common identity
criterion for all its instances. - -I The property does not carry a common identity
criterion for all its instances. - U The property carries a common unity criterion
for all its instances. - -U The property does not carry a common unity
criterion for all its instances. - U No instance of the property satisfies a unity
criterion. - R The property is essential to all its
instances an instance of a rigid property
cannot stop satisfying that property. - -R The property is not essential to all its
instances some instances of a non rigid
property can stop satisfying that property. - R No instance of the property has it
essentially all instances of the property can
stop satisfying it.
Guarino Welty
30Ontological theories
- theories between reality and the ontology
(ontology as a representation) - Granular Partition Theory (T Bittner B. Smith)
- Logic of Classes (B. Smith)
- Foundational relations
31Theory of granular partitions (B. Smith)
Think of it as Albertis grid
32Granular partitions main principles
- a partition is the drawing of a (typically
complex) fiat boundary over a certain domain - a partition typically comes with labels and/or an
address system - partitions are artefacts of our cognition
- a partition is transparent (veridical)
- bona fide objects exist independently of our
partitions, fiat objects are determined by
partitions - different partitions may represent cuts through
the same reality which are skew to each other - entities (existing in reality) located in the
same cell of a partition share common
characteristics
33(Simplified) Logic of classes
- primitive
- entities particulars versus universals
- relation inst such that
- all classes are universals all instances are
particulars - some particulars are not instances e.g. some
mereological sums - subsumption defined resorting to instances
34Reference Ontology
- a theory of a domain of entities in the world
- based on realizing the goals of maximal
expressiveness and adequacy to reality - sacrificing computational tractability for the
sake of representational adequacy
35Basic Formal Ontology
- Basic Formal Ontology consists in a
series of sub-ontologies (most properly conceived
as a series of perspectives on reality), the most
important of which are - SnapBFO, a series of snapshot ontologies (Oti ),
indexed by times continuants - SpanBFO a single videoscopic ontology (Ov)
occurants. -
- Each Oti is an inventory of all entities
existing at a time. Ov is an inventory
(processory) of all processes unfolding through
time.
36Occurants and continuants
Picture by Vladimir Brajic
37(No Transcript)
38SpanBFO
39A Realist Ontology
- a for a computer understable representation of
some pre-existing domain of REALITY, reflecting
the properties of the objects within its domain
in such a way that there obtain substantial and
systematic correlations between reality and the
ontology itself.
40Why is the concept-basedapproach so wide-spread ?
41Back to the operating theatre
He wants me to remove that blood
I must get rid of that blood
Suction, please !
42This is communication !
Give me a kocher, please.
kocher
43Triadic models of meaning The Semiotic/Semantic
triangle
Reference Concept / Sense / Model / View /
Partition
Sign Language/ Term/ Symbol
Referent Reality/ Object
44Aristotles triadic meaning model
Words spoken are signs or symbols (symbola) of
affections or impressions (pathemata) of the soul
(psyche) written words (graphomena) are the
signs of words spoken (phoné). As writing
(grammatta), so also is speech not the same for
all races of men. But the mental affections
themselves, of which these words are primarily
signs (semeia), are the same for the whole of
mankind, as are also the objects (pragmata) of
which those affections are representations or
likenesses, images, copies (homoiomata).
Aristotle, 'On Interpretation', 1.16.a.4-9,
Translated by Cooke Tredennick, Loeb Classical
Library, William Heinemann, London, UK, 1938.
pathema
semeia ? gramma/ phoné
pragma
45An interesting sidestepunderstanding
- understanding ? Latin substare
- literally to stand under
- Websters Dictionary (1961) understanding the
power to render experience intelligible by
bringing perceived particulars under appropriate
concepts. - particulars what is NOT SAID of a subject
(Aristotle) - substances this patient, that tumor, ...
- qualities the red of that patients skin, his
body temperature, blood pressure, ... - processes that incision made by that surgeon,
the rise of that patients temperature,... - concepts may be taken in the above definition
as Aristotles universals what is SAID OF a
subject - Substantial concepts patient, tumor, ...
- Quality concepts white, temperature
- ...
46Richards semantic triangle
- Reference (concept) indicates the realm of
memory where recollections of past experiences
and contexts occur. - Hence as with Aristotle, the reference is
mind-related thought. - But not the same for all, rather individual
mind-related
reference
symbol
referent
47Dont confuse with homonymy !
mole
48Different thoughts Homonymy
R2
R3
R1
mole skin lesion
mole unit
mole
mole animal
49And by the way, synonymy...
the Aristotelian view
Richards view
sweat
sweat
perspiration
perspiration
50Freges view
- sense is an objective feature of how words are
used and not a thought or concept in somebodys
head - 2 names with the same reference can have
different senses (mst/ist) - 2 names with the same sense have the same
reference (synonyms) - a name with a sense does not need to have a
reference (Beethovens 10th symphony)
sense
name
reference (referent)
51Ontology and the semantic triangle
- In Information Science
- An ontology is a description (like a formal
specification of a program) of the concepts and
relationships that can exist for an agent or a
community of agents. - In Philosophy
- Ontology is the science of what is, of the kinds
and structures of objects, properties, events,
processes and relations in every area of reality.
52Current state of the art onmeaning in
biomedical informatics
- A pervasive bias towards concepts
- Content wise
- Work based on ISO/TC37 that advocates the
Ogden-Richards theory of meaning - Corresponds with a linguistic reading of
concept - Architecture wise
- In Europe work based on CEN/TC251 WG1 WG2 that
follow ISO/TC37 - In the US HL7, inspired by Speech Act Theory
- Concepts used as elements of information
models, hence mixing a linguistic and engineering
reading.
53Before the introduction of concepts, it was
even worse ...
- Characteristics of an ideal medical knowledge
system - a unique code for each term (word, phrase)
- each code-term being defined
- each term independent, not defined as the result
of other terms in the system - synonyms recognisable through the codes
- to each codes could be attached codes of related
terms - the system would encompass all of medicine
- the system would be in the public domain
- the format of the KB should be functionally
described, independent from hard- or software
(C. Bishop, 1989)
54With concepts, it became
- Characteristics of an ideal medical knowledge
system - a unique code for each term (word, phrase) and
concept - each code-term concept being defined
- each term concept independent, not defined as the
result of other terms in the system ??? - synonyms recognisable through the codes concepts
- to each code concept could be attached codes
concepts of related terms - the system would encompass all of medicine
- the system would be in the public domain
- the format of the KB should be functionally
described, independent from hard- or software
55Requirements for clinical vocabularies (1)
- Domain completeness coverage of all possible
terms that lie within a vocabularys domain - Non-vagueness the term should represent the
concept behind it as close as possible - Non-ambiguity the same term cannot refer to more
than one concept - Non-redundancy each concept must be represented
by one unique identifier
(Cimino, 1989)
56Requirements for clinical vocabularies (2)
- Synonomy multiple ways for expressing a word (or
concept) must be allowed - Multiple classification concepts must be allowed
to be classified in multiple hierarchies - Consistency of view concepts must have the same
relationships in all views - Explicit relationships all relationships (e.g.
class, synonymy,) must be explicitly labelled.
57The price you pay if you gofor concepts ...
58Borders classification of medicine
- Medicine
- Mental health
- Internal medicine
- Endocrinology
- Oversized endocrinology
- Gastro-enterology
- ...
- Pediatrics
- ...
- Oversized medicine
59MeSHMedical Subject Headings
- Designed for bibliographic indexing, eg Index
Medicus - Basis for MedLINE
- focuses on biomedicine and other basic healthcare
sciences - clinically very impoverished
- Consistency amongst indexers
- 60 for headings
- 30 for sub-headings
60MeSH Tree Structures - 2004
- Anatomy A
- Organisms B
- Diseases C
- Chemicals and Drugs D
- Analytical, Diagnostic and Therapeutic
Techniques and Equipment E - Psychiatry and Psychology F
- Biological Sciences G
- Physical Sciences H
- Anthropology, Education, Sociology and Social
Phenomena I - Technology and Food and Beverages J
- Humanities K
- Information Science L
- Persons M
- Health Care N
- Geographic Locations Z
What about this as a top ontology ???
61MeSH Tree Structures - 2004
- Cardiovascular Diseases C14
- Heart Diseases C14.280
- Arrhythmia C14.280.067
- Carcinoid Heart Disease C14.280.129
- Cardiomegaly C14.280.195
- Endocarditis C14.280.282
- Heart Aneurysm C14.280.358
- Heart Arrest C14.280.383
- Heart Defects, Congenital C14.280.400
- Aortic Coarctation C14.280.400.090
- Arrhythmogenic Right Ventricular Dysplasia
C14.280.400.145 - Cor Triatriatum C14.280.400.200
- Coronary Vessel Anomalies C14.280.400.210
- Crisscross Heart C14.280.400.220
- Dextrocardia C14.280.400.280
62MeSH Tree Structures - 2004
- Body Regions A01
- Extremities A01.378
- Lower Extremity A01.378.610
- Buttocks A01.378.610.100
- Foot A01.378.610.250
- Ankle A01.378.610.250.149
- Forefoot, Human A01.378.610.250.300
- Heel A01.378.610.250.510
- Hip A01.378.610.400
- Knee A01.378.610.450
- Leg A01.378.610.500
- Thigh A01.378.610.750
The most abundant sort of mistakes !
63MeSH Tree Structures - 2004
- Body Regions A01
- Abdomen A01.047
- Back A01.176
- Breast A01.236
- Extremities A01.378
- Amputation Stumps A01.378.100
- Lower Extremity A01.378.610
- Upper Extremity A01.378.800
- Head A01.456
- Neck A01.598
- Pelvis A01.673
- Perineum A01.719
- Thorax A01.911
- Viscera A01.960
And here ?
64SNOMED International (1995)
- Multi-axial coding system
- morphology, disease, function, procedure, ...
- Each axis has an hierarchical structure
- Translations in other languages than English only
for older versions - Informal internal structuring
- Being translated in CG formalism, but with only
internal consistency - Possibility to generate meaningless concepts
- Mixing of hierarchies
- Bone
- Long Bone
- Periosteum
- Shaft
65Snomed International (1995) Number of records
(V3.1)
- T Topography 12,385
- M Morphology 4,991
- F Function 16,352
- L Living Organisms 24,265
- C Drugs Biological Products 14,075
- A Physical Agents, Forces and Activities
1,355 - D Disease/ Diagnosis 28,623
- P Procedures 27,033
- S Social Context 433
- J Occupations 1,886
- G General Modifiers 1,176
- TOTAL RECORDS 132,641
66Snomed International (1995)knowledge in the
codes.
- posterior
- anatomic leaflet
- mitral
- cardiac valve
- cardiovascular
-
Why was this not a good idea ?
67Snomed International multiple ways to express
the same thing
- D5-46210 Acute appendicitis, NOS
- D5-46100 Appendicitis, NOS
- G-A231 Acute
- M-41000 Acute inflammation, NOS
- G-C006 In
- T-59200 Appendix, NOS
- G-A231 Acute
- M-40000 Inflammation, NOS
- G-C006 In
- T-59200 Appendix, NOS
68The International Classification of diseases
(WHO).
- ...
- Chapter II Neoplasms (C00-D48)
- Chapter III Diseases of the Blood and
Blood-forming organs and certain disorders
involving the immune mechanism (D50-D89) - Excludes auto-immune disease (systemic) NOS
(M35.9) - ....
- Nutritional Anemias (D50-D53)
- D50 Iron deficiency anaemia
- Includes ...
- D50.0 Iron deficiency anaemia secondary
to blood loss (chronic) - Excludes ...
- D50.1 ...
- D51 Vit B12 deficiency anaemia
- Haemolytic Anemias (D55-D59)
- ...
- Chapter IV ...
69UMLS Unified Medical Language System (NLM)
- Tool for information retrieval of 4 components
- Metathesaurus contains information about
biomedical concepts and how they are represented
in diverse terminological systems. - Semantic Network contains information about
concept categories and the permissible
relationships among them - Information Sources Map contains both
human-readable and machine-processable
information about all kinds of biomedical
terminological systems - Specialist lexicon english words with POS
- The tool from and for the U.S. -)
70UMLS Semantic Network
71Semantic Network Relationships
- Is_a
- physically related to
- spatially related to
- temporally related to
- functionally related to
- conceptually related to
72Semantic Network Biologic Function Hierarchy
73Semantic Network "affects" Hierarchy
74Axiom 2
- Concept-based terminology (and standardisation
thereof) is there as a mechanism to improve
understanding of messages by humans. - It is NOT the right device
- to explain why reality is what it is, how it is
organised, etc., (although it is needed to allow
communication), - to reason about reality,
- to make machines understand what is real,
- to integrate across different views, languages,
conceptualisations, ...
75Why not ?
- Does not take care of universals and particulars
appropriately - Concepts not necessarily correspond to something
that (will) exist(ed) - Sorcerer, unicorn, leprechaun, ...
- Definitions set the conditions under which terms
may be used, and may not be abused as conditions
an entity must satisfy to be what it is - Language can make strings of words look as if it
were terms - Middle lobe of left lung
- ...
76Ok, then Description Logicswill save us ... ?
77Description Logics
- A decidable fragment of FOL
- A propositional modal logic
- A classes and properties (concepts and roles)
oriented KR language - Subsumption and satisfiability (consistency) are
the key inferences - Most DLs are supersets of ALC
- Boolean operators on concepts
- Existential and Universal quantifiers
- OWL-DL is a large superset (SHOIN)
- Property hierarchies Transitive roles (SH)
- Inverse (I)
- Nominals (O) (hasValue and one of)
- Number restrictions (counting quantifiers)
78Snomed and DL
SNOMED-RT (2000)
DL dont guarantee you to get parthood right !
79Use of description logics does not guarantee
correct representations !
80Sloppiness in definitions
81NCI Thesaurus
- a biomedical thesaurus created specifically to
meet the needs of the National Cancer Institute. - semantically modeled cancer-related terminology
built using description logics
82NCI Thesaurus Root concepts
83Conceptual entity
- Definition none
- Semantic type
- Conceptual entity
- Classification
- Subconcepts
- Action
- definition action a thing done
- And
- Definition an article which expresses the
relation of connection or addition, used to
conjoin a word with a word, ... - Classification
- Definition the grouping of things into classes
or categories
84Definition of cancer gene
85NCI Thesaurus architecture
Findings-And- Disorders-Kind
Anatomy-Kind
Disease
Formal subsumption or inheritance
Associative relationships providing
differentiae
Kinds restrict the domain and range of
associative relationships
ISA
Breast
Breast neoplasm
Disease-has-associated-anatomy
86Ontology versusDescription Logics
- In the Description Logic world
- terms and definitions come first,
- the job is to validate them and reason with them
by means of a model - but whether the model correspond to reality is
not its problem (Workshop on DL, Saarbrücken,
22-23/11/2004) - In the realist ontology world
- robust ontology (with all its reasoning power)
comes first - terms, term-hierarchies and record architectures
must be subjected to the constraints of
ontological coherence
87 Thanks x there is OWL ?Where x ?
,
,
,
,
88Understanding content (1)
John Doe has a pyogenic granuloma of the left
thumb
John Doe has a pyogenic granuloma of the left
thumb
89Understanding content (2)
ltrecordgt ltpatientgtJohn Doelt/patientgt ltdiagnosisgtpy
ogenic granuloma of the left thumblt/diagnosisgt lt/r
ecordgt
ltrecordgt ltsubjectgt John Doe lt/subjectgt ltdiagnosisgt
pyogenic granuloma of the left thumb
lt/diagnosisgt lt/recordgt
90Understanding content (3)
lt129465004gt lt116154003gtJohn Doelt/116154003gt lt
8319008 gt 17372009 ltfinding sitegt 76505004
ltlateralitygt7771000lt/lateralitygt lt/finding
sitegt lt/ 8319008 gt lt/129465004gt
91XML OWL
- XML
- Pure syntax
- Simulated semantics
- OWL
- Very precise semantics
- But is the semantics of the right sort to
faithfully describe simple medical facts ?
92NCITs Lung in OWL
- ltowlClass rdfID"Lung"gt
- ltrdfslabelgtLunglt/rdfslabelgt
- ltcodegtC12468lt/codegt
- lthasTypegtprimitivelt/hasTypegt
- ltrdfssubClassOf rdfresource"Organ"/gt
- ltrdfssubClassOfgt
- ltowlRestrictiongt
- ltowlonProperty
- rdfresource"rAnatomic_Structure_Has_Location
"/gt - ltowlsomeValuesFrom rdfresource"Thoracic_Cavi
ty"/gt - lt/owlRestrictiongt
- lt/rdfssubClassOfgt
- ...
- lt/owlgt
All instances of lung must be located in at
least one instance of thoracic cavity Hence
total lung excision is impossible.
93NCITs Lung in OWL
- ltowlClass rdfID"Lung"gt
- ltrdfslabelgtLunglt/rdfslabelgt
- ltcodegtC12468lt/codegt
- lthasTypegtprimitivelt/hasTypegt
- ltrdfssubClassOf rdfresource"Organ"/gt
- ltrdfssubClassOfgt
- ltowlRestrictiongt
- ltowlonProperty
- rdfresource"rAnatomic_Structure_Has_Location
"/gt - ltowlallValuesFrom rdfresource"Thoracic_Cavit
y"/gt - lt/owlRestrictiongt
- lt/rdfssubClassOfgt
- ...
- lt/owlgt
every assigned location of pleura must be an
instance of the class Thoracic Cavity Allows
lungs not to be located at all.
94Take home messages
- Very few ontologies are ontologies.
- Realist ontology offers a good methodology for
building consistent representations. - DLs are helpful, but only if you know how to use
them properly. - OWL is inadequate to represent even the most
obvious facts. - Please ... be critical when buzz words are used.