Title: 10:30-12:00 How to Build an Ontology 1-2pm Best Practices and Lessons Learned 2-3pm BIRN Ontologies: An Overview
11030-1200 How to Build an Ontology 1-2pm Best
Practices and Lessons Learned 2-3pm BIRN
Ontologies An Overview
2How to Build an Ontology
3High quality shared ontologies build communities
- General trend on the part of NIH, FDA and other
bodies to consolidate ontology-based standards
for the communication and processing of
biomedical data. - NCIT / caBIG / NECTAR / BIRN / OBO ...
4TWO STRATEGIESAd hoc creation of new database
schemas for each research group / research
hypothesisvs.
- Pre-established interoperable stable reference
ontologies in terms of which all database schemas
need to be defined
5- How to create the conditions for a step-by-step
evolution towards gold standard reference
ontologies in the biomedical domain - ... and why we need to create these conditions
- OBO Core project
6- Ontology def
-
- A representation of the types of entities
existing in a given domain of reality, and of the
relations between these types
7Types have instances
- Ontologies are like science texts they are about
types - (Diaries, databases, clinical records are about
instances)
8The need
- strong general-purpose classification
hierarchies created by domain specialists - clear, rigorous definitions
- thoroughly tested in real cases
- ontologies teach us about the instances in
reality by supporting cross-disciplinary
(cross-ontology) reasoning about types
9The actuality (too often)
- myriad special purpose light ontologies,
prepared by ontology engineers and deposited in
internet repositories or registries
10- these light ontologies often do not generalize
-
- repeat work already done by others
- are not interoperable
- reproduce the very problems of communication
which ontology was designed to solve - contain incoherent definitions
- and incoherent documentation
11BIRN Ontology Experiences
- In the short-term, users will probably download
the data or analyses and extract the results
using their preferred methods. - In the long term, however, that will become
infeasible - the databases will have to be made interoperable
with standard datamining software. - This is where the neuroanatomy ontologies come
in. - We will need to know what the ROI is and which
naming scheme it came from (e.g., a Brodmanns
area, or a sulcal/gyral area, etc.). Well need
to know how it was defined (Talairach atlas? MNI
atlas? LONI atlas? Or subject-specific regions?)
and what the statistic is.
12BIRN Ontology Experiences
- In the short-term, users will probably download
the data or analyses and extract the results
using their preferred methods. - In the long term that will become infeasible
13The long term begins here
14A methodology for quality-assurance of ontologies
- tested thus far in the biomedical domain on
- FMA
- GO other OBO Ontologies
- FuGO
- SNOMED
- UMLS Semantic Network
- NCI Thesaurus
- ICF (International Classification of Functioning,
Disability and Health) - ISO Terminology Standards
- HL7-RIM
15A methodology for quality-assurance of ontologies
- accepted need for application of this
methodology - FMA
- GO other OBO Ontologies
- FuGO
- SNOMED
- UMLS Semantic Network
- NCI Thesaurus
- ICF (International Classification of Functioning,
Disability and Health) - ISO Terminology Standards
- HL7-RIM
16A methodology for quality-assurance of ontologies
- signs of hope
- FMA
- GO other OBO Ontologies
- FuGO
- SNOMED
- UMLS Semantic Network
- NCI Thesaurus
- ICF (International Classification of Functioning,
Disability and Health) - ISO Terminology Standards
- HL7-RIM
17We know that high-quality ontologies built
according to this methodology can help in
creating high-quality mappings between human and
model organism phenotypes
18Alignment of Multiple Ontologies of Anatomy
Deriving Indirect Mappings from Direct Mappings
to a Reference OntologySongmao ZhangOlivier
BodenreiderAMIA 2005
19We also know that OWL is not enough to ensure
high-quality ontologies
- and that the use of a common syntax and logical
machinery and the careful separating out of
ontologies into namespaces does not solve the
problem of ontology integration
20A basic distinction
- type vs. instance
- science text vs. clinical document
- man vs. Musen
21Instances are not represented in an ontology
- It is the generalizations that are important
- (but instances must still be taken into account)
22A 515287 DC3300 Dust Collector Fan
B 521683 Gilmer Belt
C 521682 Motor Drive Belt
23Ontology Types Instances
24Ontology A Representation of Types
25Ontology A Representation of Types
- Each node of an ontology consists of
- preferred term (aka term)
- term identifier (TUI, aka CUI)
- synonyms
- definition, glosses, comments
26Ontology A Representation of Types
Nodes in an ontology are connected by
relations primarily is_a ( is subtype of) and
part_of designed to support search, reasoning and
annotation
27types
mammal
frog
leaf class
28Rules for formating terms
- Terms should be in the singular
- Terms should be lower case
- Avoid abbreviations even when it is clear in
context what they mean (breast for breast
tumor) - Avoid acronyms
- Avoid mass terms (tissue, brain mapping,
clinical research ...) - Each term A in an ontology is shorthand for a
term of the form the type A
29Motivation to capture reality
- Inferences and decisions we make are based upon
what we know of reality. - An ontology is a computable representation of the
underlying biological reality. - Designed to enable a computer to reason over the
data we derive from this reality in (some of) the
ways that we do.
30Concepts
- Biomedical ontology integration will never be
achieved through integration of meanings or
concepts - The problem is precisely that different user
communities use different concepts - Concepts are in your head and will change as your
understanding changes
31Concepts
- Ontologies represent types not concepts,
meanings, ideas ... - Types exist, with their instances, in objective
reality - including types of image, of imaging process,
of brain region, of clinical procedure, etc.
32Rules on types
- Dont confuse types with words
- Dont confuse types with concepts
- Dont confuse types with ways of getting to know
types - Dont confuse types with ways of talking about
types - Dont confuses types with data about types
33Some other simple rules for high quality
ontologies
34Univocity
- Terms should have the same meanings on every
occasion of use. - They should refer to the same kinds of entities
in reality - Basic ontological relations such as is_a and
part_of should be used in the same way by all
ontologies
35Positivity
- Complements of types are not themselves types.
- Hence terms such as
- non-mammal
- non-membrane
- other metalworker in New Zealand
- do not designate types in reality
- There are also no conjunctive and disjunctive
types - protoplasmic astrocyte and Schwann cell
- Purkinje neuron or dendritic shaft
36Objectivity
- Which types exist is not a function of our
knowledge. - Terms such as unknown or unclassified or
unlocalized do not designate types in reality.
37Single Inheritance
- No kind in a classificatory hierarchy should
have more than one is_a parent on the immediate
higher level
38Multiple Inheritance
blue thing
car
is_a1
is_a2
blue car
39is_a Overloading
- serves as obstacle to integration with
neighboring ontologies - The success of ontology alignment demands that
ontological relations (is_a, part_of, ...) have
the same meanings in the different ontologies to
be aligned. - See Relations in Biomedical Ontologies, Genome
Biology May 2005. - ? DISEASE MAPS
40General Rule
- Formulate universal statements first
- Move to A may be B in such and such a context
later
41Intelligibility of Definitions
- The terms used in a definition should be simpler
(more intelligible) than the term to be defined
otherwise the definition provides no assistance - to human understanding
- to machine processing
42Definitions should be intelligible to both
machines and humans
- Machines can cope with the full formal
representation - Humans need clarity and modularity
43But
- Some terms are primitive (cannot be defined)
- AVOID CIRCULAR DEFINITIONS
- Avoid definitions of the forms
- An A is an A which is B (person person with
identity documents) - An A is the B of an A (heptolysis the causes of
heptolysis)
44Case Study The National Cancer Institute
Thesaurus (NCIT)
- does not (yet) satisfy these and other simple
principles -
45The NCIT reflects a recognition of the need
- for high quality shared ontologies and
terminologies the use of which by clinical
researchers in large communities can ensure
re-usability of data collected by different
research groups
46NCIT
- a biomedical vocabulary that provides
consistent, unambiguous codes and definitions for
concepts used in cancer research - exhibits ontology-like properties in its
construction and use.
47Goals
- to make use of current terminology best
practices to relate relevant concepts to one
another in a formal structure, so that computers
as well as humans can use the Thesaurus for a
variety of purposes, including the support of
automatic reasoning - to speed the introduction of new concepts and
new relationships in response to the emerging
needs of basic researchers, clinical trials,
information services and other users.
48Formal Definitions
- of 37,261 nodes, 33,720 were stipulated to be
primitive in the DL sense - Thus only a small portion of the NCIT ontology
can be used for purposes of automatic
classification and error-checking by using OWL. -
49Verbal Definitions
- About half the NCIT terms are assigned verbal
definitions - Unfortunately some are assigned more than one
50Disease Progression
- Definition1
- Cancer that continues to grow or spread.
- Definition2
- Increase in the size of a tumor or spread of
cancer in the body. - Definition3
- The worsening of a disease over time. This
concept is most often used for chronic and
incurable diseases where the stage of the disease
is an important determinant of therapy and
prognosis.
51To make matters worse Disease Progression has as
subclass
- Cancer Progression
- Definition
- The worsening of a cancer over time. This
concept is most often used for incurable cancers
where the stage of the cancer is an important
determinant of therapy and prognosis.
52Cancer
- a process (of getting better or worse)
- an object (which can grow and spread)
53Confuses definitions with descriptions
- Tuberculosis
- Definition
- A chronic, recurrent infection caused by the
bacterium Mycobacterium tuberculosis.
Tuberculosis (TB) may affect almost any tissue or
organ of the body with the lungs being the most
common site of infection. The clinical stages of
TB are primary or initial infection, latent or
dormant infection, and recrudescent or adult-type
TB. Ninety to 95 of primary TB infections may go
unrecognized. Histopathologically, tissue lesions
consist of granulomas which usually undergo
central caseation necrosis. Local symptoms of TB
vary according to the part affected acute
symptoms include hectic fever, sweats, and
emaciation serious complications include
granulomatous erosion of pulmonary bronchi
associated with hemoptysis. If untreated,
progressive TB may be associated with a high
degree of mortality. This infection is frequently
observed in immunocompromised individuals with
AIDS or a history of illicit IV drug use.
54Confuses definitions with descriptions
- Tuberculosis
- Definition
- A chronic, recurrent infection caused by the
bacterium Mycobacterium tuberculosis.
Tuberculosis (TB) may affect almost any tissue or
organ of the body with the lungs being the most
common site of infection. The clinical stages of
TB are primary or initial infection, latent or
dormant infection, and recrudescent or adult-type
TB. Ninety to 95 of primary TB infections may go
unrecognized. Histopathologically, tissue lesions
consist of granulomas which usually undergo
central caseation necrosis. Local symptoms of TB
vary according to the part affected acute
symptoms include hectic fever, sweats, and
emaciation serious complications include
granulomatous erosion of pulmonary bronchi
associated with hemoptysis. If untreated,
progressive TB may be associated with a high
degree of mortality. This infection is frequently
observed in immunocompromised individuals with
AIDS or a history of illicit IV drug use.
55A better definition
- Tuberculosis
- Definition
- A chronic, recurrent infection caused by the
bacterium Mycobacterium tuberculosis.
56NCIT inherits this ontological and terminological
incoherence from source vocabularies in UMLS
- Conceptual Entities def
- An organizational header for concepts
representing mostly abstract entities. - Includes as subtypes
- action, change, color, death, event, fluid,
injection, temperature
57- Conceptual Entities def
- An organizational header for concepts
representing mostly abstract entities. - Confuses use and mention (swimming is healthy and
has eight letters)
58Duratec, Lactobutyrin, Stilbene Aldehyde
- are classified by the NCIT as Unclassified Drugs
and Chemicals
59and problematic synonyms
- Anatomic Structure, System, or Substance
Anatomic Structures and Systems - Does anatomic apply only to structure or also
to system and substance? - Biological Function Biological Process
- some biological processes are the exercises of
biological functions - others (e.g. pathological processes, side
effects) not - Genetic Abnormality Molecular Abnormality (with
subtype Molecular Genetic Abnormality)
(definitions not supplied)
60Problematic synonyms
- Diseases and Disorders Disease Disorder
- Definition1 for Disease
- A disease is any abnormal condition of the body
or mind that causes discomfort, dysfunction, or
distress to the person affected or those in
contact with the person. ... - Definition2 for Disease
- A definite pathologic process with a
characteristic set of signs and symptoms. ... - Condition ? Process
- Definition2 contradicts NCITs own classification
hierarchy
61Three disjoint classes of plants
-
- Vascular Plant
- Non-vascular Plant
- Other Plant
62Three kinds of cells
- Abnormal Cell is a top-level class (thus not
subsumed by Cell - Normal Cell is a subclass of Microanatomy.
- Cell is a subclass of Other Anatomic Concept (so
that cells themselves are concepts)
63NCIT as now constituted will block automatic
reasoning
- Neither Normal Cells nor Abnormal Cells are Cells
within the context of the NCIT
64Some consolations
- NCIT is open source
- NCIT has broad coverage
- NCIT has some formal structure (OWL-DL)
- NCIT is much, much better than (for example) the
HL7-RIM - NCIT has realized the errors of its ways
65The road ahead
- http//www.cbd-net.com/index.php/search/show/9384
64 - Review of NCI Thesaurus and Development of
Plan to Achieve OBO Compliance - and welcome to the Pre-NCIT
- http//nciterms.nci.nih.gov/NCIBrowser/Dictionary
.do
66Fragment of Pre-NCIT Hierarchy
- Murine Tissue Type Body Fluids and
Substances (MMHCC) Cardiovascular System
(MMHCC) Blood Vessel (MMHCC)
Heart (MMHCC)
Digestive System (MMHCC)
67First step
- Alignment of OBO ontologies through a common
system of formally defined relations in the
OBO-RO (OBO Relation Ontology) - see Relations in Biomedical Ontologies, Genome
Biology Apr. 2005
68is_a (sensu UMLS)
- A is_a B def
- A is narrower in meaning than B
- grows out of the heritage of dictionaries
- (which ignore the basic distinction between types
and instances)
69To build a high quality shared ontology requires
hard work and staying powerYou cannot cheat by
borrowing from UMLSUMLS ( the UMLS
Metathesaurus) is not an ontology
70Concepts, Concept Names, and their Identifiers in
the UMLS
- The Metathesaurus is organized by concept. One of
its primary purposes is to connect different
names for the same concept from many different
vocabularies. - A concept is a meaning. A meaning can have many
different names. A key goal of Metathesaurus
construction is to understand the intended
meaning of each name in each source vocabulary
and to link all the names from all of the source
vocabularies that mean the same thing (the
synonyms). This is not an exact science. ...
Metathesaurus editors decide what view of
synonymy to represent in the Metathesaurus
concept structure. Please note that each source
vocabularys view of synonymy is also present in
the Metathesaurus, irrespective of whether it
agrees or disagrees with the Metathesaurus view.
71This strange mapping
- between names as they appear in different source
vocabularies created for widely different
purposes can still be very useful - but the source vocabularies themselves are of
variable quality - (not all mappings are created equal)
- and the sorts of search which the UMLS supports
reflects an already outmoded technology
72is_a
- congenital absent nipple is_a nipple
- surgical procedure not carried out because of
patients decision is_a surgical procedure - cancer documentation is_a cancer
- disease prevention is_a disease
- living subject is_a information object
representing an animal or complex organism - individual allele is_a act of observation
- limb is_a tissue
73is_a (sensu UMLS)
- both testes is_a testis
- plant leaves is_a plant
- smoking is_a individual behavior
- walking is_a social behavior
74is_a
- A is_a B def
- For all x, if x instance_of A then x instance_of
B - cell division is_a biological process
- adult is_a child ???
75Two kinds of entities
- occurrents (processes, events, happenings)
- cell division, ovulation, death
- continuants (objects, qualities, ...)
- cell, ovum, organism, temperature of organism,
...
76is_a (for occurrents)
- A is_a B def
- For all x, if x instance_of A then x instance_of
B - cell division is_a biological process
77is_a (for continuants)
- A is_a B def
- For all x, t if x instance_of A at t then x
instance_of B at t - abnormal cell is_a cell
- adult human is_a human
- but not adult is_a child
78part_of
- Composes, with one or more other physical units,
some larger whole. - (UMLS Semantic Network)
- what does this relation relate?
- A is_a B def A is narrower in meaning than B
79Part_of as a relation between types is more
problematic than is standardly supposed
- heart part_of human being ?
- human heart part_of human being ?
- human being has_part human testis ?
- testis part_of human being ?
80Definition of part_of as a relation between types
- A part_of B Def all instances of A are
instance-level parts of some instance of B - human testis part_of adult human being
81two kinds of parthood
- between instances
- Marys heart part_of Mary
- this nucleus part_of this cell
- between types
- human heart part_of human
- cell nucleus part_of cell
82part_of (for occurrents)
- A part_of B def.
- For all x, if x instance_of A then there is some
y, y instance_of B and x part_of y - where part_of is the instance-level part
relation - EVERY A IS PART OF SOME B
83part_of (for continuants)
- A part_of B def.
- For all x, t if x instance_of A at t then there
is some y, y instance_of B at t and x part_of y - where part_of is the instance-level part
relation - NOTE THE ALL-SOME STRUCTURE
84A part_of B, B part_of C ...
- The all-some structure of such definitions allows
- cascading of inferences
- (i) within ontologies
- (ii) between ontologies
- (iii) between ontologies and EHR repositories of
instance-data
85Cascading inferences
- Whichever A you choose, the instance of B of
which it is a part will be included in some C,
which will include as part also the A with which
you began - The same principle applies to the other relations
in the OBO-RO - located_at, transformation_of, derived_from,
adjacent_to, etc.
86is_a and part_of never cross categorial divides
(cf. tripartite organization of GO)
- if A is_a B
- then A is an object type iff B is an object type
- then A is a process type iff B is a process type
- then A is a characteristic type iff B is a
characteristic type
87Kinds of relations
- Between types
- is_a, part_of, ...
- Between an instance and a type
- this explosion instance_of the type explosion
- Between instances
- Marys heart part_of Mary
88Continuity
- instance a continuous_with instance b
- is always symmetric
- But consider the types lymph node and lymphatic
vessel - Each lymph node is continuous with some
lymphatic vessel, but there are lymphatic vessels
(e.g. lymphs and lymphatic trunks) which are not
continuous with any lymph nodes - Continuity on the type level is not symmetric.
89Adjacency as a relation between universals is not
symmetric
- Consider
- seminal vesicle adjacent_to urinary bladder
- Not urinary bladder adjacent_to seminal vesicle
90- Instance level
- this nucleus is adjacent to this cytoplasm
- implies
- this cytoplasm is adjacent to this nucleus
- Type level
- nucleus adjacent_to cytoplasm
- Not cytoplasm adjacent_to nucleus
91Applications
- Expectations of symmetry e.g. for protein-protein
interactions hmay hold only at the instance level - if A interacts with B, it does not follow that B
interacts with A - if A is expressed simultaneously with B, it does
not follow that B is expressed simultaneously
with A
92Definitions of the all-some form
- allow cascading inferences
- If A R1 B and B R2 C, then we know that
- every A stands in R1 to some B, but we know also
that, whichever B this is, it can be plugged into
the R2 relation
93GALEN Vomitus contains carrot
- All portions of vomit contain all portions of
carrot - All portions of vomit contain some portion of
carrot - Some portions of vomit contain some portion of
carrot - Some portions of vomit contain all portions of
carrot
94transformation_of
95transformation_of
- A transformation_of B Def.
- Every instance of A was at some earlier time an
instance of B - adult transformation_of child
96embryological development
97tumor development
98derives_from
C1 c1 at t1
C c at t
time
C' c' at t
ovum
zygote derives_from
sperm
99Request from Bill Bug
- How best to effectively bring together
- - spatial/morphological ontologies
- - neuroscience terminologies (e.g., NeuroNames)
and - - data-centric neuroanatomical indexing systems
(voxel-based 3D atlases) - to promote optimal integration of neuroscience
data sets that include a spatial component,
however coarse.
100A suite of defined relations between universals
Foundational is_apart_of
Spatial located_incontained_inadjacent_to
Temporal transformation_ofderives_frompreceded_by
Participation has_participanthas_agent
101Logical Theory of Spatial Relations
- RCC Region-Connection Calculus (Leeds
Qualitative Spatial Reasoning Group) - Cf. Dameron et al. Modeling dependencies between
relations to ensure consistency of a cerebral
cortex anatomy knowledge base
102 Principles
- 1 anatomical structure ? 1 regionhas_location
- Define the relationships of adjacency,
connectedness etc. using RCC-8 and its extensions
NTPP
PO
TPP
EQ
DC
EC
103Example 1
- Reasoning with part and location at the instance
level
Operc. Pars of Inferior Frontal Gyrus
Inferior Frontal Gyrus
104Example 2
- Reasoning with location, continuity and external
connection
PreCentral Gyrus
PostCentral Gyrus
105Extension to the 3-D case
106Most ontologies are execrableBut some good
ontologies do already exist
- as far as possible dont reinvent
- use the power of combination and collaboration
- ontologies are like telephones they are valuable
only to the degree that they are used and
networked with other ontologies - but choose working telephones
- most UMLS telephones were broken from the start
107Why do we need rules/standards for good ontology?
- Ontologies must be intelligible both to humans
(for annotation) and to machines (for reasoning
and error-checking) unintuitive rules for
classification lead to errors - Intuitive rule facilitate training of curators
and annotators - Common rules allow alignment with other
ontologies - Logically coherent rules enhance harvesting of
content through automatic reasoning systems
108To the degree that basic rules of good ontology
are not satisfied, error checking and ontology
alignment will be achievable, at best, only
with human intervention via force majeure
with unstable results
109Current practice in the domain of clinical
research
- Results of clinical trials are organized too
tightly around specific diagnostic criteria
imposed by specific, local, hypotheses - A change in criteria forces a costly
re-examination and re-coding of all existing
records to make them usable in future hypothesis
generation and testing.
110How to solve this problem?
- Just as clinical hypotheses need to be tied to
basic science, so special-purpose application
ontologies need to be tied to general-purpose
reference ontologies
111 How to solve this problem?
- We separate
- data as interpreted in terms of current criteria
- from
- the structure of the underlying biomedical
reality - and ensure that the first is stored and
processed always by using terms drawn from a
shared, stable representation (a reference
ontology) of the latter. - Diagnostic criteria for a disease can then be
changed but we will still maintain access to the
data relevant to all prior diagnosed cases of the
disease in question.
112Not only data needs to be aligned through
pre-established reference ontologies, so also
does software
- Currently, application ontologies are built
afresh for each new application - They commonly introduce new idiosyncrasies of
terminology, format or logic, plus
simplifications or distortions of their
subject-matters. - This may do no harm in relation to the specific
application (for example radiology, tissue
classification, cancer staging) and keeps the
software simple
113 But what happens
- when other applications want to use the data
annotated in their terms, or when we need to
extend to a larger portion of biomedical
reality?Now the expanded ontology will no longer
be compatible with the software designed for its
original application. - Different groups now need to start working with
different and incompatible versions of an
ontology, engendering a spiralling complexity as
these different versions themselves become
revised and extended for different purposes.
114The solution
- The methodology of always developing application
ontologies against the backgrund of formally
robust reference ontologies can both counteract
these tendencies toward ontology proliferation
and ensure the interoperability of application
ontologies as they become further developed in
the future.
115The methodology of reference ontologies
- can provide locally developed application
ontologies with cross-granular understanding of
the ways processes at the gene and protein level
are linked to clinically salient processes at
coarser granularity - and it can allow them take advantage of existing
logical tools and methods for reasoning across
large bodies of data.
116An application ontology
- is comparable to an engineering artifact such as
a software tool. It is constructed for a specific
practical purpose. - Examples
- NCIT
- FuGO Functional Genomics Investigation Ontology
117A reference ontology
- A reference ontology has a unified
subject-matter, which consists of entities
existing independently of the ontology, and it
seeks to optimize descriptive or representational
adequacy to this subject matter. - A reference ontology is analogous to a scientific
theory. Thus it consists of representations of
biological reality which are correct when viewed
in light of our current understanding of reality,
and it must be subjected to updating in light of
scientific advance. - Example The Foundational Model of Anatomy
118Current Best Practice
119(No Transcript)
120Anatomical Space
Anatomical Structure
Organ Cavity Subdivision
Organ Cavity
Organ
Serous Sac
Organ Component
Serous Sac Cavity
Tissue
Serous Sac Cavity Subdivision
is_a
Pleural Sac
Pleura(Wall of Sac)
Pleural Cavity
part_of
Parietal Pleura
Visceral Pleura
Interlobar recess
Mediastinal Pleura
Mesothelium of Pleura
121The Foundational Model of Anatomy
- Follows formal rules for Aristotelian
definitions - When A is_a B, the definition of A takes the
form - an A def. a B which ...
- a human being def. an animal which is rational
122FMA Example
- Cell def. an anatomical structure which consists
of cytoplasm surrounded by a plasma membrane with
or without a cell nucleus - Plasma membrane def. a cell part that surrounds
the cytoplasm
123The FMA regimentation
- Brings the advantage that each definition
reflects the position in the hierarchy to which a
defined term belongs. - The position of a term within the hierarchy
enriches its own definition by incorporating
automatically the definitions of all the terms
above it. - The entire information content of the FMAs term
hierarchy can be translated very cleanly into a
computer representation
124GO now adopting structured definitions which
contain both genus and differentiae
Species def Genus Differentiae
neuron cell differentiation def differentiation
by which a cell acquires features of a neuron
125Ontology alignmentOne of the current goals of GO
is to align
Cell Types in GO
Cell Types in the Cell Ontology
with
- cone cell fate commitment
keratinocyte
keratinocyte differentiation
fat_cell
adipocyte differentiation
dendritic_cell
dendritic cell activation
lymphocyte
lymphocyte proliferation
T_lymphocyte
T-cell homeostasis
garland_cell
garland cell differentiation
heterocyst
heterocyst cell differentiation
126Alignment of the two ontologies will permit the
generation of consistent and complete definitions
GO
Cell type
Osteoblast differentiation Processes whereby an
osteoprogenitor cell or a cranial neural crest
cell acquires the specialized features of an
osteoblast, a bone-forming cell which secretes
extracellular matrix.
New Definition
127Other Ontologies to be aligned with GO
- Chemical ontologies
- 3,4-dihydroxy-2-butanone-4-phosphate synthase
activity - Anatomy ontologies
- metanephros development
- GO itself
- mitochondrial inner membrane peptidase activity
- ? OBO core
128eventually to comprehend all of OBO
129Anatomical Space
Anatomical Structure
Organ Cavity Subdivision
Organ Cavity
Organ
is_a
Serous Sac
Organ Component
Serous Sac Cavity
Tissue
Serous Sac Cavity Subdivision
Pleural Sac
Pleura(Wall of Sac)
Pleural Cavity
part_of
Parietal Pleura
Visceral Pleura
Interlobar recess
Mediastinal Pleura
Mesothelium of Pleura
130(No Transcript)
131The Anatomy Reference Ontology
- is organized in a graph-theoretical structure
involving two sorts of links or edges - is-a ( is a subtype of )
- (pleural sac is-a serous sac)
- part-of
- (cervical vertebra part-of vertebral column)
132at every level of granularity
133Modularity
134How does a kidney work?
NEPHRON
135Nephron Functions
FUNCTIONAL SEGMENTS
136Top-Level Categories in the FMA
137- anatomical structure (cell, lung, nerve, tooth)
- result from the coordinated expression of
structural genes - have their own 3-D shape
138- portion of body substance
- inherits its shape from container
- portion of urine
- portion of menstrual fluid
- portion of blood
139- anatomical space
- cavities, conduits
140- anatomical attribute
- mass
- weight
- temperature
- your temperature
- its value now
141- anatomical relationship
- located_in
- contained_in
- adjacent_to
- connected_to
- surrounds
- lateral_to (West_of)
- anterior_to
142- boundary
- bona fide / fiat
www.enel.ucalgary.ca/ People/Mintchev/stomach.htm
143Connectedness and Continuity
- The body is a highly connected entity.
- Exceptions cells floating free in blood
- continuous_with,
- attached_to (muscle to bone)
- synapsed_with (nerve to nerve and nerve to
muscle) - Two continuants are continuous on the instance
level if and only if they share a fiat boundary.
144 basis for generalization to other species
Anatomical Space
Anatomical Structure
Organ Cavity Subdivision
Organ Cavity
Organ
Serous Sac
Organ Component
Serous Sac Cavity
Tissue
Serous Sac Cavity Subdivision
Pleural Sac
Pleura(Wall of Sac)
Pleural Cavity
Parietal Pleura
Visceral Pleura
Interlobar recess
Mediastinal Pleura
Mesothelium of Pleura
145Anatomical Space
Anatomical Structure
Organ Cavity Subdivision
Organ Cavity
Organ
Serous Sac
Organ Component
Serous Sac Cavity
Tissue
Serous Sac Cavity Subdivision
is_a
Pleural Sac
Pleura(Wall of Sac)
Pleural Cavity
part_of
Parietal Pleura
Visceral Pleura
Interlobar recess
Mediastinal Pleura
Mesothelium of Pleura
146Web-Based Representations of Neuroanatomy
147(No Transcript)
148includes Neuronames
149(No Transcript)
150Human Morphometry and Function BIRN Testbeds
- with thanks to Christine Fennema-Notestine and
Jessica Turner
CBiO/BIRN Workshop 2006
151BIRN Ontology Needs
- GOAL User will employ BIRN interface and
Mediator to perform scientific queries on data
from - structural and functional MRI experiments
- clinical assessments
- psychiatric interviews
- and/or behavioral experiments
- BIRN needs for common vocabularies
- Mediator needs to talk across databases to find
relevant/similar information this requires
linking of concepts to table columns and values - Query interface needs semantic network to find
related information
152Example queries
- Find all datasets of schizophrenics with
structural and functional imaging data related to
working memory - Find the correlation between hippocampal volume
and working memory performance in AD subjects
153MBIRN priorities
- To relate clinical assessments, cognitive
function, and neuroanatomy within mBIRNs
multi-site AD sample, with future branching into
neuropsychiatric measures - Only a high quality reference ontology of
neuro(patho)anatomy from the macroscopic to the
subcellular levels of granularity can give you
this
154Existing neuroanatomical ontology
Brain
- Need to create related function-based ontology
Cerebellum
Cerebrum
CVLT
Cerebral white matter
Cerebral cortex
Frontal cortex
Temporal cortex
Memory
Mesial temporal
Superior temporal
Amygdala
Hippocampus
155 Need to create related function-based
ontology
- UMLS mental process is_a organism function
- Function vs. functioning
- Many entities have functions which they never
realise - A has function B A can realise B (under which
circumstances?)
156 Need to create related function-based
ontology
- A function is a disposition of an independent
continuant to engage in corresponding processes. - To what extent are the various functions
identified by BIRN are in fact complex processes
with many less complex processes as their parts. - How are functions different from disfunctions /
malfunctions ? - Are all function such that their execution is
good for the organism?
157 Need to create related function-based
ontology
- You cannot classify parts of the brain on the
basis of which parts can think, remember, effect
movement or perceive various kinds of sensations,
just as you cannot sort anatomical entities on
the basis of which can pump, digest, secrete,
fertilize or stabilize. - It is impossible to create an inheritance class
subsumption hierarchy of neuroanatomical entities
at any meaningful depth on the basis of
function.
158Assessment
Brain
Neuropsychology
Cerebrum
Amnesia
Cognition
Cerebral cortex
Frontal
Temporal
Cognitive impairment
Memory
Learning
Mesial temporal
CVLT
Hippocampus
Task and score description
159Can we reason on the basis of a graph of this
sort?
Behavioral Paradigm
Assessment
SCID-Patient
SIRP
CVLT
Breathhold
Long Term memory
Working memory
Memory
Attention
Cognitive Process
Action
160Bonfire Relations
relation the type of relation between the
concept to the left and the concept to the
rightPAR ParentCHD ChildSIB SiblingRB
Broader RelationshipRN Narrower
RelationshipRO Other Relationship
161BIRN Relations
- UMLS (PAR, CHD, RN, RO, RB, SY).
- RB has a broader relationship
- RN has a narrower relationship
- RO has relationship other than synonymous,
narrower, or broader - CHD has child relationship in a Metathesaurus
- SIB has sibling relationship in a Metathesaurus
source vocabulary
162Circular Hierarchical Relationships in the
UMLSEtiology, Diagnosis, Treatment,
Complications and PreventionOlivier Bodenreider
- Topographic regions General terms
- Physical anatomical entity
- Anatomical spatial entity
- Anatomical surface
- Body regions
- Topographic regions
163MeSH
- MeSH Descriptors Index Medicus Descriptor
Anthropology, Education, Sociology and Social
Phenomena (MeSH Category) Social
Sciences - Political Systems National
Socialism - National Socialism is_a Political Systems
- National Socialism is_a Anthropology ...
164MeSH
- National Socialism is_a MeSH Descriptor
- Cf. NeuroNames
- Ontology def a codification of the
relationships between words and concepts
165Human BIRN data includes
- Participant demographics such as age, gender,
- Clinical and psychiatric information
- Assessments used, data type
- Diagnostic information
- Behavioral data during fMRI tasks
- Need to know how to interpret that (is a button
1 response a yes or a no?) - Raw structural and functional images
- Need information about data collection and
preprocessing methods - Single-subject and group level analyses and
results - Need information about analytic methods used
166Areas where application ontologies will be needed
- Participant demographics such as age, gender,
- Clinical and psychiatric information
- Assessments used, data type
- Diagnostic information
- Behavioral data during fMRI tasks
- Need to know how to interpret that (is a button
1 response a yes or a no?) - Raw structural and functional images
- Need information about data collection and
preprocessing methods - Single-subject and group level analyses and
results - Need information about analytic methods used
167Bottom-up search
- Users dataset contains the CVLT what does it
measure? - Search for CVLT
- Related to PARENT concepts like
Neuropsychological tests or Assessment Scales
or SIBLING concepts of other tests - What is the CVLT? This doesnt answer the users
question. - Need relationship links to function memory and
learning - Need relationship links to structure anatomical
regions reflected in change of performance on
this measure ? hippocampus
168Top-down search
- User interested in studying the relationship
between hippocampal volume and memory performance
in Alzheimers disease. - Search for measures of memory
- Would like to see memory linked to CVLT
- Would like to see memory linked to hippocampus at
a very basic level - Would like to see links to potential disorders
assessed, e.g., amnesia or AD
169Ontology/Terminology Infrastructure
- GOAL to allow database mediation and scientific
queries for multi-site clinical neuroimaging
studies. This requires the relationship of
database tables to concepts and to relate brain
structure and function through neuroanatomical
regions, neuropsychological and cognitive terms,
and clinical assessments.
170Ontology/Terminology Infrastructure
- To do this, the Mediator relies in part on
defined terms/concepts to define relationships
between similar terms from different databases. - If a user is interested in data related to long
delay free recall," it is important to also
include information related to memory." This
type of relational knowledge is critical to find
other values in other databases that have similar
information.
171Ontology/Terminology Infrastructure
- In addition, the ontology will provide a
semantic network for a user searching for
memory" information, related information would
include - Cognitive terms, e.g., recall, recognition, short
and long term memory - Assessment terms, e.g., California Verbal
Learning Test - Disorders of terms, e.g., Alzheimers disease
is a disorder of memory
How block information overload?
172Bottom-up search
- Users resultant dataset contains the MMSE the
user asks what does it measure? - Search for MMSE concept
- Related to PARENT concepts like
Neuropsychological tests or Assessment Scales
or SIBLING concepts of other tests - What is the MMSE? This doesnt answer the users
question. - Need relationship links to function general
cognitive ability, cognitive impairment, dementia
severity, brain damage - Need relationship links to structure anatomical
regions reflected in change of performance on
this measure, although a relatively non-specific
measure
173Top-down search
- What variables exist that would provide a measure
of general cognitive function and dementia
severity? - Search for measures of (general) cognitive
function - Would like to see general cognitive ability,
cognitive impairment, dementia severity linked to
MMSE - Would like to see general cognitive ability,
cognitive impairment, dementia severity linked to
neuroanatomical regions, simply brain in this
case - Would like to see links to potential disorders
measured, e.g., AD
174NeuroNames (with thanks to Onard Mejino)
- has a limited scope.
- It deals with neuroanatomical structures only at
the gross level. No cellular, subcellular or
macromolecular entities are represented. - The peripheral nervous system and the spinal cord
are not included. - It represents structures from different species
(human, macaque and rodent) in the same
hierarchy.
175NNs main hierarchy
- is a partonomy based on mutually exclusive and
exhaustive volumetric partitions, the equivalent
of regional partition in the FMA. - The partonomy supports only ONE partition view
and therefore does not accommodate - other recognized regional partitions like Brodman
areas (treated as ancillary structures) - constitutional parts like the internal pyramidal
layer of neocortex and the vasculature of
neuraxis (entities that have important clinical
significance) - new partitions advanced by new technology like
gene expression mappings or radiologic imaging
techniques - partitions determined by formal spatial
region-based ontologies like RCC
176The Neuronames partonomy
- will serve at best as an application ontology
for annotating segmented images of the brain. - But it will still be very difficult to link the
annotated image data to all the other types of
data which will BIRN will need to describe - ? a reference ontology of neuroanatomy is a first
priority.
177Neuronames
- Since univocity is not enforced in the literature
of neuroanatomy, e.g. the term Basal ganglia
represents different structures when used in
association with anatomic, functional and
clinical views. - How will NN resolve or clarify this?
178Neuronames
- entities are primarily identified on the basis of
stains that distinguish gray matter from white
matter - thus not on principles or rules that define the
type of the entity in question, thereby yielding
a partition not in accord with the standards
commonly accepted for representing the rest of
the body. - gray matter and white matter are viewed as
tissues. But tissue is usually defined as an
aggregate of similarly specialized cells and
intercellular matrix. - yet gray matter consists not of cells but of cell
bodies, white matter not of cells but of neurites
179Neuronames
- gives no explicit definitions, and the
representations it gives (e.g. of the Fourth
Ventricle) are often at odds with consensual
usage - hence scalability, extendability, combinability
with other ontologies is limited how then can
it be used to bridge research efforts at the
genomic / proteomic level with those at the
clinical level? - Information unique to neuroanatomical entities
such as axonal input/output relationships,
connectivity, neuron type, neurotransmitter and
receptor types are indispensable in establishing
and understanding both structural and
physiological relationships among neuroanatomical
entities and their relationship with the rest of
the body.
180BIRNLex
- does provide definitions, normally taken over
from UMLS
181Rules for definitions
- A child term
- B parent term
- an A def a B which Cs
- If a definition is correct it should always make
sense to substitute a B which Cs for an A - A human being is subject to processes of aging
- A rational animal is subject to processes of
aging
182BIRNLex
- The eye def.
- The eyeball and its constituent parts, e.g.
retina - mouse def.
- common name for the species mus musculus
183BIRNLex
184BIRNLex
185BIRNLex
186BIRNLex
bear in mind always that your ontology needs to
be interoperable with other ontologies
187BIRNLex
bear in mind always that your ontology needs to
be interoperable with other ontologies
188BIRNLex
- surface def 3D segmentation obtained by fitting
a polygonal mesh around the boundary of an object
of interest, creating a 3D surface - Concept def Generic ideas or categories derived
from common properties of objects, events, or
qualities, usually represented by words or
symbols -
189BIRNLex
- brain imaging def none synonymous with
positrocephalogram, nos - CA1 def CA1 cytoarchitectonic field of
hippocampus - cognitive process def. conceptual function or
thinking in all its forms
190BIRNLex and UMLS-SN
- Rest SN Daily or Recreational Activity
- Principal Investigator SN Professional or
Occupational Group - Left handedness SN Organism Attribute
- Ambidextrous SN Finding
- Brain Imaging SN Diagnostic Procedure
- Brain Mapping SN Diagnostic Procedure Research
Activity - Healthy Adult SN Finding
191BIRNLex
192Mouse BIRN Ontologies
Mouse BIRN Ontologies
- Maryann Martone
- and
- Bill Bug
Maryann Martone and Bill Bug
2005 All Hands Meeting
193Use of Ontologies in BIRN
- Databases
- Enforces semantic consistency within a database
- Data Sharing
- Establishes semantic relationship among concepts
contained in distributed databases - Data integration
- Bridging across multiscale and multimodal data
- Concept-based queries
- Ontologies can be used to alter semantic context
to present a view of the conceptual aspects of a
data set or meta-analysis result most relevant to
a particular neuroscientist
194Objectives of Working Group
- Educate BIRN participants on the use of
ontologies and associated tools for data
integration - Tuesday (PM) and Wednesday (AM)
- Develop a set of ontology resources for BIRN
participants, based on existing ontologies where
possible - Identify areas that are not well covered by
existing ontologies for possible development. - Develop a clear set of policies and procedures
for working with ontologies - Including curation, addition of core ontologies,
extension of ontologies, mapping of databases to
ontologies
195Goals of OTF
- Provide a dynamic knowledge infrastructure to
support integration and analysis of BIRN
federated data sets, one which is conducive to
accepting novel data from researchers to include
in this analysis. - Identify and assess existing ontologies and
terminologies for summarizing, comparing,
merging, and mining datasets. Relevant subject
domains include clinical assessments,
demographics, cognitive task descriptions,
imaging parameters/data provenance in general,
and derived (fMRI) data. - Identify the resources needed to achieve the
ontological objectives of individual test-beds
and of the BIRN overall. May include finding
other funding sources, making connections with
industry and other consortia facing similar
issues, and planning a strategy to acquire the
necessary resources.
196BIRN Ontology Resources
Mouse BIRN Ontology Resource Page
http//nbirn.net/Resources/Users/Ontologies/
197Current Ontology Development by Mouse BIRN
Participants
- Developmental Ontology
- Seth Ruffins, Cal Tech
- Subcellular Anatomy
- Maryann Martone and Lisa Fong, UCSD
198Ontology for Subcellular Anatomy of Nervous System
199CCDB Dictionary
Term Ontology ConceptID Semantic Type Definition
Cerebellum UMLS C0007765 Body Part, Organ, or Organ Component Part of the metencephalon that lies in the posterior cranial fossa behind the brain stem. It is concerned with the coordination of movement. (MSH)
Glial Fibrillary Acidic Protein UMLS C0017626 Amino Acid, Peptide, or Protein, Biologically Active Substance An intermediate filament protein found only in glial cells or cells of glial origin. MW 51,000. (MSH)
Medium Spiny Neuron Bonfire BID000012 Cell Small (10-15 µm in diameter) projection neurons found in neostriatum, possessing a rougly spherical dendritic tree composed of 3-5 primary dendrites. Dendrites are covered with dendritic spines.
Purkinje cell UMLS C0034143 Cell large branching neurons of the middle layer of cerebellar cortex, characterized by vast arrays of dendrites the output neurons of the cerebellar cortex.
200Some Areas of Interest to BIRN
Linking animal and human imaging data
Navigating through Multi-resolution information
brain
Entopeduncular nucleus
Globus pallidus, internal segment
cerebellum
Disease Process
Animal Model
cerebellar cortex
Purkinje cell
- Map between Human and Animal models
- Functional assessment
dendritic spine
201Anatomical Knowledge Sources
- Foundational model of anatomy
- Neuronames (Brain Info)
- BAMS
- Adult Mouse Anatomical Dictionary (Edinburgh/GO)
Although BIRN is an open, diverse and fluid
environment, the use of ontologies for enhanced
interoperability will be pointless if we allow
ran