10:30-12:00 How to Build an Ontology 1-2pm Best Practices and Lessons Learned 2-3pm BIRN Ontologies: An Overview - PowerPoint PPT Presentation

1 / 205
About This Presentation
Title:

10:30-12:00 How to Build an Ontology 1-2pm Best Practices and Lessons Learned 2-3pm BIRN Ontologies: An Overview

Description:

10:30-12:00 How to Build an Ontology 1-2pm Best Practices and Lessons Learned 2-3pm BIRN Ontologies: An Overview – PowerPoint PPT presentation

Number of Views:348
Avg rating:3.0/5.0
Slides: 206
Provided by: md79
Category:

less

Transcript and Presenter's Notes

Title: 10:30-12:00 How to Build an Ontology 1-2pm Best Practices and Lessons Learned 2-3pm BIRN Ontologies: An Overview


1
1030-1200 How to Build an Ontology 1-2pm Best
Practices and Lessons Learned 2-3pm BIRN
Ontologies An Overview
2
How to Build an Ontology
3
High quality shared ontologies build communities
  • General trend on the part of NIH, FDA and other
    bodies to consolidate ontology-based standards
    for the communication and processing of
    biomedical data.
  • NCIT / caBIG / NECTAR / BIRN / OBO ...

4
TWO STRATEGIESAd hoc creation of new database
schemas for each research group / research
hypothesisvs.
  • Pre-established interoperable stable reference
    ontologies in terms of which all database schemas
    need to be defined

5
  • How to create the conditions for a step-by-step
    evolution towards gold standard reference
    ontologies in the biomedical domain
  • ... and why we need to create these conditions
  • OBO Core project

6
  • Ontology def
  • A representation of the types of entities
    existing in a given domain of reality, and of the
    relations between these types

7
Types have instances
  • Ontologies are like science texts they are about
    types
  • (Diaries, databases, clinical records are about
    instances)

8
The need
  • strong general-purpose classification
    hierarchies created by domain specialists
  • clear, rigorous definitions
  • thoroughly tested in real cases
  • ontologies teach us about the instances in
    reality by supporting cross-disciplinary
    (cross-ontology) reasoning about types

9
The actuality (too often)
  • myriad special purpose light ontologies,
    prepared by ontology engineers and deposited in
    internet repositories or registries

10
  • these light ontologies often do not generalize
  • repeat work already done by others
  • are not interoperable
  • reproduce the very problems of communication
    which ontology was designed to solve
  • contain incoherent definitions
  • and incoherent documentation

11
BIRN Ontology Experiences
  • In the short-term, users will probably download
    the data or analyses and extract the results
    using their preferred methods.
  • In the long term, however, that will become
    infeasible
  • the databases will have to be made interoperable
    with standard datamining software.
  • This is where the neuroanatomy ontologies come
    in.
  • We will need to know what the ROI is and which
    naming scheme it came from (e.g., a Brodmanns
    area, or a sulcal/gyral area, etc.). Well need
    to know how it was defined (Talairach atlas? MNI
    atlas? LONI atlas? Or subject-specific regions?)
    and what the statistic is.

12
BIRN Ontology Experiences
  • In the short-term, users will probably download
    the data or analyses and extract the results
    using their preferred methods.
  • In the long term that will become infeasible

13
The long term begins here
14
A methodology for quality-assurance of ontologies
  • tested thus far in the biomedical domain on
  • FMA
  • GO other OBO Ontologies
  • FuGO
  • SNOMED
  • UMLS Semantic Network
  • NCI Thesaurus
  • ICF (International Classification of Functioning,
    Disability and Health)
  • ISO Terminology Standards
  • HL7-RIM

15
A methodology for quality-assurance of ontologies
  • accepted need for application of this
    methodology
  • FMA
  • GO other OBO Ontologies
  • FuGO
  • SNOMED
  • UMLS Semantic Network
  • NCI Thesaurus
  • ICF (International Classification of Functioning,
    Disability and Health)
  • ISO Terminology Standards
  • HL7-RIM

16
A methodology for quality-assurance of ontologies
  • signs of hope
  • FMA
  • GO other OBO Ontologies
  • FuGO
  • SNOMED
  • UMLS Semantic Network
  • NCI Thesaurus
  • ICF (International Classification of Functioning,
    Disability and Health)
  • ISO Terminology Standards
  • HL7-RIM

17
We know that high-quality ontologies built
according to this methodology can help in
creating high-quality mappings between human and
model organism phenotypes
18
Alignment of Multiple Ontologies of Anatomy
Deriving Indirect Mappings from Direct Mappings
to a Reference OntologySongmao ZhangOlivier
BodenreiderAMIA 2005
19
We also know that OWL is not enough to ensure
high-quality ontologies
  • and that the use of a common syntax and logical
    machinery and the careful separating out of
    ontologies into namespaces does not solve the
    problem of ontology integration

20
A basic distinction
  • type vs. instance
  • science text vs. clinical document
  • man vs. Musen

21
Instances are not represented in an ontology
  • It is the generalizations that are important
  • (but instances must still be taken into account)

22
A 515287 DC3300 Dust Collector Fan
B 521683 Gilmer Belt
C 521682 Motor Drive Belt

23
Ontology Types Instances




24
Ontology A Representation of Types




25
Ontology A Representation of Types
  • Each node of an ontology consists of
  • preferred term (aka term)
  • term identifier (TUI, aka CUI)
  • synonyms
  • definition, glosses, comments

26
Ontology A Representation of Types
Nodes in an ontology are connected by
relations primarily is_a ( is subtype of) and
part_of designed to support search, reasoning and
annotation
27
types
mammal
frog
leaf class
28
Rules for formating terms
  • Terms should be in the singular
  • Terms should be lower case
  • Avoid abbreviations even when it is clear in
    context what they mean (breast for breast
    tumor)
  • Avoid acronyms
  • Avoid mass terms (tissue, brain mapping,
    clinical research ...)
  • Each term A in an ontology is shorthand for a
    term of the form the type A

29
Motivation to capture reality
  • Inferences and decisions we make are based upon
    what we know of reality.
  • An ontology is a computable representation of the
    underlying biological reality.
  • Designed to enable a computer to reason over the
    data we derive from this reality in (some of) the
    ways that we do.

30
Concepts
  • Biomedical ontology integration will never be
    achieved through integration of meanings or
    concepts
  • The problem is precisely that different user
    communities use different concepts
  • Concepts are in your head and will change as your
    understanding changes

31
Concepts
  • Ontologies represent types not concepts,
    meanings, ideas ...
  • Types exist, with their instances, in objective
    reality
  • including types of image, of imaging process,
    of brain region, of clinical procedure, etc.

32
Rules on types
  • Dont confuse types with words
  • Dont confuse types with concepts
  • Dont confuse types with ways of getting to know
    types
  • Dont confuse types with ways of talking about
    types
  • Dont confuses types with data about types

33
Some other simple rules for high quality
ontologies
34
Univocity
  • Terms should have the same meanings on every
    occasion of use.
  • They should refer to the same kinds of entities
    in reality
  • Basic ontological relations such as is_a and
    part_of should be used in the same way by all
    ontologies

35
Positivity
  • Complements of types are not themselves types.
  • Hence terms such as
  • non-mammal
  • non-membrane
  • other metalworker in New Zealand
  • do not designate types in reality
  • There are also no conjunctive and disjunctive
    types
  • protoplasmic astrocyte and Schwann cell
  • Purkinje neuron or dendritic shaft

36
Objectivity
  • Which types exist is not a function of our
    knowledge.
  • Terms such as unknown or unclassified or
    unlocalized do not designate types in reality.

37
Single Inheritance
  • No kind in a classificatory hierarchy should
    have more than one is_a parent on the immediate
    higher level

38
Multiple Inheritance
  • thing

blue thing
car
is_a1
is_a2
blue car
39
is_a Overloading
  • serves as obstacle to integration with
    neighboring ontologies
  • The success of ontology alignment demands that
    ontological relations (is_a, part_of, ...) have
    the same meanings in the different ontologies to
    be aligned.
  • See Relations in Biomedical Ontologies, Genome
    Biology May 2005.
  • ? DISEASE MAPS

40
General Rule
  • Formulate universal statements first
  • Move to A may be B in such and such a context
    later

41
Intelligibility of Definitions
  • The terms used in a definition should be simpler
    (more intelligible) than the term to be defined
    otherwise the definition provides no assistance
  • to human understanding
  • to machine processing

42
Definitions should be intelligible to both
machines and humans
  • Machines can cope with the full formal
    representation
  • Humans need clarity and modularity

43
But
  • Some terms are primitive (cannot be defined)
  • AVOID CIRCULAR DEFINITIONS
  • Avoid definitions of the forms
  • An A is an A which is B (person person with
    identity documents)
  • An A is the B of an A (heptolysis the causes of
    heptolysis)

44
Case Study The National Cancer Institute
Thesaurus (NCIT)
  • does not (yet) satisfy these and other simple
    principles

45
The NCIT reflects a recognition of the need
  • for high quality shared ontologies and
    terminologies the use of which by clinical
    researchers in large communities can ensure
    re-usability of data collected by different
    research groups

46
NCIT
  • a biomedical vocabulary that provides
    consistent, unambiguous codes and definitions for
    concepts used in cancer research
  • exhibits ontology-like properties in its
    construction and use.

47
Goals
  • to make use of current terminology best
    practices to relate relevant concepts to one
    another in a formal structure, so that computers
    as well as humans can use the Thesaurus for a
    variety of purposes, including the support of
    automatic reasoning
  • to speed the introduction of new concepts and
    new relationships in response to the emerging
    needs of basic researchers, clinical trials,
    information services and other users.

48
Formal Definitions
  • of 37,261 nodes, 33,720 were stipulated to be
    primitive in the DL sense
  • Thus only a small portion of the NCIT ontology
    can be used for purposes of automatic
    classification and error-checking by using OWL.

49
Verbal Definitions
  • About half the NCIT terms are assigned verbal
    definitions
  • Unfortunately some are assigned more than one

50
Disease Progression
  • Definition1
  • Cancer that continues to grow or spread.
  • Definition2
  • Increase in the size of a tumor or spread of
    cancer in the body.
  • Definition3
  • The worsening of a disease over time. This
    concept is most often used for chronic and
    incurable diseases where the stage of the disease
    is an important determinant of therapy and
    prognosis.

51
To make matters worse Disease Progression has as
subclass
  • Cancer Progression
  • Definition
  • The worsening of a cancer over time. This
    concept is most often used for incurable cancers
    where the stage of the cancer is an important
    determinant of therapy and prognosis.

52
Cancer
  • a process (of getting better or worse)
  • an object (which can grow and spread)

53
Confuses definitions with descriptions
  • Tuberculosis
  • Definition
  • A chronic, recurrent infection caused by the
    bacterium Mycobacterium tuberculosis.
    Tuberculosis (TB) may affect almost any tissue or
    organ of the body with the lungs being the most
    common site of infection. The clinical stages of
    TB are primary or initial infection, latent or
    dormant infection, and recrudescent or adult-type
    TB. Ninety to 95 of primary TB infections may go
    unrecognized. Histopathologically, tissue lesions
    consist of granulomas which usually undergo
    central caseation necrosis. Local symptoms of TB
    vary according to the part affected acute
    symptoms include hectic fever, sweats, and
    emaciation serious complications include
    granulomatous erosion of pulmonary bronchi
    associated with hemoptysis. If untreated,
    progressive TB may be associated with a high
    degree of mortality. This infection is frequently
    observed in immunocompromised individuals with
    AIDS or a history of illicit IV drug use.

54
Confuses definitions with descriptions
  • Tuberculosis
  • Definition
  • A chronic, recurrent infection caused by the
    bacterium Mycobacterium tuberculosis.
    Tuberculosis (TB) may affect almost any tissue or
    organ of the body with the lungs being the most
    common site of infection. The clinical stages of
    TB are primary or initial infection, latent or
    dormant infection, and recrudescent or adult-type
    TB. Ninety to 95 of primary TB infections may go
    unrecognized. Histopathologically, tissue lesions
    consist of granulomas which usually undergo
    central caseation necrosis. Local symptoms of TB
    vary according to the part affected acute
    symptoms include hectic fever, sweats, and
    emaciation serious complications include
    granulomatous erosion of pulmonary bronchi
    associated with hemoptysis. If untreated,
    progressive TB may be associated with a high
    degree of mortality. This infection is frequently
    observed in immunocompromised individuals with
    AIDS or a history of illicit IV drug use.

55
A better definition
  • Tuberculosis
  • Definition
  • A chronic, recurrent infection caused by the
    bacterium Mycobacterium tuberculosis.

56
NCIT inherits this ontological and terminological
incoherence from source vocabularies in UMLS
  • Conceptual Entities def
  • An organizational header for concepts
    representing mostly abstract entities.
  • Includes as subtypes
  • action, change, color, death, event, fluid,
    injection, temperature

57
  • Conceptual Entities def
  • An organizational header for concepts
    representing mostly abstract entities.
  • Confuses use and mention (swimming is healthy and
    has eight letters)

58
Duratec, Lactobutyrin, Stilbene Aldehyde
  • are classified by the NCIT as Unclassified Drugs
    and Chemicals

59
and problematic synonyms
  • Anatomic Structure, System, or Substance
    Anatomic Structures and Systems
  • Does anatomic apply only to structure or also
    to system and substance?
  • Biological Function Biological Process
  • some biological processes are the exercises of
    biological functions
  • others (e.g. pathological processes, side
    effects) not
  • Genetic Abnormality Molecular Abnormality (with
    subtype Molecular Genetic Abnormality)
    (definitions not supplied)

60
Problematic synonyms
  • Diseases and Disorders Disease Disorder
  • Definition1 for Disease
  • A disease is any abnormal condition of the body
    or mind that causes discomfort, dysfunction, or
    distress to the person affected or those in
    contact with the person. ...
  • Definition2 for Disease
  • A definite pathologic process with a
    characteristic set of signs and symptoms. ...
  • Condition ? Process
  • Definition2 contradicts NCITs own classification
    hierarchy

61
Three disjoint classes of plants
  • Vascular Plant
  • Non-vascular Plant
  • Other Plant

62
Three kinds of cells
  • Abnormal Cell is a top-level class (thus not
    subsumed by Cell
  • Normal Cell is a subclass of Microanatomy.
  • Cell is a subclass of Other Anatomic Concept (so
    that cells themselves are concepts)

63
NCIT as now constituted will block automatic
reasoning
  • Neither Normal Cells nor Abnormal Cells are Cells
    within the context of the NCIT

64
Some consolations
  • NCIT is open source
  • NCIT has broad coverage
  • NCIT has some formal structure (OWL-DL)
  • NCIT is much, much better than (for example) the
    HL7-RIM
  • NCIT has realized the errors of its ways

65
The road ahead
  • http//www.cbd-net.com/index.php/search/show/9384
    64
  • Review of NCI Thesaurus and Development of
    Plan to Achieve OBO Compliance
  • and welcome to the Pre-NCIT
  • http//nciterms.nci.nih.gov/NCIBrowser/Dictionary
    .do

66
Fragment of Pre-NCIT Hierarchy
  • Murine Tissue Type Body Fluids and
    Substances (MMHCC) Cardiovascular System
    (MMHCC) Blood Vessel (MMHCC)
    Heart (MMHCC)
    Digestive System (MMHCC)

67
First step
  • Alignment of OBO ontologies through a common
    system of formally defined relations in the
    OBO-RO (OBO Relation Ontology)
  • see Relations in Biomedical Ontologies, Genome
    Biology Apr. 2005

68
is_a (sensu UMLS)
  • A is_a B def
  • A is narrower in meaning than B
  • grows out of the heritage of dictionaries
  • (which ignore the basic distinction between types
    and instances)

69
To build a high quality shared ontology requires
hard work and staying powerYou cannot cheat by
borrowing from UMLSUMLS ( the UMLS
Metathesaurus) is not an ontology
70
Concepts, Concept Names, and their Identifiers in
the UMLS
  • The Metathesaurus is organized by concept. One of
    its primary purposes is to connect different
    names for the same concept from many different
    vocabularies.
  • A concept is a meaning. A meaning can have many
    different names. A key goal of Metathesaurus
    construction is to understand the intended
    meaning of each name in each source vocabulary
    and to link all the names from all of the source
    vocabularies that mean the same thing (the
    synonyms). This is not an exact science. ...
    Metathesaurus editors decide what view of
    synonymy to represent in the Metathesaurus
    concept structure. Please note that each source
    vocabularys view of synonymy is also present in
    the Metathesaurus, irrespective of whether it
    agrees or disagrees with the Metathesaurus view.

71
This strange mapping
  • between names as they appear in different source
    vocabularies created for widely different
    purposes can still be very useful
  • but the source vocabularies themselves are of
    variable quality
  • (not all mappings are created equal)
  • and the sorts of search which the UMLS supports
    reflects an already outmoded technology

72
is_a
  • congenital absent nipple is_a nipple
  • surgical procedure not carried out because of
    patients decision is_a surgical procedure
  • cancer documentation is_a cancer
  • disease prevention is_a disease
  • living subject is_a information object
    representing an animal or complex organism
  • individual allele is_a act of observation
  • limb is_a tissue

73
is_a (sensu UMLS)
  • both testes is_a testis
  • plant leaves is_a plant
  • smoking is_a individual behavior
  • walking is_a social behavior

74
is_a
  • A is_a B def
  • For all x, if x instance_of A then x instance_of
    B
  • cell division is_a biological process
  • adult is_a child ???

75
Two kinds of entities
  • occurrents (processes, events, happenings)
  • cell division, ovulation, death
  • continuants (objects, qualities, ...)
  • cell, ovum, organism, temperature of organism,
    ...

76
is_a (for occurrents)
  • A is_a B def
  • For all x, if x instance_of A then x instance_of
    B
  • cell division is_a biological process

77
is_a (for continuants)
  • A is_a B def
  • For all x, t if x instance_of A at t then x
    instance_of B at t
  • abnormal cell is_a cell
  • adult human is_a human
  • but not adult is_a child

78
part_of
  • Composes, with one or more other physical units,
    some larger whole.
  • (UMLS Semantic Network)
  • what does this relation relate?
  • A is_a B def A is narrower in meaning than B

79
Part_of as a relation between types is more
problematic than is standardly supposed
  • heart part_of human being ?
  • human heart part_of human being ?
  • human being has_part human testis ?
  • testis part_of human being ?

80
Definition of part_of as a relation between types
  • A part_of B Def all instances of A are
    instance-level parts of some instance of B
  • human testis part_of adult human being

81
two kinds of parthood
  • between instances
  • Marys heart part_of Mary
  • this nucleus part_of this cell
  • between types
  • human heart part_of human
  • cell nucleus part_of cell

82
part_of (for occurrents)
  • A part_of B def.
  • For all x, if x instance_of A then there is some
    y, y instance_of B and x part_of y
  • where part_of is the instance-level part
    relation
  • EVERY A IS PART OF SOME B

83
part_of (for continuants)
  • A part_of B def.
  • For all x, t if x instance_of A at t then there
    is some y, y instance_of B at t and x part_of y
  • where part_of is the instance-level part
    relation
  • NOTE THE ALL-SOME STRUCTURE

84
A part_of B, B part_of C ...
  • The all-some structure of such definitions allows
  • cascading of inferences
  • (i) within ontologies
  • (ii) between ontologies
  • (iii) between ontologies and EHR repositories of
    instance-data

85
Cascading inferences
  • Whichever A you choose, the instance of B of
    which it is a part will be included in some C,
    which will include as part also the A with which
    you began
  • The same principle applies to the other relations
    in the OBO-RO
  • located_at, transformation_of, derived_from,
    adjacent_to, etc.

86
is_a and part_of never cross categorial divides
(cf. tripartite organization of GO)
  • if A is_a B
  • then A is an object type iff B is an object type
  • then A is a process type iff B is a process type
  • then A is a characteristic type iff B is a
    characteristic type

87
Kinds of relations
  • Between types
  • is_a, part_of, ...
  • Between an instance and a type
  • this explosion instance_of the type explosion
  • Between instances
  • Marys heart part_of Mary

88
Continuity
  • instance a continuous_with instance b
  • is always symmetric
  • But consider the types lymph node and lymphatic
    vessel
  • Each lymph node is continuous with some
    lymphatic vessel, but there are lymphatic vessels
    (e.g. lymphs and lymphatic trunks) which are not
    continuous with any lymph nodes
  • Continuity on the type level is not symmetric.

89
Adjacency as a relation between universals is not
symmetric
  • Consider
  • seminal vesicle adjacent_to urinary bladder
  • Not urinary bladder adjacent_to seminal vesicle

90
  • Instance level
  • this nucleus is adjacent to this cytoplasm
  • implies
  • this cytoplasm is adjacent to this nucleus
  • Type level
  • nucleus adjacent_to cytoplasm
  • Not cytoplasm adjacent_to nucleus

91
Applications
  • Expectations of symmetry e.g. for protein-protein
    interactions hmay hold only at the instance level
  • if A interacts with B, it does not follow that B
    interacts with A
  • if A is expressed simultaneously with B, it does
    not follow that B is expressed simultaneously
    with A

92
Definitions of the all-some form
  • allow cascading inferences
  • If A R1 B and B R2 C, then we know that
  • every A stands in R1 to some B, but we know also
    that, whichever B this is, it can be plugged into
    the R2 relation

93
GALEN Vomitus contains carrot
  • All portions of vomit contain all portions of
    carrot
  • All portions of vomit contain some portion of
    carrot
  • Some portions of vomit contain some portion of
    carrot
  • Some portions of vomit contain all portions of
    carrot

94
transformation_of
95
transformation_of
  • A transformation_of B Def.
  • Every instance of A was at some earlier time an
    instance of B
  • adult transformation_of child

96
embryological development
97
tumor development
98
derives_from
C1 c1 at t1
C c at t
time
C' c' at t
ovum
zygote derives_from
sperm
99
Request from Bill Bug
  • How best to effectively bring together
  • - spatial/morphological ontologies 
  • - neuroscience terminologies (e.g., NeuroNames)
    and 
  • - data-centric neuroanatomical indexing systems
    (voxel-based 3D atlases)
  • to promote optimal integration of neuroscience
    data sets that include a spatial component,
    however coarse.

100
A suite of defined relations between universals
Foundational is_apart_of
Spatial located_incontained_inadjacent_to
Temporal transformation_ofderives_frompreceded_by
Participation has_participanthas_agent
101
Logical Theory of Spatial Relations
  • RCC Region-Connection Calculus (Leeds
    Qualitative Spatial Reasoning Group)
  • Cf. Dameron et al. Modeling dependencies between
    relations to ensure consistency of a cerebral
    cortex anatomy knowledge base

102
Principles
  • 1 anatomical structure ? 1 regionhas_location
  • Define the relationships of adjacency,
    connectedness etc. using RCC-8 and its extensions

NTPP
PO
TPP
EQ
DC
EC
103
Example 1
  • Reasoning with part and location at the instance
    level

Operc. Pars of Inferior Frontal Gyrus
Inferior Frontal Gyrus
104
Example 2
  • Reasoning with location, continuity and external
    connection

PreCentral Gyrus
PostCentral Gyrus
105
Extension to the 3-D case
106
Most ontologies are execrableBut some good
ontologies do already exist
  • as far as possible dont reinvent
  • use the power of combination and collaboration
  • ontologies are like telephones they are valuable
    only to the degree that they are used and
    networked with other ontologies
  • but choose working telephones
  • most UMLS telephones were broken from the start

107
Why do we need rules/standards for good ontology?
  • Ontologies must be intelligible both to humans
    (for annotation) and to machines (for reasoning
    and error-checking) unintuitive rules for
    classification lead to errors
  • Intuitive rule facilitate training of curators
    and annotators
  • Common rules allow alignment with other
    ontologies
  • Logically coherent rules enhance harvesting of
    content through automatic reasoning systems

108
To the degree that basic rules of good ontology
are not satisfied, error checking and ontology
alignment will be achievable, at best, only
with human intervention via force majeure
with unstable results
109
Current practice in the domain of clinical
research
  • Results of clinical trials are organized too
    tightly around specific diagnostic criteria
    imposed by specific, local, hypotheses
  • A change in criteria forces a costly
    re-examination and re-coding of all existing
    records to make them usable in future hypothesis
    generation and testing.

110
How to solve this problem?
  • Just as clinical hypotheses need to be tied to
    basic science, so special-purpose application
    ontologies need to be tied to general-purpose
    reference ontologies

111

How to solve this problem?
  • We separate
  • data as interpreted in terms of current criteria
  • from
  • the structure of the underlying biomedical
    reality
  • and ensure that the first is stored and
    processed always by using terms drawn from a
    shared, stable representation (a reference
    ontology) of the latter.
  • Diagnostic criteria for a disease can then be
    changed but we will still maintain access to the
    data relevant to all prior diagnosed cases of the
    disease in question.

112
Not only data needs to be aligned through
pre-established reference ontologies, so also
does software
  • Currently, application ontologies are built
    afresh for each new application
  • They commonly introduce new idiosyncrasies of
    terminology, format or logic, plus
    simplifications or distortions of their
    subject-matters.
  • This may do no harm in relation to the specific
    application (for example radiology, tissue
    classification, cancer staging) and keeps the
    software simple

113
But what happens
  • when other applications want to use the data
    annotated in their terms, or when we need to
    extend to a larger portion of biomedical
    reality?Now the expanded ontology will no longer
    be compatible with the software designed for its
    original application.
  • Different groups now need to start working with
    different and incompatible versions of an
    ontology, engendering a spiralling complexity as
    these different versions themselves become
    revised and extended for different purposes.

114
The solution
  • The methodology of always developing application
    ontologies against the backgrund of formally
    robust reference ontologies can both counteract
    these tendencies toward ontology proliferation
    and ensure the interoperability of application
    ontologies as they become further developed in
    the future.

115
The methodology of reference ontologies
  • can provide locally developed application
    ontologies with cross-granular understanding of
    the ways processes at the gene and protein level
    are linked to clinically salient processes at
    coarser granularity
  • and it can allow them take advantage of existing
    logical tools and methods for reasoning across
    large bodies of data.

116
An application ontology
  • is comparable to an engineering artifact such as
    a software tool. It is constructed for a specific
    practical purpose.
  • Examples
  • NCIT
  • FuGO Functional Genomics Investigation Ontology

117
A reference ontology
  • A reference ontology has a unified
    subject-matter, which consists of entities
    existing independently of the ontology, and it
    seeks to optimize descriptive or representational
    adequacy to this subject matter.
  • A reference ontology is analogous to a scientific
    theory. Thus it consists of representations of
    biological reality which are correct when viewed
    in light of our current understanding of reality,
    and it must be subjected to updating in light of
    scientific advance.
  • Example The Foundational Model of Anatomy

118
Current Best Practice
119
(No Transcript)
120
Anatomical Space
Anatomical Structure
Organ Cavity Subdivision
Organ Cavity
Organ
Serous Sac
Organ Component
Serous Sac Cavity
Tissue
Serous Sac Cavity Subdivision
is_a
Pleural Sac
Pleura(Wall of Sac)
Pleural Cavity
part_of
Parietal Pleura
Visceral Pleura
Interlobar recess
Mediastinal Pleura
Mesothelium of Pleura
121
The Foundational Model of Anatomy
  • Follows formal rules for Aristotelian
    definitions
  • When A is_a B, the definition of A takes the
    form
  • an A def. a B which ...
  • a human being def. an animal which is rational

122
FMA Example
  • Cell def. an anatomical structure which consists
    of cytoplasm surrounded by a plasma membrane with
    or without a cell nucleus
  • Plasma membrane def. a cell part that surrounds
    the cytoplasm

123
The FMA regimentation
  • Brings the advantage that each definition
    reflects the position in the hierarchy to which a
    defined term belongs.
  • The position of a term within the hierarchy
    enriches its own definition by incorporating
    automatically the definitions of all the terms
    above it.
  • The entire information content of the FMAs term
    hierarchy can be translated very cleanly into a
    computer representation

124
GO now adopting structured definitions which
contain both genus and differentiae
Species def Genus Differentiae
neuron cell differentiation def differentiation
by which a cell acquires features of a neuron
125
Ontology alignmentOne of the current goals of GO
is to align
Cell Types in GO
Cell Types in the Cell Ontology
with
  • cone cell fate commitment
  • retinal_cone_cell

keratinocyte
keratinocyte differentiation
fat_cell
adipocyte differentiation
dendritic_cell
dendritic cell activation
lymphocyte
lymphocyte proliferation
T_lymphocyte
T-cell homeostasis
garland_cell
garland cell differentiation
heterocyst
heterocyst cell differentiation
126
Alignment of the two ontologies will permit the
generation of consistent and complete definitions
GO

Cell type

Osteoblast differentiation Processes whereby an
osteoprogenitor cell or a cranial neural crest
cell acquires the specialized features of an
osteoblast, a bone-forming cell which secretes
extracellular matrix.
New Definition
127
Other Ontologies to be aligned with GO
  • Chemical ontologies
  • 3,4-dihydroxy-2-butanone-4-phosphate synthase
    activity
  • Anatomy ontologies
  • metanephros development
  • GO itself
  • mitochondrial inner membrane peptidase activity
  • ? OBO core

128
eventually to comprehend all of OBO
129
Anatomical Space
Anatomical Structure
Organ Cavity Subdivision
Organ Cavity
Organ
is_a
Serous Sac
Organ Component
Serous Sac Cavity
Tissue
Serous Sac Cavity Subdivision
Pleural Sac
Pleura(Wall of Sac)
Pleural Cavity
part_of
Parietal Pleura
Visceral Pleura
Interlobar recess
Mediastinal Pleura
Mesothelium of Pleura
130
(No Transcript)
131
The Anatomy Reference Ontology
  • is organized in a graph-theoretical structure
    involving two sorts of links or edges
  • is-a ( is a subtype of )
  • (pleural sac is-a serous sac)
  • part-of
  • (cervical vertebra part-of vertebral column)

132
at every level of granularity
133
Modularity
134
How does a kidney work?
NEPHRON
135
Nephron Functions
FUNCTIONAL SEGMENTS
136
Top-Level Categories in the FMA
137
  • anatomical structure (cell, lung, nerve, tooth)
  • result from the coordinated expression of
    structural genes
  • have their own 3-D shape

138
  • portion of body substance
  • inherits its shape from container
  • portion of urine
  • portion of menstrual fluid
  • portion of blood

139
  • anatomical space
  • cavities, conduits

140
  • anatomical attribute
  • mass
  • weight
  • temperature
  • your temperature
  • its value now

141
  • anatomical relationship
  • located_in
  • contained_in
  • adjacent_to
  • connected_to
  • surrounds
  • lateral_to (West_of)
  • anterior_to

142
  • boundary
  • bona fide / fiat

www.enel.ucalgary.ca/ People/Mintchev/stomach.htm
143
Connectedness and Continuity
  • The body is a highly connected entity.
  • Exceptions cells floating free in blood
  • continuous_with,
  • attached_to (muscle to bone)
  • synapsed_with (nerve to nerve and nerve to
    muscle)
  • Two continuants are continuous on the instance
    level if and only if they share a fiat boundary.

144
basis for generalization to other species
Anatomical Space
Anatomical Structure
Organ Cavity Subdivision
Organ Cavity
Organ
Serous Sac
Organ Component
Serous Sac Cavity
Tissue
Serous Sac Cavity Subdivision
Pleural Sac
Pleura(Wall of Sac)
Pleural Cavity
Parietal Pleura
Visceral Pleura
Interlobar recess
Mediastinal Pleura
Mesothelium of Pleura
145
Anatomical Space
Anatomical Structure
Organ Cavity Subdivision
Organ Cavity
Organ
Serous Sac
Organ Component
Serous Sac Cavity
Tissue
Serous Sac Cavity Subdivision
is_a
Pleural Sac
Pleura(Wall of Sac)
Pleural Cavity
part_of
Parietal Pleura
Visceral Pleura
Interlobar recess
Mediastinal Pleura
Mesothelium of Pleura
146
Web-Based Representations of Neuroanatomy
147
(No Transcript)
148
includes Neuronames
149
(No Transcript)
150
Human Morphometry and Function BIRN Testbeds
  • with thanks to Christine Fennema-Notestine and
    Jessica Turner

CBiO/BIRN Workshop 2006
151
BIRN Ontology Needs
  • GOAL User will employ BIRN interface and
    Mediator to perform scientific queries on data
    from
  • structural and functional MRI experiments
  • clinical assessments
  • psychiatric interviews
  • and/or behavioral experiments
  • BIRN needs for common vocabularies
  • Mediator needs to talk across databases to find
    relevant/similar information this requires
    linking of concepts to table columns and values
  • Query interface needs semantic network to find
    related information

152
Example queries
  • Find all datasets of schizophrenics with
    structural and functional imaging data related to
    working memory
  • Find the correlation between hippocampal volume
    and working memory performance in AD subjects

153
MBIRN priorities
  • To relate clinical assessments, cognitive
    function, and neuroanatomy within mBIRNs
    multi-site AD sample, with future branching into
    neuropsychiatric measures
  • Only a high quality reference ontology of
    neuro(patho)anatomy from the macroscopic to the
    subcellular levels of granularity can give you
    this

154
Existing neuroanatomical ontology
Brain
  • Need to create related function-based ontology


Cerebellum
Cerebrum
CVLT
Cerebral white matter

Cerebral cortex

Frontal cortex
Temporal cortex
Memory
Mesial temporal

Superior temporal

Amygdala
Hippocampus
155
Need to create related function-based
ontology
  • UMLS mental process is_a organism function
  • Function vs. functioning
  • Many entities have functions which they never
    realise
  • A has function B A can realise B (under which
    circumstances?)

156
Need to create related function-based
ontology
  • A function is a disposition of an independent
    continuant to engage in corresponding processes.
  • To what extent are the various functions
    identified by BIRN are in fact complex processes
    with many less complex processes as their parts.
  • How are functions different from disfunctions /
    malfunctions ?
  • Are all function such that their execution is
    good for the organism?

157
Need to create related function-based
ontology
  • You cannot classify parts of the brain on the
    basis of which parts can think, remember, effect
    movement or perceive various kinds of sensations,
    just as you cannot sort anatomical entities on
    the basis of which can pump, digest, secrete,
    fertilize or stabilize.
  • It is impossible to create an inheritance class
    subsumption hierarchy of neuroanatomical entities
    at any meaningful depth on the basis of
    function.

158
Assessment
Brain
Neuropsychology
Cerebrum
Amnesia
Cognition
Cerebral cortex
Frontal
Temporal
Cognitive impairment
Memory
Learning
Mesial temporal
CVLT
Hippocampus
Task and score description
159
Can we reason on the basis of a graph of this
sort?
Behavioral Paradigm
Assessment
SCID-Patient
SIRP
CVLT
Breathhold
Long Term memory
Working memory
Memory
Attention
Cognitive Process
Action
160
Bonfire Relations
relation the type of relation between the
concept to the left and the concept to the
rightPAR ParentCHD ChildSIB SiblingRB
Broader RelationshipRN Narrower
RelationshipRO Other Relationship
161
BIRN Relations
  • UMLS (PAR, CHD, RN, RO, RB, SY).
  • RB has a broader relationship
  • RN has a narrower relationship
  • RO has relationship other than synonymous,
    narrower, or broader
  • CHD has child relationship in a Metathesaurus
  • SIB has sibling relationship in a Metathesaurus
    source vocabulary

162
Circular Hierarchical Relationships in the
UMLSEtiology, Diagnosis, Treatment,
Complications and PreventionOlivier Bodenreider
  • Topographic regions General terms
  • Physical anatomical entity
  • Anatomical spatial entity
  • Anatomical surface
  • Body regions
  • Topographic regions

163
MeSH
  • MeSH Descriptors Index Medicus Descriptor
    Anthropology, Education, Sociology and Social
    Phenomena (MeSH Category) Social
    Sciences
  • Political Systems National
    Socialism
  • National Socialism is_a Political Systems
  • National Socialism is_a Anthropology ...

164
MeSH
  • National Socialism is_a MeSH Descriptor
  • Cf. NeuroNames
  • Ontology def a codification of the
    relationships between words and concepts

165
Human BIRN data includes
  • Participant demographics such as age, gender,
  • Clinical and psychiatric information
  • Assessments used, data type
  • Diagnostic information
  • Behavioral data during fMRI tasks
  • Need to know how to interpret that (is a button
    1 response a yes or a no?)
  • Raw structural and functional images
  • Need information about data collection and
    preprocessing methods
  • Single-subject and group level analyses and
    results
  • Need information about analytic methods used

166
Areas where application ontologies will be needed
  • Participant demographics such as age, gender,
  • Clinical and psychiatric information
  • Assessments used, data type
  • Diagnostic information
  • Behavioral data during fMRI tasks
  • Need to know how to interpret that (is a button
    1 response a yes or a no?)
  • Raw structural and functional images
  • Need information about data collection and
    preprocessing methods
  • Single-subject and group level analyses and
    results
  • Need information about analytic methods used

167
Bottom-up search
  • Users dataset contains the CVLT what does it
    measure?
  • Search for CVLT
  • Related to PARENT concepts like
    Neuropsychological tests or Assessment Scales
    or SIBLING concepts of other tests
  • What is the CVLT? This doesnt answer the users
    question.
  • Need relationship links to function memory and
    learning
  • Need relationship links to structure anatomical
    regions reflected in change of performance on
    this measure ? hippocampus

168
Top-down search
  • User interested in studying the relationship
    between hippocampal volume and memory performance
    in Alzheimers disease.
  • Search for measures of memory
  • Would like to see memory linked to CVLT
  • Would like to see memory linked to hippocampus at
    a very basic level
  • Would like to see links to potential disorders
    assessed, e.g., amnesia or AD

169
Ontology/Terminology Infrastructure
  • GOAL to allow database mediation and scientific
    queries for multi-site clinical neuroimaging
    studies. This requires the relationship of
    database tables to concepts and to relate brain
    structure and function through neuroanatomical
    regions, neuropsychological and cognitive terms,
    and clinical assessments.

170
Ontology/Terminology Infrastructure
  • To do this, the Mediator relies in part on
    defined terms/concepts to define relationships
    between similar terms from different databases.
  • If a user is interested in data related to long
    delay free recall," it is important to also
    include information related to memory." This
    type of relational knowledge is critical to find
    other values in other databases that have similar
    information.

171
Ontology/Terminology Infrastructure
  • In addition, the ontology will provide a
    semantic network for a user searching for
    memory" information, related information would
    include
  • Cognitive terms, e.g., recall, recognition, short
    and long term memory
  • Assessment terms, e.g., California Verbal
    Learning Test
  • Disorders of terms, e.g., Alzheimers disease
    is a disorder of memory

How block information overload?
172
Bottom-up search
  • Users resultant dataset contains the MMSE the
    user asks what does it measure?
  • Search for MMSE concept
  • Related to PARENT concepts like
    Neuropsychological tests or Assessment Scales
    or SIBLING concepts of other tests
  • What is the MMSE? This doesnt answer the users
    question.
  • Need relationship links to function general
    cognitive ability, cognitive impairment, dementia
    severity, brain damage
  • Need relationship links to structure anatomical
    regions reflected in change of performance on
    this measure, although a relatively non-specific
    measure

173
Top-down search
  • What variables exist that would provide a measure
    of general cognitive function and dementia
    severity?
  • Search for measures of (general) cognitive
    function
  • Would like to see general cognitive ability,
    cognitive impairment, dementia severity linked to
    MMSE
  • Would like to see general cognitive ability,
    cognitive impairment, dementia severity linked to
    neuroanatomical regions, simply brain in this
    case
  • Would like to see links to potential disorders
    measured, e.g., AD

174
NeuroNames (with thanks to Onard Mejino)
  • has a limited scope.
  • It deals with neuroanatomical structures only at
    the gross level. No cellular, subcellular or
    macromolecular entities are represented.
  • The peripheral nervous system and the spinal cord
    are not included.
  • It represents structures from different species
    (human, macaque and rodent) in the same
    hierarchy.

175
NNs main hierarchy
  • is a partonomy based on mutually exclusive and
    exhaustive volumetric partitions, the equivalent
    of regional partition in the FMA.
  • The partonomy supports only ONE partition view
    and therefore does not accommodate
  • other recognized regional partitions like Brodman
    areas (treated as ancillary structures)
  • constitutional parts like the internal pyramidal
    layer of neocortex and the vasculature of
    neuraxis (entities that have important clinical
    significance)
  • new partitions advanced by new technology like
    gene expression mappings or radiologic imaging
    techniques
  • partitions determined by formal spatial
    region-based ontologies like RCC

176
The Neuronames partonomy
  • will serve at best as an application ontology
    for annotating segmented images of the brain.
  • But it will still be very difficult to link the
    annotated image data to all the other types of
    data which will BIRN will need to describe
  • ? a reference ontology of neuroanatomy is a first
    priority.

177
Neuronames
  • Since univocity is not enforced in the literature
    of neuroanatomy, e.g. the term Basal ganglia
    represents different structures when used in
    association with anatomic, functional and
    clinical views.
  • How will NN resolve or clarify this?

178
Neuronames
  • entities are primarily identified on the basis of
    stains that distinguish gray matter from white
    matter
  • thus not on principles or rules that define the
    type of the entity in question, thereby yielding
    a partition not in accord with the standards
    commonly accepted for representing the rest of
    the body.
  • gray matter and white matter are viewed as
    tissues. But tissue is usually defined as an
    aggregate of similarly specialized cells and
    intercellular matrix.
  • yet gray matter consists not of cells but of cell
    bodies, white matter not of cells but of neurites

179
Neuronames
  • gives no explicit definitions, and the
    representations it gives (e.g. of the Fourth
    Ventricle) are often at odds with consensual
    usage
  • hence scalability, extendability, combinability
    with other ontologies is limited how then can
    it be used to bridge research efforts at the
    genomic / proteomic level with those at the
    clinical level?
  • Information unique to neuroanatomical entities
    such as axonal input/output relationships,
    connectivity, neuron type, neurotransmitter and
    receptor types are indispensable in establishing
    and understanding both structural and
    physiological relationships among neuroanatomical
    entities and their relationship with the rest of
    the body.

180
BIRNLex
  • does provide definitions, normally taken over
    from UMLS

181
Rules for definitions
  • A child term
  • B parent term
  • an A def a B which Cs
  • If a definition is correct it should always make
    sense to substitute a B which Cs for an A
  • A human being is subject to processes of aging
  • A rational animal is subject to processes of
    aging

182
BIRNLex
  • The eye def.
  • The eyeball and its constituent parts, e.g.
    retina
  • mouse def.
  • common name for the species mus musculus

183
BIRNLex
184
BIRNLex
185
BIRNLex
186
BIRNLex
bear in mind always that your ontology needs to
be interoperable with other ontologies
187
BIRNLex
bear in mind always that your ontology needs to
be interoperable with other ontologies
188
BIRNLex
  • surface def 3D segmentation obtained by fitting
    a polygonal mesh around the boundary of an object
    of interest, creating a 3D surface
  • Concept def Generic ideas or categories derived
    from common properties of objects, events, or
    qualities, usually represented by words or
    symbols

189
BIRNLex
  • brain imaging def none synonymous with
    positrocephalogram, nos
  • CA1 def CA1 cytoarchitectonic field of
    hippocampus
  • cognitive process def. conceptual function or
    thinking in all its forms

190
BIRNLex and UMLS-SN
  • Rest SN Daily or Recreational Activity
  • Principal Investigator SN Professional or
    Occupational Group
  • Left handedness SN Organism Attribute
  • Ambidextrous SN Finding
  • Brain Imaging SN Diagnostic Procedure
  • Brain Mapping SN Diagnostic Procedure Research
    Activity
  • Healthy Adult SN Finding


191
BIRNLex
192
Mouse BIRN Ontologies
Mouse BIRN Ontologies
  • Maryann Martone
  • and
  • Bill Bug

Maryann Martone and Bill Bug
2005 All Hands Meeting
193
Use of Ontologies in BIRN
  • Databases
  • Enforces semantic consistency within a database
  • Data Sharing
  • Establishes semantic relationship among concepts
    contained in distributed databases
  • Data integration
  • Bridging across multiscale and multimodal data
  • Concept-based queries
  • Ontologies can be used to alter semantic context
    to present a view of the conceptual aspects of a
    data set or meta-analysis result most relevant to
    a particular neuroscientist

194
Objectives of Working Group
  • Educate BIRN participants on the use of
    ontologies and associated tools for data
    integration
  • Tuesday (PM) and Wednesday (AM)
  • Develop a set of ontology resources for BIRN
    participants, based on existing ontologies where
    possible
  • Identify areas that are not well covered by
    existing ontologies for possible development.
  • Develop a clear set of policies and procedures
    for working with ontologies
  • Including curation, addition of core ontologies,
    extension of ontologies, mapping of databases to
    ontologies

195
Goals of OTF
  • Provide a dynamic knowledge infrastructure to
    support integration and analysis of BIRN
    federated data sets, one which is conducive to
    accepting novel data from researchers to include
    in this analysis.
  • Identify and assess existing ontologies and
    terminologies for summarizing, comparing,
    merging, and mining datasets. Relevant subject
    domains include clinical assessments,
    demographics, cognitive task descriptions,
    imaging parameters/data provenance in general,
    and derived (fMRI) data.
  • Identify the resources needed to achieve the
    ontological objectives of individual test-beds
    and of the BIRN overall. May include finding
    other funding sources, making connections with
    industry and other consortia facing similar
    issues, and planning a strategy to acquire the
    necessary resources.

196
BIRN Ontology Resources
Mouse BIRN Ontology Resource Page
http//nbirn.net/Resources/Users/Ontologies/
197
Current Ontology Development by Mouse BIRN
Participants
  • Developmental Ontology
  • Seth Ruffins, Cal Tech
  • Subcellular Anatomy
  • Maryann Martone and Lisa Fong, UCSD

198
Ontology for Subcellular Anatomy of Nervous System
199
CCDB Dictionary
Term Ontology ConceptID Semantic Type Definition
Cerebellum UMLS C0007765 Body Part, Organ, or Organ Component Part of the metencephalon that lies in the posterior cranial fossa behind the brain stem. It is concerned with the coordination of movement. (MSH)
Glial Fibrillary Acidic Protein UMLS C0017626 Amino Acid, Peptide, or Protein, Biologically Active Substance An intermediate filament protein found only in glial cells or cells of glial origin. MW 51,000. (MSH)
Medium Spiny Neuron Bonfire BID000012 Cell Small (10-15 µm in diameter) projection neurons found in neostriatum, possessing a rougly spherical dendritic tree composed of 3-5 primary dendrites. Dendrites are covered with dendritic spines.
Purkinje cell UMLS C0034143 Cell large branching neurons of the middle layer of cerebellar cortex, characterized by vast arrays of dendrites the output neurons of the cerebellar cortex.
200
Some Areas of Interest to BIRN
Linking animal and human imaging data
Navigating through Multi-resolution information
brain
Entopeduncular nucleus
Globus pallidus, internal segment
cerebellum
Disease Process
Animal Model
cerebellar cortex
Purkinje cell
  • Map between Human and Animal models
  • Functional assessment

dendritic spine
201
Anatomical Knowledge Sources
  • Foundational model of anatomy
  • Neuronames (Brain Info)
  • BAMS
  • Adult Mouse Anatomical Dictionary (Edinburgh/GO)

Although BIRN is an open, diverse and fluid
environment, the use of ontologies for enhanced
interoperability will be pointless if we allow
ran
Write a Comment
User Comments (0)
About PowerShow.com