Ontologies: uses and stakes in biology' - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Ontologies: uses and stakes in biology'

Description:

OBO (Open Biological Ontologies) is the directory that list such ontologies. ... solves the common problem of insuring that all actors understand well each other ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 22
Provided by: Chri190
Category:

less

Transcript and Presenter's Notes

Title: Ontologies: uses and stakes in biology'


1
Ontologies uses and stakes in biology.
Ontologies are now key elements in every domain
that relies heavily on knowledge or large data
sets (not only in biology google).
2
Ontology. Definitions(s).
  • An ontology A formal and explicit description
    of every concept for a particular domain of
    knowledge.
  • An ontology The abstraction of known objects
    within a domain, their properties and
    relationships between them.
  • Such an ontology (including individual instances
    of classes) constitutes a knowledge database.

3
Information versus Knowledge.
  • Informations include every primary data
    (measures) provided by instrumentation (images,
    sequences...) as well as secondary data
    necessary for subsequent analyses (for DNA chips,
    secondary data as defined under MIAME) results
    materials methods.
  • Knowledge includes annotations of informations as
    done by experts within the frame of the current
    paradigm in use in this particular domain.
  • An ontology is the formal representation of such
    knowledge.

4
Définitions (easier)
  • An ontology is made of
  • A controled vocabulary common to experts
    (required for the sharing of knowledge).
  • A representation of relationships between terms
    of the vocabulary these relationships define
    knowledge.
  • inference rules on some instanciations
  • An ontlogy can be read and used by
  • Humans.
  • Computers.
  • An ontology allows
  • Permanence of knowledge (even in absence of the
    specialist).
  • Humans using knowledge of other specialists.
  • Algorithms using knowledge.

5
  • Few ontologies have been developed for the sake
    of the ontology itself.
  • The goal of an ontology are the applications that
    will use it.
  • This is the main difference with encyclopedia the
    purpose of which is to brace the entire knowledge
    of a domain.
  • An ontology is a mean toward a goal.

6
Existing ontologies in biology.
  • Most ontologies in biology are public domain.
  • Their list is increasing "every day".
  • They are supposed to be orthogonal they do not
    cover the same subjects.
  • OBO (Open Biological Ontologies) is the
    directory that list such ontologies.
  • Presently 45 are listed, the two major ontologies
    are
  • GO Gene Ontology (19.8 MB in XML format)
  • UMLS Unified Medical Language System (20 GB). 1
    million biomedical concepts and 4.3 million
    concept names from more than 100 controlled
    vocabularies and classifications (some in
    multiple languages) used in patient records,
    administrative health data, bibliographic and
    full-text databases and expert systems.

7
Modelisation and ontologies
  • Modelisation consists into the elaboration of an
    abstract and synthetic vision of the real world
    in order to better grasp "reality"within the
    context of a goal.
  • gt Such abstraction reduces complexity by
    focusing on particular aspects, and with
    particular goals in mind.
  • A model formulates what is known about objets in
    a particular context and articulates knowledge.
  • A model also allows exchange of data with no
    concern about format.

8
  • Abstraction and the presence of a controled
    vocabulary allow sharing a common vision of
    reality, with no ambiguity.
  • gt This solves the common problem of insuring
    that all actors understand well each other and
    that they agree on a common problem and common
    means of solving it.
  • Typically, abstraction is followed by a "top
    down" procedure toward the "real world", or when
    using standard vocabulary by an "instanciation"
    of the model. It means for example, associating
    real objects to each word of the controled
    vocabulary.

9
XML and ontologies
  • Most if not all ontologies are XML based XML is
    a langage which makes use of tags to delimitate
    entities.
  • Below is an example of what could be an ontology
    to describe biology

ltbiologygt ltmolecular_geneticsgtlt/
molecular_genetics gt ltbiochemistrygt
ltproteinsgtlt/proteinsgt ltnucleic_
acidsgtlt/ nucleic_ acids gt
ltlipidsgtlt/lipidsgt lt/biochemistry gt
ltcell_biologygtlt/cell_biology gt
ltphysiologygtlt/physiologygt
ltmedecinegtlt/medecinegt lt/biologygt
Note that in this example "medecine" does not
include biochemistry or molecular biology...
10
Gene Ontology.
  • GO was born in 1998. Its main objective is to
    deal with informations linked to genes.
  • It results from a collaboration between main
    databases such as FlyBase (drosophila), the
    Saccharomyces Genome Database) and other genomic
    databases such as Mus musculus, Homo sapiens,
    etc.
  • GO is sub-divided into three main parts
  • Molecular Function.
  • Function of genes products examples
    carbohydrate binding, ATPase activity.
  • Biological Process.
  • General biological role of complex molecular
    functions. Examples mitosis, purine
    metabolism.
  • Cellular Component.
  • Subcellular structures , localisations and
    macromolecular complexes examples nucleus,
    telomere, origin recognition complex.

11
(No Transcript)
12
  • GO contains an "evidence code" that qualifies its
    annotations, according to their quality.
  • It is clear that one cannot use similarily an
    information described in a well refereed paper or
    an information derived from an automatic data
    mining algorithm.
  • For exemple, the introns-exons structure of a
    gene known from the cloning of an entire mRNA is
    a much reliable knowledge than an ab initio
    prediction resulting from a HMM model !
  • IC inferred by curator
  • IDA inferred from direct assay
  • IEA inferred from electronic annotation
  • IEP inferred from expression pattern
  • IGI inferred from genetic interaction
  • IMP inferred from mutant phenotype
  • IPI inferred from physical interaction
  • ISS inferred from sequence or structural
    similarity
  • NAS non-traceable author statement
  • ND no biological data available
  • TAS traceable author statement
  • NR not recorded

13
  • For a biologist, GO allows queries at various
    levels. For example, one can use GO for
  • Finding all gene products in the mouse that are
    involved in signal transduction.
  • Looking for all tyrosine kinase receptors.
  • ...
  • Each gene product is linked at various depths in
    the ontology, depending on what is known
  • For example
  • A well known protein will be linked in several
    places in GO, usually near the terminal leaves.
  • A less known protein will be linked to a few
    (one) general terms, such as "metabolism" .
  • A predicted gene of unknown function will not be
    linked.

14
"unexpected bonus" for biology.
  • Biology is a field that still lacks a good
    formalism (as opposed for example to physics).
  • Building ontologies allows the ermergence of well
    defined concepts and introduces some logic.
  • Finally, ontologies
  • pinpoint contradictions,
  • underline areas of shadows,
  • reveals "holes" in our present knowledge.

15
Building ontologies from texts
  • For a long time, it has been considered that
    ontologies were a formalisation of an expert's
    knowledge and know-how.
  • Ontologies were then derived from analyses of
    expert's behavior.
  • More recently, ontologies are now built from the
    analysis of a corpus of texts.
  • This corpus is produced by an expert, controled
    vocabularies and main relationships are proprosed
    by algorithms and validated by experts. This
    allows
  • Using terms really utilised in that particular
    domain by the majority of scientists.
  • Maintaining a strong link between the ontology
    and the textual documents that will in the end be
    analyzed with the help of the ontology.

16
Some applications of ontologies.
17
The problem
  • Microarray technology makes it possible to
    measure thousands of variables and to compare
    their values under hundreds of conditions.
  • Once microarray data are quantified, normalized
    and classified, the analysis phase is essentially
    a manual and subjective task based on visual
    inspection of classes in the light of the vast
    amount of information available for each gene.
  • Currently, data interpretation clearly
    constitutes the bottleneck of such analyses and
    there is an obvious need for tools able to fill
    the gap between data processed with mathematical
    methods and existing biological knowledge.

Cell in condition A
labelled cDNA
mRNA
Cell in condition B
scan
Quantitation
combination normalization
classification
manual interpretation
Knowledge
18
(No Transcript)
19
Publications
Large-Scale Protein Annotation through Gene
Ontology Genome Research 2002 Automated Gene
Ontology annotation for anonymous sequence
data Nucleic Acids Research, 2003, Vol. 31, No.
13 3712-3715 The Gene Ontology Annotation (GOA)
Project Implementation of GO in SWISS-PROT,
TrEMBL, and InterPro Genome Research
2003 WILMAautomated annotation of protein
sequences Bioinformatics Vol. 19
2003 Whole-genome comparative annotation and
regulatory motif discovery in multiple yeast
species Annual Conference on Research in
Computational Molecular Biology 2003 GOblet a
platform for Gene Ontology annotation of
anonymous sequence data Nucleic Acids Res. 2004
July 1 32 (Web Server issue) W313W317 Applying
Support Vector Machines for Gene ontology based
gene function prediction BMC Bioinformatics.
2004 5 116.
20
Ontologies documents on the web.
  • ontology OR ontologies gt
  • 575 000 by Google
  • 952 000 by AllTheWeb
  • 172 196 by Scirus
  • annotation genome ontology gt
  • 50 300 documents by Google
  • 46 500 documents by AllTheWeb
  • 6 532 documents by Scirus

21
CONCLUSIONSGOorno GO
Write a Comment
User Comments (0)
About PowerShow.com