Title: Ontologies: uses and stakes in biology'
1Ontologies uses and stakes in biology.
Ontologies are now key elements in every domain
that relies heavily on knowledge or large data
sets (not only in biology google).
2Ontology. Definitions(s).
- An ontology A formal and explicit description
of every concept for a particular domain of
knowledge. - An ontology The abstraction of known objects
within a domain, their properties and
relationships between them. - Such an ontology (including individual instances
of classes) constitutes a knowledge database.
3Information versus Knowledge.
- Informations include every primary data
(measures) provided by instrumentation (images,
sequences...) as well as secondary data
necessary for subsequent analyses (for DNA chips,
secondary data as defined under MIAME) results
materials methods. - Knowledge includes annotations of informations as
done by experts within the frame of the current
paradigm in use in this particular domain. - An ontology is the formal representation of such
knowledge.
4Définitions (easier)
- An ontology is made of
- A controled vocabulary common to experts
(required for the sharing of knowledge). - A representation of relationships between terms
of the vocabulary these relationships define
knowledge. - inference rules on some instanciations
- An ontlogy can be read and used by
- Humans.
- Computers.
- An ontology allows
- Permanence of knowledge (even in absence of the
specialist). - Humans using knowledge of other specialists.
- Algorithms using knowledge.
5- Few ontologies have been developed for the sake
of the ontology itself. - The goal of an ontology are the applications that
will use it. - This is the main difference with encyclopedia the
purpose of which is to brace the entire knowledge
of a domain. - An ontology is a mean toward a goal.
6Existing ontologies in biology.
- Most ontologies in biology are public domain.
- Their list is increasing "every day".
- They are supposed to be orthogonal they do not
cover the same subjects. - OBO (Open Biological Ontologies) is the
directory that list such ontologies. - Presently 45 are listed, the two major ontologies
are - GO Gene Ontology (19.8 MB in XML format)
- UMLS Unified Medical Language System (20 GB). 1
million biomedical concepts and 4.3 million
concept names from more than 100 controlled
vocabularies and classifications (some in
multiple languages) used in patient records,
administrative health data, bibliographic and
full-text databases and expert systems.
7Modelisation and ontologies
- Modelisation consists into the elaboration of an
abstract and synthetic vision of the real world
in order to better grasp "reality"within the
context of a goal. - gt Such abstraction reduces complexity by
focusing on particular aspects, and with
particular goals in mind. - A model formulates what is known about objets in
a particular context and articulates knowledge. - A model also allows exchange of data with no
concern about format.
8- Abstraction and the presence of a controled
vocabulary allow sharing a common vision of
reality, with no ambiguity. - gt This solves the common problem of insuring
that all actors understand well each other and
that they agree on a common problem and common
means of solving it. - Typically, abstraction is followed by a "top
down" procedure toward the "real world", or when
using standard vocabulary by an "instanciation"
of the model. It means for example, associating
real objects to each word of the controled
vocabulary.
9XML and ontologies
- Most if not all ontologies are XML based XML is
a langage which makes use of tags to delimitate
entities. - Below is an example of what could be an ontology
to describe biology
ltbiologygt ltmolecular_geneticsgtlt/
molecular_genetics gt ltbiochemistrygt
ltproteinsgtlt/proteinsgt ltnucleic_
acidsgtlt/ nucleic_ acids gt
ltlipidsgtlt/lipidsgt lt/biochemistry gt
ltcell_biologygtlt/cell_biology gt
ltphysiologygtlt/physiologygt
ltmedecinegtlt/medecinegt lt/biologygt
Note that in this example "medecine" does not
include biochemistry or molecular biology...
10Gene Ontology.
- GO was born in 1998. Its main objective is to
deal with informations linked to genes. - It results from a collaboration between main
databases such as FlyBase (drosophila), the
Saccharomyces Genome Database) and other genomic
databases such as Mus musculus, Homo sapiens,
etc. - GO is sub-divided into three main parts
- Molecular Function.
- Function of genes products examples
carbohydrate binding, ATPase activity. - Biological Process.
- General biological role of complex molecular
functions. Examples mitosis, purine
metabolism. - Cellular Component.
- Subcellular structures , localisations and
macromolecular complexes examples nucleus,
telomere, origin recognition complex.
11(No Transcript)
12- GO contains an "evidence code" that qualifies its
annotations, according to their quality. - It is clear that one cannot use similarily an
information described in a well refereed paper or
an information derived from an automatic data
mining algorithm. - For exemple, the introns-exons structure of a
gene known from the cloning of an entire mRNA is
a much reliable knowledge than an ab initio
prediction resulting from a HMM model ! - IC inferred by curator
- IDA inferred from direct assay
- IEA inferred from electronic annotation
- IEP inferred from expression pattern
- IGI inferred from genetic interaction
- IMP inferred from mutant phenotype
- IPI inferred from physical interaction
- ISS inferred from sequence or structural
similarity - NAS non-traceable author statement
- ND no biological data available
- TAS traceable author statement
- NR not recorded
13- For a biologist, GO allows queries at various
levels. For example, one can use GO for - Finding all gene products in the mouse that are
involved in signal transduction. - Looking for all tyrosine kinase receptors.
- ...
- Each gene product is linked at various depths in
the ontology, depending on what is known - For example
- A well known protein will be linked in several
places in GO, usually near the terminal leaves. - A less known protein will be linked to a few
(one) general terms, such as "metabolism" . - A predicted gene of unknown function will not be
linked.
14"unexpected bonus" for biology.
- Biology is a field that still lacks a good
formalism (as opposed for example to physics). - Building ontologies allows the ermergence of well
defined concepts and introduces some logic. - Finally, ontologies
- pinpoint contradictions,
- underline areas of shadows,
- reveals "holes" in our present knowledge.
15Building ontologies from texts
- For a long time, it has been considered that
ontologies were a formalisation of an expert's
knowledge and know-how. - Ontologies were then derived from analyses of
expert's behavior. - More recently, ontologies are now built from the
analysis of a corpus of texts. - This corpus is produced by an expert, controled
vocabularies and main relationships are proprosed
by algorithms and validated by experts. This
allows - Using terms really utilised in that particular
domain by the majority of scientists. - Maintaining a strong link between the ontology
and the textual documents that will in the end be
analyzed with the help of the ontology.
16Some applications of ontologies.
17The problem
- Microarray technology makes it possible to
measure thousands of variables and to compare
their values under hundreds of conditions. - Once microarray data are quantified, normalized
and classified, the analysis phase is essentially
a manual and subjective task based on visual
inspection of classes in the light of the vast
amount of information available for each gene. - Currently, data interpretation clearly
constitutes the bottleneck of such analyses and
there is an obvious need for tools able to fill
the gap between data processed with mathematical
methods and existing biological knowledge.
Cell in condition A
labelled cDNA
mRNA
Cell in condition B
scan
Quantitation
combination normalization
classification
manual interpretation
Knowledge
18(No Transcript)
19Publications
Large-Scale Protein Annotation through Gene
Ontology Genome Research 2002 Automated Gene
Ontology annotation for anonymous sequence
data Nucleic Acids Research, 2003, Vol. 31, No.
13 3712-3715 The Gene Ontology Annotation (GOA)
Project Implementation of GO in SWISS-PROT,
TrEMBL, and InterPro Genome Research
2003 WILMAautomated annotation of protein
sequences Bioinformatics Vol. 19
2003 Whole-genome comparative annotation and
regulatory motif discovery in multiple yeast
species Annual Conference on Research in
Computational Molecular Biology 2003 GOblet a
platform for Gene Ontology annotation of
anonymous sequence data Nucleic Acids Res. 2004
July 1 32 (Web Server issue) W313W317 Applying
Support Vector Machines for Gene ontology based
gene function prediction BMC Bioinformatics.
2004 5 116.
20Ontologies documents on the web.
- ontology OR ontologies gt
- 575 000 by Google
- 952 000 by AllTheWeb
- 172 196 by Scirus
- annotation genome ontology gt
- 50 300 documents by Google
- 46 500 documents by AllTheWeb
- 6 532 documents by Scirus
21CONCLUSIONSGOorno GO