Title: EMERALD Ontology Efforts
1EMERALD Ontology Efforts
- James Malone, Helen Parkinson
2Overview
- Background
- Ontologies the what and why?
- Building the best ontology we can Global
collaboration - Progress to date
- Future Steps
3Background The Trouble with Microarray data
- Successful analysis and reproducibility of
microarray experiments is dependant upon quality
documentation and descriptions that are used to
report microarray experiments. - MIAME (Minimum Information About a Microarray
Experiment) - MAGE-TAB (MicroArray Gene Expression Tabular)
- Reproducing experiments requires reporting of
transformations on data that are - Unambiguous
- Consistent
- Understandable (in use of language and context)
!
4A Common Question What on Earth are ontologies?
- Originally a philosophical invention from ancient
Greece - Used to describe the entities of existence and
their relationship within this framework - For our purposes we will use the widely cited
definition of Tom Gruber An ontology is a
specification of a conceptualization. - In other words, they describe explicitly the
concepts, the objects and relationships that hold
among them that exist in some given domain. - This last point is important because we are
defining the world within some specific scope,
e.g. gene expression or genetics or biology or
science generally. - Clearly the larger the scope or the community,
the more complex the domain to model (and the
more people have to agree on our model
definitions!) this can make ontological
modelling hard
challenging
5Why do we need an ontology?
- Consider some of the reasons as to why anyone in
bioinformatics uses ontologies - Semantics The meaning of meaning?
- Ontologies define the syntax and semantics of
concepts and relationships that hold between
these concepts for a given domain richer
representation of data - Information sharing shared understanding
- Explicitness helps to remove ambiguity and helps
other understand what it is we mean - Machine readable (To Computer Scientists the most
interesting part ?) - Using languages such as Web Ontology Language
(OWL), ontologies can be interpreted by software
programmes
6Why do we need an ontology in EMERALD?
- Diversity in microarray experiment designs and
applications requires that a large number of
pre-processing approaches are available - Our previous check list
- reporting of transformations on data that are
- Unambiguous
- Consistent
- Understandable (in use of language and context)
- Powerful querying of biological models
7An Ontology Example Querying and Browsing
The Arabidopsis Information Resource
8An Ontology Example Visualisation and
MappingE.g. Edinburgh ATLAS
9How Are We Building the Normalisation and
Transformation Ontology (NTO)?
- NTO is a coordinated action involving members of
an assembled working group from around the globe - Including biologists, biochemists, ontologists,
computer scientists, statisticians, phillosophers
and MDs. - Collaboration with OBI project (more in a minute)
- Weekly teleconference calls
- Use of SVN for version controlling of ontology
- Face to face workshop meeting with working group
- Encouraging submission from potential users
- Dissemination
10Ontology for Biomedical Investigations
(OBI)http//obi.sourceforge.net/
- OBI has the grand scope of enabling the modelling
of any biomedical investigation, regardless of
domain - Orthogonal coverage, reuse of existing resources
and shared frameworks
Cell Type Ontology
Chemical Entities of Biological Interest (ChEBI)
OBI
11NTO Progress to Date
- Use case collection (recently started to place
some online https//wiki.cbil.upenn.edu/obiwiki/in
dex.php/EvaluationPhase1Submissions ) - Still welcoming submissions to (malone_at_ebi.ac.uk
or Obi-datatrfm-branch_at_lists.sourceforge.net)
Feedback and iterate
Building
Identify Scope
Capture
Coding
Integrating
Evaluation
Version 1.0
Version 1.1
Document
12Use Cases and Competency Questions
- Competency questions
- Which genes have a 2 fold change in expression
where MAS5 has been applied as a data
transformation methodology? - Which pre-processed microarray data expresses
values as log ratios (of two conditions) for a
specified logarithmic base? - Use Case
- An experimenter has conducted an expression
microarray experiment involving two conditions
with replicate assays per condition, where they
have both biological and technical replicates.
They are running two kinds of differential
expression analyses (a) one at the gene level
and (b) one at the gene set level. In (a) the aim
is to identify differential expressed genes (e.g.
via algorithms like PaGE and SAM). In (b) the aim
is to identify, from an a priori given collection
of gene sets (e.g. user provided, or based upon
GO annotation), which of these sets are
differentially expressed as a whole (e.g. via
algorithms like GSEA or SAM-GSA). Before running
the analyses the data is preprocessed with the
following data transformation series (i) filter
out flagged reporters, (ii) normalize the
individual assays, (iii) average across technical
replicates (but not across biological
replicates). The above steps all requires
annotation using the ontology.
13NTO Progress to Date
14NTO Progress to Date
15Conclusion
- An NTO would give us
- Consistency in usage of terms through explicit
definitions - Widen reproducability of microarray experiments
- Richer representations, again definitions, but
also axioms, relationships, properties to
describe the data - Reduction of disparate efforts
- (potentially) mappings to external resources
- Most importantly the biology and the data are
given more relevance and increased utility - But relies on collaborative efforts and
consensus of opinion across domain (not always
easy!) - Annotating or modelling data with the ontology,
i.e. actual use
16Future Steps
- Meeting with OBI consortium and NTO working group
member in January in Vancouver, Canada - Publishing of Alpha version of NTO integrated
within OBI early 2008. - Evaluation of Alpha version with competency
questions and use case - Run annotation using ArrayExpress as assessment
- Iterate!!
17Thanks
- EMERALD Consortium (www.microarray-quality.org)
- OBI Consortium (http//obi.sourceforge.net/)
- Especially Tina Boussard, Ryan Brinkman, Melanie
Courtot, Elisabetta Manduchi, Monnie McGee, Helen
Parkinson, Philippe Rocca-Serra, Richard
Scheuermann - ArrayExpress team (www.ebi.ac.uk/arrayexpress/)
- All contributors and submitters to ontology