Title: BioOntologies: a New Means of Travel for Biological Facts
1Bio-Ontologies a New Means of Travel for
Biological Facts
International Society for the History, Philosophy
and Social Studies of Biology Biannual Meeting,
Exeter 25-29 July 2007
- Sabina Leonelli
- Project How Well Do Facts Travel?
- London School of Economics
- s.leonelli_at_lse.ac.uk
- www.sabinaleonelli.org
2Outline
- The role of Bio-Ontologies BOs in biological
databases - Four interpretive steps in standardization
- The epistemic status of BO terms situating
concepts - A new type of theory in biology? Back to Mary
Hesses network view - Implications data travel and use across research
contexts - Conclusion on technology and theory-making
3Biological Ontologies BOs
- Context
- Fast accumulation of data on model organisms,
esp. genomics - Fragmentation of biology into local epistemic
cultures - Common yearning for integrative understanding of
organisms - Goal enhance availability and usability of data
across research contexts - Means formal representations of areas of
knowledge in which the essential terms are
combined with structuring rules that describe the
relationship between the terms. Knowledge that is
structured in a bio-ontology can then be linked
to the molecular databases (Bard and Rhee 2004) -
- Precisely defined terms related through DAGs
structures - Association of terms with datasets
4.
E.g. Gene Ontology Precise definition, large set
of associated data
5(No Transcript)
6Search by GO
7Search returns children
Sum of MGI data
8Returns set of genes annotated to this term
Search returns annotations to terms and sub-terms
(children)
9BO Terms as Standards
- Standard Coordination device facilitating
interdisciplinary research (Berg 2004) - BO terms as neutral tools for scientific
communication and exchange - Data are attached to specific BO terms purely for
the purposes of retrieval by biologists
interested in investigating the phenomenon to
which the term refer - No theoretical interpretation involved BO terms
are broad classificatory concepts conceived to
pass on information without distorting or
interpreting it - However Interpretation in standardisation is
unavoidable (Bowker Star 1999)
10Interpreting to Standardize 4 Steps
- Abstraction processes Masking, distorting,
simplifying or eliminating characteristics of
entities to be standardised (data formatting) - De-contextualisation processes Black-boxing
specific interests, methods and goals of
producers of data (non-locality
decoupling marks from provenance) - Knowledge-stabilisation processes Assemble
precise definitions for each term and relation so
as to mirror (what curators see as) the consensus
in contemporary biology - Situating processes Associate each dataset with
a specific term (and thus a specific phenomenon) - standardisation processes influence the
database users understanding and use of data
11BO Terms as Situating Concepts
- Unambiguously defined as referring to specific
phenomena (knowledge-stabilisation process) - Through gene annotation, each available dataset
is associated with one or more BO term (situating
process). This makes it possible to retrieve data
relevant to the phenomena captured by those
terms. But also, .. - .. it fixes the biological relevance of data as
evidence BO terms determine the range of
phenomena to be researched by reference to each
dataset - BO terms are situating concepts they determine
the future applicability of data by fixing the
research contexts in which data can be of use - Vs. Unifying or explanatory concepts do not aim
at explaining phenomena, but rather at describing
a phenomenon so that data associated with it can
easily be retrieved
12Select data from publication or repository
- Select data about gene product TSK from
publication Suzuki et al., 2005 Plant Cell
Physiol. 46736-742. TONSOKU Is Expressed in S
Phase of the Cell Cycle and Its Defect Delays
Cell Cycle Progression in Arabidopsis - Associate with term G2/M transition of mitotic
cell cycle, which is defined as progression
from G2 phase to M phase of the mitotic cell
cycle - Looking for data on the mitotic cell cycle,
researchers find gene product TSK as relevant to
the G2/M transition - Gene product TSK could be relevant to
researching other parts of the mitotic cell cycle
but there is no evidence for this, so the
database does not report this possibility - the biological relevance of dataset Y is
restricted to the phenomenon captured by the term
G2/M transition, thus excluding other, possibly
relevant phenomena
Associate data with GO term
GO term refers to phenomenon X
Data are situated as relevant to phenomenon X
and not to other phenomena
13A New Type of Theory in Biology?Mary Hesses
three criteria
- Network of concepts
- Situating concepts rather than unifying or
explanatory concepts - Observational and theoretical language
- Concepts are primarily meant to refer to existing
phenomena mix of observational and theoretical - Internal coherence and economy
- Consistency among terms should not have the
same referents (otherwise redundant/obsolete) - Minimalism the most useful standards are those
that consist of the minimal number of the most
informative parameters (Brazma et al 2006, 594)
14Implications
- Data travel made easier
- Easy retrieval and comparison
- Easy to check and form new hypotheses
- Relatively simple access skills IT skills
acquaintance with BOs - What about data use?
- Easy retrieval of information about data
provenance (evidence codes) - BUT users need to be aware of interpretive
processes involved in standardization
15Conclusion When Technology Makes a Difference to
Theory-Making
- Digital technology does not guarantee
objectivity - Curators estimate the biological relevance of
data as evidence for phenomena - Curators define situating concepts
- Yet, technology efficiently mediates between
different (local) expertises - Integration of data from various sources
- Opportunity for comparisons and queries
- Differential access to information depending on
expertise layers of complexity and detail
reachable through a mouse click - BOs bioinformatics towards integration without
unification?
16Abstract
- Bio-ontologies are often presented as a
neutral tool for the diffusion of facts about
organisms to biologists that is, as a way to
standardise the terminology and relations among
terms used to describe biological processes, so
that the immense amount of (especially
microbiological) data recently accumulated on
various aspects of the main model organisms can
be brought together and made accessible to the
whole biological community. In this paper, I
argue that bio-ontologies are not a neutral
vehicle for the diffusion of evidence. Rather,
they constitute a new type of biological theory,
incorporating a specific perspective on
biological phenomena, through which data are
re-interpreted in order to fit specific research
goals. Notably, one of these goals consists of
integrating the available knowledge about various
aspects of any organisms into an overall
understanding of their biology. The main issues
that I shall address in this paper are thus the
following how well do biological facts circulate
through bio-ontologies? How effective is the use
of bio-ontologies towards obtaining integration
in biology? And what kind of integration is that
is it actually possible to distinguish it from
a kind of theoretical unification? In addressing
these questions, I focus on the use of one of the
bio-ontologies, the so-called Gene Ontology, to
structure and display data about Arabidopsis
thaliana within The Arabidopsis Information
Resource.
17No associated data!
18(No Transcript)
19Opens Browser