BioOntologies: a New Means of Travel for Biological Facts PowerPoint PPT Presentation

presentation player overlay
1 / 19
About This Presentation
Transcript and Presenter's Notes

Title: BioOntologies: a New Means of Travel for Biological Facts


1
Bio-Ontologies a New Means of Travel for
Biological Facts
International Society for the History, Philosophy
and Social Studies of Biology Biannual Meeting,
Exeter 25-29 July 2007
  • Sabina Leonelli
  • Project How Well Do Facts Travel?
  • London School of Economics
  • s.leonelli_at_lse.ac.uk
  • www.sabinaleonelli.org

2
Outline
  • The role of Bio-Ontologies BOs in biological
    databases
  • Four interpretive steps in standardization
  • The epistemic status of BO terms situating
    concepts
  • A new type of theory in biology? Back to Mary
    Hesses network view
  • Implications data travel and use across research
    contexts
  • Conclusion on technology and theory-making

3
Biological Ontologies BOs
  • Context
  • Fast accumulation of data on model organisms,
    esp. genomics
  • Fragmentation of biology into local epistemic
    cultures
  • Common yearning for integrative understanding of
    organisms
  • Goal enhance availability and usability of data
    across research contexts
  • Means formal representations of areas of
    knowledge in which the essential terms are
    combined with structuring rules that describe the
    relationship between the terms. Knowledge that is
    structured in a bio-ontology can then be linked
    to the molecular databases (Bard and Rhee 2004)
  • Precisely defined terms related through DAGs
    structures
  • Association of terms with datasets

4
.
E.g. Gene Ontology Precise definition, large set
of associated data
5
(No Transcript)
6
Search by GO
7
Search returns children
Sum of MGI data
8
Returns set of genes annotated to this term
Search returns annotations to terms and sub-terms
(children)
9
BO Terms as Standards
  • Standard Coordination device facilitating
    interdisciplinary research (Berg 2004)
  • BO terms as neutral tools for scientific
    communication and exchange
  • Data are attached to specific BO terms purely for
    the purposes of retrieval by biologists
    interested in investigating the phenomenon to
    which the term refer
  • No theoretical interpretation involved BO terms
    are broad classificatory concepts conceived to
    pass on information without distorting or
    interpreting it
  • However Interpretation in standardisation is
    unavoidable (Bowker Star 1999)

10
Interpreting to Standardize 4 Steps
  • Abstraction processes Masking, distorting,
    simplifying or eliminating characteristics of
    entities to be standardised (data formatting)
  • De-contextualisation processes Black-boxing
    specific interests, methods and goals of
    producers of data (non-locality
    decoupling marks from provenance)
  • Knowledge-stabilisation processes Assemble
    precise definitions for each term and relation so
    as to mirror (what curators see as) the consensus
    in contemporary biology
  • Situating processes Associate each dataset with
    a specific term (and thus a specific phenomenon)
  • standardisation processes influence the
    database users understanding and use of data

11
BO Terms as Situating Concepts
  • Unambiguously defined as referring to specific
    phenomena (knowledge-stabilisation process)
  • Through gene annotation, each available dataset
    is associated with one or more BO term (situating
    process). This makes it possible to retrieve data
    relevant to the phenomena captured by those
    terms. But also, ..
  • .. it fixes the biological relevance of data as
    evidence BO terms determine the range of
    phenomena to be researched by reference to each
    dataset
  • BO terms are situating concepts they determine
    the future applicability of data by fixing the
    research contexts in which data can be of use
  • Vs. Unifying or explanatory concepts do not aim
    at explaining phenomena, but rather at describing
    a phenomenon so that data associated with it can
    easily be retrieved

12
Select data from publication or repository
  • Select data about gene product TSK from
    publication Suzuki et al., 2005 Plant Cell
    Physiol. 46736-742. TONSOKU Is Expressed in S
    Phase of the Cell Cycle and Its Defect Delays
    Cell Cycle Progression in Arabidopsis
  • Associate with term G2/M transition of mitotic
    cell cycle, which is defined as progression
    from G2 phase to M phase of the mitotic cell
    cycle
  • Looking for data on the mitotic cell cycle,
    researchers find gene product TSK as relevant to
    the G2/M transition
  • Gene product TSK could be relevant to
    researching other parts of the mitotic cell cycle
    but there is no evidence for this, so the
    database does not report this possibility
  • the biological relevance of dataset Y is
    restricted to the phenomenon captured by the term
    G2/M transition, thus excluding other, possibly
    relevant phenomena

Associate data with GO term
GO term refers to phenomenon X
Data are situated as relevant to phenomenon X
and not to other phenomena
13
A New Type of Theory in Biology?Mary Hesses
three criteria
  • Network of concepts
  • Situating concepts rather than unifying or
    explanatory concepts
  • Observational and theoretical language
  • Concepts are primarily meant to refer to existing
    phenomena mix of observational and theoretical
  • Internal coherence and economy
  • Consistency among terms should not have the
    same referents (otherwise redundant/obsolete)
  • Minimalism the most useful standards are those
    that consist of the minimal number of the most
    informative parameters (Brazma et al 2006, 594)

14
Implications
  • Data travel made easier
  • Easy retrieval and comparison
  • Easy to check and form new hypotheses
  • Relatively simple access skills IT skills
    acquaintance with BOs
  • What about data use?
  • Easy retrieval of information about data
    provenance (evidence codes)
  • BUT users need to be aware of interpretive
    processes involved in standardization

15
Conclusion When Technology Makes a Difference to
Theory-Making
  • Digital technology does not guarantee
    objectivity
  • Curators estimate the biological relevance of
    data as evidence for phenomena
  • Curators define situating concepts
  • Yet, technology efficiently mediates between
    different (local) expertises
  • Integration of data from various sources
  • Opportunity for comparisons and queries
  • Differential access to information depending on
    expertise layers of complexity and detail
    reachable through a mouse click
  • BOs bioinformatics towards integration without
    unification?

16
Abstract
  • Bio-ontologies are often presented as a
    neutral tool for the diffusion of facts about
    organisms to biologists that is, as a way to
    standardise the terminology and relations among
    terms used to describe biological processes, so
    that the immense amount of (especially
    microbiological) data recently accumulated on
    various aspects of the main model organisms can
    be brought together and made accessible to the
    whole biological community. In this paper, I
    argue that bio-ontologies are not a neutral
    vehicle for the diffusion of evidence. Rather,
    they constitute a new type of biological theory,
    incorporating a specific perspective on
    biological phenomena, through which data are
    re-interpreted in order to fit specific research
    goals. Notably, one of these goals consists of
    integrating the available knowledge about various
    aspects of any organisms into an overall
    understanding of their biology. The main issues
    that I shall address in this paper are thus the
    following how well do biological facts circulate
    through bio-ontologies? How effective is the use
    of bio-ontologies towards obtaining integration
    in biology? And what kind of integration is that
    is it actually possible to distinguish it from
    a kind of theoretical unification? In addressing
    these questions, I focus on the use of one of the
    bio-ontologies, the so-called Gene Ontology, to
    structure and display data about Arabidopsis
    thaliana within The Arabidopsis Information
    Resource.

17
No associated data!
18
(No Transcript)
19
Opens Browser
Write a Comment
User Comments (0)
About PowerShow.com