Title: EXPERIENCES IN BUILDING AN ONTOLOGYDRIVEN IMAGE DATABASE FOR BIOLOGISTS
1EXPERIENCES IN BUILDING AN ONTOLOGY-DRIVEN IMAGE
DATABASE FOR BIOLOGISTS
2Outline
- Why are images important?
- What is the BioImage database?
- Why use a semantic web architecture?
- Lessons and research questions
3Why are biological images important in the
post-genomic age?
- Images are semantic instruments for capturing
aspects of the real world, and form a vital part
of the scientific record, for which words are no
substitute - In the post-genomic world, attention is now
focused on the organization and integration of
information within cells, for functional analyses
of gene products - In a month a single active cell biology lab may
generate between 10 and 100 Gbytes of
multidimensional image data
4Images are complex
- An image database must be able to store original
images in any digital format currently available
or yet to be invented, including multi-channel 3D
images, multi-channel videos, etc.
5The need for image databases
- The value of digital image information depends
upon how easily it can be located, searched for
relevance, and retrieved - Detailed descriptive metadata about the images
are essential - Without them, digital image repositories become
little more than meaningless and costly data
graveyards - Despite the growth of on-line journals that
permit the inclusion of media objects, few of
these resources are freely available, and those
that are are difficult to locate and are not
cross-searchable - There is thus a need for a free publicly
available image database with rich
well-structured searchable metadata - The BioImage Database seeks to fulfil that need
6This view has a growing acceptance
7What metadata?
- Image acquisition (who took the original
micrograph, where, when, under what conditions,
for what purpose, etc.) - The media object itself (source and derivation,
image type, dynamic range, resolution, format,
codec, etc.), - The denotation of the referent (e.g. the name,
age and condition of the subject), - Connotation of the referent (the images
interpretation, meaning, purpose or significance,
its relevance to its creator and others, and its
semantic relationship to other images). - Field aspects of the real world that cannot
conveniently be attached to any particular object
(e.g. variations of illumination intensity or
chemo-attractant concentration across the field
of view of a light microscope image). - Sequences of change where there is a need to
preserve the concept of object identity in the
face of radical spatio-temporal changes in
appearance.
8Why use a semantic web architecture?
- Traditional relational databases dont meet our
needs - Image data is complex, layered, and difficult to
model - Images are searched primarily through their
metadata - Metadata is time consuming and difficult to
obtain - Ontologies offer the promise of better retrieval
accuracy through linking to instances in an
ontology, rather than attempting to process free
text. - Ontologies offer the promise of easy
inter-operability with other systems
9The BioImage Ontology
10Lessons learnedPerformance, scalability
- Database retrieval is slower than a traditional
database would be - Scalability remains to be tested (true for all
semantic web software) - Query languages (RDQL) are immature when compared
to SQL - Parsing RDF is hard and slow (RDF-ABBREV output
of the Jena parser is unreliable and the
unstriped format requires multiple passes to
create XML that can easily be transformed to HTML)
11A problem with ontologies?
- The volume of data generated in the Life Sciences
is now estimated to be doubling every month - Already people look less and less at the raw
scientific data (unless they are their own
results) - As this volume of data accumulates, few if any of
us will have the time or the mental capacity to
assimilate new data, structure them in a
meaningful way and extract information, without
first processing the data through an ontology or
some other similar machine-based organisational
aid - THE ONTOLOGY WILL BE WRONG! (or we should all
pack up and go home)
12Paradigm shifts
- Our human understanding of an area of science is
never static, but is constantly being revised by
new research - Such revisions in understanding are either
evolutionary (incremental), following the
progressive discovery of more and more detail,
interpreted according to the prevailing paradigm,
or revolutionary, when the prevailing paradigm is
overthrown by another - How do paradigm revolutions succeed?
- "A new scientific truth does not triumph by
convincing its opponents and making them see the
light, but rather because its opponents
eventually die, and a new generation grows up
that is familiar with it" - (Max Planck, 1949)
13Factors preventing evolution
- Ontology builders are monks (and nuns) - led by
an abbot, a relatively senior domain expert
likely to be committed to encapsulating the
dominant paradigm - Substantial problems confront any newcomers
wishing to contribute, since ontology building is
time-consuming and expensive - Since an ontology expresses the community
consensus, there will be massive social pressures
against change - If large volumes of data have already been
encoded using an existing ontology, this will
make it difficult to introduce change - The first ontology in a domain may assume a
monopolistic position that becomes unassailable,
even if it has universally acknowledged
weaknesses - Ontologies are unlikely to evolve in response to
the same market forces that drive the development
of applications software
14Encapsulating the dominant paradigm
- Imagine a section of an ontology describing the
development of adult mammalian bone marrow and
brain, constructed according to the pre-1980
dominant paradigm that bone marrow develops from
mesoderm, while brain develops from
ectoderm
15An example of paradigm evolution
- Subsequently, adult mouse brain was found to
contain haemopoietic stem cells - Bartlett (1982) hypothesised that these cells
developed from foetal haemopoietic cells that
entered the brain tissue before the barrier was
established - This challenge to the dominant paradigm that
brain tissues are derived exclusively from
ectoderm can be accommodated by extending the
graph
16An example of paradigm revolution
- More recently, Brazelton et al. (2000) claimed
that haemopoietic stem cells from adult bone
marrow can develop into neural cells in adult
mouse brain - If true, this result overthrows the paradigm that
neuronal cells can only develop from embryonic
ectoderm, requiring a new ontology incompatible
with the old - This new ontology is no longer an extension of
the previous one, since neural cells no longer
develop only from foetal neuroepithelium
17A way forward using Named Graphs in RDF (and
OWL?)
- In response to considerable frustration and
confusion within the RDF community about the best
method of reifying RDF statements, Jeremy Carroll
et al. proposed an extension to RDF
18Thanks and acknowledgements
- David Shotton and Simon Sparks for BioImage
developments (http//www.bioimage.org) - John Pybus, our computer systems manager, for
keeping us running in spite of the problems - Liz Mellings for unbounded patience inputting
data and testing - The European Commission for funding the BioImage
Project (EC IST 5th Framework Contract
2001-32688 ORIEL Online Research Information
Environment for the Life Sciences
http//www.oriel.org)
19End