EXPERIENCES IN BUILDING AN ONTOLOGYDRIVEN IMAGE DATABASE FOR BIOLOGISTS - PowerPoint PPT Presentation

1 / 19

About This Presentation

Title:

EXPERIENCES IN BUILDING AN ONTOLOGYDRIVEN IMAGE DATABASE FOR BIOLOGISTS

Description:

University of Pennsylvania School of Medicine. Image ... (who took the original micrograph, where, when, under what conditions, for what purpose, etc. ... – PowerPoint PPT presentation

Number of Views:19

Avg rating:3.0/5.0

Slides: 20

Provided by: drda79

Category:

more less

Transcript and Presenter's Notes

Title: EXPERIENCES IN BUILDING AN ONTOLOGYDRIVEN IMAGE DATABASE FOR BIOLOGISTS

1
EXPERIENCES IN BUILDING AN ONTOLOGY-DRIVEN IMAGE
DATABASE FOR BIOLOGISTS
2
Outline

Why are images important?
What is the BioImage database?
Why use a semantic web architecture?
Lessons and research questions

3
Why are biological images important in the
post-genomic age?

Images are semantic instruments for capturing
aspects of the real world, and form a vital part
of the scientific record, for which words are no
substitute
In the post-genomic world, attention is now
focused on the organization and integration of
information within cells, for functional analyses
of gene products
In a month a single active cell biology lab may
generate between 10 and 100 Gbytes of
multidimensional image data

4
Images are complex

An image database must be able to store original
images in any digital format currently available
or yet to be invented, including multi-channel 3D
images, multi-channel videos, etc.

5
The need for image databases

The value of digital image information depends
upon how easily it can be located, searched for
relevance, and retrieved
Detailed descriptive metadata about the images
are essential
Without them, digital image repositories become
little more than meaningless and costly data
graveyards
Despite the growth of on-line journals that
permit the inclusion of media objects, few of
these resources are freely available, and those
that are are difficult to locate and are not
cross-searchable
There is thus a need for a free publicly
available image database with rich
well-structured searchable metadata
The BioImage Database seeks to fulfil that need

6
This view has a growing acceptance
7
What metadata?

Image acquisition (who took the original
micrograph, where, when, under what conditions,
for what purpose, etc.)
The media object itself (source and derivation,
image type, dynamic range, resolution, format,
codec, etc.),
The denotation of the referent (e.g. the name,
age and condition of the subject),
Connotation of the referent (the images
interpretation, meaning, purpose or significance,
its relevance to its creator and others, and its
semantic relationship to other images).
Field aspects of the real world that cannot
conveniently be attached to any particular object
(e.g. variations of illumination intensity or
chemo-attractant concentration across the field
of view of a light microscope image).
Sequences of change where there is a need to
preserve the concept of object identity in the
face of radical spatio-temporal changes in
appearance.

8
Why use a semantic web architecture?

Traditional relational databases dont meet our
needs
Image data is complex, layered, and difficult to
model
Images are searched primarily through their
metadata
Metadata is time consuming and difficult to
obtain
Ontologies offer the promise of better retrieval
accuracy through linking to instances in an
ontology, rather than attempting to process free
text.
Ontologies offer the promise of easy
inter-operability with other systems

9
The BioImage Ontology
10
Lessons learnedPerformance, scalability

Database retrieval is slower than a traditional
database would be
Scalability remains to be tested (true for all
semantic web software)
Query languages (RDQL) are immature when compared
to SQL
Parsing RDF is hard and slow (RDF-ABBREV output
of the Jena parser is unreliable and the
unstriped format requires multiple passes to
create XML that can easily be transformed to HTML)

11
A problem with ontologies?

The volume of data generated in the Life Sciences
is now estimated to be doubling every month
Already people look less and less at the raw
scientific data (unless they are their own
results)
As this volume of data accumulates, few if any of
us will have the time or the mental capacity to
assimilate new data, structure them in a
meaningful way and extract information, without
first processing the data through an ontology or
some other similar machine-based organisational
aid
THE ONTOLOGY WILL BE WRONG! (or we should all
pack up and go home)

12
Paradigm shifts

Our human understanding of an area of science is
never static, but is constantly being revised by
new research
Such revisions in understanding are either
evolutionary (incremental), following the
progressive discovery of more and more detail,
interpreted according to the prevailing paradigm,
or revolutionary, when the prevailing paradigm is
overthrown by another
How do paradigm revolutions succeed?
"A new scientific truth does not triumph by
convincing its opponents and making them see the
light, but rather because its opponents
eventually die, and a new generation grows up
that is familiar with it"
(Max Planck, 1949)

13
Factors preventing evolution

Ontology builders are monks (and nuns) - led by
an abbot, a relatively senior domain expert
likely to be committed to encapsulating the
dominant paradigm
Substantial problems confront any newcomers
wishing to contribute, since ontology building is
time-consuming and expensive
Since an ontology expresses the community
consensus, there will be massive social pressures
against change
If large volumes of data have already been
encoded using an existing ontology, this will
make it difficult to introduce change
The first ontology in a domain may assume a
monopolistic position that becomes unassailable,
even if it has universally acknowledged
weaknesses
Ontologies are unlikely to evolve in response to
the same market forces that drive the development
of applications software

14
Encapsulating the dominant paradigm

Imagine a section of an ontology describing the
development of adult mammalian bone marrow and
brain, constructed according to the pre-1980
dominant paradigm that bone marrow develops from
mesoderm, while brain develops from
ectoderm

15
An example of paradigm evolution

Subsequently, adult mouse brain was found to
contain haemopoietic stem cells
Bartlett (1982) hypothesised that these cells
developed from foetal haemopoietic cells that
entered the brain tissue before the barrier was
established
This challenge to the dominant paradigm that
brain tissues are derived exclusively from
ectoderm can be accommodated by extending the
graph

16
An example of paradigm revolution

More recently, Brazelton et al. (2000) claimed
that haemopoietic stem cells from adult bone
marrow can develop into neural cells in adult
mouse brain
If true, this result overthrows the paradigm that
neuronal cells can only develop from embryonic
ectoderm, requiring a new ontology incompatible
with the old
This new ontology is no longer an extension of
the previous one, since neural cells no longer
develop only from foetal neuroepithelium

17
A way forward using Named Graphs in RDF (and
OWL?)

In response to considerable frustration and
confusion within the RDF community about the best
method of reifying RDF statements, Jeremy Carroll
et al. proposed an extension to RDF

18
Thanks and acknowledgements

David Shotton and Simon Sparks for BioImage
developments (http//www.bioimage.org)
John Pybus, our computer systems manager, for
keeping us running in spite of the problems
Liz Mellings for unbounded patience inputting
data and testing
The European Commission for funding the BioImage
Project (EC IST 5th Framework Contract
2001-32688 ORIEL Online Research Information
Environment for the Life Sciences
http//www.oriel.org)

19
End

Write a Comment

User Comments (0)