Title: Helping Biodiversity Researchers to do their Work
1Helping Biodiversity Researchers to do their Work
- Collaborative e-Science and Virtual Organisations
- Richard White
- Cardiff University
- r.j.white_at_cs.cardiff.ac.uk
2Biodiversity research
- Biologists are working to understand the
adaptation of organisms to their environmental
niche, - eventually by combining knowledge at all the
levels of biological organisation - and to predict their interactions with their
environment
- tissue
- organ
- individual whole organism
- population
- species
- evolutionary pathways
- genome
- transcription
- proteome
- metabolic pathways
- cell
3Examples of biodiversity resources
- Scientists working with biodiversity information
employ a wide variety of resources which may be
available on various local and remote computer
platforms. - Data sources
- Names Species 2000 ITIS Catalogue of Life
- Data GBIF, sequence databases
- Geography Gazetteers
- Collections and distributions BioCASE, MaNIS
- Analysis tools
- Statistical and multivariate analysis
- Modelling
- Presentation and visualisation
4Use of resources together
- Scientists frequently need to use several of
these resources in sequence to carry out their
research. - When working with biodiversity data, much effort
is currently expended in - initially acquiring resources
- installing and sometimes adapting them to run on
the users own machine - converting and transporting data sets between
stages of the analysis process
5Problem-Solving Environments
- A problem-solving environment (PSE) is a
software workbench to help - scientists build bigger analyses and models more
easily, and thus - make it easier to answer Big Questions,
especially those with the complexity of
biodiversity informatics - A PSE allows the user to
- select appropriate local and remote resources
- arrange them into a workflow
- execute the workflow
- automatically manage access to the chosen
resources - save a workflow for modification and re-use
6The Biodiversity World (BDWorld) project
- A 3-year e-Science project funded by BBSRC (UK)
- To design, build and demonstrate a working proof
of concept PSE with appropriate data and analysis
resources to support biodiversity research - 3 example applications showing how scientists can
use it to assist biodiversity research - Our goals are to develop and enhance this
approach to collaborative computing and to
encourage its wide adoption, by research in the
areas described later
7Example Climate-space modelling
- Modelling and predicting changes in distribution
in response to climate changes such as those
brought about by global warming - Steps involved
- Get current distribution of a species (e.g.
specimen records) - Get current or recent climate data for those
localities - Calculate a model for the climate space the
species can occupy - Predict the distribution the species would have
in any specified climate (may be different to the
climate used above) - Project back on world map
8Example work-flow (Climate-space Modelling)
Submit scientific name retrieve accepted name
synonyms for species
Species 2000
Climate
Present or recent climate surfaces
Localities
ClimateSpace Model
Retrieve distribution data for species of interest
Model of climatic conditions where species is
currently found
Prediction of suitable regions for species of
interest
Prediction
Climate
Possibly different climate surfaces (e.g.
predicted climate)
Base Maps
World or regional maps
Projection
Projection of predicted distribution on to base
map
9BDWorld / Triana in operation1Workflow creation
(design, editing)
10Triana screen-shots
11Triana screen-shots
12Triana screen-shots
13Triana screen-shots
14Triana screen-shots
15Triana screen-shots
16Triana screen-shots
17BDWorld / Triana in operation 2Workflow
execution (enactment, run-time)
18Triana screen-shots
19Triana screen-shots
20Triana screen-shots
21Triana screen-shots
22Triana screen-shots
23Design of architecture
- to facilitate
- resource discovery
- semantic mediation
- workflow creation and enactment
- management of data generated by workflows, etc.
24Workflows
- Resources are called into use in an appropriate
sequence from an interactive workflow. - The facility for scientists to be able to create
their own workflows, without the need for regular
assistance from computer scientists, is an
essential part of the BDWorld system. Accessible
tools for resource discovery and for workflow
design, enactment and re-use are therefore
required.
25Difficulties with resources
- Finding the resources
- Knowing how to use these heterogeneous resources
- Originally constructed for various reasons, often
with little attention to standards or
interoperability - Have to pass data sets from one to another
- Some involve user interaction
26User interface
- The drag-and-drop metaphor needs further research
into the best ways to support - resource discovery
- resource matching
- data management (e.g. temporary storage of
intermediate results) - Perhaps using a plug-in architecture, so that
third parties can extend it as required
27Extensibility
- to allow scientific and technical users to add
new resources to the environment, without the
involvement of the system programmers wherever
possible
28Flexibility
- software libraries that can be extended to
provide interfaces to new resources, but which
can also be configured for many common
requirements without the need for programmed
extensions
29Intelligent agents
- mediation or facilitation to manage the semantic
heterogeneity encountered in all aspects of the
PSE, including the names of organisms and their
components, resolution of geographical data, etc.
30Virtual Organisations
- The facilities described so far can be used by a
single scientist working on their research. - By adding additional functionality to the PSE, it
can be used to support collaboration between
scientists in virtual organisations.
31Security and authentication
- the PSE and all the relevant resources and
results will be - accessible with a single log-on
- can be shared with other members of a virtual
organisation
32Management
- Mechanisms for managing the experiments and
analyses which the workflows represent, including
- maintenance of logs and provenance information,
- distributed storage of intermediate data sets,
etc., - to reduce the burden on the scientists and
increase their productivity. This - (not only) helps individual scientists look after
their own data and results - (but also) imposes order and helps scientists
collaborate in the same or linked analyses
33Role of metadata
- Metadata is needed to enable discovery of
resources and to indicate how they are to be
used. - Properties to help locate appropriate resources
- Check interoperability, suggest transformations
- Provenance of data sets
- Log of work-flows executed
34Knowledge-base
- Metadata, thesaurus and knowledge management to
support the facilities described above, in which
provenance data and biodiversity-specific
knowledge are recorded, maintained, and used by
other components of the PSE. - In addition, it can translate between different
terminology as used by different scientists
trying to collaborate - For example, by providing concept-based
cross-mapping between alternative taxonomies as
described by Andrew Jones
35A dream
- A desktop environment in which scientists can
drag and drop data sources, analysis and
modelling tools, and visualisation interfaces
into a desired sequence of operations which can
be run automatically. - BDWorld is just about at this stage at present.
- With the additional features described above,
such an environment could be made richer, easier
to use, more productive, and support research
groups. - Something like a component-based visual
programming environment. - Not just for biodiversity!
36Summary
- Problem-solving Environment
- Architecture
- User interface
- Extensibility
- Virtual organisations
- Managing use of workflows and results
- Knowledge-base
37Acknowledgements
- BBSRC (UK)
- Collaborators in the BDWorld project
Universities of Reading and Southampton Natural
History Museum (London) - Organisations that have co-operated with these
research projects, especially - Species 2000
- ILDIS
- FishBase
- Hadley Centre for Climate Prediction and Research
38Merry Christmas
- And a Convergent New Year!