Title: Designing and Building a Biodiversity Grid: Experiences from the BiodiversityWorld Project
1Designing and Building a Biodiversity
GridExperiences from the BiodiversityWorld
Project
- Andrew C. Jones
- Cardiff University, UK
- Andrew.C.Jones_at_cs.cardiff.ac.uk
2The BiodiversityWorld project
- 3 year e-Science project funded by the UK BBSRC
research council - Universities of Reading, Cardiff and Southampton
The Natural History Museum (London)
2
3Some Background ...
4The GRID e-Science
- A computational grid is a hardware and software
infrastructure that provides dependable,
consistent, pervasive, and inexpensive access to
high-end computational capabilities(Foster
Kesselman The Grid) - e-Science is about global collaboration in key
areas of science and the next generation of
infrastructure that will enable it. The
infrastructure to enable this science revolution
is generally referred to as the Grid(Hey
Trevethen The UK e-Science CoreProgramme the
Grid)
4
5GRAB (GRid And Biodiversity)
- 6 month DTI-funded demonstrator project
- Project aim
- Assess Grids potential for collaborative
research in biodiversity informatics - Supporting discovery use of diverse
biodiversity-related databases - Exploring use of Globus SRB middleware
5
66
7GRAB resource types
...
GRAB interface
- Catalogue of life
- Scientific common names
- Species Information System (SIS)
- Images geography
- Climate
- Max/min temperature annual precipitation
7
8Issues in GRAB
- Problems installing Globus research software
- Essentially wanted to send distributed requests
receive responses - Initial HTTP-based prototype worked well
- Versions of SRB then available had little to
offer - Globus 2 approach needed canned queries,
temporary files, etc much more difficult than
the HTTP prototype
8
9What we want to achieve
10Some difficult Biodiversity questions
- How should conservation efforts be concentrated?
- (example of Biodiversity Richness Conservation
Evaluation) - Where might a species be expected to occur, under
present or predicted climatic conditions? - (example of Bioclimatic Ecological Niche
Modelling) - How can geographical information assist in
selection among possible phylogenetic trees? - (example of Phylogenetic Analysis Palaeoclimate
Modelling)
10
11Some relevant resource types
- Data sources
- Catalogue of life
- Species Information Sources (SISs)
- Species geography
- Descriptive data
- Specimen distribution
- Geographical
- Boundaries of geographical political units
- Climate surfaces
- Genetic sequences
- Analytic tools
- Biodiversity richness assessment various
metrics - Bioclimatic modelling bioclimatic
envelopegeneration - Phylogenetic analysis (generation of
phylogenetictrees)
11
12Some challenges
- Finding the resources
- Knowing how to use these heterogeneous resources
- Originally constructed for various reasons
- Often little thought was given to standards or
interoperability - So need to have appropriate associated metadata
12
13Our vision (1)
- Biodiversity Problem Solving Environment
- Heterogeneous diverse resources
- Facilitating integration of both legacy and
newly-developed resources - Flexible workflows
- Main challenges centre around metadata,
interoperability, resource discovery, etc - High-performance computing secondary(though
relevant)
13
14Our vision (2)
- Distinctive features
- a biodiversity informatics GRID
- interoperability with heterogeneous data, complex
in structure - resilience to infrastructure change
interoperation with other GRIDs - interactive collaboration a secondary concern
- We want to automate tasks such as the following
analysis
14
1515
1616
1717
18Our architecture
19BiodiversityWorld as a flexible PSE
19
20Interoperability in BiodiversityWorld
- Initial proof-of-concept prototype used Java
RMI no serious attention to interoperability at
that stage - Have now defined BiodiversityWorld-Grid Interface
(BGI) addressing need to - wrap resources to remove needless heterogeneity
- wrap the wrapped resources (!) to insulate from
infrastructure change - use metadata to cope with remaining heterogeneity
20
21BiodiversityWorld architecture
User interface
Presentation
Workflow
enactment
Wrapped
Native
engine
resources
Biodiversity
-
Metadata
World
repositor
y
Resources
BGI API
BiodiversityWorld
-
GRID
Interface
(BGI)
The GRID
21
22BGI architecture
22
23Some implications
- Wrapping
- Various ways of introducing resources (see later)
- Computationally intensive applications
- Assume these will lie within a single BDW
resource - Interoperability with other Grids
- Could wrap non-BDW resources
- Could rely on (e.g.) WSRF for communications with
our GRID - Highly interactive applications
- BGI OK for coarse-grained interaction other
possibilities (see later)
23
24Resources for BiodiversityWorld
1
2
3
4
5
6
7
Wrapped non-Java resource
Grid software (of some sort)
24
25User interaction with BDW
26Example work-flow (Climate-space Modelling)
Submit scientific name retrieve accepted name
synonyms for species
Present or recent climate surfaces
Retrieve distribution data for species of interest
Model of climatic conditions where species is
currently found
Prediction of suitable regions for species of
interest
Possibly different climate surfaces (e.g.
predicted climate)
World or regional maps
Projection of predicted distribution on to base
map
26
27BDWorld / Triana in operationWorkflow creation
(design, editing)
27
28Triana screen-shots
28
29Triana screen-shots
29
30Triana screen-shots
30
31Triana screen-shots
31
32BDWorld / Triana in operationWorkflow
execution (enactment, run-time)
32
33Triana screen-shots
33
34Triana screen-shots
34
35Triana screen-shots
35
36Triana screen-shots
36
37Workflows
- Creating a workflow
- Workflows clearly good for capturing complex
tasks - Good for tweaking tasks
- But is this how users think?
- If not, we should provide an environment that
supports a more exploratory approach too, e.g. - User tries out some small subtasks
- (S)he joins results together
- System records interactions, so re-usable
workflows can be composed
37
38Other aspects of user interface
- The drag-and-drop metaphor needs further
research into the best ways to support - resource discovery
- resource matching
- data management (e.g. temporary storage of
intermediate results)
38
39Complex interactions
- BGI not well-suited to fine-grained interaction
- Stand-alone applications difficult to wrap
- may need, e.g., screen scraping
- Were looking at
- Less portable by-pass mechanisms, e.g.
- New BGI protocol
- Existing techniques (in extremis) e.g. VNC
- Plug-ins for the BDW client
- External tools
- (which will always be needed)
39
40Role of metadata
- Metadata is needed to enable discovery of
resources and to indicate how they are to be used - Properties to help locate appropriate resources
- Check interoperability, suggest transformations
- Provenance of data sets
- Log of work-flows executed
40
41Architectural issues in BDW
- Globus 3 provides Grid Services, but still
evolving (WSRF in Globus 4) - Trade-off abstraction layer (BGI) including
invocation mechanism - Insulates from change
- Wraps resources to remove needless heterogeneity
- Wraps the wrapped resources (!) to insulate from
infrastructure change - (3 implementations now Java RMI-, Globus OGSA-
and Web Services-based) - Performance penalty
- Assume computationally intensive applications lie
in a single BDW resource - Hinders interoperation with other
Grid/Webservices
41
42A dream
- Desktop environment in which scientists drag
drop data sources, analysis and modelling tools,
and visualisation interfaces into desired
sequence of operations which can be run
automatically - BDWorld just about at this stage
- With additional features (some described
earlier), the environment could be made richer,
more productive, and support research groups. - Essentially a component-based visual programming
environment - Not just for biodiversity!
42
43Acknowledgements
- UK DTI, EPSRC BBSRC EU
- Collaborators at
- Cardiff University
- Southampton University
- The University of Reading
- The Natural History Museum (London)
- Organisations that have co-operated with these
research projects, especially - Species 2000
- ILDIS
- FishBase
- Hadley Centre for Climate Prediction and Research
43