Helping Biodiversity Researchers to do their Work - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Helping Biodiversity Researchers to do their Work

Description:

... logs and provenance information, ... Provenance of data sets. Log of work-flows executed. 34. 17 ... support the facilities described above, in which provenance ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 39
Provided by: uni61
Category:

less

Transcript and Presenter's Notes

Title: Helping Biodiversity Researchers to do their Work


1
Helping Biodiversity Researchers to do their Work
  • Collaborative e-Science and Virtual Organisations
  • Richard White
  • Cardiff University
  • r.j.white_at_cs.cardiff.ac.uk

2
Biodiversity research
  • Biologists are working to understand the
    adaptation of organisms to their environmental
    niche,
  • eventually by combining knowledge at all the
    levels of biological organisation
  • and to predict their interactions with their
    environment
  • tissue
  • organ
  • individual whole organism
  • population
  • species
  • evolutionary pathways
  • genome
  • transcription
  • proteome
  • metabolic pathways
  • cell

3
Examples of biodiversity resources
  • Scientists working with biodiversity information
    employ a wide variety of resources which may be
    available on various local and remote computer
    platforms.
  • Data sources
  • Names Species 2000 ITIS Catalogue of Life
  • Data GBIF, sequence databases
  • Geography Gazetteers
  • Collections and distributions BioCASE, MaNIS
  • Analysis tools
  • Statistical and multivariate analysis
  • Modelling
  • Presentation and visualisation

4
Use of resources together
  • Scientists frequently need to use several of
    these resources in sequence to carry out their
    research.
  • When working with biodiversity data, much effort
    is currently expended in
  • initially acquiring resources
  • installing and sometimes adapting them to run on
    the users own machine
  • converting and transporting data sets between
    stages of the analysis process

5
Problem-Solving Environments
  • A problem-solving environment (PSE) is a
    software workbench to help
  • scientists build bigger analyses and models more
    easily, and thus
  • make it easier to answer Big Questions,
    especially those with the complexity of
    biodiversity informatics
  • A PSE allows the user to
  • select appropriate local and remote resources
  • arrange them into a workflow
  • execute the workflow
  • automatically manage access to the chosen
    resources
  • save a workflow for modification and re-use

6
The Biodiversity World (BDWorld) project
  • A 3-year e-Science project funded by BBSRC (UK)
  • To design, build and demonstrate a working proof
    of concept PSE with appropriate data and analysis
    resources to support biodiversity research
  • 3 example applications showing how scientists can
    use it to assist biodiversity research
  • Our goals are to develop and enhance this
    approach to collaborative computing and to
    encourage its wide adoption, by research in the
    areas described later

7
Example Climate-space modelling
  • Modelling and predicting changes in distribution
    in response to climate changes such as those
    brought about by global warming
  • Steps involved
  • Get current distribution of a species (e.g.
    specimen records)
  • Get current or recent climate data for those
    localities
  • Calculate a model for the climate space the
    species can occupy
  • Predict the distribution the species would have
    in any specified climate (may be different to the
    climate used above)
  • Project back on world map

8
Example work-flow (Climate-space Modelling)
Submit scientific name retrieve accepted name
synonyms for species
Species 2000
Climate
Present or recent climate surfaces
Localities
ClimateSpace Model
Retrieve distribution data for species of interest
Model of climatic conditions where species is
currently found
Prediction of suitable regions for species of
interest
Prediction
Climate
Possibly different climate surfaces (e.g.
predicted climate)
Base Maps
World or regional maps
Projection
Projection of predicted distribution on to base
map
9
BDWorld / Triana in operation1Workflow creation
(design, editing)
10
Triana screen-shots
11
Triana screen-shots
12
Triana screen-shots
13
Triana screen-shots
14
Triana screen-shots
15
Triana screen-shots
16
Triana screen-shots
17
BDWorld / Triana in operation 2Workflow
execution (enactment, run-time)
18
Triana screen-shots
19
Triana screen-shots
20
Triana screen-shots
21
Triana screen-shots
22
Triana screen-shots
23
Design of architecture
  • to facilitate
  • resource discovery
  • semantic mediation
  • workflow creation and enactment
  • management of data generated by workflows, etc.

24
Workflows
  • Resources are called into use in an appropriate
    sequence from an interactive workflow.
  • The facility for scientists to be able to create
    their own workflows, without the need for regular
    assistance from computer scientists, is an
    essential part of the BDWorld system. Accessible
    tools for resource discovery and for workflow
    design, enactment and re-use are therefore
    required.

25
Difficulties with resources
  • Finding the resources
  • Knowing how to use these heterogeneous resources
  • Originally constructed for various reasons, often
    with little attention to standards or
    interoperability
  • Have to pass data sets from one to another
  • Some involve user interaction

26
User interface
  • The drag-and-drop metaphor needs further research
    into the best ways to support
  • resource discovery
  • resource matching
  • data management (e.g. temporary storage of
    intermediate results)
  • Perhaps using a plug-in architecture, so that
    third parties can extend it as required

27
Extensibility
  • to allow scientific and technical users to add
    new resources to the environment, without the
    involvement of the system programmers wherever
    possible

28
Flexibility
  • software libraries that can be extended to
    provide interfaces to new resources, but which
    can also be configured for many common
    requirements without the need for programmed
    extensions

29
Intelligent agents
  • mediation or facilitation to manage the semantic
    heterogeneity encountered in all aspects of the
    PSE, including the names of organisms and their
    components, resolution of geographical data, etc.

30
Virtual Organisations
  • The facilities described so far can be used by a
    single scientist working on their research.
  • By adding additional functionality to the PSE, it
    can be used to support collaboration between
    scientists in virtual organisations.

31
Security and authentication
  • the PSE and all the relevant resources and
    results will be
  • accessible with a single log-on
  • can be shared with other members of a virtual
    organisation

32
Management
  • Mechanisms for managing the experiments and
    analyses which the workflows represent, including
  • maintenance of logs and provenance information,
  • distributed storage of intermediate data sets,
    etc.,
  • to reduce the burden on the scientists and
    increase their productivity. This
  • (not only) helps individual scientists look after
    their own data and results
  • (but also) imposes order and helps scientists
    collaborate in the same or linked analyses

33
Role of metadata
  • Metadata is needed to enable discovery of
    resources and to indicate how they are to be
    used.
  • Properties to help locate appropriate resources
  • Check interoperability, suggest transformations
  • Provenance of data sets
  • Log of work-flows executed

34
Knowledge-base
  • Metadata, thesaurus and knowledge management to
    support the facilities described above, in which
    provenance data and biodiversity-specific
    knowledge are recorded, maintained, and used by
    other components of the PSE.
  • In addition, it can translate between different
    terminology as used by different scientists
    trying to collaborate
  • For example, by providing concept-based
    cross-mapping between alternative taxonomies as
    described by Andrew Jones

35
A dream
  • A desktop environment in which scientists can
    drag and drop data sources, analysis and
    modelling tools, and visualisation interfaces
    into a desired sequence of operations which can
    be run automatically.
  • BDWorld is just about at this stage at present.
  • With the additional features described above,
    such an environment could be made richer, easier
    to use, more productive, and support research
    groups.
  • Something like a component-based visual
    programming environment.
  • Not just for biodiversity!

36
Summary
  • Problem-solving Environment
  • Architecture
  • User interface
  • Extensibility
  • Virtual organisations
  • Managing use of workflows and results
  • Knowledge-base

37
Acknowledgements
  • BBSRC (UK)
  • Collaborators in the BDWorld project
    Universities of Reading and Southampton Natural
    History Museum (London)
  • Organisations that have co-operated with these
    research projects, especially
  • Species 2000
  • ILDIS
  • FishBase
  • Hadley Centre for Climate Prediction and Research

38
Merry Christmas
  • And a Convergent New Year!
Write a Comment
User Comments (0)
About PowerShow.com