Title: GLOBAL BIODIVERSITY
1GLOBALBIODIVERSITY
WWW.GBIF.ORG
INFORMATIONFACILITY
GBIF Network as a Model for GISIN
Hannu Saarenmaa AAAS Annual Meeting Washington,
DC, February 20, 2005
2Outline
- Sharing and using primary biodiversity data
through GBIF - Overview of the GBIF information system
- Sharing species information from Species Banks
- GISIN architecture and GISIN as a Species Bank
- Conclusion
31.Sharing and using primarybiodiversity data
through GBIF
4GBIFs objective is
- to establish an distributed information
infrastructure that serves scientific
biodiversity data - with initial focus on primary data at specimen
and observation levels, and on names - expanding to species-level information,
- with links to molecular, genetic and ecosystems
levels - to function as a global integrator
5Pyramid of information
- Policy and decisions
- can benefit from
- Knowledge
- and
- Information
- which depend on
- Primary data
CHM
Refinement, analysis, synthesis
Other information networks
GISIN
GBIF area of responsibility
6What primary data exists?
- 1-3 billion physical specimens in museums
- Label data to bedigitised
- 300-400 million digital data records off-line
- Museums, observation networks, natural resource
surveys, etc. - 46 million records are online today through GBIF
- Using standard formats
7What is primary data?
Secondary information
- Point occurrence data with the basic attributes
- Identification
- Location
- Time
Primary data
Slide by A. Townsend Peterson
8Predicting geographic distributions with primary
data makes possible ...
- Projecting species invasions
- Designing reintroduction programs
- Understanding the effects of global climate
change and other types of change - Understanding rare and endangered species
distributions - Designing biodiversity conservation plans
- Many models such as Bioclim, GARP
Slide by A. Townsend Peterson
9Hydrilla Primary data of native range
Slide by A. Townsend Peterson
10Hydrilla Native modeled distribution
Slide by A. Townsend Peterson
11Hydrilla North America
Slide by A. Townsend Peterson
12Hydrilla North American infestations
Slide by A. Townsend Peterson
132. Overview of the GBIF information system
14User
GBIF component architecture
Metadata and name query
( UDDI )
( UDDI )
Provider query
Index
Index
Portal
Data Portal
Registry
Registry
Request Marshaller
Request Marshaller
Cache
Metadata
Cache
Metadata
Institutions Providers Services
Institutions Providers Services
Available providers
Metadata response
Query Engine
Query Engine
Accounting
Accounting
Publish availability
Metadata and statistics
DiGIR
Full data response
DiGIR
Full data query
Synonyms
SOAP
SOAP
Name provider
Name provider
Data provider
Data provider
HTTP
HTTP
Provider Services
Provider Services
Provider Services
Provider Services
other
other
Resource
Resource
Metadata
Metadata
15(No Transcript)
16- Turn-key packages available implementing
DiGIR/DarwinCore and BioCASe/ABCD - Available for Linux and Windows
- Supported by helpdesk_at_gbif.org
17GBIF (prototype) Data Portal
- Gateway to data of the providers
- Name service is a part of the data portal
- Search and browse data by name, country, etc.
- Drill in and download data, display simple maps
- Multilingual
- Maintains a cache of key data in case provider
goes off-line - Opened 6 February 2004
- Based on Java and MySQL
18(No Transcript)
19(No Transcript)
20(No Transcript)
21(No Transcript)
22(No Transcript)
23protocols
and
- Simple web services
- XML messaging between computer applications
- This is open data sharing -- not data exchange
with trade partner agreements - Enables search retrieval of structured data
- Enable single point of access (portal/search) to
distributed information resources - Created by the TDWG/CODATA subgroup on biological
collection data - Unified protocol in 2005 Merger of DiGIR and
BioCASe to TAPIR TDWG Access Protocol for
Information Retrieval
24Darwin Core and ABCD data formats
- Two XML schemata for data exchange available and
to choose from - Darwin Core is a minimal set
- 48 elements in flat structure
- Can be extended for instance, curatorial,
bacteriological, observational... - ABCD (Access to Biological Collection Data) is a
superset - 600 elements in hierarchical structure
- Can describe entire collection
25Image data standards
- ABCD can handle links to images now
- Metadata from Dublin Core
- Annotations standards of what is in image needed
- JPEG2000 in future
26Identification data standards
- DELTA
- Standard tied to aging software
- LUCID
- Data format less tied to new, evolving software
- Many electronic key products available
- SDD Structured Descriptive Data
- Character description
- New standard without software yet
273. Sharing species information through Species
Banks
- GBIF is starting to climb the pyramid of
information
28Encyclopedia of Life
- Imagine an electronic page for each species of
organism on Earth, ... - Linking dynamically to data, information, and
knowledge sources, such as - ARKive
- EcoPort
- GBIF data and name providers
- GenBank
- MORPHOBANK
- Tree of Life
29(No Transcript)
30(No Transcript)
31(No Transcript)
32(No Transcript)
33(No Transcript)
34Species banks
- Species home pages mushrooming, but no standard
exists for species information pages or how they
can be registered, accessed and virtualised - GBIF believes that there is not going to be a
Species Bank but a distributed cyber-infrastructur
e for species information and knowledge - Integrate information for various uses like
identification, invasives, pest control,
taxonomic review, ... - GBIF approach (again) is to integrate
- Standardise sharing of how species home pages and
their fragements and enable interoperability of
their providers - Symposium (STAG) in March 2005
35Technical view on Species Bank
- Species Bank is an idea of federated databases
serving elementary chucks of knowledge. - Example from diagnostic knowledge Descriplets -
have the form Taxon t has value v for feature
f. - Millions of such statements, not only about
diagnostics, may form a global Species Bank. - They originate from thousands of sources
- RDF/XML, semantic web, semantic grid, CYC
- Multimedia, free text must also be supported
- Link to primary data and modelling
36Slide courtesy of Kevin Thiele
375.GISIN needs and a possible GISIN architecture
- A specialised Species Bank
38Some use cases for GISIN
- Identify IAS at point of entry
- Decide on control measures
- Report IAS sighting, trigger alerts
- Model IAS spread and impact
- Find expertise and literature on IAS
39GISIN technology needs
- Standards for schemata for key data types in
particular species profile and its fragments - Tools for providers TAPIR protocol
- Federation via a registry - UDDI
- Integration - portals
40Database / provider types
- Six were identified by the Database Content
Working Group - Species profile or fact sheet / diagnostic
- Specimens (Darwin Core)
- Observations (Darwin Core extension)
- Expertise
- Bibliographic
- Projects / research
41Species profile standard
- Representation of IAS fact sheets and species
home pages - Well-structured
- Consists of basic species info, plus
community-specific extensions - Distribution, identification, trophic relations,
naming, expertise, IAS status, - Support for distributed authoring
- Work is underway
42IAS observation standard
- Elements important to invasive species science
- Native and non-native status
- Pathways of spread
- Host and parasitic organisms observed
- Impact, invasiveness, etc.
- Control techniques used
- Include other optional fields as guidance for new
database developers - This could be a Darwin Core extension for IAS
43GISIN registry
- In a service-oriented distributed architecture,
dynamic discovery and location independence of
the services is fundamental - Need to have a registry solution as part of the
architecture - Alternatives
- Dedicated GISIN UDDI with replication to/from
GBIF - Appear in GBIF UDDI as thematic network
44A possible GISIN component architecture
User
( UDDI )
( UDDI )
Index
Index
Registry
Registry
Knowledge Portal(s)
Cache
Metadata
Cache of data and descriplets
Metadata
Institutions Providers Services
Institutions Providers Services
Accounting
Accounting
Publish availability
Species knowledge provider
Data/name provider
Multimedia
Provider Services
Provider Services
Texts
Resource
Resource
Resource
Resource
Metadata
Metadata
Metadata
Metadata
45GISIN portal(s)
- Portals integrate data, information, and
knowledge - Integration of GBIFs primary data has been
straightforward - Finding working models how to integrate
information and knowledge is the challenge for
Species Banks and GISIN
466.Conclusion
47What makes GBIF work
- Standards for data and protocols (and their
interaction via web services) - Control and ownership of data remains with
providers - Registry for advertisement of data
- Integration at portals
- GBIF is multi-purpose open-ended
cyber-infrastructure that enables taxonomists and
others to serve the society in new ways
48What can make GISIN work
- Build on what has made GBIF work, but do
recognise that... - GISIN has more targeted use cases than GBIF
- How to share and integrate species knowledge is
still not well known. Research and prototyping
needed. - Think GISIN largely as a Species Bank