Title: GLOBAL BIODIVERSITY
1GLOBALBIODIVERSITY
INFORMATIONFACILITY
Biodiversiteitsinformatie in Nederland woensdag
14 januari 2004
Larry Speers Global Biodiversity Information
Facility LSPEERS_at_GBIF.ORG
WWW.GBIF.ORG
2..there will be winners and there will be
losers The next century will be the Age of
Biology, just as this one has been the age of
physics and astronomy. Specifically, those
countries who best know how to correlate,
analyze, and communicate biological information
will be in the leading position to achieve
economic and scientific advances
Sir Robert May, Chief Scientist, U.K., July 1998
3What is GBIF ?
- A distributed megascience facility aimed at
- Making the worlds biodiversity data freely and
universally available via the Internet - Sharing primary scientific biodiversity data to
benefit society, science and a sustainable future
4MEGASCIENCE FORUM of the OECD (became Global
Science Forum after the GBIF recommendation was
adopted)
- Examples of Working Groups
- Neutron Sources
- Nuclear Physics
- Radio Astronomy
- Biological Informatics (19961999)
- Subgroup Biodiversity Informatics
- Subgroup Neuroinformatics
- Recommended that the Megascience Forum endorse
development of the Global Biodiversity
Information Facility
5When was GBIF started ?
- GBIF came into existence on 1 March 2001, when
the first 10 countries signed the Memorandum of
Understanding (MoU) and pledged a total of US2M
- The MoU resulted from the recommendations of an
international working group / steering committee - The group met several times between June 1996 and
December 2000, when the MoU was opened for
signature
6GBIF Mission
- ...making the worlds biodiversity data freely
and universally available via the Internet.
7GBIF Voting Participants 24
- Mexico
- Netherlands
- New Zealand
- Nicaragua
- Portugal
- Peru
- Slovenia
- South Africa
- Spain
- Sweden
- UK
- USA
- Australia
- Belgium
- Canada
- Costa Rica
- Denmark
- Estonia
- Finland
- France
- Germany
- Iceland
- Japan
- Republic of Korea
8GBIF Associate Participants 14 18
- ALL Species Foundation
- ASEANET
- BioNET
- BIOSIS
- CABI Bioscience
- EASIANET
- Expert Centre for Taxonomic Identification
- Inter-American Biodiversity Information Network
- Integrated Taxonomic Information System
- NatureServe
- Ocean Biogeographic Information System
- Société de Bactériologie Systématique et
Vétérinaire - Species 2000
- Taxonomic Databases Working Group
- UNESCO Man and the Biosphere Program
- UNEP (World Conservation Monitoring Centre)
- World Federation for Culture Collections
- Wildscreen Trust
- Argentina
- Austria
- Bulgaria
- Czech Republic
- Ghana
- Madagascar
- Morocco
- Pakistan
- Poland
- Slovak Republic
- Switzerland
- Taiwan
- Tanzania
- European Commission
9Why was GBIF established ?
- Demand for Biological Information
- Biotechnology, biodiversity, climate change,
environmental problems, invasive species, human
health, sustainable development
10Nature is so complex We know so little
11Why was GBIF established ?
- Demand for Biological Information
- Biotechnology, biodiversity, climate change,
environmental problems, invasive species, human
health, sustainable development - Bioinformatics
- Computing Power
- Moores Law
12- With 2500 desktop PCs now delivering more raw
computing power than the first Cray,
bioinformatics is rapidly becoming the critical
technology for the 21st century biology - R. Robbins, Fred Hutchinson Cancer Research
Center -
13Definition
- Biodiversity informatics is the application of
information technology to biodiversity with the
emphasis on persistent data stores.
Modified from R. Robbins, Fred Hutchinson Cancer
Research Center
14DNA
Proteins
Fundamental Dogma
Phenotypes
Populations
Species
Ecosystems
Abiotic Factors
Adapted from R. Robbins
15DNA
Proteins
Bioinformatics
Phenotypes
Populations
Species
Ecosystems
Abiotic Factors
Adapted from R. Robbins
16GenBank EMBL DDBJ
Map Databases
DNA
Bioinformatics Persistent Primary Data Stores
SwissPROT PIR
Proteins
PDB
Phenotypes
Populations
Species
Abiotic Factors
Ecosystems
Adapted from R. Robbins
17DNA
Proteins
Biodiversity Informatics
Phenotypes
Populations
Species
Ecosystems
Abiotic Factors
Adapted from R. Robbins
18DNA
Persistent Primary Data Stores
Proteins
Phenotypes
Populations
Literature Observational Databases
Species
Abiotic Factors
Ecosystems
Adapted from R. Robbins
19Biodiversity Informatics as a Megascience Activity
20Why was GBIF established ?
- Demand for Biological Information
- Biotechnology, biodiversity, climate change,
environmental problems, invasive species, human
health, sustainable development - Bioinformatics
- Computing Power
- Moores Law
- Electronic Connectivity
- Internet
- Distributed Information Systems
21(No Transcript)
22Where is GBIF located ?
- Unlike CERN, the megascience instrumentation
facility for particle physics that is located in
Switzerland, GBIF is a megascience facility that
is distributed all over the world, with its many
parts connected by the Internet
- The small, non-bureaucratic GBIF Secretariat is
hosted by the Zoological Museum of the University
of Copenhagen, Denmark
23(No Transcript)
24What does GBIF do ?
- In order to promote the sharing and use of
scientific biodiversity data by everyone, it
focuses on four areas of activity - Data Access and Database Interoperability (DADI)
- Electronic Catalog of Names of Known Organisms
(ECAT) - Outreach and Capacity Building (OCB)
- Digitisation of Natural History Collections
(DIGIT)
25How does GBIF work ?
- NODES Committee
- Comprises the managers of the Participant nodes
- Works with the Information and Communications
Technology (ICT) staff of the Secretariat to
develop the network of nodes - Participant nodes share software and ideas with
each other and with data providers - Secretariat ICT staff advise, coordinate and
provide software toolkits
26Network Structure
27GBIF Principles
- Equitable sharing of data
- Data providers retain control
- Protection of intellectual property rights
- Distributed network architecture
- Common standards and protocols
- Partnership with other networks
- Avoidance of duplication of effort
- Promotion of technical developments to deal with
complexity of biodiversity data
28The following is a simple classification of the
biodiversity data for which GBIF is responsible
- Taxonomic data, including
- Scientific names, including data on synonymy
- Vernacular names
- Taxonomic descriptions, including diagnostic keys
- Taxon occurrence information (primarily
species-level, but including data for taxa at
different ranks where appropriate) - Specimen records (from natural history
collections) - Observation records
- Links to other taxon-level information,
including - Information on taxon biology and life history
- Ecological interactions
- Genetic data
- Sound and image resources
29Characteristics of the Species Level Biodiversity
Data Domain-
- Data developers are numerous, specialized and
widely distributed - Government labs
- Universities
- Museums
- Private individuals
- Quality data critical to environmental decision
making - Legacy data extremely valuable
- Data are dynamic
- Legacy data continually being updated and
enhanced - New data continually being added
- Primary data has common core attributes
30Primary species occurrence core data includes but
is not limited to the following essential details
- Name of the taxon to which the organism has been
assigned - Location where the specimen was collected or the
observation made - Date on which the specimen was collected or the
observation made - Where the specimen or record is held and how to
access more information
31GBIF-DIGIT Mission
To facilitate the expansion of biodiversity
knowledge by having legacy and newly acquired
primary species occurrence data digitised and
dynamically accessible.
32What are GBIFs primary data ?
- Label data on 1.5 - 3.0 billion specimens in
natural history collections - Species level observational data sets
- Associated notes, recordings, metadata, etc.
- These data must be digitised in order to be
shared and fully utilised
GBIF-DIGIT
33- gt 2 billion specimens worldwide
34(No Transcript)
35(No Transcript)
36(No Transcript)
37(No Transcript)
38Natural History Collections Data
- Strengths
- Identification of specimens auditable
- Potential for DNA analysis
- Often long time series
- Broad taxonomic coverage
- Type specimens
- Weakness
- Presence only data
- Often poorly curated
- Locality data often lacks precision
- Seldom collected in a systematic way
- Often not in digital format
- Any one collection has limited taxonomic, spatial
and temporal coverage
39Observational Data Sets
- Strengths
- Often presence-absence data
- Often collected in a systematic way
- Usually precise locality information
- Usually in digital format
- Weakness
- Individual identifications NOT auditable
- Generally short time series
- Limited taxonomic coverage
- Any one data set has limited taxonomic, spatial
and temporal coverage
40(No Transcript)
41(No Transcript)
42(No Transcript)
43(No Transcript)
44(No Transcript)
45(No Transcript)
46(No Transcript)
47(No Transcript)
48(No Transcript)
49(No Transcript)
50TEX (University of Texas at Austin)
UADY (University of Yucatan)
ARIZ (University of Arizona)
CIDIIR (Center of Scientific Research of Durango)
51XAL (Institute of Ecology, Xalapa)
CAS (California Academy of Sciences)
CICY (Center of Scientific Research of Yucatan)
MEXU (National University of Mexico)
52The Virtual Herbarium of Mexico 700,000 registers
from 25 Herbaria In Mexico and the United States.
53- Taken collectively, the plant and animal
specimens in the U.S. museum collections provide
our most complete picture of the biological
diversity of the entire nation.
U.S Dept. of the Interior Electronic National
Museum Proposal
54Characteristics of a Megascience Effort
- Something that cannot be undertaken by only one
country - expense
- no one country has access to all the data
- Some components of the research can be done at
the national or regional levels, but some must be
truly global - Usually infrastructural in nature (e.g. CERN)
- Involves collaboration among many scientists and
others - The topic is hugely inclusive and affects many
disciplines
55- Interoperability must be perceived as the
sharing of information. - Eliminating Legal and Policy Barriers to
Interoperable Government Systems - Electronic
Commerce, Law, and Information Policy
Strategies Report June 1999
56- The value of data lies in their use.
- Bits of Power Issues in Global Access to
Scientific Data - National Academy Press 1997
57(No Transcript)
58Points to Distributions
Desktop Applications
Information Retrieval API
Server
Server
SpecimenDatabases
59Prediction Tools
Point Data
Distribution Predicted for Native Region
PredictionAlgorithm (GARP)
Distribution After Climate Change
Distribution Predicted In Non-native Region
60Why share data?
- Advantages of sharing core collection data for
individual curators - Increased use of collections
- Increased justification for funding, collection
development, staffing etc. (Use it or lose it) - Advantages of sharing core collection data for
individual biodiversity scientists - Making available high quality data for use by
others - Helps improve quality of data by making it
visible - Increased visibility and relevance of
biodiversity community will result in increased
funding - Advantages of sharing data for individual
institutions - Increase value of collections by increasing
access and use - Increased use will increase relavance and result
in increased funding - Decrease of staff time answering queries
61- Interoperability must be perceived as the
sharing of information. - Eliminating Legal and Policy Barriers to
Interoperable Government Systems - Electronic
Commerce, Law, and Information Policy
Strategies Report June 1999
62- The most profound barriers to interoperability
are the soft human technologies implied in
fundamental policy and organizational design. - Eliminating Legal and Policy Barriers to
Interoperable Government Systems - Electronic
Commerce, Law, and Information Policy
Strategies Report June 1999