Title: GLOBAL BIODIVERSITY
1GLOBALBIODIVERSITY
INFORMATIONFACILITY
The magnitude of biodiversity data and the
imperative role of biodiversity
informatics Converging Sciences Trento, Italy, 17
December 2004
WWW.GBIF.ORG
2And now for something different
- So far, this conference has largely focused on
molecular and genetic biology - Will now move to higher levels of complexity
- With strong connections to climate change and
environmental problems - I will emphasise the role of global
infrastructure as an enabler for biodiversity
analysis and understanding
3What is biodiversity?
- The Convention on Biological Diversity recognises
three levels of biodiversity - Genetic ( molecular) diversity
- Species ( organismal) diversity
- Ecosystem ( ecological) diversity
- There are important emergent properties at each
level - Biodiversity has come to be associated with the
species and ecosystem levels
4Fundamental properties of biodiversity at all
levels
- At all levels of organization, each biological
entity is unique - History (differentiation, development, phylogeny)
must be taken into account - Every biological entity has important contingent
relations to all other entities
5The Fundamental Role of Informatics in Biology
- Therefore the law of large numbers does not hold
in biology, since every living thing is genuinely
unique - Instead of calculus, the method for manipulating
information about statistically large numbers of
small, independent, equivalent things ... - Biology needs informatics, the method for
manipulating information about large numbers of
dependent, historically contingent, individual
things
6There is Lots of Biological Information to
Manipulate
- (factoids from Bob Robbins, Fred Hutchinson
Cancer Institute) - DNA is amazingly efficient at storing information
- Typed in 10-pitch font, the 3 billion base pairs
of the human genome would stretch gt 8000 km - Duplicating the information storage capacity of
all the DNA in the biosphere would need 1027
100-Gb hard disks - Which would fill a volume gt that of the Earth
7Information overload at the species and specimen
level
- 1.8 million known species on Earth
- And estimates of the total number of species
range from 10 million to 100 million - On average, each species has 5 scientific names
- Billions of specimens in the worlds natural
history museums - Which provide an invaluable historical and
contemporary record of biodiversity - One of the major roles of the Global Biodiversity
Information Facility (GBIF) is to liberate this
species and specimen data
8GBIFs major goals are to
- Make the worlds biodiversity data freely and
universally available via the Internet - Share primary scientific biodiversity data
- Especially georeferenced data
- ... Promote the development of biodiversity
informatics around the world - GBIF-UNESCO Chairs in Biodiversity Informatics
9What do we mean by primary biodiversity data?
- Label data on 1.5 - 3.0 billion specimens in
natural history collections, herbaria, botanical
gardens, etc.
- Associated notes, recordings, publications, etc.
- Observational data (e.g. bird banding data)
- These data have been amassed over 300 years
most not digital
10Other reasons for establishing GBIF
- to partner with other biodiversity
organisations - to establish and promote standards, protocols
and ontologies for biodiversity data - Create vigour and rigour
- to help deal with the unequal distribution of
biodiversity information
11Biodiversity and information about it are
unevenly distributed.
biodiversity hotspot
holder of large amounts of biodiversity data
12How does GBIF work ?
- Was established in 2001
- Is an open-ended network of Participants
(currently 70) - 42 countries/economies -- newest is Indonesia
- 28 international organisations
- E.g. Species 2000, UNEPs World Conservation
Monitoring Centre - Each Participant agrees to
- Share biodiversity data
- Set up a computer node(s) for accessing those
data - Data providers retain control of their data
- GBIF asserts no IPR over the data
13GBIF uses a web-services approach
- Uses open-source software
- Data providers make their data known through our
Registry of Shared Biodiversity data - www.gbif.net currently serving gt 45 million
specimen and observation records from gt 330
collections - Electronic catalogue of scientific names (ECAT)
will be available as an authority file to any
user - GenBank was one of the inspirations for GBIF, but
are clear differences
14GBIF plays a critical role in e-biodiversity
GBIF
Bioinformatics Molecular Informatics
Ecoinformatics
Biodiversity Informatics
15What can you do with georeferenced biodiversity
data
- 20 years ago, Mexicos CONABIO set out to make a
comprehensive database of Mexican plants and
animals - Gathered data from the worlds natural history
collections and herbaria - Now have an unparalleled database that can be
used in many ways, including - Predict effects of climate change
- Determine safe sites for field trials of
genetically modified organisms - Predict best places to set up new protected areas
- GBIF working to be a reverse global CONABIO
16Essence of Ecological Niche Modeling
Geographic Space
Ecological Space
ecological niche modeling
occurrence points on native distribution
Native range prediction
17Example Predicting the spread of an invasive
disease vector
- From Townsend Peterson, University of Kansas, and
his students - The vector the mosquito Aedes albopictus
- The disease dengue fever
- Original vector for dengue fever was eradicated
in US in 1970 - Invasion of A. albopictus has now brought the
disease back to the US
18Aedes albopictus
19Aedes albopictus in the USA
20Aedes albopictus US Invasion
21Aedes albopictus World Risk Map
Levine, R. R., and M. Q. Benedict. In preparation.
22Conclusions
- Species- and specimen-level biodiversity are a
distributed, largely non-digitised but valuable
resource - Large-scale megascience efforts are needed to
mobilise these data - Global approaches can produce infrastructures
(like GenBank and GBIF) that act as essential
underpinnings for whole new areas of science and
economy