Title: From Field to Federation:
1From Field to Federation Challenges in
Biodiversity Informatics
Portal
Provider
Provider
Provider
Public Database
Public Database
Public Database
Collection Database
Collection Database
Collection Database
John Wieczorek Museum of Vertebrate
Zoology University of California, Berkeley
2Biodiversity Informatics
3Biodiversity Informatics First Use
1992, John Whiting Canadian Biodiversity
Informatics Consortium Canadian Museum of
Nature, GPS, GIS, RDBMS, environmental economics
4Biodiversity Informatics Definition
The creation, integration, analysis, and
understanding of information regarding biological
diversity.
5Biodiversity Informatics The Journal
Mammals of the World MaNIS as an example of data
integration in a distributed network
environment Uses and Requirements of Ecological
Niche Models and Related Distributional Models A
synecological framework for systematic
conservation planning Climate Change and
Biodiversity Some Considerations in Forecasting
Shifts in Species' Potential Distributions
6Biodiversity Informatics The Prize
Ebbe Nielsen Prize
offered by the Global Biodiversity Information
Facility (GBIF)
7Pieces of the Biodiversity Informatics Puzzle
Lane, M.A. 2003. Bulletin of the American Society
for Information Science and Technology.
8Biodiversity Informatics Challenges
Digitization Mobilization Applications Integrat
ion
9Biodiversity Informatics Solutions
Digitization Mobilization Applications Integrat
ion
10Work Programmes
ECAT
DIGIT
DADI
OCB
Digitization (DIGIT)
11Biodiversity Informatics Solutions
Digitization Mobilization Applications Integrat
ion
12Digitization
13Biodiversity Informatics Solutions
Digitization Mobilization Applications Integrat
ion
14Mobilization Standards
Information Standards
Darwin Core (DwC) ABCD
Communication Protocols
DiGIR BioCASE TAPIR
15Mobilization Information Standards
Darwin Core (DwC) and Extensions ABCD - Access
to Biological Collections Databases
16What data are being shared?
Who collector, researcher What taxonomy,
observations Where geography, locality When
date observed/collected
17Mobilization Communication Protocols
DiGIR Distributed Generic Information Retrieval
TAPIR TDWG Access Protocol for Information
Retrieval
18Darwin Core (DwC)
A list of concepts (data fields) needed for
questions based on biodiversity occurrence data.
19DiGIR and TAPIR
- protocols for retrieving structured data from
multiple, heterogeneous databases across the
Internet - reference implementation of both provider and
portal software
20A Simple DistributedDatabase Network
http//www.tdwg.org/activities/tapir/
21How are data being shared?
Distributed Data Network
Provider Software
Provider Software
Provider Software
Public Database Local Copy
Public Database Hosted Copy
Occurrence Database
Occurrence Database
Occurrence Database
Local Public Repository
22How are data being shared?
Distributed Data Network
Provider Software
Provider Software
Provider Software
Public Database Local Copy
Public Database Hosted Copy
Occurrence Database
Occurrence Database
Occurrence Database
Local Direct Access
23How are data being shared?
Distributed Data Network
Provider Software
Provider Software
Provider Software
Public Database Local Copy
Public Database Hosted Copy
Occurrence Database
Occurrence Database
Occurrence Database
Hosted Public Repository
24Biodiversity Informatics Challenges
Digitization Mobilization Applications Integrat
ion
25Applications Services
Taxonomic uBio
Geospatial BioGeomancer
Niche Modeling openModeller
Conservation ReBioMa
26BioGeomancer Geospatial Digitization
Data Validation
27Biodiversity Informatics Challenges
Digitization Mobilization Applications Integrat
ion
28Integration Species Pages
29AmphibiaWeb Species Descriptions
Literature, Media, etc.
Species Descriptive Data
30Integration Workflows
31Integration Collaborative Distributed Databases
32Case StudyA Network of Mammal Collections
33If you talk to a man in a language he
understands, that goes to his head. If you
talk to him in his own language, that goes to
his heart. Nelson Mandela
34Challenge Engender provider participation.
- Solution Gain the trust of collections.
- begin with collections that have no doubts
- lead by example
- preserve the integrity of the original data
- concede control to data providers
- promote the value of data providers
35Challenge Promote the importance of providers
- Solution Engage users, improve collections.
- preserve the identity of the data source
- accommodate the dynamic nature of data
- provide data validation tools
- provide data feedback mechanisms
36Challenge Maintain provider interest.
- Solution Increase value of collections
- through participation.
- increase visibility
- liberate resources
- enable providers to track data access
- provide tools for collection improvement
- provide tools to ask interesting questions
37Relatively well-known and stable taxonomy.
A Network of Mammal Collections
38Relatively well-known and stable taxonomy.
A Network of Mammal Collections
Willing community with a shared vision.
39Distributed Data Network Example
Distributed Data Network
Provider Software
Provider Software
Provider Software
Public Database Local Copy
Public Database Hosted Copy
Occurrence Database
Occurrence Database
Occurrence Database
40MaNIS Query
41MaNIS Query Summary
42MaNIS Result Map
43Global Coverage, Mammals, Specimens only
44MaNIS Data Accessibility, Georeferencing
30 institutions 2M specimens
45HerpNet Synonym Lookup
43 institutions 4.6M specimens
46ORNIS Data Validation
Data Validation
36 institutions 37M specimens and observations
47Museums Participating in MaNIS/HerpNET/ORNIS
(VertNet)
Multiple projects ORNIS HerpNET MaNIS
86 institutions 74 in North America, 12
others 70 available now on MHO for searching
48(No Transcript)
49Biodiversity Informatics Current status
Digitization progress, bottle
necks Mobilization have solutions Applications
more every day Integration issues
(persistence, performance, management, IPR)
50(No Transcript)
51Web Portal
John Wieczorek Lead Developer, REBIOMA Museum of
Vertebrate Zoology University of California,
Berkeley
52Case Study Network of Madagascar Occurrences
53Global Objective
To serve up-to-date, validated biodiversity
occurrence data for the conservation community in
Madagascar.
54Madagascar, All Taxa, All Occurrences
55Global Objective
To serve up-to-date, validated biodiversity
occurrence data for the conservation community in
Madagascar.
56What data are being shared?
Who collector, observer What taxon,
observation Where collection, locality When
date observed/collected
57Darwin Core (DwC)
A list of concepts (data fields) needed for
questions based on biodiversity occurrence data.
58Darwin Core (DwC)
Who?
GlobalUniqueIdentifier BasisOfRecord DateLastModif
ied ScientificName IdentifiedBy Collectors Collect
ingMethod Sex LifeStage Attributes IndividualCount
HigherGeography Locality DecimalLatitude DecimalLo
ngitude GeodeticDatum MaximumUncertainty Georefere
nceProtocol GeoreferenceSources EarliestDateCollec
ted LatestDateCollected InformationWithheld
http//wiki.tdwg.org/DarwinCore/
59Darwin Core (DwC)
What?
GlobalUniqueIdentifier BasisOfRecord DateLastModif
ied ScientificName IdentifiedBy Collectors Collect
ingMethod Sex LifeStage Attributes IndividualCount
HigherGeography Locality DecimalLatitude DecimalLo
ngitude GeodeticDatum MaximumUncertainty Georefere
nceProtocol GeoreferenceSources EarliestDateCollec
ted LatestDateCollected InformationWithheld
http//wiki.tdwg.org/DarwinCore/
60Darwin Core (DwC)
Where?
GlobalUniqueIdentifier BasisOfRecord DateLastModif
ied ScientificName IdentifiedBy Collectors Collect
ingMethod Sex LifeStage Attributes IndividualCount
HigherGeography Locality DecimalLatitude DecimalLo
ngitude GeodeticDatum MaximumUncertainty Georefere
nceProtocol GeoreferenceSources EarliestDateCollec
ted LatestDateCollected InformationWithheld
http//wiki.tdwg.org/DarwinCore/
61Darwin Core (DwC)
When?
GlobalUniqueIdentifier BasisOfRecord DateLastModif
ied ScientificName IdentifiedBy Collectors Collect
ingMethod Sex LifeStage Attributes IndividualCount
HigherGeography Locality DecimalLatitude DecimalLo
ngitude GeodeticDatum MaximumUncertainty Georefere
nceProtocol GeoreferenceSources EarliestDateCollec
ted LatestDateCollected InformationWithheld
http//wiki.tdwg.org/DarwinCore/
62Darwin Core (DwC)
Details
GlobalUniqueIdentifier BasisOfRecord DateLastModif
ied ScientificName IdentifiedBy Collectors Collect
ingMethod Sex LifeStage Attributes IndividualCount
HigherGeography Locality DecimalLatitude DecimalLo
ngitude GeodeticDatum MaximumUncertainty Georefere
nceProtocol GeoreferenceSources EarliestDateCollec
ted LatestDateCollected InformationWithheld
http//wiki.tdwg.org/DarwinCore/
63Darwin Core (DwC)
Record Metadata
GlobalUniqueIdentifier BasisOfRecord DateLastModif
ied ScientificName IdentifiedBy Collectors Collect
ingMethod Sex LifeStage Attributes IndividualCount
HigherGeography Locality DecimalLatitude DecimalLo
ngitude GeodeticDatum MaximumUncertainty Georefere
nceProtocol GeoreferenceSources EarliestDateCollec
ted LatestDateCollected InformationWithheld
http//wiki.tdwg.org/DarwinCore/
64A Simple DistributedDatabase Network
http//www.tdwg.org/activities/tapir/
65Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
66Global Objective
To serve up-to-date, validated biodiversity
occurrence data for the conservation community in
Madagascar.
67Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
68Automated Data Validation
Taxonomy Digital Taxonomic Thesaurus Geography
BioGeomancer
69Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
70Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
71Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
72Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
73Human Expert Review
Taxonomic Review Board Experts notified of new
records Experts flag suspect records
74Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
75Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
76Global Objective
To serve up-to-date, validated biodiversity
occurrence data for the conservation community in
Madagascar.
77Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
78REBIOMA Result Map
79Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
80Automated Modeling
Linked to environmental data Choice of model
algorithms Choice of scenarios Update models as
data change
81Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
82User Queries,Visualization
Heteropsis parva
2000
83Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
84Conservation Applications
Conservation Planning Protected Area
Design Planning for Climate Change Monitoring
Biodiversity Business Ecotourism Development
Plans Environmental Impact Statement Science Bi
ogeographic Analyses Identifying New Survey Areas
85(No Transcript)