Title: From Field to Federation:
1From Field to Federation Challenges in
Biodiversity Informatics
Portal
Provider
Provider
Provider
Public Database
Public Database
Public Database
Collection Database
Collection Database
Collection Database
John Wieczorek Museum of Vertebrate
Zoology University of California, Berkeley
2Biodiversity Informatics
3Biodiversity Informatics First Use
1992, John Whiting Canadian Biodiversity
Informatics Consortium Canadian Museum of
Nature, GPS, GIS, RDBMS, environmental economics
4Biodiversity Informatics Definition
The creation, integration, analysis, and
understanding of information regarding biological
diversity.
5Biodiversity Informatics The Journal
Mammals of the World MaNIS as an example of data
integration in a distributed network
environment Uses and Requirements of Ecological
Niche Models and Related Distributional Models A
synecological framework for systematic
conservation planning Climate Change and
Biodiversity Some Considerations in Forecasting
Shifts in Species' Potential Distributions
6Biodiversity Informatics The Prize
Ebbe Nielsen Prize
offered by the Global Biodiversity Information
Facility (GBIF)
7Pieces of the Biodiversity Informatics Puzzle
Lane, M.A. 2003. Bulletin of the American Society
for Information Science and Technology.
8Biodiversity Informatics Challenges
Digitization Mobilization Applications Integrat
ion
9Biodiversity Informatics Solutions
Digitization Mobilization Applications Integrat
ion
10Work Programmes
ECAT
DIGIT
DADI
OCB
Digitization (DIGIT)
11Biodiversity Informatics Solutions
Digitization Mobilization Applications Integrat
ion
12Digitization
13Biodiversity Informatics Solutions
Digitization Mobilization Applications Integrat
ion
14Mobilization Standards
Information Standards
Darwin Core (DwC) ABCD
Communication Protocols
DiGIR BioCASE TAPIR
15Mobilization Information Standards
Darwin Core (DwC) and Extensions ABCD - Access
to Biological Collections Databases
16What data are being shared?
Who collector, researcher What taxonomy,
observations Where geography, locality When
date observed/collected
17Darwin Core (DwC)
A list of concepts (data fields) needed for
questions based on biodiversity occurrence data.
18Darwin Core (DwC)
Who?
GlobalUniqueIdentifier BasisOfRecord DateLastModif
ied ScientificName IdentifiedBy Collectors Collect
ingMethod Sex LifeStage Attributes IndividualCount
HigherGeography Locality DecimalLatitude DecimalLo
ngitude GeodeticDatum MaximumUncertainty Georefere
nceProtocol GeoreferenceSources EarliestDateCollec
ted LatestDateCollected InformationWithheld
19Darwin Core (DwC)
What?
GlobalUniqueIdentifier BasisOfRecord DateLastModif
ied ScientificName IdentifiedBy Collectors Collect
ingMethod Sex LifeStage Attributes IndividualCount
HigherGeography Locality DecimalLatitude DecimalLo
ngitude GeodeticDatum MaximumUncertainty Georefere
nceProtocol GeoreferenceSources EarliestDateCollec
ted LatestDateCollected InformationWithheld
20Darwin Core (DwC)
Where?
GlobalUniqueIdentifier BasisOfRecord DateLastModif
ied ScientificName IdentifiedBy Collectors Collect
ingMethod Sex LifeStage Attributes IndividualCount
HigherGeography Locality DecimalLatitude DecimalLo
ngitude GeodeticDatum MaximumUncertainty Georefere
nceProtocol GeoreferenceSources EarliestDateCollec
ted LatestDateCollected InformationWithheld
21Darwin Core (DwC)
When?
GlobalUniqueIdentifier BasisOfRecord DateLastModif
ied ScientificName IdentifiedBy Collectors Collect
ingMethod Sex LifeStage Attributes IndividualCount
HigherGeography Locality DecimalLatitude DecimalLo
ngitude GeodeticDatum MaximumUncertainty Georefere
nceProtocol GeoreferenceSources EarliestDateCollec
ted LatestDateCollected InformationWithheld
22Darwin Core (DwC)
Details
GlobalUniqueIdentifier BasisOfRecord DateLastModif
ied ScientificName IdentifiedBy Collectors Collect
ingMethod Sex LifeStage Attributes IndividualCount
HigherGeography Locality DecimalLatitude DecimalLo
ngitude GeodeticDatum MaximumUncertainty Georefere
nceProtocol GeoreferenceSources EarliestDateCollec
ted LatestDateCollected InformationWithheld
23Darwin Core (DwC)
Record Metadata
GlobalUniqueIdentifier BasisOfRecord DateLastModif
ied ScientificName IdentifiedBy Collectors Collect
ingMethod Sex LifeStage Attributes IndividualCount
HigherGeography Locality DecimalLatitude DecimalLo
ngitude GeodeticDatum MaximumUncertainty Georefere
nceProtocol GeoreferenceSources EarliestDateCollec
ted LatestDateCollected InformationWithheld
24Mobilization Communication Protocols
DiGIR Distributed Generic Information Retrieval
TAPIR TDWG Access Protocol for Information
Retrieval
25DiGIR and TAPIR
- protocols for retrieving structured data from
multiple, heterogeneous databases across the
Internet - reference implementation of both provider and
portal software
26How are data being shared?
Distributed Data Network
Provider Software
Provider Software
Provider Software
Public Database Local Copy
Public Database Hosted Copy
Occurrence Database
Occurrence Database
Occurrence Database
Local Public Repository
27How are data being shared?
Distributed Data Network
Provider Software
Provider Software
Provider Software
Public Database Local Copy
Public Database Hosted Copy
Occurrence Database
Occurrence Database
Occurrence Database
Local Direct Access
28How are data being shared?
Distributed Data Network
Provider Software
Provider Software
Provider Software
Public Database Local Copy
Public Database Hosted Copy
Occurrence Database
Occurrence Database
Occurrence Database
Hosted Public Repository
29Biodiversity Informatics Challenges
Digitization Mobilization Applications Integrat
ion
30Applications Services
Taxonomic uBio
Geospatial BioGeomancer
Niche Modeling openModeller
31BioGeomancer Geospatial Digitization
Data Validation
32Biodiversity Informatics Challenges
Digitization Mobilization Applications Integrat
ion
33Integration Species Pages
34AmphibiaWeb Species Descriptions
Literature, Media, etc.
Species Descriptive Data
35Integration Workflows
36Integration Collaborative Distributed Databases
37Case StudyA Network of Mammal Collections
38If you talk to a man in a language he
understands, that goes to his head. If you
talk to him in his own language, that goes to
his heart. Nelson Mandela
39Challenge Engender provider participation.
- Solution Gain the trust of collections.
- begin with collections that have no doubts
- lead by example
- preserve the integrity of the original data
- concede control to data providers
- promote the value of data providers
40Challenge Promote the importance of providers
- Solution Engage users, improve collections.
- preserve the identity of the data source
- accommodate the dynamic nature of data
- provide data validation tools
- provide data feedback mechanisms
41Challenge Maintain provider interest.
- Solution Increase value of collections
- through participation.
- increase visibility
- liberate resources
- enable providers to track data access
- provide tools for collection improvement
- provide tools to ask interesting questions
42Relatively well-known and stable taxonomy.
A Network of Mammal Collections
43Relatively well-known and stable taxonomy.
A Network of Mammal Collections
Willing community with a shared vision.
44Distributed Data Network Example
Distributed Data Network
Provider Software
Provider Software
Provider Software
Public Database Local Copy
Public Database Hosted Copy
Occurrence Database
Occurrence Database
Occurrence Database
45MaNIS Query
46MaNIS Query Summary
47MaNIS Result Map
48Global Coverage, Mammals, Specimens only
49MaNIS Data Accessibility, Georeferencing
30 institutions 2M specimens
50HerpNet Synonym Lookup
43 institutions 4.6M specimens
51ORNIS Data Validation
Data Validation
36 institutions 37M specimens and observations
52Museums Participating in MaNIS/HerpNET/ORNIS
(VertNet)
Multiple projects ORNIS HerpNET MaNIS
86 institutions 74 in North America, 12
others 70 available now on MHO for searching
53(No Transcript)
54Case Study Network of Madagascar Occurrences
55Madagascar, All Taxa, All Occurrences
56Global Objective
To serve up-to-date, validated biodiversity
occurrence data for the conservation community in
Madagascar.
57Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
58Automated Data Validation
Taxonomy Digital Taxonomic Thesaurus Geography
BioGeomancer
59Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
60Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
61Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
62Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
63Human Expert Review
Taxonomic Review Board Experts notified of new
records Experts flag suspect records
64Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
65Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
66Global Objective
To serve up-to-date, validated biodiversity
occurrence data for the conservation community in
Madagascar.
67Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
68REBIOMA Result Map
69Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
70Automated Modeling
Linked to environmental data Choice of model
algorithms Choice of scenarios Update models as
data change
71Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
72User Queries,Visualization
Heteropsis parva
2000
73Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
74Conservation Applications
Conservation Planning Protected Area
Design Planning for Climate Change Monitoring
Biodiversity Business Ecotourism Development
Plans Environmental Impact Statement Science Bi
ogeographic Analyses Identifying New Survey Areas
75(No Transcript)
76Web Portal
John Wieczorek Lead Developer, REBIOMA Museum of
Vertebrate Zoology University of California,
Berkeley
77REBIOMA Distribution Modeling
Literature, Media, etc.
Species Descriptive Data
78Global Objective
To serve up-to-date, validated biodiversity
occurrence data for the conservation community in
Madagascar.
79Global Objective
To serve up-to-date, validated biodiversity
occurrence data for the conservation community in
Madagascar. How?
80Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
81Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
82What data are being shared?
Who collector, researcher What taxonomy,
observations Where geography, locality When
date observed/collected
83Darwin Core (DwC)
A list of concepts (data fields) needed for
questions based on biodiversity occurrence data.
84Darwin Core (DwC)
Who?
GlobalUniqueIdentifier BasisOfRecord DateLastModif
ied ScientificName IdentifiedBy Collectors Collect
ingMethod Sex LifeStage Attributes IndividualCount
HigherGeography Locality DecimalLatitude DecimalLo
ngitude GeodeticDatum MaximumUncertainty Georefere
nceProtocol GeoreferenceSources EarliestDateCollec
ted LatestDateCollected InformationWithheld
85Darwin Core (DwC)
What?
GlobalUniqueIdentifier BasisOfRecord DateLastModif
ied ScientificName IdentifiedBy Collectors Collect
ingMethod Sex LifeStage Attributes IndividualCount
HigherGeography Locality DecimalLatitude DecimalLo
ngitude GeodeticDatum MaximumUncertainty Georefere
nceProtocol GeoreferenceSources EarliestDateCollec
ted LatestDateCollected InformationWithheld
86Darwin Core (DwC)
Where?
GlobalUniqueIdentifier BasisOfRecord DateLastModif
ied ScientificName IdentifiedBy Collectors Collect
ingMethod Sex LifeStage Attributes IndividualCount
HigherGeography Locality DecimalLatitude DecimalLo
ngitude GeodeticDatum MaximumUncertainty Georefere
nceProtocol GeoreferenceSources EarliestDateCollec
ted LatestDateCollected InformationWithheld
87Darwin Core (DwC)
When?
GlobalUniqueIdentifier BasisOfRecord DateLastModif
ied ScientificName IdentifiedBy Collectors Collect
ingMethod Sex LifeStage Attributes IndividualCount
HigherGeography Locality DecimalLatitude DecimalLo
ngitude GeodeticDatum MaximumUncertainty Georefere
nceProtocol GeoreferenceSources EarliestDateCollec
ted LatestDateCollected InformationWithheld
88Darwin Core (DwC)
Details
GlobalUniqueIdentifier BasisOfRecord DateLastModif
ied ScientificName IdentifiedBy Collectors Collect
ingMethod Sex LifeStage Attributes IndividualCount
HigherGeography Locality DecimalLatitude DecimalLo
ngitude GeodeticDatum MaximumUncertainty Georefere
nceProtocol GeoreferenceSources EarliestDateCollec
ted LatestDateCollected InformationWithheld
89Darwin Core (DwC)
Record Metadata
GlobalUniqueIdentifier BasisOfRecord DateLastModif
ied ScientificName IdentifiedBy Collectors Collect
ingMethod Sex LifeStage Attributes IndividualCount
HigherGeography Locality DecimalLatitude DecimalLo
ngitude GeodeticDatum MaximumUncertainty Georefere
nceProtocol GeoreferenceSources EarliestDateCollec
ted LatestDateCollected InformationWithheld
90Global Objective
To serve up-to-date, validated biodiversity
occurrence data for the conservation community in
Madagascar.
91Biodiversity Informatics Current status
Digitization progress, bottle
necks Mobilization have solutions Applications
more every day Integration issues
(persistence, performance, management, IPR)
92Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
93How are data being shared?
Distributed Data Network
Provider Software
Provider Software
Provider Software
Public Database Local Copy
Public Database Hosted Copy
Occurrence Database
Occurrence Database
Occurrence Database
Local Public Repository
94How are data being shared?
Distributed Data Network
Provider Software
Provider Software
Provider Software
Public Database Local Copy
Public Database Hosted Copy
Occurrence Database
Occurrence Database
Occurrence Database
Local Direct Access
95How are data being shared?
Distributed Data Network
Provider Software
Provider Software
Provider Software
Public Database Local Copy
Public Database Hosted Copy
Occurrence Database
Occurrence Database
Occurrence Database
Hosted Public Repository
96Distributed Data Network Example
Distributed Data Network
Provider Software
Provider Software
Provider Software
Public Database Local Copy
Public Database Hosted Copy
Occurrence Database
Occurrence Database
Occurrence Database
97MaNIS Query
98MaNIS Query Summary
99MaNIS Result Map
100Global Coverage, Mammals, Specimens only
101Madagascar, All Taxa, All Occurrences
102Global Objective
To serve up-to-date, validated biodiversity
occurrence data for the conservation community in
Madagascar.
103Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
104Automated Data Validation
Taxonomy Digital Taxonomic Thesaurus Geography
BioGeomancer
105Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
106Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
107Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
108Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
109Human Expert Review
Taxonomic Review Board Experts notified of new
records Experts flag suspect records
110Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
111Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
112Global Objective
To serve up-to-date, validated biodiversity
occurrence data for the conservation community in
Madagascar.
113Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
114REBIOMA Result Map
115Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
116Automated Modeling
Linked to environmental data Choice of model
algorithms Choice of scenarios Update models as
data change
117Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
118User Queries,Visualization
Heteropsis parva
2000
119Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
120Conservation Applications
Conservation Planning Protected Area
Design Planning for Climate Change Monitoring
Biodiversity Business Ecotourism Development
Plans Environmental Impact Statement Science Bi
ogeographic Analyses Identifying New Survey Areas
121(No Transcript)
122What you dont know can hurt you uncertainties
in georeferencing
John Wieczorek Museum of Vertebrate
Zoology University of California, Berkeley
123Uncertainties
- What comes out of a system depends on
- what goes into it
- what you ask of it
- what happens in between
124What species occur where?
Basis for conservation bio-prospecting entertain
ment survival?
125What species occur where?
species identification
126What species occur where?
occurrence location
127What species occur where?
occurrence location
Problem most original data are in textual form
Problem collection resources are scarce and
cant support large-scale digitization
128Scope of the georeferencing problem
2.5x109 records
6 records per locality
14 localities per hour
15,500 years
based on the MaNIS Project
129What species occur where?
What can Biodiversity Informatics do?
Taxonomic Resolution Services
130What species occur where?
What can Biodiversity Informatics do?
Taxonomic Resolution Services
Georeferencing Services
131What we haveLocalities we can read
ID
Species
Locality
1
Lynx rufus
Dawson Rd. N Whitehorse
2
Pudu puda
cerca de Valdivia
3
Canis lupus
20 mi NW Duluth
4
Felis concolor
Pichi Trafúl
5
Lama alpaca
near Cuzco
6
Panthera leo
San Diego Zoo
7
Sorex lyelli
Lyell Canyon, Yosemite
8
Orcinus orca
1 mi W San Juan Island
9
Ursus arctos
Bear Flat, Haines Junction
132What we wantLocalities we can map
133Integration Species Pages
134What is a georeference?
A numerical description of a place that can be
mapped.
135Davis, Yolo County, California
Coordinates 38.5463 -121.7425 Horizontal
Geodetic Datum NAD27
point method
136What is an acceptable georeference?
A numerical description of a place that can be
mappedand that describes the spatial extent of a
locality and its associated uncertainties.
137Sources of uncertainty
- 1) Map inaccuracy
- 2) Extent of the reference
- 3) Coordinate imprecision
- 4) Undocumented datum
- 5) Distance imprecision
- 6) Direction imprecision
138Davis, Yolo County, California
Coordinates 38.5486 -121.7542 38.5450
-121.7394 Horizontal Geodetic Datum NAD27
bounding-box method
139Davis, Yolo County, California
Coordinates 38.5468 -121.7469 Horizontal
Geodetic Datum NAD27 Maximum Uncertainty 8325 m
point-radius method
140What is an ideal georeference?
A numerical description of a place that can be
mappedand that describes the spatial extent of a
locality and its associated uncertaintiesas
well as possible.
141Davis, Yolo County, California
shape method
14220 mi E Hayfork, California
probability method
143Method Comparison
point
easy to produce no data quality
bounding-box
simple spatial queries difficult quality
assessment
point-radius
easy quality assessment difficult spatial queries
shape
accurate representation complex, uniform
probability
accurate representation complex, non-uniform
144Point-radius Method
Global Biodiversity Information Facility (GBIF)
145Manual Georeferencing Tools
146Semi-automated Georeferencing Tools
147(a)
(b)
(d)
(c)
Rowe, 2005. Elevational gradient analysis
ofhistorical museum specimens a cautionary tale
148Rowe, 2005. Elevational gradient analysis
ofhistorical museum specimens a cautionary tale
149What species occur where?
- Conclusions
- We can help users find relevant records
- 2) We can help users assess data quality and
fitness for use - 3) In the end, users must exercise due diligence.
Without 1) and 2), they cant.