From Field to Federation: - PowerPoint PPT Presentation

1 / 149
About This Presentation
Title:

From Field to Federation:

Description:

From Field to Federation: – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 150
Provided by: Eileen127
Category:
Tags: federation | field | vug

less

Transcript and Presenter's Notes

Title: From Field to Federation:


1
From Field to Federation Challenges in
Biodiversity Informatics
Portal
Provider
Provider
Provider
Public Database
Public Database
Public Database
Collection Database
Collection Database
Collection Database
John Wieczorek Museum of Vertebrate
Zoology University of California, Berkeley
2
Biodiversity Informatics
3
Biodiversity Informatics First Use
1992, John Whiting Canadian Biodiversity
Informatics Consortium Canadian Museum of
Nature, GPS, GIS, RDBMS, environmental economics
4
Biodiversity Informatics Definition
The creation, integration, analysis, and
understanding of information regarding biological
diversity.
5
Biodiversity Informatics The Journal
Mammals of the World MaNIS as an example of data
integration in a distributed network
environment Uses and Requirements of Ecological
Niche Models and Related Distributional Models A
synecological framework for systematic
conservation planning Climate Change and
Biodiversity Some Considerations in Forecasting
Shifts in Species' Potential Distributions
6
Biodiversity Informatics The Prize
Ebbe Nielsen Prize
offered by the Global Biodiversity Information
Facility (GBIF)
7
Pieces of the Biodiversity Informatics Puzzle
Lane, M.A. 2003. Bulletin of the American Society
for Information Science and Technology.
8
Biodiversity Informatics Challenges
Digitization Mobilization Applications Integrat
ion
9
Biodiversity Informatics Solutions
Digitization Mobilization Applications Integrat
ion
10
Work Programmes
ECAT
DIGIT
DADI
OCB
Digitization (DIGIT)
11
Biodiversity Informatics Solutions
Digitization Mobilization Applications Integrat
ion
12
Digitization
13
Biodiversity Informatics Solutions
Digitization Mobilization Applications Integrat
ion
14
Mobilization Standards
Information Standards
Darwin Core (DwC) ABCD
Communication Protocols
DiGIR BioCASE TAPIR
15
Mobilization Information Standards
Darwin Core (DwC) and Extensions ABCD - Access
to Biological Collections Databases
16
What data are being shared?
Who collector, researcher What taxonomy,
observations Where geography, locality When
date observed/collected
17
Darwin Core (DwC)
A list of concepts (data fields) needed for
questions based on biodiversity occurrence data.
18
Darwin Core (DwC)
Who?
GlobalUniqueIdentifier BasisOfRecord DateLastModif
ied ScientificName IdentifiedBy Collectors Collect
ingMethod Sex LifeStage Attributes IndividualCount
HigherGeography Locality DecimalLatitude DecimalLo
ngitude GeodeticDatum MaximumUncertainty Georefere
nceProtocol GeoreferenceSources EarliestDateCollec
ted LatestDateCollected InformationWithheld
19
Darwin Core (DwC)
What?
GlobalUniqueIdentifier BasisOfRecord DateLastModif
ied ScientificName IdentifiedBy Collectors Collect
ingMethod Sex LifeStage Attributes IndividualCount
HigherGeography Locality DecimalLatitude DecimalLo
ngitude GeodeticDatum MaximumUncertainty Georefere
nceProtocol GeoreferenceSources EarliestDateCollec
ted LatestDateCollected InformationWithheld
20
Darwin Core (DwC)
Where?
GlobalUniqueIdentifier BasisOfRecord DateLastModif
ied ScientificName IdentifiedBy Collectors Collect
ingMethod Sex LifeStage Attributes IndividualCount
HigherGeography Locality DecimalLatitude DecimalLo
ngitude GeodeticDatum MaximumUncertainty Georefere
nceProtocol GeoreferenceSources EarliestDateCollec
ted LatestDateCollected InformationWithheld
21
Darwin Core (DwC)
When?
GlobalUniqueIdentifier BasisOfRecord DateLastModif
ied ScientificName IdentifiedBy Collectors Collect
ingMethod Sex LifeStage Attributes IndividualCount
HigherGeography Locality DecimalLatitude DecimalLo
ngitude GeodeticDatum MaximumUncertainty Georefere
nceProtocol GeoreferenceSources EarliestDateCollec
ted LatestDateCollected InformationWithheld
22
Darwin Core (DwC)
Details
GlobalUniqueIdentifier BasisOfRecord DateLastModif
ied ScientificName IdentifiedBy Collectors Collect
ingMethod Sex LifeStage Attributes IndividualCount
HigherGeography Locality DecimalLatitude DecimalLo
ngitude GeodeticDatum MaximumUncertainty Georefere
nceProtocol GeoreferenceSources EarliestDateCollec
ted LatestDateCollected InformationWithheld
23
Darwin Core (DwC)
Record Metadata
GlobalUniqueIdentifier BasisOfRecord DateLastModif
ied ScientificName IdentifiedBy Collectors Collect
ingMethod Sex LifeStage Attributes IndividualCount
HigherGeography Locality DecimalLatitude DecimalLo
ngitude GeodeticDatum MaximumUncertainty Georefere
nceProtocol GeoreferenceSources EarliestDateCollec
ted LatestDateCollected InformationWithheld
24
Mobilization Communication Protocols
DiGIR Distributed Generic Information Retrieval
TAPIR TDWG Access Protocol for Information
Retrieval
25
DiGIR and TAPIR
  • protocols for retrieving structured data from
    multiple, heterogeneous databases across the
    Internet
  • reference implementation of both provider and
    portal software

26
How are data being shared?
Distributed Data Network
Provider Software
Provider Software
Provider Software
Public Database Local Copy
Public Database Hosted Copy
Occurrence Database
Occurrence Database
Occurrence Database
Local Public Repository
27
How are data being shared?
Distributed Data Network
Provider Software
Provider Software
Provider Software
Public Database Local Copy
Public Database Hosted Copy
Occurrence Database
Occurrence Database
Occurrence Database
Local Direct Access
28
How are data being shared?
Distributed Data Network
Provider Software
Provider Software
Provider Software
Public Database Local Copy
Public Database Hosted Copy
Occurrence Database
Occurrence Database
Occurrence Database
Hosted Public Repository
29
Biodiversity Informatics Challenges
Digitization Mobilization Applications Integrat
ion
30
Applications Services
Taxonomic uBio
Geospatial BioGeomancer
Niche Modeling openModeller
31
BioGeomancer Geospatial Digitization
Data Validation
32
Biodiversity Informatics Challenges
Digitization Mobilization Applications Integrat
ion
33
Integration Species Pages
34
AmphibiaWeb Species Descriptions
Literature, Media, etc.
Species Descriptive Data
35
Integration Workflows
36
Integration Collaborative Distributed Databases
37
Case StudyA Network of Mammal Collections
38
If you talk to a man in a language he
understands, that goes to his head. If you
talk to him in his own language, that goes to
his heart. Nelson Mandela
39
Challenge Engender provider participation.
  • Solution Gain the trust of collections.
  • begin with collections that have no doubts
  • lead by example
  • preserve the integrity of the original data
  • concede control to data providers
  • promote the value of data providers

40
Challenge Promote the importance of providers
  • Solution Engage users, improve collections.
  • preserve the identity of the data source
  • accommodate the dynamic nature of data
  • provide data validation tools
  • provide data feedback mechanisms

41
Challenge Maintain provider interest.
  • Solution Increase value of collections
  • through participation.
  • increase visibility
  • liberate resources
  • enable providers to track data access
  • provide tools for collection improvement
  • provide tools to ask interesting questions

42
Relatively well-known and stable taxonomy.
A Network of Mammal Collections
43
Relatively well-known and stable taxonomy.
A Network of Mammal Collections
Willing community with a shared vision.
44
Distributed Data Network Example
Distributed Data Network
Provider Software
Provider Software
Provider Software
Public Database Local Copy
Public Database Hosted Copy
Occurrence Database
Occurrence Database
Occurrence Database
45
MaNIS Query
46
MaNIS Query Summary
47
MaNIS Result Map
48
Global Coverage, Mammals, Specimens only
49
MaNIS Data Accessibility, Georeferencing
30 institutions 2M specimens
50
HerpNet Synonym Lookup
43 institutions 4.6M specimens
51
ORNIS Data Validation
Data Validation
36 institutions 37M specimens and observations
52
Museums Participating in MaNIS/HerpNET/ORNIS
(VertNet)
Multiple projects ORNIS HerpNET MaNIS
86 institutions 74 in North America, 12
others 70 available now on MHO for searching
53
(No Transcript)
54
Case Study Network of Madagascar Occurrences
55
Madagascar, All Taxa, All Occurrences
56
Global Objective
To serve up-to-date, validated biodiversity
occurrence data for the conservation community in
Madagascar.
57
Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
58
Automated Data Validation
Taxonomy Digital Taxonomic Thesaurus Geography
BioGeomancer
59
Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
60
Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
61
Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
62
Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
63
Human Expert Review
Taxonomic Review Board Experts notified of new
records Experts flag suspect records
64
Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
65
Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
66
Global Objective
To serve up-to-date, validated biodiversity
occurrence data for the conservation community in
Madagascar.
67
Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
68
REBIOMA Result Map
69
Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
70
Automated Modeling
Linked to environmental data Choice of model
algorithms Choice of scenarios Update models as
data change
71
Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
72
User Queries,Visualization
Heteropsis parva
2000
73
Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
74
Conservation Applications
Conservation Planning Protected Area
Design Planning for Climate Change Monitoring
Biodiversity Business Ecotourism Development
Plans Environmental Impact Statement Science Bi
ogeographic Analyses Identifying New Survey Areas
75
(No Transcript)
76
Web Portal
John Wieczorek Lead Developer, REBIOMA Museum of
Vertebrate Zoology University of California,
Berkeley
77
REBIOMA Distribution Modeling
Literature, Media, etc.
Species Descriptive Data
78
Global Objective
To serve up-to-date, validated biodiversity
occurrence data for the conservation community in
Madagascar.
79
Global Objective
To serve up-to-date, validated biodiversity
occurrence data for the conservation community in
Madagascar. How?
80
Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
81
Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
82
What data are being shared?
Who collector, researcher What taxonomy,
observations Where geography, locality When
date observed/collected
83
Darwin Core (DwC)
A list of concepts (data fields) needed for
questions based on biodiversity occurrence data.
84
Darwin Core (DwC)
Who?
GlobalUniqueIdentifier BasisOfRecord DateLastModif
ied ScientificName IdentifiedBy Collectors Collect
ingMethod Sex LifeStage Attributes IndividualCount
HigherGeography Locality DecimalLatitude DecimalLo
ngitude GeodeticDatum MaximumUncertainty Georefere
nceProtocol GeoreferenceSources EarliestDateCollec
ted LatestDateCollected InformationWithheld
85
Darwin Core (DwC)
What?
GlobalUniqueIdentifier BasisOfRecord DateLastModif
ied ScientificName IdentifiedBy Collectors Collect
ingMethod Sex LifeStage Attributes IndividualCount
HigherGeography Locality DecimalLatitude DecimalLo
ngitude GeodeticDatum MaximumUncertainty Georefere
nceProtocol GeoreferenceSources EarliestDateCollec
ted LatestDateCollected InformationWithheld
86
Darwin Core (DwC)
Where?
GlobalUniqueIdentifier BasisOfRecord DateLastModif
ied ScientificName IdentifiedBy Collectors Collect
ingMethod Sex LifeStage Attributes IndividualCount
HigherGeography Locality DecimalLatitude DecimalLo
ngitude GeodeticDatum MaximumUncertainty Georefere
nceProtocol GeoreferenceSources EarliestDateCollec
ted LatestDateCollected InformationWithheld
87
Darwin Core (DwC)
When?
GlobalUniqueIdentifier BasisOfRecord DateLastModif
ied ScientificName IdentifiedBy Collectors Collect
ingMethod Sex LifeStage Attributes IndividualCount
HigherGeography Locality DecimalLatitude DecimalLo
ngitude GeodeticDatum MaximumUncertainty Georefere
nceProtocol GeoreferenceSources EarliestDateCollec
ted LatestDateCollected InformationWithheld
88
Darwin Core (DwC)
Details
GlobalUniqueIdentifier BasisOfRecord DateLastModif
ied ScientificName IdentifiedBy Collectors Collect
ingMethod Sex LifeStage Attributes IndividualCount
HigherGeography Locality DecimalLatitude DecimalLo
ngitude GeodeticDatum MaximumUncertainty Georefere
nceProtocol GeoreferenceSources EarliestDateCollec
ted LatestDateCollected InformationWithheld
89
Darwin Core (DwC)
Record Metadata
GlobalUniqueIdentifier BasisOfRecord DateLastModif
ied ScientificName IdentifiedBy Collectors Collect
ingMethod Sex LifeStage Attributes IndividualCount
HigherGeography Locality DecimalLatitude DecimalLo
ngitude GeodeticDatum MaximumUncertainty Georefere
nceProtocol GeoreferenceSources EarliestDateCollec
ted LatestDateCollected InformationWithheld
90
Global Objective
To serve up-to-date, validated biodiversity
occurrence data for the conservation community in
Madagascar.
91
Biodiversity Informatics Current status
Digitization progress, bottle
necks Mobilization have solutions Applications
more every day Integration issues
(persistence, performance, management, IPR)
92
Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
93
How are data being shared?
Distributed Data Network
Provider Software
Provider Software
Provider Software
Public Database Local Copy
Public Database Hosted Copy
Occurrence Database
Occurrence Database
Occurrence Database
Local Public Repository
94
How are data being shared?
Distributed Data Network
Provider Software
Provider Software
Provider Software
Public Database Local Copy
Public Database Hosted Copy
Occurrence Database
Occurrence Database
Occurrence Database
Local Direct Access
95
How are data being shared?
Distributed Data Network
Provider Software
Provider Software
Provider Software
Public Database Local Copy
Public Database Hosted Copy
Occurrence Database
Occurrence Database
Occurrence Database
Hosted Public Repository
96
Distributed Data Network Example
Distributed Data Network
Provider Software
Provider Software
Provider Software
Public Database Local Copy
Public Database Hosted Copy
Occurrence Database
Occurrence Database
Occurrence Database
97
MaNIS Query
98
MaNIS Query Summary
99
MaNIS Result Map
100
Global Coverage, Mammals, Specimens only
101
Madagascar, All Taxa, All Occurrences
102
Global Objective
To serve up-to-date, validated biodiversity
occurrence data for the conservation community in
Madagascar.
103
Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
104
Automated Data Validation
Taxonomy Digital Taxonomic Thesaurus Geography
BioGeomancer
105
Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
106
Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
107
Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
108
Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
109
Human Expert Review
Taxonomic Review Board Experts notified of new
records Experts flag suspect records
110
Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
111
Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
112
Global Objective
To serve up-to-date, validated biodiversity
occurrence data for the conservation community in
Madagascar.
113
Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
114
REBIOMA Result Map
115
Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
116
Automated Modeling
Linked to environmental data Choice of model
algorithms Choice of scenarios Update models as
data change
117
Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
118
User Queries,Visualization
Heteropsis parva
2000
119
Conservation Applications
Models Database
User Queries, Visualization
Automated Modeling
Validated Occurrence Data
Human Expert Review
Automated Validation
Distributed Data Network
Occurrence Data
Occurrence Data
Occurrence Data
120
Conservation Applications
Conservation Planning Protected Area
Design Planning for Climate Change Monitoring
Biodiversity Business Ecotourism Development
Plans Environmental Impact Statement Science Bi
ogeographic Analyses Identifying New Survey Areas
121
(No Transcript)
122
What you dont know can hurt you uncertainties
in georeferencing
John Wieczorek Museum of Vertebrate
Zoology University of California, Berkeley
123
Uncertainties
  • What comes out of a system depends on
  • what goes into it
  • what you ask of it
  • what happens in between

124
What species occur where?
Basis for conservation bio-prospecting entertain
ment survival?
125
What species occur where?
species identification
126
What species occur where?
occurrence location
127
What species occur where?
occurrence location
Problem most original data are in textual form
Problem collection resources are scarce and
cant support large-scale digitization
128
Scope of the georeferencing problem
2.5x109 records
6 records per locality
14 localities per hour
15,500 years
based on the MaNIS Project
129
What species occur where?
What can Biodiversity Informatics do?
Taxonomic Resolution Services
130
What species occur where?
What can Biodiversity Informatics do?
Taxonomic Resolution Services
Georeferencing Services
131
What we haveLocalities we can read
ID
Species
Locality
1
Lynx rufus
Dawson Rd. N Whitehorse
2
Pudu puda
cerca de Valdivia
3
Canis lupus
20 mi NW Duluth
4
Felis concolor
Pichi Trafúl
5
Lama alpaca
near Cuzco
6
Panthera leo
San Diego Zoo
7
Sorex lyelli
Lyell Canyon, Yosemite
8
Orcinus orca
1 mi W San Juan Island
9
Ursus arctos
Bear Flat, Haines Junction
132
What we wantLocalities we can map
133
Integration Species Pages
134
What is a georeference?
A numerical description of a place that can be
mapped.
135
Davis, Yolo County, California
Coordinates 38.5463 -121.7425 Horizontal
Geodetic Datum NAD27
point method
136
What is an acceptable georeference?
A numerical description of a place that can be
mappedand that describes the spatial extent of a
locality and its associated uncertainties.
137
Sources of uncertainty
  • 1) Map inaccuracy
  • 2) Extent of the reference
  • 3) Coordinate imprecision
  • 4) Undocumented datum
  • 5) Distance imprecision
  • 6) Direction imprecision


138
Davis, Yolo County, California
Coordinates 38.5486 -121.7542 38.5450
-121.7394 Horizontal Geodetic Datum NAD27
bounding-box method
139
Davis, Yolo County, California
Coordinates 38.5468 -121.7469 Horizontal
Geodetic Datum NAD27 Maximum Uncertainty 8325 m
point-radius method
140
What is an ideal georeference?
A numerical description of a place that can be
mappedand that describes the spatial extent of a
locality and its associated uncertaintiesas
well as possible.
141
Davis, Yolo County, California
shape method
142
20 mi E Hayfork, California
probability method
143
Method Comparison
point
easy to produce no data quality
bounding-box
simple spatial queries difficult quality
assessment
point-radius
easy quality assessment difficult spatial queries
shape
accurate representation complex, uniform
probability
accurate representation complex, non-uniform
144
Point-radius Method
Global Biodiversity Information Facility (GBIF)
145
Manual Georeferencing Tools

146
Semi-automated Georeferencing Tools

147
(a)
(b)
(d)
(c)
Rowe, 2005. Elevational gradient analysis
ofhistorical museum specimens a cautionary tale
148
Rowe, 2005. Elevational gradient analysis
ofhistorical museum specimens a cautionary tale
149
What species occur where?
  • Conclusions
  • We can help users find relevant records
  • 2) We can help users assess data quality and
    fitness for use
  • 3) In the end, users must exercise due diligence.
    Without 1) and 2), they cant.
Write a Comment
User Comments (0)
About PowerShow.com