Title: The complexity of biodiversity knowledge
1The complexity of biodiversity knowledge
- Andrew C. Jones
- Cardiff University
- Andrew.C.Jones_at_cs.cardiff.ac.uk
- Malcolm Scoble
- The Natural History Museum
- M.Scoble_at_nhm.ac.uk
2Purpose of talk
- Malcolm Andrew are both investigators in
BiodiversityWorld (BDW) - There are many problems BDW doesnt solve yet
- and the funding runs out tomorrow!
- Well present
- BiodiversityWorld as a framework to support
biodiversity research - Other projects in which biodiversity informatics
problems have been addressed individually - Major challenge draw these disparate efforts
together
3Part 1(Andrew Jones)
4Why Biodiversity Informatics is hard
- Need to integrate data tools of different kinds
for interesting in silico analyses - Various computer science issues, e.g.
- Human-Computer Interaction
- Design of environments to support scientific
research - Interoperability
- Complexity heterogeneity of data
- Differences of scientific opinion
- Data quality problems
5The BiodiversityWorld project
- 3 year e-Science project funded by BBSRC
- Partners The University of Reading, Cardiff
University, The Natural History Museum,
Southampton University - Aim
- Build a Biodiversity Grid(Problem Solving
Environment to support Biodiversity research) - Support discovery use of arbitrary tools data
sources for interesting in silico experiments - Provide environment to get beyond the cutting
and pasting into Word documents approach to data
integration and analysis
6Example problems for BiodiversityWorld
- How should conservation efforts be concentrated?
- (example of Biodiversity Richness Conservation
Evaluation) - Where might a species be expected to occur, under
present or predicted climatic conditions? - (example of Bioclimatic Ecological Niche
Modelling) - How can geographical information assist in
selection among possible phylogenetic trees? - (example of Phylogenetic Analysis Palaeoclimate
Modelling)
7BiodiversityWorld architecture
User interface
Presentation
Workflow
enactment
Wrapped
Native
engine
resources
Biodiversity
-
Metadata
World
repositor
y
Resources
BGI API
BiodiversityWorld
-
GRID
Interface
(BGI)
The GRID
8(No Transcript)
9(No Transcript)
10Some problems not fully solved in BDW
- Flexible data access
- BGI designed to make BDW maintainable, but
currently assumes each resource has a predefined
set of operations - BioDA project investigated use of OGSA-DAI in BDW
- HCI issues
- A much more exploratory approach to workflow
construction might be appropriate? - Semantic interoperability data quality
- Metadata repository basic information only
- Only basic solution to species naming problems
(SPICE) - Other problems of descriptive terms, differences
of expert opinion, etc., remain to be addressed
11Complexity of biodiversity data a
multi-dimensional problem
- Same specimen might be described with differences
of - Terminology
- Opinion about identification
- Opinion about whether a particular feature is
present - Accuracy
- Experts may differ as to
- Circumscription associated with a given
scientific name - (So may not be describing the same concept)
- Terminology used to describe a given taxon
- Accepted name for a species in a taxonomic
checklist - There may be errors!
- ...
12SPICE for Species 2000
- BBSRC/EPSRC- and EU-funded
- SPecies 2000 Interoperability Co-ordination
Environment - Aims
- build scalable, federated scientific name
catalogue organised by taxon (species, etc.) - provide synonymy server, enriching information
retrieval - Issue how to build an architecture to integrate
specialist, heterogeneous databases, providing a
consistent federated view of broader scope? - Common Data Model sufficed
- data requirements of federation identical for
each database - small set of canned queries adequate for the
catalogue
13SPICE internal architecture
14LITCHI
- BBSRC/EPSRC- and EU- funded
- Logic-based Integration of Taxonomic Conflicts in
Heterogeneous Information systems - Aim detect conflicts between species checklists
and either - Assist in producing a consistent checklist, or
- Generate correspondences between checklists
(cross-map) - Addressing problems of species classification
naming variations when accessing species-related
data - More general, semantic interoperability issue
- detecting conflicts between different expert
views of same subject matter - supporting data access based on these views
15LITCHI example
- Checklist 1
- Caragana arborescens Lam. (accepted name)
- Caragana sibirica Medikus (synonym)
- Checklist 2
- Caragana sibirica Medikus (accepted name)
- Caragana arborescens Lam. (synonym)
- (Lam. Lamark)
A full name which is not a pro-parte name may
not appear as both an accepted name and a synonym
in the same checklist
16Name relationships (LITCHI 2)
17myViews
- Not funded yet limited proof-of-concept
prototype only - Addresses problem that an expert may wish to
generate taxon descriptions which are - Coherent
- Mapped explicitly to other taxon descriptions,
and - Based directly on existing documentation
(monographs, etc), rather than completely
re-coded in some restrictive formalism with a new
vocabulary
18Example describing the same things?
- Description A
- Sarothamnus scoparius (L.) Wimm. ex Koch.
- Broom
- ... a bush which is 50-200 cm high ...
- Description B
- Cytisus scoparius
- Yellow broom
- ... a small shrub up to 6ft or more ... native in
its yellow form ... - Description C
- Cytisus scoparius (L.) Link.
- Broom
- ... a deciduous shrub growing to 2.4m by 1m at a
fast rate ... scented flowers ... - Description D
- Common Broom
- Cytisus scoparius
- ... covered in profuse golden-yellow flowers ...
shrub about 1-3m tall ... - Description E
- Broom
- Cytisus scoparius
19Things we might want to do
- In a system where
- data is held in as raw a form as possible, to
avoid information loss, but - we can impose various views and hypotheses
- we might wish to
- Create our own view of the data
- For a given piece of knowledge, we could
- accept it unaltered
- accept but re-express in our terms (e.g.
different scientific name different units ...) - state it is equivalent to another piece of
knowledge(e.g. minor differences in
measurements) - flag it as wrong
- ...
- In relation to anothers view, we might
- include or ignore it
- declare some mapping applicable to a group of
items(e.g. every species of Sarothamnus is
mapped to Cytisus) - ...
- Reason with differing levels of precision
simultaneously (e.g. binary/continuous characters
derived from same features)
20An experimental prototype
- Proof of concept ...
- arbitrary, small data set from various sources
Cytisus Genista species - No real front end or back end yet!
- Implemented in Prolog (a logic programming
language) - Formalisms to record complex assertions their
sources - Ontological knowledge not currently separated out
explicitly rules perform inference - User makes his/her own assertions about (for
example) - synonymy
- which assertions of others to accept
- ...
- ... both very specific and more general rules
- Main purpose illustrate handling multiple
opinions/hypotheses
21Sample knowledge base extracts
- assertion(1, association(2, 3, absent(scent(flower
s)))). - assertion(1, property(2, yellow(flowers))).
- assertion(1, label(2, common('Broom'))).
- assertion(1, label(2,species('Cytisus',
'scoparius'))). - assertion(4, property(5, shrublet(whole))).
- assertion(4, property(5, deciduous(whole))).
- assertion(4, property(5, size(6, in, whole))).
- assertion(4, property(5, deep_yellow(flowers))).
- assertion(4, property(5, small(leaves))).
- assertion(4, label(5,species('Cytisus',
'ardoinii'))). - assertion(4, property(7, size(6, ft, whole))).
- assertion(4, label(7,species('Cytisus',
'scoparius'))). - assertion(12, label(13, common('Broom'))).
- assertion(12, label(13,common('Scotch Broom'))).
- assertion(12, property(13, compound('sparteine')))
.
- assertion(12, property(13, compound('tyramine'))).
- assertion(12, label(13,species('Sarothamnus',
'scoparius'))). - assertion(14, label(15,species('Sarothamnus',
'scoparius'))). - assertion(14, property(15,size_range(50, 200,
cm, whole))). - assertion(14, property(15, bright_yellow(flowers))
). - assertion(16, label(17,species('Cytisus',
'scoparius'))). - assertion(16, property(17,max_height(2.4, m,
whole))). - assertion(16, property(17,max_width(1, m,
whole))). - assertion(16, property(17, present(scent(flowers))
)). - assertion(8, property(9, golden_yellow(flowers))).
- assertion(8, property(9,size_range(1, 3, m,
whole))). - assertion(8, label(9,species('Cytisus',
'scoparius'))).
Source 12 asserts that item 13s label is
common name Scotch Broom
22Deducing from the knowledge base
- ?- display_accepted_props('Cytisus', 'ardoinii').
- shrublet(whole)
- deciduous(whole)
- size(6, in, whole)
- deep_yellow(flowers)
- small(leaves)
- Yes
- ?- display_accepted_props('Cytisus',
'scoparius'). - yellow(flowers)
- size(6, ft, whole)
- golden_yellow(flowers)
- size_range(1, 3, m, whole)
- max_height(2.4, m, whole)
- max_width(1, m, whole)
- present(scent(flowers))
- absent(spines)
- absent(scent(flowers))
23Adding synonymy (1)
- User regards any statement about a Sarathamnus
species as being a statement about a Cytisus
species with same epithet - assertion(20,synonym(species('Cytisus',
Epithet), _, species('Sarothamnus', Epithet),
_)). - (Could be more restrictive, e.g. apply to only
particular information sources)
24Adding synonymy (2)
- ?- display_accepted_props('Cytisus',
'scoparius'). - yellow(flowers)
- size(6, ft, whole)
- golden_yellow(flowers)
- size_range(1, 3, m, whole)
- max_height(2.4, m, whole)
- max_width(1, m, whole)
- present(scent(flowers))
- compound(sparteine)
- compound(tyramine)
- size_range(50, 200, cm, whole)
- bright_yellow(flowers)
- absent(spines)
- absent(scent(flowers))
- Yes
- ?- display_contradictions_for('Cytisus',
'scoparius'). - size_range(1, 3, m, whole), size_range(50, 200,
cm, whole) - present(scent(flowers)), absent(scent(flowers))
25Some important issues for future work
- Complexity, e.g.
- Trade-off effective resource discovery v.
computational expense of traversing rich ontology - Scalability of taxonomic conflict detection
- May find large data sets need clever techniques
such as Rete network - Scalability of inference in myViews caching
inferred information - Managing ranking large result sets
- How to rank resources discovered
- How to rank conflicts
- to present users with matches they are likely to
want - Joining all these fragmentary projects up together
26Part 2(Malcolm Scoble)
27The complexity of taxonomic/biodiversity data
Specimen (unit) data
Collection-level
Species/taxon concept
Locality
Species name
DNA barcodes
Species concepts
Observations
Date of description
Synonyms
Type specimen
Genus name (for binomial)
Date of specimen collection
Time of specimen collection
Images
Name of collector
Homonyms
Author of taxon
28Taxonomy from a fragmented to a distributed
resource
- Where we want to be
- Less fragmented single site or distributed
access - Easier to update
- Coordinated effort
- Electronic (or dual) medium
- Free access to data
- Taxonomy easier to use
- Where we are now
- Fragmented results
- Fragmented effort
- Largely a paper medium (restricted access)
29Projects to integrate biodiversity data
- BioCISE (collection-level)
- ENHSIN (specimen (unit)-level)
- BioCASE (unit- collection-level)
- Species 2000 (species nomenclature)
- SYNTHESYS (taxonomic infrastructure)
- ENBI (network of biodiversity information)
- EDIT (distributed approach to taxonomy)
- PBIs (inventorying the planets biodiversity)
- CATE Creating a Taxonomic e-Science
30BioCASE National Node Network
Collection-level
- 31 National Nodes
- Core Meta Database is updated every night
31 All levels
A Biological Collections Service for Europe
32(No Transcript)
33Creating a taxonomic e-science (CATE)
- Literature scattered over 250 years of paper
publications. - Data inaccessible other than to specialist users
- Aim to transfer in toto the taxonomy of two
groups of organisms to the web (Hawkmoths and
Aroids). - Broad aim to encourage migration of taxonomy to
the web. - Provide data for those studying biodiversity.
- Encourage quality control, peer-review and the
development of consensus taxonomies in the web
environment. - Develop means of citation for web-based revisions
Arisaema candidissimum Photo RBG Kew
The Hawkmoth Sphinx caligineus sinicus from
Beijing, China. Photo Tony Pittaway