The complexity of biodiversity knowledge - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

The complexity of biodiversity knowledge

Description:

scented flowers ... Description D: Common Broom ... assertion(1, association(2, 3, absent(scent(flowers) ... assertion(16, property(17, present(scent(flowers) ... – PowerPoint PPT presentation

Number of Views:127
Avg rating:3.0/5.0
Slides: 34
Provided by: cidocIc
Category:

less

Transcript and Presenter's Notes

Title: The complexity of biodiversity knowledge


1
The complexity of biodiversity knowledge
  • Andrew C. Jones
  • Cardiff University
  • Andrew.C.Jones_at_cs.cardiff.ac.uk
  • Malcolm Scoble
  • The Natural History Museum
  • M.Scoble_at_nhm.ac.uk

2
Purpose of talk
  • Malcolm Andrew are both investigators in
    BiodiversityWorld (BDW)
  • There are many problems BDW doesnt solve yet
  • and the funding runs out tomorrow!
  • Well present
  • BiodiversityWorld as a framework to support
    biodiversity research
  • Other projects in which biodiversity informatics
    problems have been addressed individually
  • Major challenge draw these disparate efforts
    together

3
Part 1(Andrew Jones)
4
Why Biodiversity Informatics is hard
  • Need to integrate data tools of different kinds
    for interesting in silico analyses
  • Various computer science issues, e.g.
  • Human-Computer Interaction
  • Design of environments to support scientific
    research
  • Interoperability
  • Complexity heterogeneity of data
  • Differences of scientific opinion
  • Data quality problems

5
The BiodiversityWorld project
  • 3 year e-Science project funded by BBSRC
  • Partners The University of Reading, Cardiff
    University, The Natural History Museum,
    Southampton University
  • Aim
  • Build a Biodiversity Grid(Problem Solving
    Environment to support Biodiversity research)
  • Support discovery use of arbitrary tools data
    sources for interesting in silico experiments
  • Provide environment to get beyond the cutting
    and pasting into Word documents approach to data
    integration and analysis

6
Example problems for BiodiversityWorld
  • How should conservation efforts be concentrated?
  • (example of Biodiversity Richness Conservation
    Evaluation)
  • Where might a species be expected to occur, under
    present or predicted climatic conditions?
  • (example of Bioclimatic Ecological Niche
    Modelling)
  • How can geographical information assist in
    selection among possible phylogenetic trees?
  • (example of Phylogenetic Analysis Palaeoclimate
    Modelling)

7
BiodiversityWorld architecture


User interface


Presentation

Workflow
enactment
Wrapped
Native

engine

resources

Biodiversity
-
Metadata
World
repositor
y

Resources

BGI API


BiodiversityWorld
-
GRID
Interface
(BGI)


The GRID

8
(No Transcript)
9
(No Transcript)
10
Some problems not fully solved in BDW
  • Flexible data access
  • BGI designed to make BDW maintainable, but
    currently assumes each resource has a predefined
    set of operations
  • BioDA project investigated use of OGSA-DAI in BDW
  • HCI issues
  • A much more exploratory approach to workflow
    construction might be appropriate?
  • Semantic interoperability data quality
  • Metadata repository basic information only
  • Only basic solution to species naming problems
    (SPICE)
  • Other problems of descriptive terms, differences
    of expert opinion, etc., remain to be addressed

11
Complexity of biodiversity data a
multi-dimensional problem
  • Same specimen might be described with differences
    of
  • Terminology
  • Opinion about identification
  • Opinion about whether a particular feature is
    present
  • Accuracy
  • Experts may differ as to
  • Circumscription associated with a given
    scientific name
  • (So may not be describing the same concept)
  • Terminology used to describe a given taxon
  • Accepted name for a species in a taxonomic
    checklist
  • There may be errors!
  • ...

12
SPICE for Species 2000
  • BBSRC/EPSRC- and EU-funded
  • SPecies 2000 Interoperability Co-ordination
    Environment
  • Aims
  • build scalable, federated scientific name
    catalogue organised by taxon (species, etc.)
  • provide synonymy server, enriching information
    retrieval
  • Issue how to build an architecture to integrate
    specialist, heterogeneous databases, providing a
    consistent federated view of broader scope?
  • Common Data Model sufficed
  • data requirements of federation identical for
    each database
  • small set of canned queries adequate for the
    catalogue

13
SPICE internal architecture
14
LITCHI
  • BBSRC/EPSRC- and EU- funded
  • Logic-based Integration of Taxonomic Conflicts in
    Heterogeneous Information systems
  • Aim detect conflicts between species checklists
    and either
  • Assist in producing a consistent checklist, or
  • Generate correspondences between checklists
    (cross-map)
  • Addressing problems of species classification
    naming variations when accessing species-related
    data
  • More general, semantic interoperability issue
  • detecting conflicts between different expert
    views of same subject matter
  • supporting data access based on these views

15
LITCHI example
  • Checklist 1
  • Caragana arborescens Lam. (accepted name)
  • Caragana sibirica Medikus (synonym)
  • Checklist 2
  • Caragana sibirica Medikus (accepted name)
  • Caragana arborescens Lam. (synonym)
  • (Lam. Lamark)

A full name which is not a pro-parte name may
not appear as both an accepted name and a synonym
in the same checklist
16
Name relationships (LITCHI 2)
17
myViews
  • Not funded yet limited proof-of-concept
    prototype only
  • Addresses problem that an expert may wish to
    generate taxon descriptions which are
  • Coherent
  • Mapped explicitly to other taxon descriptions,
    and
  • Based directly on existing documentation
    (monographs, etc), rather than completely
    re-coded in some restrictive formalism with a new
    vocabulary

18
Example describing the same things?
  • Description A
  • Sarothamnus scoparius (L.) Wimm. ex Koch.
  • Broom
  • ... a bush which is 50-200 cm high ...
  • Description B
  • Cytisus scoparius
  • Yellow broom
  • ... a small shrub up to 6ft or more ... native in
    its yellow form ...
  • Description C
  • Cytisus scoparius (L.) Link.
  • Broom
  • ... a deciduous shrub growing to 2.4m by 1m at a
    fast rate ... scented flowers ...
  • Description D
  • Common Broom
  • Cytisus scoparius
  • ... covered in profuse golden-yellow flowers ...
    shrub about 1-3m tall ...
  • Description E
  • Broom
  • Cytisus scoparius

19
Things we might want to do
  • In a system where
  • data is held in as raw a form as possible, to
    avoid information loss, but
  • we can impose various views and hypotheses
  • we might wish to
  • Create our own view of the data
  • For a given piece of knowledge, we could
  • accept it unaltered
  • accept but re-express in our terms (e.g.
    different scientific name different units ...)
  • state it is equivalent to another piece of
    knowledge(e.g. minor differences in
    measurements)
  • flag it as wrong
  • ...
  • In relation to anothers view, we might
  • include or ignore it
  • declare some mapping applicable to a group of
    items(e.g. every species of Sarothamnus is
    mapped to Cytisus)
  • ...
  • Reason with differing levels of precision
    simultaneously (e.g. binary/continuous characters
    derived from same features)

20
An experimental prototype
  • Proof of concept ...
  • arbitrary, small data set from various sources
    Cytisus Genista species
  • No real front end or back end yet!
  • Implemented in Prolog (a logic programming
    language)
  • Formalisms to record complex assertions their
    sources
  • Ontological knowledge not currently separated out
    explicitly rules perform inference
  • User makes his/her own assertions about (for
    example)
  • synonymy
  • which assertions of others to accept
  • ...
  • ... both very specific and more general rules
  • Main purpose illustrate handling multiple
    opinions/hypotheses

21
Sample knowledge base extracts
  • assertion(1, association(2, 3, absent(scent(flower
    s)))).
  • assertion(1, property(2, yellow(flowers))).
  • assertion(1, label(2, common('Broom'))).
  • assertion(1, label(2,species('Cytisus',
    'scoparius'))).
  • assertion(4, property(5, shrublet(whole))).
  • assertion(4, property(5, deciduous(whole))).
  • assertion(4, property(5, size(6, in, whole))).
  • assertion(4, property(5, deep_yellow(flowers))).
  • assertion(4, property(5, small(leaves))).
  • assertion(4, label(5,species('Cytisus',
    'ardoinii'))).
  • assertion(4, property(7, size(6, ft, whole))).
  • assertion(4, label(7,species('Cytisus',
    'scoparius'))).
  • assertion(12, label(13, common('Broom'))).
  • assertion(12, label(13,common('Scotch Broom'))).
  • assertion(12, property(13, compound('sparteine')))
    .
  • assertion(12, property(13, compound('tyramine'))).
  • assertion(12, label(13,species('Sarothamnus',
    'scoparius'))).
  • assertion(14, label(15,species('Sarothamnus',
    'scoparius'))).
  • assertion(14, property(15,size_range(50, 200,
    cm, whole))).
  • assertion(14, property(15, bright_yellow(flowers))
    ).
  • assertion(16, label(17,species('Cytisus',
    'scoparius'))).
  • assertion(16, property(17,max_height(2.4, m,
    whole))).
  • assertion(16, property(17,max_width(1, m,
    whole))).
  • assertion(16, property(17, present(scent(flowers))
    )).
  • assertion(8, property(9, golden_yellow(flowers))).
  • assertion(8, property(9,size_range(1, 3, m,
    whole))).
  • assertion(8, label(9,species('Cytisus',
    'scoparius'))).

Source 12 asserts that item 13s label is
common name Scotch Broom
22
Deducing from the knowledge base
  • ?- display_accepted_props('Cytisus', 'ardoinii').
  • shrublet(whole)
  • deciduous(whole)
  • size(6, in, whole)
  • deep_yellow(flowers)
  • small(leaves)
  • Yes
  • ?- display_accepted_props('Cytisus',
    'scoparius').
  • yellow(flowers)
  • size(6, ft, whole)
  • golden_yellow(flowers)
  • size_range(1, 3, m, whole)
  • max_height(2.4, m, whole)
  • max_width(1, m, whole)
  • present(scent(flowers))
  • absent(spines)
  • absent(scent(flowers))

23
Adding synonymy (1)
  • User regards any statement about a Sarathamnus
    species as being a statement about a Cytisus
    species with same epithet
  • assertion(20,synonym(species('Cytisus',
    Epithet), _, species('Sarothamnus', Epithet),
    _)).
  • (Could be more restrictive, e.g. apply to only
    particular information sources)

24
Adding synonymy (2)
  • ?- display_accepted_props('Cytisus',
    'scoparius').
  • yellow(flowers)
  • size(6, ft, whole)
  • golden_yellow(flowers)
  • size_range(1, 3, m, whole)
  • max_height(2.4, m, whole)
  • max_width(1, m, whole)
  • present(scent(flowers))
  • compound(sparteine)
  • compound(tyramine)
  • size_range(50, 200, cm, whole)
  • bright_yellow(flowers)
  • absent(spines)
  • absent(scent(flowers))
  • Yes
  • ?- display_contradictions_for('Cytisus',
    'scoparius').
  • size_range(1, 3, m, whole), size_range(50, 200,
    cm, whole)
  • present(scent(flowers)), absent(scent(flowers))

25
Some important issues for future work
  • Complexity, e.g.
  • Trade-off effective resource discovery v.
    computational expense of traversing rich ontology
  • Scalability of taxonomic conflict detection
  • May find large data sets need clever techniques
    such as Rete network
  • Scalability of inference in myViews caching
    inferred information
  • Managing ranking large result sets
  • How to rank resources discovered
  • How to rank conflicts
  • to present users with matches they are likely to
    want
  • Joining all these fragmentary projects up together

26
Part 2(Malcolm Scoble)
27
The complexity of taxonomic/biodiversity data
Specimen (unit) data
Collection-level
Species/taxon concept
Locality
Species name
DNA barcodes
Species concepts
Observations
Date of description
Synonyms
Type specimen
Genus name (for binomial)
Date of specimen collection
Time of specimen collection
Images
Name of collector
Homonyms
Author of taxon
28
Taxonomy from a fragmented to a distributed
resource
  • Where we want to be
  • Less fragmented single site or distributed
    access
  • Easier to update
  • Coordinated effort
  • Electronic (or dual) medium
  • Free access to data
  • Taxonomy easier to use
  • Where we are now
  • Fragmented results
  • Fragmented effort
  • Largely a paper medium (restricted access)

29
Projects to integrate biodiversity data
  • BioCISE (collection-level)
  • ENHSIN (specimen (unit)-level)
  • BioCASE (unit- collection-level)
  • Species 2000 (species nomenclature)
  • SYNTHESYS (taxonomic infrastructure)
  • ENBI (network of biodiversity information)
  • EDIT (distributed approach to taxonomy)
  • PBIs (inventorying the planets biodiversity)
  • CATE Creating a Taxonomic e-Science

30
BioCASE National Node Network
Collection-level
  • 31 National Nodes
  • Core Meta Database is updated every night

31
All levels
A Biological Collections Service for Europe
32
(No Transcript)
33
Creating a taxonomic e-science (CATE)
  • Literature scattered over 250 years of paper
    publications.
  • Data inaccessible other than to specialist users
  • Aim to transfer in toto the taxonomy of two
    groups of organisms to the web (Hawkmoths and
    Aroids).
  • Broad aim to encourage migration of taxonomy to
    the web.
  • Provide data for those studying biodiversity.
  • Encourage quality control, peer-review and the
    development of consensus taxonomies in the web
    environment.
  • Develop means of citation for web-based revisions

Arisaema candidissimum Photo RBG Kew
The Hawkmoth Sphinx caligineus sinicus from
Beijing, China. Photo Tony Pittaway
Write a Comment
User Comments (0)
About PowerShow.com