Science Environment - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Science Environment

Description:

Results taken to integrate with other data realms (e.g., human populations, public health, etc. ... Vernacular used (e.g. Scrub Hickory) Misspelled. Are not unique ' ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 31
Provided by: jessi161
Category:

less

Transcript and Presenter's Notes

Title: Science Environment


1
Science Environment for Ecological
Knowledge Jessie Kennedy School of Computing,
Napier University, Edinburgh
2
The SEEK Prototype Ecological Niche Modeling
Geographic Space
Ecological Space
Biodiversity information e.g. data from museum
specimens, ecological surveys
ecological niche modeling
occurrence points on native distribution
Geospatial and remotely sensed data
Results taken to integrate with other data realms
(e.g., human populations, public health, etc.)
Native range prediction
3
Species prediction map
Predicted Distribution Amur snakehead (Channa
argus)
Image from http//www.lifemapper.org
4
SEEK Overview
Semantic Mediation System Smart data discovery
and integration
  • Analysis and Modelling System (Kepler)
  • Modelling scientific workflows
  • EcoGrid
  • Making diverse environmental data systems
    interoperate

Taxon WG Taxonomic name/concept resolution server
5
Scientific workflows
EML provides semi-automated data
binding Scientific workflows represent knowledge
about the process AMS captures this knowledge
6
Kepler Ecological Niche Model
7
Metadata driven data ingestion
  • Key information needed to read and machine
    process a data file is in the metadata
  • Physical descriptors (CSV, Excel, RDBMS, etc.)
  • Logical Entity (table, image, etc) and Attribute
    (column) descriptions
  • Name
  • Type (integer, float, string, etc.)
  • Codes (missing values, nulls, etc.)
  • Integrity constraints
  • Semantic descriptions (ontology-based type
    systems)

8
Ecological ontologies
  • What was measured (e.g., biomass)
  • Type of quantity measured (e.g., Energy)
  • Context of measurement (e.g., Psychotria
    limonensis)
  • How it was measured (e.g., dry weight)

9
Semantic Mediation
  • Label data with semantic types
  • Label inputs and outputs of analytical components
    with semantic types
  • Use reasoning engines to generate transformation
    steps
  • Use reasoning engine to discover relevant
    components

Data
Ontology
Workflow Components
10
Data integration
  • Homogeneous data integration
  • Integration of homogeneous data via EML metadata
    is relatively straightforward
  • Heterogeneous Data integration
  • Requires advanced metadata and processing
  • Attributes must be semantically typed
  • Collection protocols must be known
  • Units and measurement scale must be known
  • Measurement relationships must be known
  • e.g., that ArealDensityCount/Area

11
Life Sciences Data
  • Much of the data gathered in ecological studies
    and used in ecological data analysis is
    bio-referenced data
  • typically organisms are referenced by a Latin
    name
  • Many analyses requires integrating data
    originating in many locations and at various
    points in time
  • for most bio-referenced data, integration
    involves matching on organism name

12
Biological (scientific) Names
  • Used for communicating information about known
    organisms and groups of organisms taxa
  • Framework for all biologists to communicate with
  • Taxonomists apply scientific names to species and
    higher taxa in their classifications
  • Formalized and validated according to strict
    codes of nomenclature
  • (different depending on kingdom)
  • Latin name is a polynomial for species and below
    monomial for genus and above
  • Quoted as LatinName NameAuthors Year
  • Example Carya floridana Sarg. 1913

13
Classification, Concepts Names
14
Classification, Concepts Names
15
Taxonomic history of Aus L. 1758
bea and cea noted as invalid names and replaced
with beus and ceus. Pyle 1990
16
Problems with Scientific Names
  • Often recorded inappropriately in datasets
  • No author and/or year (e.g. Carya floridana)
  • Abbreviated (e.g. C. floridana)
  • Internal code (e.g. PicRub for Picea rubens)
  • Vernacular used (e.g. Scrub Hickory)
  • Misspelled
  • Are not unique
  • Re-use of names with changed definition
  • Name is ambiguous without definition
  • Subject to name alterations and 'corrections'
    over time
  • (e.g. Code changes its rules)

17
Concepts
  • Full Scientific name according to (Author
    Publication Date) Definition
  • Carya floridana Sarg. (1913) according to
    Charles Sprague Sargent, Trees Shrubs 2193
    plate 177 (1913) Definition
  • Original concept
  • 1st use of name as described by the taxonomist
  • same author date in scientific name and the
    according to
  • same publication for original concepts and name
  • Revised concept
  • Re-classification of a group
  • different author date in according to
  • Carya floridana Sarg. (1913) according to Stone
    FNA 3424 (1997) Definition
  • Should be used for communicating about groups of
    organisms
  • Full Scientific name according to (Author
    Publication Date)
  • definition clear can get the definition
  • comparing or integrating data based on concepts
    is more accurate
  • Can GUIDs help?

18
Concepts
  • Concepts are are described in many ways
  • Created by someone - an Author
  • Described in a Publication
  • Given a Name
  • May or may not be valid in terms of the
    nomenclatural codes
  • Depending on the taxonomists working practice,
    defined by
  • the set of Specimens examined
  • (type specimens and others)
  • Common set of Characters
  • data recorded by taxonomists to describe
    specimens and taxa
  • context dependent differentiate taxa rather than
    fully describe them
  • use natural language with all its ambiguities
  • Relationships to other Taxon Concepts
  • Taxon circumscription
  • the lower level taxa
  • Congruence, overlap etc to taxa in other
    classifications

19
Legacy Data
  • In legacy data names often appear in place of
    concepts
  • Names are imprecise
  • are inappropriate for referring to information
    regarding taxon e.g. observational/collection
    data
  • BUTsometimes thats all we have
  • How do we interpret names?..
  • potentially multiple definitions
  • the sum of all definitions that exist for the
    name
  • would that make any sense conflicts?
  • one of the existing definitions
  • how can we choose?
  • the attributes in common to all the definitions
  • would that leave any?
  • represented by the type specimen
  • but what does that mean? very subjective..

20
Legacy Names as Concepts
  • Nominal concepts
  • Sub-set of TaxonConcepts
  • Name but no AccordingTo
  • non-unique (concept) identifier elements
  • can have a unique concept GUID
  • No definition
  • Explicitly saying its something with this name
    but not really sure what is/was meant
  • Encourage people to understand and address the
    issue of names
  • Allowing mark-up of collections with names allows
    people to believe names are really good enough
  • Important problem - needs to be tackled sooner
    rather than later
  • will improve long term usefulness of scientific
    data
  • ease integration

21
SEEK Taxon
  • Build a Name/Concept resolution server
  • TOS (Kansas)
  • Taxonomic Concept Schema
  • TCS (Napier)
  • Exchange of taxonomic Info
  • TDWG/GBIF standard
  • Basis for TOS
  • GUIDs
  • GBIF/SEEK etc..
  • Tools to relate and compare concepts
  • Taxonomy Comparison Visualisation Tool (Napier)
  • Concept Mapper Tool (UNC)

22
Concept Comparison Visualisation
23
Taxon Concept Schema
  • TCS developed to allow exchange of taxonomic
    names/concept data
  • Based on consultation with range of users
  • understand users notions of taxonomic concept
  • what information they consider part of a concept
  • Presentations at meetings including 2 TDWG
  • Agreement that concepts are important and
    necessary
  • Taxon Names are independent from Taxon concepts
  • Agreement that observations/identifications etc.
    should record concepts not names

24
TCS
  • XML based exchange schema
  • Not designed as the correct way to model a
    Taxon Concept
  • No rules as to what a taxon must have
  • certain things needed to be useful
  • Design to accommodate different ways concepts
    described
  • Lots of optionality or flexibility in elements
  • to address different work practices in the
    community
  • Includes Taxon Names
  • are more constrained as they are governed the
    codes of nomenclature

25
TCS
  • Considerable debate on what should be top level
    elements
  • Related closely to the question
  • What gets a GUID?
  • Taxon concepts
  • Taxon Names
  • Specimens
  • Publications
  • Taxon Relationship Assertions
  • Concepts refer to Names
  • Names must not change
  • Cant record original taxon concept

26
Exchange of Data
  • Exchange of definitional data
  • name definition
  • information on history of name and type specimen
    and publication details
  • taxon concept definition
  • Name, publication details for the defining
    source, characters, specimens, related taxa etc
  • Exchange of usage data
  • for observations/lists (should only use taxon
    concepts)
  • need only exchange references to existing taxon
    concepts
  • user readable keys, e.g. Full Scientific name
    according to Author Publication
  • GUIDs
  • for name checking purposes
  • need only exchange name without history or
    typification
  • user readable keys, e.g. Full Scientific name
  • GUIDs

27
Issues of GUIDs for integration
  • What gets a GUID?
  • TCS top level elements??
  • The physical thing or electronic record of the
    thing
  • What is data and what is metadata associated with
    the GUID?
  • Depends on your perspective on life..
  • Stability of data associated with a GUID
  • Who issues GUIDs?
  • Centralised authority of some sort peer
    review??
  • One GUID per concept or name (no duplicates)
  • ensure business rules are applied to new
    names/concepts created
  • - bottleneck?
  • - too restrictive in what the business rules
    might be
  • Distributed free for all
  • Anyone can publish their own name/concept and
    get a GUID
  • - Mess of GUIDs to sort out
  • Which technology?
  • LSIDs, DOI etc.

28
TCS and SEEK and
  • Taxon Object Server
  • Core of concept/name resolution service
  • Kansas team has been implementing the TOS
  • Schema based on the TCS model
  • Tool to import data from TCS documents
  • EML
  • Proposed modifications to EML to accommodate
    SEEK's taxonomic resolution services in the
    future
  • User interface tools
  • Uses cut down TCS as input format
  • Inform other biology meta-data standards on
    taxonomic issues
  • Cataloguing the complete genome standard

29
Taxonomic Object Server
  • TOS Allows
  • registration, retrieval, integration of datasets
  • Matches concepts given names, other concepts and
    taxonomies
  • Allow taxonomists to
  • Author new ideas
  • Make new relationships between concepts
  • Allow researchers to
  • Easily see previous taxonomic opinions
  • Use a stable identification system to reference
    concepts (LSIDs)
  • Find concepts
  • Integration with Kepler

30
TOS operations
  • Via TCS document
  • addConcept
  • addRelationship
  • Public APIs
  • getConcept on GUID
  • getBestConcept on name string
  • getHigherTaxon on GUID and authority up tree
  • getAuthoritativeList down tree
  • findConcepts on any property(s)
  • findRelatedConcepts on GUID and relationships
  • getSynonymousNames returns name strings
  • getHigherTaxon
  • getAuthoritativeList
  • Dictionary for name-concept matching
  • N-gram matching algorithm
  • getBestConcept
Write a Comment
User Comments (0)
About PowerShow.com