Experience from Mapping Existing Models to the Transfer Schema - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Experience from Mapping Existing Models to the Transfer Schema

Description:

up to 4 epithets (only 3 used) plus 4 category indicators to be ... Atomised name difficult to recreate as only terminal epithet is stored omitted it ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 16
Provided by: robertk154
Category:

less

Transcript and Presenter's Notes

Title: Experience from Mapping Existing Models to the Transfer Schema


1
Experience from Mapping Existing Models to the
Transfer Schema
  • Robert Kukla

2
Introduction
  • Three test databases
  • ITIS (plants part)
  • Berlin Model (mosses/higher plants)
  • Taxonomer (fishes)
  • Imported into mySQL
  • Java program to generate XML
  • Three main aspects
  • Identifying concepts
  • Extracting relationships
  • Concept details
  • No CharacterCircumscription, SpecimenCircumscripti
    on
  • No hybrids as implications are not fully
    understood

3
ITIS
  • Integrated Taxonomic Information System
  • authoritative taxonomic information
  • Continuously evolving
  • New records get added
  • Existing records get updated (!)
  • 331886 taxonomic units (97741 plants) - 206649
    concepts
  • Most explored DB

4
ITIS - Identifying Concepts
  • ITIS own concepts (type revision)
  • taxonomic unit
  • usage accepted
  • Synonyms (type referenced)
  • usage not accepted
  • referenced from synonym table
  • Vernaculars (type vernacular)
  • from vernacular table

5
ITIS Extracting Relationships
  • Concept Circumscription
  • parent_tsn field
  • Synonymy Relationships
  • Explicit synonyms
  • Vernaculars
  • Lineage Relationships
  • to concept of same name according to different
    publication

6
ITIS concept details
  • Names
  • up to 4 epithets (only 3 used) plus 4 category
    indicators to be interpreted depending on rank
  • authorTeam from separate table
  • NameSimple calculated
  • Publications
  • Multiple publication per taxon_unit
  • Not completely atomised - compromise

7
Berlin Model - Mosses/(German Higher Plants)
  • Database of Taxonomic Concepts
  • Records will not change
  • Explicit concept relationships (name-) synonymy
  • 24368 concepts 24368 concepts

8
Berlin Model - Identifying Concepts
  • From table pTaxon

9
Taxonomer
  • Relational data model for managing information
    relevant to taxonomic research
  • Records get added not changed
  • Assertion mention of a taxonomic name in the
    taxonomic literature
  • Protonym taxonomic name in the context of its
    first publication
  • Relationships between assertions
  • 36305 assertions 14971 concepts

10
Taxonomer - Identifying Concepts
  • Concepts (typereferenced)
  • from table tbl_Assertions
  • ReliabilityID gt 4 (4-revision, 5 original/new
    combination)

11
Taxonomer extracting relationships
  • ConceptCircumscription
  • ParentAssertionID
  • Relationships
  • Table not populated

12
Taxonomer concept details
  • Number of fields in the database suggested a
    complexity that was not supported by the data
    (not all fields filled)
  • Atomised name difficult to recreate as only
    terminal epithet is stored omitted it
  • Use of cheat fields for NameSimple
  • Large number of AccordingTo (gt4000)
  • Publication data transferred 11

13
Technical Aspects
  • Database consistency e.g.
  • getting all publication records
  • no relationships to non-existant concepts
  • Charset
  • assume windows-1252 code page
  • Slow!
  • indexes essential
  • fewer queries with big result sets faster
  • Recursive approach is more suitable for wrapper
  • guarantees small, consistent subset

14
Mapping software
  • Universal transformation software to convert
    relational data to XML (XMlizer)
  • Often GUI based filling in a skeleton XML file
  • Relate a single query (table or join) to
    collection of XML nodes
  • Map fields from that query to attributes or child
    elements of the XML node
  • Problems
  • No mechanism to use multiple sources (queries)
    for one
  • No conditional transformation
  • No splitting of fields
  • Limited merging of fields
  • Write our own universal mapping software
  • addresses first 2 problems

15
Conclusion
  • Conversion of legacy data is possible but
  • information missing
  • information will be lost
  • Data in original DB is open to interpretation so
    expert should be consulted
  • Required computing resources should not be
    underestimated
Write a Comment
User Comments (0)
About PowerShow.com