Proxy data objects - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Proxy data objects

Description:

Plus terminology definitions multiple. description projects sharing a ... Example: Specify contains a name and publication 'reduced data set' with editors ... – PowerPoint PPT presentation

Number of Views:135
Avg rating:3.0/5.0
Slides: 25
Provided by: gregorh
Category:

less

Transcript and Presenter's Notes

Title: Proxy data objects


1
Proxy data objects
provide optional linking to external objects and
reusable data interfaces, from which Darwin- and
LinneanCore-like protocol interfaces can be
constructed
(Talk held Wednesday, 2004-10-13, at the TDWG
2004 meeting in Christchurch, New Zealand)
2
(Submitted Abstract)
  • Ideally, biodiversity data are expressed using
    well-defined object types (with generally
    accepted and intuitive property and object
    composition concepts) and all objects required in
    relations are available in digital form,
    identifiable through resolvable globally unique
    identifiers. Reality is different. a) the complex
    models like ABCD, SDD, TCS etc. are under debate
    and not necessarily fully stable. A consequence
    is that simplified "Cores" like DarwinCore or
    LinneanCore are proposed. b) Most required data
    are not digitized at all. Wherever data should
    naturally refer to other biodiversity domains
    alternative models are required to either linking
    something external or provide a sufficient
    internal object definition. Such a definition
    also involves a simplified set of core elements.
  • I therefore propose to combine the two problems
    and define relatively simple data interfaces,
    that can serve both for protocol/query purposes
    and for the definition of local proxy objects (
    either link to external entities, or provide a
    local definition). Data interfaces shield the
    complexity of a fuller model (i.e. the full model
    can be treated as a black box). They should be
    rough enough to fit to several models, but also
    detailed enough to allow the definition of proxy
    data (i. e. make a substantial semantic
    definition). Data interface are often implicitly
    used in current practice. 'Specify' uses a
    simplified literature and nomenclature interface,
    DarwinCore contains name, identification and
    geographical location interfaces, Taxon Concept
    Schema contains interfaces for literature and
    specimen, and Linnean Core has a literature
    interface. Agreeing on a common set of such
    interface concepts would allow to build Darwin
    and LinneanCore as well as much of SDD, TCS, etc.
    from a the same building blocks, drastically
    reducing the investment needed for a full global
    biodiversity information system.

3
Outline
  • Linking and proxy objects
  • Data interface definitions
  • DarwinCore and LinneanCore from Interfaces

4
Linking
  • Organism-interaction data are primarily a
    4-tuple of
  • links to external objects
  • Organism name 1 (e. g., a fungus)
  • Organism name 2 (e. g., a plant)
  • Geographical location
  • Publication / data source reference
  • plus non-linking data
  • Interaction type (controlled vocabulary)
  • Reference detail (page, document-fragment, ...)

5
Linking
  • Similarly, SDD descriptions need links to
    express
  • Taxon / class name
  • Specimen unit
  • Geographical scope
  • Publication
  • Contributing IPR agents
  • Plus terminology definitions multiple
    description projects sharing a common terminology

6
Two kind of links?
  • Convenience links (see also-links)
  • GenBank entries for a specimen/taxon name
  • Link to illustrating image/video
  • If link breaks, the information content of the
    main document is usually left intact.
  • Defining links
  • Link to taxon / specimen in a description
  • Link to cited publication
  • If link breaks, the information content of the
    main document is lost or severely damaged.

? Recovery mechanism is highly desirable ?
Simplest preserve some human-readable semantics
7
Requirements
  • A link should be
  • Stable
  • URLs are not stable
  • PURLs unmanageable? (usually just xURLs
    extended life URLs)
  • Resolvable
  • Many GUID / URN schemes are not
  • LSIDs and are both stable and resolvable
  • Links used in primary scientific data must
    further
  • Offer recovery mechanism if a (principally
    stable) link vanishes nevertheless!

8
Not required, but desirable
  • Object Identity
  • DOI defines identity, but is expensive
  • At least, it should be discoverable, whether a
    link defines object identity or not ( do
    multiple URL / LSID exist for the same object?)

9
The Problem
  • For quite some time, the default will be that
    there is no external object
  • Either not yet
  • Digitization of specimens
  • Older literature
  • Or unreliable
  • Recent literature
  • Moreover, permanently objects will be
    temporarily not yet available
  • Science creates new
  • Specimens still in private collection
  • Taxon names / concepts to be published
  • etc.

10
So, now we have
Linkclient
Linkedobject
Local recove- ry semantics
Or
(unavailable)
Linkingclient
Linkedobject
Local replacement
11
Often also caching desirable
Locally cached data
Linkclient
Linkedobject
Local recove- ry semantics
Caching linked data temporarily unavailable
Recovery linked data permanently unavailable
12
Simplify?
Linkclient
Linkedobject
Locally cached interface data (incl. recovery)
Proxy object
13
A data proxy
  • May link to external data providers, especially
    for knowledge domains outside of the scope of the
    current dataset
  • Supports several object linking mechanisms
    involving globally unique identifiers and
    resolving mechanisms (e. g., DOI, LSID, URL)
  • Can replace links in cases where objects are
    (perhaps not yet) available from an external data
    source
  • For existing links A minimalized data interface
    is cached on the assumption that access is
    asynchronous, slow, or may be temporarily
    unavailable
  • For local proxies The same data interface allows
    relatively simple, local object definition to
    decouple processes
  • Provides cached data and semantics to human
    readers, allowing recovery even if a link has
    become permanently broken

14
Outline
  • Linking and proxy objects
  • Data interface definitions
  • DarwinCore and LinneanCore from Interfaces

15
Interface good choice of term?
  • Not user-interface!
  • Interface is used similar to its use in
    object-oriented programming
  • However, no methods in data interfaces
  • Persistence interface properties/fields/struc
    tures like collections
  • Data nterfaces provide additional abstraction
    layer on top of public object model
  • Data interfaces allow programming against the
    interface instead of against the full object
    model
  • Is there a better term?

16
Complexity
  • Complex schemata arecertainly necessary, but no
    solution for projects where a knowledge domain
    is of secondary relevance!

17
Goals
  • Abstract encapsulate complexity
  • Reduced size
  • cover 80 of interaction needs
  • react more flexible when changes in the main
    object model occur
  • Formalize (and standardize) reality
  • Most software needs peripheral objects from other
    knowledge domains and treats them lightly
  • TCS, SDD, ABCD/DarwinCore, LinneanCore
  • Example Specify contains a name and publication
    reduced data set with editors to them
    essentially proxy objects with a set of interface
    fields

18
Abstraction layers
Complex object model/schema
19
Data interface requirements
  • Should be stable, so applications programming
    against the interface do not break
  • Define mapping from complex standardto interface
  • Where interface is used as proxy (entering data),
    the reverse should also be defined

20
Outline
  • Linking and proxy objects
  • Data interface definitions
  • DarwinCore and LinneanCore from Interfaces

21
Modular interface/protocol schemata
Darwin-Core
Taxon name interface(full/atomizedtaxon name)
Taxon name interface(full/atomizedtaxon name)
Linn-ean-Core
Specimen curatorial interface(Collection/subcoll
.,access. no., )
Specimen curatorial interface(Collection/subcoll
.,access. no., )
Publicationinterface(author, year, title, ...)
Geographical location interface(place
description, geogr. coord., gazetteer link, )
22
Full schema partly using interfaces
Fulltaxonconceptschema
Taxon name interface(full/atomizedtaxon name)
Taxon name complete
Linn-ean-Core
Specimen curatorial interface(Collection/subcoll
.,access. no., )
Specimen curatorial interface(Collection/subcoll
.,access. no., )
Publicationinterface(author, year, title, ...)
Publicationinterface(author, year, title, ...)
23
OO schema or flattened?
Darwin-Core
Geographical location interface(place
description, geogr. coord., gazetteer link, )
24
Proxies proposed in UBIF/SDD
Write a Comment
User Comments (0)
About PowerShow.com