Content aggregation and information re-use - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Content aggregation and information re-use

Description:

Helmholtz Association big infrastructure labs. AWI - RV 'Polarstern' (100 M ) and stations ... WDC-Mare. Ana Macario, Bastian Onken and Hans Pfeiffenberger ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 30
Provided by: dranam
Category:

less

Transcript and Presenter's Notes

Title: Content aggregation and information re-use


1
Content aggregation and information re-use
Ana Macario, Bastian Onken and Hans
Pfeiffenberger Alfred Wegener Institute for
Polar and Marine Research
2
About us
  • Helmholtz Association big infrastructure labs
  • AWI - RV Polarstern (100 M) and stations
  • 400 scientists
  • 50 TB of ship- and station-generated datasets-
    among those up to 100 years old time series
  • Computer centre in charge of supplying
  • IT-part of productive working environment
  • preservation of valuable or at least costly
    datasets - since finished Ph.D.s dont care
    (almost)
  • gt mostly in that order of precedence
  • We try to take the middle ground at the
    institute as well as here

45 Gt/a primary production 50 of living matter
Why Plankton ??
3
Road map
  • EU-project PlanktonNet
  • Introduction to taxonomy
  • Rich content
  • Towards NOA for PlanktonNet

4
Background
Early 2004, AWI started a small project with MBL
to archive images and taxonomic keys/descriptions
for phytoplankton found in the North Sea
  • -gt 2 year EU project (acronym Plankton-Net)
    with 6 partners AWI Marine Biology Lab, Woods
    Hole Station Biologique, Roscoff Universidade
    de Lisboa IPIMAR, Lisbon Natural History
    Museum, London
  • -gt Original scope to create a network of
    interoperable repositories on plankton taxonomy
  • -gt Motivation to give taxonomists support in the
    hard task of identifying species and to rescue
    historically relevant collections
  • -gt Scope keeps growing information system which
    aggregates taxonomic content, descriptions,
    assets (images, documents), environmental and
    molecular data, annotations, etc and supplies
    an interactive environment for contributing

5
Road map
  • EU-project PlanktonNet
  • Introduction to taxonomy
  • Rich content
  • Towards NOA for PlanktonNet

6
Taxonomy and its challenges
  • Information about organisms is often linked to a
    name. This can create problems in information
    retrieval
  • one taxon can have many names
  • the same name can refer to many taxa

7
Taxonomic Name Server
  • The uBio Taxonomic Name Server (MBL-WHOI Library,
    Woods Hole, USA), implemented as a web service,
    acts as a name thesaurus. Two services are
    offered
  • NameBank is a repository of millions of recorded
    biological names and facts that link those names
    together
  • ClassificationBank stores multiple
    classifications and taxonomic concepts that are
    the result of expert opinions. It extends the
    functionality of NameBank.

8
Whats in a name?
Scientific names evolve over time as specimens
names are updated over the years.  When dealing
with vernacular (common) name, the problem is
even more difficult given the fact that it may
appear in several languages
nameBank
9
Whats in a classification?
  • ClassificationBank is a taxon concept server

10
Road map
  • EU-project PlanktonNet
  • Introduction to taxonomy
  • Rich content
  • Towards NOA for PlanktonNet

11
What is the content of PlanktonNet?
  • Data and meta-data associated with organisms
    (taxa)
  • by value
  • descriptive metadata (Darwin Core schema)
  • Images, SEM photos, schematic drawings, etc
  • Annotations
  • by reference linkout, include via Web-Service
  • Taxonomic keys, synonyms and classification
  • Bibliographical references
  • Geo-referenced environmental data
  • Molecular data

12
http//planktonnet.awi.de
from BioPedia, re-use via WS
to PANGAEA, WDC-Mare
quality linkouts
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
  • This is the working, local prototype (not a
    vision !!)
  • It has been fitted with an OAI-PMH module, to
    enable it as a data-provider

18
Road map
  • EU-project PlanktonNet
  • Introduction to taxonomy
  • Rich content
  • Towards Network Overlays for PlanktonNet

19
Rich Content for and from PlanktonNet
SP
Planktonnet_at_AWI
Planktonnet_at_Roscoff ..
OAI-PMH ()
DP
DP
20
Reality check
  • Highly heterogeneous information systems
  • Metadata harvesting is problematic lacking
    OAI-PMH compliance
  • Providing web services is not standard
  • Schema use is not standard crosswalks
    problematic
  • Why RDF-Ontology (and such things) when one can
    do tagging (and annotation) with Flickr?

21
(No Transcript)
22
(No Transcript)
23
Short-term goals
  • Create a central catalog with Dublin Core
    metadata as minimum and Darwin Core as an
    extended metadata format for PlanktonNet
  • Harvest all PlanktonNet data providers (with
    respective set information) using OAI-PMH
  • Long-term archival of all harvested records in a
    repository
  • Create a portal for accessing the locally
    harvested items as well as remote ones

24
Short-term goals (cont.)
  • Limitations
  • Only metadata is harvested
  • Relationships limited to collectionlt-gtitem
  • Restricted only to publicly available items
  • No support for collaborative work (e.g.,resource
    annotation/revision)

25
Long-term goals
  • Harvesting of metadata AND data (images,
    documents, etc) associated with a given resource
  • relevant for preservation / mirroring purposes
  • Branding as a result of targeted quality
    control of metadata from field experts
  • workflow needed
  • Versioning and traceability
  • Access control policies at item level

26
Long-term goals (cont.)
  • Expression of rich relationship
  • beyond simple collectionlt-gtitems (e.g.,
    structural, equivalence and annotation type of
    relationships)
  • Combine and disseminate harvested content
    with other, re-used content in flexible ways -gt
    foundation for a rich service offering
  • gt Networked Overlay Architecture (NOA) with
    FEDORA

27
Conclusions
  • Ontologists can learn from hundreds of years of
    taxonomy
  • Though an old field, information is a moving
    target (preservation vs. improvement ?)
  • Where is the (inter-)action happening ?
  • What (where and when) do we preserve ?
  • We believe that the visions and concepts of
    Fedora and NOA are appropriate to the problem
  • The scope of the problem and user ambitions have
    to be contained and satisfied in stages

28
  • Thank You !
  • Questions ??

29
Branding and taxonomy
  • Traditional field dates back 4th century BC
  • Specimen identification is not straight forward
    world-wide experts on a class or genus level
  • Information quality relevant in several cases
    (e.g., harmful algae blooms and associated
    health consequences)
  • Revision/annotation as unstructured metadata
    about a resource
  • Information on both metadata provenance and
    annotation provenance are relevant for branding
  • Type of desired queries
  • Find resources contributed by ...
  • Find resources revised / annotated by ..., etc

gt Versioning, traceability
Write a Comment
User Comments (0)
About PowerShow.com