Discovery services: experiences from EIONET and GBIF - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Discovery services: experiences from EIONET and GBIF

Description:

... implementation was made in PHP using Apache as Web server ... Data Repository Tool, DiGIR-based Python Provider using Zope as ... for the PHP-based code ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 34
Provided by: mirunab
Category:

less

Transcript and Presenter's Notes

Title: Discovery services: experiences from EIONET and GBIF


1
Discovery services experiences from EIONET and
GBIF
Corrado Iannucci c.iannucci_at_finsiel.it
2
EIONET
  • Environmental Information and Observation
    Network, the telematic and organisational
    infrastructure of the EEA.
  • http//eionet.eu.int
  • Connects 37 countries (over 300 national
    environment agencies and other bodies dealing
    with environmental information).
  • Built since 1996 (last current node Tirana
    2003).
  • Includes ETC/Terrestrial Environment

3
EIONET (contd)
  • Hosts sw and data
  • CIRCA
  • Reportnet
  • EUNIS
  • Data include metadata and maps
  • GEMET
  • Mapservices
  • SW released as Open Source.

4
GBIF
  • The mission of the Global Biological Information
    Facility is
  • facilitating digitisation and global
    dissemination of primary biodiversity data, so
    that people from all countries can benefit from
    the use of the information.
  • http//www.gbif.org/
  • includes EUNIS Database as a provider of
    information about species, using the DiGIR
    protocol protocol and the Darwin Core XML schema.

5
Web Service concept
  • Software application or component which supports
    direct interaction with other systems, regardless
    of the platform they are built on
  • Most modern Web applications are predisposed to
    some kind of collaboration
  • if nothing else, at least export of data in RDF
    format
  • Necessary formats for data storing, discovery and
    transfer to be used are embraced according to the
    custom needs
  • XML, XML/RPC, SOAP, DiGIR, UDDI, ebXML etc.

6
Example 1
  • Many Providers have digitized data they want to
    expose through the Web
  • GBIFs case

7
Example 2
  • An application storing data exposes subsets of it
    in various formats
  • GEMETs case

8
Example 3
  • Data collection system which processes,
    aggregates and disseminates this information
  • REPORTNETs case

9
GBIF architecture
10
Requirements
  • Biodiversity data is shared through nodes (data
    providers)
  • Maintenance and control of data remains with
    datasets owners
  • There are no central data banks (except caches
    metadata)
  • Database owners must be able to block access to
    sensitive data
  • Source of data is acknowledged by all users

11
GBIF is a global integrator
12
GBIF objectives
  • To establish an distributed information
    infrastructure that serves primary biodiversity
    data
  • Specimens
  • Observations
  • Names
  • Species information
  • Images linked with specimens and observations
  • Literature
  • Metadata on the above
  • GBIF services will mainly be integrative metadata
    services, and standards

13
Data exchange standards
14
The Protocol
  • XML messaging on top of HTTP
  • Used for communication between data providers and
    data users
  • More light-weight and specialised than SOAP
  • Enables single point of access (portal/search) to
    distributed information resources
  • Resource a collection of data objects that
    conform to a common schema (DB records, XML
    documents)
  • Distributed resources comply with a federation
    schema
  • Enables search retrieval of structured data
  • Search for data values in context (semantics)
  • Results are presented as a structured data set
  • Makes location and technical characteristics of
    the native resource transparent to the user
  • The Distributed Generic Information Retrieval
    protocol was created by the TDWG/CODATA subgroup
    on biological collection data

15
A simple DiGIR architecture
16
GBIF architecture
17
Data provider
  • Goal a simple platform-independent tool for
    sharing data
  • The initial DiGIR implementation was made in PHP
    using Apache as Web server
  • Other implementations were needed since
  • Taxonomists often record their data in
    spreadsheet, word processor, etc.
  • Management of an online database requires
    resources and knowledge
  • Result the Data Repository Tool, DiGIR-based
    Python Provider using Zope as Web and application
    server

18
Data provider software
  • Each system entails
  • Provider software
  • Communication with the DiGIR (or BioCASe)
    protocol
  • Data standards Darwin Core, (ABCD,) Dublin Core
  • Configuration for each resource (local existing
    database)
  • Registration with GBIF UDDI registry
  • Turn-key package for easy installation supported
    by GBIF
  • Linux and Windows for the PHP-based code
  • Windows for the Data repository tool, including
    the BioCASe Python wrapper

19
GBIF Data repository tool
  • Enable data custodians to manage and publish
    their own data
  • Make available a simple data warehouse tool for
    those who want to host datasets for the community

20
Provider object The mapping of the Darwin Core
elements with the database fields same for
metadata is done TTW
21
Starting from generic export file, generates a
valid document for the Repository tool
22
GEMET architecture
23
GEMET as Web service
  • GEneral Multilingual Environmental Thesaurus has
    a new implementation in Python for Zope
  • http//www.eionet.eu.int/gemet
  • Its content can facilitate the global
    dissemination and exchange of environmental
    information
  • The administrative TTW interface insures proper
    authorised maintenance, without the need of
    exchanging CDs or big files in different formats

24
Who will use it
  • Generic portals and Web applications
  • Display word definitions for terms that appear in
    pages
  • Build picklists for keywords (metadata)
  • Describe data in a harmonised way
  • Use the existing API and implement own interfaces
    no matter on what system or programming language
  • (Environmental) search engines can better index
    data
  • End users will browse the Web interface

25
XML output with GEMET data
The first iteration exposes parts of GEMETs
structure and content to the public
26
Further steps
  • The needs of other applications will be
    individuated and, based on the requests, a
    complete API will be released to allow
  • retrieving terms and definitions, themes and
    concepts
  • updating parts of the content store remotely
  • implementing search mechanisms
  • etc.
  • Existing Python implementation exposing the
    content in XML (SKOS), XML-RPC and SOAP will ease
    the efforts

27
REPORTNET architecture
28
A webservice in itself the football
  • The component parts of Reportnet communicate
    inside the system through
  • XML
  • XML-RPC
  • HTTP

29
Collaboration with the outside world
  • Each component can be installed and customised
    locally and still communicate with the central
    system
  • The harmonised way of storage and data transfer
    ensures data can be picked, filtered and
    synthesized by end users and remote applications

30
Focusing on GDEM
  • Modular framework
  • Automatic quality assessment of data
  • Data conversions in various formats friendlier or
    easier to process
  • Feedback to data providers
  • Built-in workflow system which allows defining
    the reporting process customised for each
    dataflow
  • Total transparency for end users regarding the
    tools and platforms used (Zope, Python, Java)

31
CDR Quality Assurance
Quality assessment system
  • .
  • .

XML-RPC
CDR asks the QA service to analyse the data and
goes back for the response. The response has the
form of a feedback to data provider
32
CDR Conversion Service
Which are the possible conversions for the
existing formats?
XML-RPC
Conversion service
  • .
  • .

HTTP
CDR finds out if the reporting documents can be
converted. End users choose the format they
rather want to see the data in, click on a link
and convert the document
33
Reporting document that has been quality assessed
and can be converted
available conversions
feedback resulted from QA
Write a Comment
User Comments (0)
About PowerShow.com