Title: Discovery services: experiences from EIONET and GBIF
1Discovery services experiences from EIONET and
GBIF
Corrado Iannucci c.iannucci_at_finsiel.it
2EIONET
- Environmental Information and Observation
Network, the telematic and organisational
infrastructure of the EEA. - http//eionet.eu.int
- Connects 37 countries (over 300 national
environment agencies and other bodies dealing
with environmental information). - Built since 1996 (last current node Tirana
2003). - Includes ETC/Terrestrial Environment
3EIONET (contd)
- Hosts sw and data
- CIRCA
- Reportnet
- EUNIS
-
- Data include metadata and maps
- GEMET
- Mapservices
-
- SW released as Open Source.
4GBIF
- The mission of the Global Biological Information
Facility is - facilitating digitisation and global
dissemination of primary biodiversity data, so
that people from all countries can benefit from
the use of the information. - http//www.gbif.org/
- includes EUNIS Database as a provider of
information about species, using the DiGIR
protocol protocol and the Darwin Core XML schema.
5Web Service concept
- Software application or component which supports
direct interaction with other systems, regardless
of the platform they are built on - Most modern Web applications are predisposed to
some kind of collaboration - if nothing else, at least export of data in RDF
format - Necessary formats for data storing, discovery and
transfer to be used are embraced according to the
custom needs - XML, XML/RPC, SOAP, DiGIR, UDDI, ebXML etc.
6Example 1
- Many Providers have digitized data they want to
expose through the Web - GBIFs case
7Example 2
- An application storing data exposes subsets of it
in various formats - GEMETs case
8Example 3
- Data collection system which processes,
aggregates and disseminates this information - REPORTNETs case
9GBIF architecture
10Requirements
- Biodiversity data is shared through nodes (data
providers) - Maintenance and control of data remains with
datasets owners - There are no central data banks (except caches
metadata) - Database owners must be able to block access to
sensitive data - Source of data is acknowledged by all users
11GBIF is a global integrator
12GBIF objectives
- To establish an distributed information
infrastructure that serves primary biodiversity
data - Specimens
- Observations
- Names
- Species information
- Images linked with specimens and observations
- Literature
- Metadata on the above
- GBIF services will mainly be integrative metadata
services, and standards
13Data exchange standards
14The Protocol
- XML messaging on top of HTTP
- Used for communication between data providers and
data users - More light-weight and specialised than SOAP
- Enables single point of access (portal/search) to
distributed information resources - Resource a collection of data objects that
conform to a common schema (DB records, XML
documents) - Distributed resources comply with a federation
schema - Enables search retrieval of structured data
- Search for data values in context (semantics)
- Results are presented as a structured data set
- Makes location and technical characteristics of
the native resource transparent to the user - The Distributed Generic Information Retrieval
protocol was created by the TDWG/CODATA subgroup
on biological collection data
15A simple DiGIR architecture
16GBIF architecture
17Data provider
- Goal a simple platform-independent tool for
sharing data - The initial DiGIR implementation was made in PHP
using Apache as Web server - Other implementations were needed since
- Taxonomists often record their data in
spreadsheet, word processor, etc. - Management of an online database requires
resources and knowledge - Result the Data Repository Tool, DiGIR-based
Python Provider using Zope as Web and application
server
18Data provider software
- Each system entails
- Provider software
- Communication with the DiGIR (or BioCASe)
protocol - Data standards Darwin Core, (ABCD,) Dublin Core
- Configuration for each resource (local existing
database) - Registration with GBIF UDDI registry
- Turn-key package for easy installation supported
by GBIF - Linux and Windows for the PHP-based code
- Windows for the Data repository tool, including
the BioCASe Python wrapper
19GBIF Data repository tool
- Enable data custodians to manage and publish
their own data - Make available a simple data warehouse tool for
those who want to host datasets for the community
20Provider object The mapping of the Darwin Core
elements with the database fields same for
metadata is done TTW
21Starting from generic export file, generates a
valid document for the Repository tool
22GEMET architecture
23GEMET as Web service
- GEneral Multilingual Environmental Thesaurus has
a new implementation in Python for Zope - http//www.eionet.eu.int/gemet
- Its content can facilitate the global
dissemination and exchange of environmental
information - The administrative TTW interface insures proper
authorised maintenance, without the need of
exchanging CDs or big files in different formats
24Who will use it
- Generic portals and Web applications
- Display word definitions for terms that appear in
pages - Build picklists for keywords (metadata)
- Describe data in a harmonised way
- Use the existing API and implement own interfaces
no matter on what system or programming language - (Environmental) search engines can better index
data - End users will browse the Web interface
25XML output with GEMET data
The first iteration exposes parts of GEMETs
structure and content to the public
26Further steps
- The needs of other applications will be
individuated and, based on the requests, a
complete API will be released to allow - retrieving terms and definitions, themes and
concepts - updating parts of the content store remotely
- implementing search mechanisms
- etc.
- Existing Python implementation exposing the
content in XML (SKOS), XML-RPC and SOAP will ease
the efforts
27REPORTNET architecture
28A webservice in itself the football
- The component parts of Reportnet communicate
inside the system through - XML
- XML-RPC
- HTTP
29Collaboration with the outside world
- Each component can be installed and customised
locally and still communicate with the central
system - The harmonised way of storage and data transfer
ensures data can be picked, filtered and
synthesized by end users and remote applications
30Focusing on GDEM
- Modular framework
- Automatic quality assessment of data
- Data conversions in various formats friendlier or
easier to process - Feedback to data providers
- Built-in workflow system which allows defining
the reporting process customised for each
dataflow - Total transparency for end users regarding the
tools and platforms used (Zope, Python, Java)
31CDR Quality Assurance
Quality assessment system
XML-RPC
CDR asks the QA service to analyse the data and
goes back for the response. The response has the
form of a feedback to data provider
32CDR Conversion Service
Which are the possible conversions for the
existing formats?
XML-RPC
Conversion service
HTTP
CDR finds out if the reporting documents can be
converted. End users choose the format they
rather want to see the data in, click on a link
and convert the document
33Reporting document that has been quality assessed
and can be converted
available conversions
feedback resulted from QA