Title: EVA ESA Virtual Archive
1EVAESA Virtual Archive
NP / GMP
2AGENDA
- Presentation will cover the following topics
- EVA WG Activities in the last 2 months
- Archive Survey Approach and results
- EVA High Level Concept
- Plan for the next 3 months
3EVA WG
- Composition
- TOS-G N. Peccia Chairman
- APP-A G. M. Pinna Member
- TOS-M H P de Koning Member
- SCI-S C. Arviset Member
- S. Zatti has been nominated as ADM-IT member, but
in a passive role - Scope
- This cross-functional, cross-Directorate forum
shall - provide a forum for exchange of information
between the different archive developers/operators
within ESA and - identify possible common building blocks for ESA
archives. An initial report shall be made within
2 months.
4EVA WG
- Activities (agreed between all participants of
1st EVA meeting) - Produce a survey of internal / external to ESA
existing Archival Systems, new initiatives and
technology trends. - Identify common services, infrastructure software
(middleware) and tools that could be used as
building blocks by any ESA Archive Project. - Identify essential enabling technologies, such as
information exchange protocols, metadata
standards and data preservation techniques. - Identify those items which could usefully be
addressed within the framework of an EVA
initiative (e.g. definition of common
requirements, ESA network bandwidth, Internet 2
technology, shared computing resources in a GRID
architecture, etc) - Present an initial report to the audience of the
1st EVA meeting by beginning May 2001.
5EVA WG
- Telecons
- 4 telecons organised
- 8th March 2001
- 22nd March 2001
- 30th March 2001
- 27th April 2001
- Main activities of the WG were
- Archive Survey
- EVA Goals
6Archive Survey
- The following archives were selected for the
survey - ISO Data Archive
- XMM SOC AMS
- ESRIN AMSESOC Generic DDS
- ESRIN MUIS
- CNES Plasma Physics Archive (SIPAD)
- JPL PSD
- Envisat PDS
- ERS HDR archive in RAL
- Last three archives were dropped due to lack of
data and / or time. - It was decided in the last general meeting to
complete the survey with the following Archives - HEASARC Archive
- Envisat PDS Archive
- An Archive based on an ODBMS
- ESO ST Archive
7Archive Survey - Approach
- The current survey has been performed as
follows - Each Archive has been compared against the ISO
Reference Model for an Open Archival Information
System (OAIS). This model provides a framework,
including terminology and concepts, that
facilitates the comparison of the different
archives. - Each Archive has also been compared against an
additional set of characteristics, I.e. - An spread sheet was agreed with the following
items - Archive Data Files, Data Organisation, Data
Interchange Standards used, DB used, Hardware,
Software, Software Languages, Metadata (global
DED, Specific DED, Abstract Schema-Catalogue),
Interfaces (Inputs, Outputs, Administration,
Mechanism, Information Access), Interoperability
with other Archives)
8OAIS Functional Model
SIP Submission Information Package AIP
Archival Information Package DIP Dissemination
Information Package
9Metadata
- The most common definition of metadata is "Data
about other data". In our survey metadata implies
the following - Keywords incorporated in an Archive catalogue,
that allows intelligent queries - Syntax and semantics description of data objects
- Preservation Description Information, that is
divided into four types of information - 1. Provenance Information (source, custody,
history) - 2. Context Information
- 3. Reference Information (identifiers)
- 4. Fixity Information (e.g checksums)
10Data Dictionary
- A Data Dictionary is a formal repository of terms
used to describe data. It defines classes of data
objects and its properties - i.e.
- Application class
- Data file class
- Document class
- Global representation information class
(representation information common to all data
files) - Specific representation information class
(representation information specific to each data
file)
11Data Dictionary
- The proposed approach to capture the syntax,
semantics and inter-relationships of domain
information is a three-tiered concept that
provides a consistent and unambiguous data
definition. - Global Dictionary, a repository for common terms
and definitions - Specific Dictionary, a repository for major
objects and elements definitions. It includes
description of key objects and relationships
between objects. - Abstract Schema, an archive catalogue acting as a
repository of data object definitions that are
instantiated into DB tables. It defines data
object types and maps data elements across
objects.
12Archive Survey
13Archive Survey
14Archive Survey
15Archive Survey
16Archive Survey First conclusions
- Good mapping of ESA Archives to OAIS reference
Model - Main differences are as follows
- No clear separation between content information
and Preservation Descriptive Information - Prime and backup chains are not physically
separated to guarantee an smoothly disaster
recovery - No file error checking when archiving (e.g.
checksum) - Administrative functions covered either via ICDs,
documents produced during SR / AD Phases or done
by S/W support - Preservation Planning functions done by Archive
Scientist and / or Archive Administrator - Comparison against other archival aspects
- Main differences are as follows
- Variety of Data Interchange Standards, DBs, H/W,
S/W and S/W languages used - Limited usage of Data Dictionaries
- Metadata delivery / visualisation method
different on each archive - Limited or no interoperability
17Archive Survey - General Comments
- ESRIN MUIS and ESRIN AMS have to be combined as
one Archive because its functions are
complementary and comparable with those of ISO
Data Archive and XMM SOC AMS. - The spread-sheet used for the Archive Survey has
to be modified as follows - Inclusion of user profile
- Clear division between mechanism to exchange data
and physical way to transport data - Archive / file size
- Data Interchange Standards
- The table shows a confusion between catalogue
access (CIP) and data interchange formats (e.g.
HDF)
18Archive Survey - General Comments
- Interoperability
- Requirements for exchange between ESA Archives
have to be clarified. Interoperability is only
needed for archives of data of the same
discipline (therefore interagency) and not for
archives of data of different disciplines in the
same agency. EO missions have more
common (but exception is the exchange between EO
and planetary missions, where similar user
services are envisaged problems with DLR or other
national Agencies than with D/SCI for Mars
express and Envisat -mutatis mutandis-). The same
applies to D/SCI and its relations with the
scientific community. - Catalogue Interoperability should be
distinguished from data interoperability.
19EVA High Level Concept
- The EVA will develop basic middleware
infrastructure, supporting tools and protocols
necessary to improve the full scientific
potential of ESA archives in the coming future. - To accomplish this, the EVA will first and
foremost be built as a science driven, ESA
multi-site effort (i.e. involving all
Directorates). This would be accomplished through
development of the infrastructure software across
sites and disciplines (ESRIN, ESTEC, VILSPA and
ESOC). - The EVA concept is defining a set of common
services, infrastructure software (middleware)
and tools that can be used as building blocks by
any ESA Archive Project. Although ESA projects
are developed in different time scales and
different data formats are used, middleware
formats are totally generic.
20EVA High Level Concept
- EVA activities to fulfil its role would include
- Establishment of a common system approach to data
archiving, retrieval and interchange. - Enabling the transparent access to distributed
data sources - Enabling the distributed development of a suite
of commonly usable new software tools - Co-ordinating the establishment of high speed
data transfer networks that are essential to
provide the connectivity among archives,
computing facilities, and the widespread
community of users - Facilitating productive collaborations among ESA
Centres - Ensuring communication and possible
collaborations with scientists in other
disciplines facing similar problems
21EVA High Level Concept
What EVA really is not?
An standard to be imposed for developing ESA
Archives
Ø
22EVA High Level Concept
23EVA High Level Concept
- The major capabilities of EVA that need to be
established in order to enable its goals include
the ability to - Build tools and services upon open standards
(e.g. Java, XML, DEDSL, OAIS) - Define a metadata language for catalogues and
data sets - Provide a framework for incorporating new data.
- Link with existing and future digital libraries.
- Similar initiatives are being exploited on the
following projects - The US NVO (National Virtual observatory)
- The CERN Data Grid, on which ESA is participating
via ESRIN on Earth Observation Applications. - JPL OODT
- GSFC SML
- SPACE GRID
- The European Astronomical Virtual Observatory
24ESA Background
- Data systems across ESA archival systems have the
following characteristics - Cross-correlation between multi-disciplinary
dataset is difficult - Geographically distributed
- Have no standard language or protocol for data
interchange (except in catalogue (CIP on MUIS) or
CEOS product format). - No common metadata model ESA wide
- Have no system for registration of data products
ESA wide - Have different internal representations for data
products - A common Archive part that is sometimes missing
or is not clear defined in our ESA archives is
metadata - (i.e. information about information, labelling,
cataloguing and descriptive information) - Each archive has to model and encode this
metadata. The meaning of the metadata has to be
documented. A proper language shall define the
metadata on properties and relationships between
archive elements.
25ESA Middleware Service
- EVA middleware services candidates (list
non-exhaustive) are as follows - The Data Distribution Service
- The Query Service shall manage metadata
associated with resources and locate resources
across geographically distributed data systems. - The Administration Service shall provide a set of
tools for logging, supervision and archive
statistics. - The User Profile Service shall manage a set of
profiles. A profile is a set of resource
definitions describing information about
distributed data systems and their products.
26Networking Services
- EVA will depend critically on the nature and
quality of the underlying network. When building
distributed applications, one often observes
unexpectedly low performance, the reasons for
which are usually not obvious. Performance
(bandwidth, latency), security, quality of
service and reliability will all be key factors. - Activities include
- Review the network service requirements of ESA
and make detailed plans in collaboration with
ESRIN, Vilspa, ESTEC, ESOC under the
co-ordination of ADM-IT. - Monitor the traffic and performance of the
network and develop models and provide tools and
data for the planning of the ESA future network,
especially concentrating on the requirements of
handling significant volumes of data. - Deal with the distributed security aspects.
27Parallel Activities
- Parallel activities are not within the ToR of the
EVA WG, but - they are deemed sometimes as related to the EVA
WG - the WG can benefit from its findings
- the WG can suggest improvements
- WG contribution to these activities is not
compulsory - Parallel activities are currently
- ESOC XML Packaging Study
- XMM SOC AMS / ESOC DDS assessment by using Tamino
- Archive Ingestion Methodology (LET-SME
initiative) - CEOS Baseband InterChange Format ICF (ESRIN)
- ESRIN DEBAT (Digital EAST based Access Tools) GSP
Study
28Parallel Activities
- ESOC Study on XML Packaging ( 01.06.01 - 28.02.02
) - ESRIN and SSD involved by receiving the study
outputs, by reviewing documentation and by
including their Archives (IDA, MUISlt AMS) in the
proposed survey. - GMP and CA are members of the TEB
- Study Phases
- Consolidation (TN, URD, SRD)
- Comparison of Archives
- XML survey
- Specification of Mechanism
- selection of Archive
- Prototype development
- Two totally different Archives will be selected
to demonstrate inter-operability - Installation presentation at ESOC
29Parallel Activities - DDS with Tamino
Spacecraft
MCS
Server
XML SML
MCS Format
DDS User
SOAP-XML Internet Intranet
RDM Production System
SOAP-XML Internet Intranet
Server
XSLT
XSLT
30Parallel Activities - XMM SOC AMS with Tamino
AMS Archive
ATA Request
SAS ISS PPS
Other Sub- Systems
XML Catalogue Retrieval
EFT
XFTS
XFTS
XML SOAP
Internal, external Partner
XML Schema Validation
XML
XSLT Disseminator
XML Metadata Repository
XSCS
Arbritary Format (XML, non XML)
XML or other Format
31Parallel Activities
- GC Proposal submitted under LET - SME
- (Leading Edge Technologies for Small and Medium
Enterprise) - on Archive Ingestion Methodology
- Two Phases (100 Keuro each)
- Phase 1 includes the feasibility analysis, design
and initial demonstration via prototype of the
archival ingestion process - Phase 2 shall permit, from Phase 1 results, to
implement, test and validate the procedures and
software tools that support the ingest process.
Interoperability between archives shall be
demonstrated. - Proposal is under evaluation
32Digital National Libraries
- EVA and OAIS model presented to the EU-DELOS
Workshop on Harmonization of European and
National Initiatives in Digital Libraries held on
May 11th 2001 in Brussels. - Agenda was
- EU initiatives under FP6 (European Commission)
- National current/planned programs (Belgium,
Denmark, Finland, France, Germany, Greece, Italy,
Norway, Portugal, Spain, UK, ESA) - Proposals and recommendations for FP6
- Proposals and recommendations for Harmonisation
of national and EU Digital Library programs
33Digital National Libraries
- List of participants
- Fernando Armario (Spain)
- Philippe Avenier (France)
- Tarina Ayazi (DELOS)
- Jose Borbinha (Portugal)
- Rosella Caffo (Italy)
- Panos Costantopoulos (Greece)
- Lorcan Dempsey (UK)
- Donald George (Belgium)
- Kristiina Hormia-Poutanen (Finland)
- Elizabeth Lyon (UK)
- Maria Pia Rinaldi Mariani (Italy)
- Philippe Mougnaud (European Space Agency)
- Bo Öhrström (Demark)
- Reinhard Rutz (Germany)
- Rudi Schmiede (Germany)
- Francesco Sicilia (Italy)
- Bernard Smith (European Commission)
- Ingeborg Solvberg (Norway)
34Plan for the next 3 months
- Technical
- The EVA Initiative WG has to issue a solid EVA
concept by middle September 2001, to be agreed by
all parties in the 3rd status Meeting. - This concept shall clearly identify the common
EVA services and the set of middleware tools
associated. It shall also clear highlight the
novel EVA "add-ons" wrt existing or on-going
similar initiatives. - Once the different EVA tools are identified, an
inventory of existing tools in other projects has
to be carry out. - Coordinate EVA activities with DATA GRID, NVO and
European Astronomical Virtual Observatory - Managerial
- The 3rd EVA Initiative General Meeting will be
held at ESOC by end September 2001 - Propose (if appropriate) an Implementation Plan
for approval. - The goal is to present the EVA initiative to high
level management to reach an interdirectorate
agreement in the period November / December 2001.