Title: Long term preservation of digital data The data and metadata ingest process in the SIPAD Claude HUC
1Long term preservation of digital
dataThe data and metadata ingest process in
the SIPADClaude HUC Claude.huc_at_cnes.fr
2Reminder Collection data model
Plasma physics
VIKING
Interball
ISEE
Collections
Sounder
V4
V2
Electron
Ion
Terminal collections
C2
C3
C4
C5
C6
C7
C8
C1
Data objects
3Data and storage objects
Data Objects
Storage Objects
4First full model
Sciences of the Universe
Plasma physics
Collections
Waves
Particles
Radio- astronomy
TerminalCollections
C2
C3
C4
C5
C6
C7
C8
C1
DataObjects
Storage Objects
5Extension of the model
Data graph
Browse graph
.....
C1
G2
Ci
Cj
BC1
BC2
BC3
BCn
...
...
...
C11
C12
C13
BC11
...
C..
C..
C..
TC1
TC2
TC3
TC4
BTC1
BTC2
BTCn
......
Data objects
Browse objects
6Situation of work with respect to OAIS
- OAIS definition work 1995-2000
- At CNES since 1994 development of generic
archive systems
- Interaction between these activities, but OAIS
not completely taken into account in the
development
7Reflections on a data and metadata ingest standard
- approach experimented at CNES with the SIPAD
system - classification of the different categories of
data and metadata delivered, - each category is described in the form of an
entity characterized by its attributes
- PVL is used for this definition
- all descriptions are gathered in a dictionary
8Classification of delivered information
- Simple entities
- object identifier
- file name
- file size
- path value
- name of a graph node
- link between 2 nodes in a graph
- type of textual document (publication, DIF,
software source, conventional document) - type of graph
- etc.
- Complex entities
- document
- document collections
- data object
- storage object
- data set or data collection
- graph node
- etc.
9document collection description
10Document description
11data object description
12storage object description
13data object and storage object input
14Ingest process
- For each ingestion session, the ingest system
waits for at least one file with a pvl
extension. It performs the following operations - verification of conformity of the metadata
described in PVL with the reference dictionary, - verification of the consistency of the metadata
delivered when a pvl entity gives the name of
an external file, this file must be present in
the ingest space - start of the ingest process
- This highly flexible approach is designed to add
as many new categories of objects to be delivered
as necessary - This dictionary system will probably be
transferred to XML in the next few years
15External reference processing
- Metadata more and more often contains the
following types of external references - multiple WWW server http addresses,
- names and E-mails for points of contacts,
- These external references are by nature unstable
- Hence the need to systematically replace all
external physical references by a logical
markup that calls up a dictionary to switch from
one to another - This fonction is taken into account by the GLU
software (Gestionnaire de Liens Universels)
design and developed by the Centre de Données
Astronomiques de Strasbourg
16Ideas to be developed (1)
- Define a standard object dictionary including the
concepts of - information packages
- content information
- representation information
- preservation description information
- package information
- With the possibility of enriching this
dictionary, to adapt it to each context - add new objects
- modify existing objects
17Ideas to be developed (2)
- The ingest process may be greatly simplified and
automated if the producer and the OAIS agree on
the dictionary of objects to be delivered, - This dictionary includes the objects but may also
include the object organizational model, - The construction rules are those described in
theDEDSL, - We do, however, describe objects (characterized
by their nature) and not data entities
(characterized by their meaning)
18Proposals
- Experiment the proposed principles in several
contexts and several types of archive, - Analyze the feasibility of drawing up a general
common approach for input, based on the
definition of dictionaries of delivered entities