Long term preservation of digital data The data and metadata ingest process in the SIPAD Claude HUC - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Long term preservation of digital data The data and metadata ingest process in the SIPAD Claude HUC

Description:

CCSDS panel 2 Ingest process Annapolis May 2000. 2 ... Sounder. V4. Interball. ISEE. V2. Electron. Ion. Data. objects. Terminal. collections. Collections ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 19
Provided by: huccl
Category:

less

Transcript and Presenter's Notes

Title: Long term preservation of digital data The data and metadata ingest process in the SIPAD Claude HUC


1
Long term preservation of digital
dataThe data and metadata ingest process in
the SIPADClaude HUC Claude.huc_at_cnes.fr
2
Reminder Collection data model
Plasma physics
VIKING
Interball
ISEE
Collections
Sounder
V4
V2
Electron
Ion
Terminal collections
C2
C3
C4
C5
C6
C7
C8
C1
Data objects
3
Data and storage objects
Data Objects
Storage Objects
4
First full model
Sciences of the Universe
Plasma physics
Collections
Waves
Particles
Radio- astronomy
TerminalCollections
C2
C3
C4
C5
C6
C7
C8
C1
DataObjects
Storage Objects
5
Extension of the model
Data graph
Browse graph
.....
C1
G2
Ci
Cj
BC1
BC2
BC3
BCn
...
...
...
C11
C12
C13
BC11
...
C..
C..
C..
TC1
TC2
TC3
TC4
BTC1
BTC2
BTCn
......
Data objects
Browse objects
6
Situation of work with respect to OAIS
  • OAIS definition work 1995-2000
  • At CNES since 1994 development of generic
    archive systems
  • Interaction between these activities, but OAIS
    not completely taken into account in the
    development

7
Reflections on a data and metadata ingest standard
  • approach experimented at CNES with the SIPAD
    system
  • classification of the different categories of
    data and metadata delivered,
  • each category is described in the form of an
    entity characterized by its attributes
  • PVL is used for this definition
  • all descriptions are gathered in a dictionary

8
Classification of delivered information
  • Simple entities
  • object identifier
  • file name
  • file size
  • path value
  • name of a graph node
  • link between 2 nodes in a graph
  • type of textual document (publication, DIF,
    software source, conventional document)
  • type of graph
  • etc.
  • Complex entities
  • document
  • document collections
  • data object
  • storage object
  • data set or data collection
  • graph node
  • etc.

9
document collection description
10
Document description
11
data object description
12
storage object description
13
data object and storage object input
14
Ingest process
  • For each ingestion session, the ingest system
    waits for at least one file with a pvl
    extension. It performs the following operations
  • verification of conformity of the metadata
    described in PVL with the reference dictionary,
  • verification of the consistency of the metadata
    delivered when a pvl entity gives the name of
    an external file, this file must be present in
    the ingest space
  • start of the ingest process
  • This highly flexible approach is designed to add
    as many new categories of objects to be delivered
    as necessary
  • This dictionary system will probably be
    transferred to XML in the next few years

15
External reference processing
  • Metadata more and more often contains the
    following types of external references
  • multiple WWW server http addresses,
  • names and E-mails for points of contacts,
  • These external references are by nature unstable
  • Hence the need to systematically replace all
    external physical references by a logical
    markup that calls up a dictionary to switch from
    one to another
  • This fonction is taken into account by the GLU
    software (Gestionnaire de Liens Universels)
    design and developed by the  Centre de Données
    Astronomiques de Strasbourg 

16
Ideas to be developed (1)
  • Define a standard object dictionary including the
    concepts of
  • information packages
  • content information
  • representation information
  • preservation description information
  • package information
  • With the possibility of enriching this
    dictionary, to adapt it to each context
  • add new objects
  • modify existing objects

17
Ideas to be developed (2)
  • The ingest process may be greatly simplified and
    automated if the producer and the OAIS agree on
    the dictionary of objects to be delivered,
  • This dictionary includes the objects but may also
    include the object organizational model,
  • The construction rules are those described in
    theDEDSL,
  • We do, however, describe objects (characterized
    by their nature) and not data entities
    (characterized by their meaning)

18
Proposals
  • Experiment the proposed principles in several
    contexts and several types of archive,
  • Analyze the feasibility of drawing up a general
    common approach for input, based on the
    definition of dictionaries of delivered entities
Write a Comment
User Comments (0)
About PowerShow.com