Title: MOIMS Internet Packaging and Registries WG XML Formatted Data Units (XFDU) XML Packaging of Binary and Text Data
1MOIMS Internet Packaging and Registries WGXML
Formatted Data Units (XFDU) XML Packaging of
Binary and Text Data
- Lou Reich
- NASA/CSC
- MOIMS Plenary
- May 10, 2004
2XML Packaging Standard Rationale
- Physical media ?Electronic Transfer
- No standard language for metadata ?XML
- Homogeneous Remote Procedure Call?CORBA, SOAP
- Little understanding of long-term
preservation?OAIS RM - Record formats?Self describing data formats
- New Requirements
- describe multiple encodings of a data object
- better describe the relationships among a set of
data objects.
3Functionality by Release
- Version 1 should include
- Support for XML document and ZIP/JAR type files
- The capabilities of current SFDU packaging and
CCSDS Control Authority Concepts - Support for data descriptions, MIME types, self
describing formats, and detached data
descriptions - Flexible Metadata Model (Supports producer view
of Metadata types - Support for the OAIS RM Information Model
Concepts and Types - Flexible linkage to Metadata
- The ability to encapsulate related
files/resources into a single file/container - The ability to reference both content and
metadata resources contained in the same
container or at a known URL - The ability to allow/reverse multiple
transformations on files - Behaviors
- Web Service Interfaces
- Portable Code
4Functionality by Release
- Version 2 Functionality Enabled
- Behavior
- Automatic execution
- Scripting (Output of one behavior as input to
another behavior) - Software Updates
- Process Definition
- Relationship Definition
- Support for SOAP with attachments with no
mandated packaging of files into a single object
5Environment View of XFDU
6Logical View
7DIAGRAM of XFDU XML SCHEMA
8Expressions of Interest
- Interest in participation in 2004 Prototype
integration - Planetary Data System JPL
- National Space Science Data Center (NSSDC)
- Deep Space MS Packaging Prototype JPL
- GSFC Library GSFC
- ESA Data Distribution System
- CNES Archives (e.g., SIPAD)
- General Interest But No Current Commitment
- GMSEC NASA/GSFC Code 581
- HEASARC and Virtual Observatory
- EOSDIS Metadata Clearinghouse (ECHO)
- International Virtual Observatories
9Status of XML Formatted Data Unit Structure and
Construction Rules
- Interoperability Profile developed at the RAL
Workshop. The Workshop Noted that agreed
resources must be committed. - Working Group editor and Toolkit prototype lead
funding discontinued 11/2003 - 2/2004 - No progress in IPR WG during that timeframe
- A New draft of the XFDU Proposed Recommendation
should be approved for TSG Review this Workshop - Only prototype and testing activities will be
able to improve the current solution
10Review of IPR Charter
11Required Resources
- Lead agency NASA or CNES editor. Staffing
needed - WG lead (NASA 25)
- WG deputy (NASA 15)
- Recommendations Editors (CNES 30, NASA 30)
- WG Contributors 10 per WG member
- Testing Coordinator 20
- Prototype developers 50 (NASA 1, CNES 1, ESA
0.5, BNSC 0.x) starting ASAP. - Integrators 25 for 3 months, then 15
continuing, at least 1 per environment (NASA 3,
CNES 2, ESA 2)
12Risks
- Resources, Resources,Resources
- Regain Momentum from Working Group shutdown
- We cannot progress with multi-agency testing
efforts - Programmatic Risk Management
- The Packaging Recommendation functionality has
been split between two planned releases of the
XFDU Packaging Recommendation to allow early
prototyping of required capabilities. - A wide variety of use cases and testing
environments including but not limited to - NASA PDS
- NASA/EOSDIS Libraries
- NASA SLE implementations
- CNES SLE implementations
- CNES Archive Ingest SIP development
- ESA Data Distribution System
- ESA CAOS
13Registries
- Packaging partners (PDS, GSFC Library, ESA-DDS,
various SLE implementations etc) should give us a
good feel for a number of repositories. - NASA wants to make XML descriptions of all its
data available from a single logical repository. - Work in other areas suggests that the ebXML
registry will be a good fit to all CCSDS
repository requirements. An Open Source
implementation is available which some say is
sufficiently mature for operational use. NASA/CSC
is installing the ebXML s/w and will report back
on its experience with this. - At the Fall 2004 meeting a joint meeting with the
Information Architecture BOF/WG will be essential
to avoid duplication of work.
14Backup Slides
15CCSDS ORGANIZATIONAL VIEW
MOIMS
16Conceptual View of Information Package
Package Interchange File
External Packages
Manifest
File system
17Logical View of XFDU Package
Information Package Map
18XML SPY DIAGRAM of XFDU XML SCHEMA
19XML SPY DIAGRAM of XFDU XML SCHEMA
20XML Schema for Metadata Linkage
21XML Schema for Information Object
22 Data/Metadata Linkages Requirements
- Data Objects that are contained in the manifest
are to be encoded in base64 or XML - Data Objects that are included by reference from
the manifest are to exist as files in the XFDU
package or as files with known URIs either in a
repository or in a location accessible via URL - Metadata objects that are contained in the
manifest are to be encoded in base64 or XML - Metadata objects that are included by reference
from the manifest are to exist as files in the
XFDU package or as files with known URIs either
in a repository or in a location accessible via
URL - Information Objects can reference applicable
Metadata objects by ID where the name of the
referencing attribute is used to classify the
Metadata and the schema enables identification of
the source of the metadata - Allow metadata objects to be treated as data
objects to enable direct mapping to the OAIS
representation net where each metadata object is
an information object containing both data object
and representation information.
23XML Schema for Metadata Linkage
24XML Schema for Digital Object
25Development Approach
- Develop Draft Concept Paper and XML Schemas for
internal review - Use automated tool (JAXR) to develop JAVA Classes
from XML schema - Modify XML Schema based on internal review
comments and issues based on JAVA class
implementations - Develop draft CCSDS White Book for Working Group
Review - Begin staged implementation of API layer and
crude GUI of a packaging toolkit - Toolkit should provide useful functionality at a
very early stage for demonstration to interested
parties - Present to Working Group for review and prototype
commitments - Develop specialization of schema that all
international prototyping efforts agree to support
26Technical Drivers
- Use of XML based technologies
- Designed to be extensible to include new XML
technologies as they emerge - Linkage of data and software
- Direct mapping to OAIS Information Models
- Support both media and network exchange
- Support for multiple encoding/compression on
individual objects or on entire package - Mapping to current SFDU Packaging and Data
Description Metadata where possible - Maximal use of existing standards and tools from
similar efforts
27Packaging MechanismsSingle XML Document
- Single XML document
- Simplest case
- All Binary must be encoded (base 64 or hex)
- Can be parsed and validated with standard XML
parsers and shipped via standard WWW protocols - Impractical with large binary files
28Multi-file Packaging Approaches
- Archive Approach
- Encapsulate entire directory structure and all
contained files into a single file archiveusing
a common available technique such as ZIP - Other archive formats such as JAR, show how the
inclusion of a well-known file can include
related metadata - Message Approach
- Combines SOAP (RPC for the web) and MIME types
- Uses multi-part MIME/related, as a packaging
format mechanism for messages that transfer
multiple files - Allow use of appropriate compression/encoding
techniques for contained files. - Use of a common manifest or table of contents
object makes these two approaches symmetric - Design DecisionXFDU version 1 must support the
ZIP and single document forms. The SOAP/MIME/DIME
forms should be prototyped but the underlying
protocols may not be stable in the version 1
timeframe.
29High Level Entities XFDU Schema (1 of 2)
- Package Header (packHeader) Administrative
metadata for the whole XFDU, such as version,
operating system, hardware, author, etc, and
metadata about transformations and behaviours
that must be understood - Metadata Section (MetadataSec) This section
contain or references all of the metadata for all
items in the XFDU package. Multiple metadata
objects are allowed so that the metadata can be
recorded for each separate item within the XFDU
object. The metadata schema allows the package
designer to define any metadata model by
providing attributes for both metadata
categories and a classification scheme for finer
definition within categories. The model also
provides predefined metadata categories and
classes via enumerate attributes that follow the
OAIS information model as follows - Descriptive information is intended for the use
of Finding Aids such as Catalogs or Search
Engines. - The Representation Section and its subsections,
syntax information (syntaxMd), static semantics
(dedMd), and unclassified metadata (otherMd) - The classification of the PDI Section -
reference, context, provenance, and fixity -
-
30High Level Entities XFDU Schema (2 of 2)
- Information Package Map Section (ipMapSec)
outlines a hierarchical structure for the
original object being encoded, by a series of
nested contentUnit elements. Content units
contain pointers to the data objects and to the
metadata associated with those objects . - Data Object Section (dataObjectSec) contains a
number of dataObjEntry elements. A Data Object
Entry contains some file content and any data
required to allow the information consumer to
reverse any transformations that have been
performed on the object and restore it to the
byte stream intended for the original designated
community and described by the Representation
metadata in the Content Unit - Behavior Section (behaviorSec) can be used to
associate executable behaviors with content in
the XFDU object. A behavior section has an
interface definition element that represents an
abstract definition of the set of behaviors
represented by a particular behavior section. A
behavior section also has a behavior mechanism
that is a module of executable code that
implements and runs the behaviors defined
abstractly by the interface definition.
31Interoperability profile
- The Profile will indicate that ALL content for
both metadata and data files will be referred to
using dataObjPtr - Transfer mechanism for XFDU
- we do not support processing before all the data
has come down the wire assume XFDU file is on
local file system before it is opened - via HTTP
- in SOAP with attachment where the XFDU zip file
is an attachment - Identifier uniqueness issues
- package instance identifier
- could perhaps use UUID
- registry for xml Schema
- could be simple FTP server, with front-end index
file - Unique name for manifest file
- MANIFEST/ccsdsxfdu.xml