Title: DEVELOPING AN ISO REFERENCE MODEL FOR AN OPEN ARCHIVAL INFORMATION SYSTEM OAIS Presentation to Socie
1ISO Reference Model For anOpen Archival
InformationSystem (OAIS)Tutorial
Presentation Don Sawyer NASA/National Space
Science Data Center (NSSDC) Lou Reich Computer
Sciences Corporation (CSC) Library of
Congress June 13, 2003
2Outline
- History
- Reference Model overview
- Some Applications
- Follow-on Activities
- Producer-Archive Ingest Methodology Abstract
Standard - Standard Submission Information Package
- Archive Certification
3NASA Role
- National Space Science Data Center
- NASAs first digital archive
- Experienced many technology changes since 1966
- Consultative Committee for Space Data Systems
- International group of space agencies
- Developed variety of science discipline-
independent standards - Became working body for an ISO TC 20/ SC 13 about
1990 - TC20 Aircraft and Space Vehicles
- SC13 Space Data and Information Transfer Systems
4Initial Archive Standards Proposal
- ISO suggested that SC 13 should develop archive
standards - Address data used in conjunction with space
missions - Address intermediate and indefinite long term
storage of digital data
5Response
- Response to Consultative Committee for Space Data
Systems (CCSDS) and ISO TC 20/SC 13 - No framework widely recognized for developing
specific digital archive standards - Begin by developing a Reference Model to
establish common terms and concepts - Ensure broad participation, including traditional
archives - (Not restricted to space communities all
participation is welcome!) - Focus on data in electronic forms, but recognize
that other forms exist in most archives - Follow up with additional archive standards
efforts as appropriate
6What is a Reference Model?
- A framework
- for understanding significant relationships among
the entities of some environment, and - for the development of consistent standards or
specifications supporting that environment. - A reference model
- is based on a small number of unifying concepts
- is an abstraction of the key concepts, their
relationships, and their interfaces both to each
other and to the external environment - may be used as a basis for education and
explaining standards to a non-specialist.
7Organizational Approach
- Organize US contribution under a framework with
NASA lead - Established liaison with Federal Geographic Data
Committee (FGDC) and National Archives and
Records Administration (NARA) - Agency archives and users must be represented in
this process - An Open process
- Important to stimulate dialogue with broad
archive/user communities - Results of US and International workshops put on
WEB - Support e-mail comments/critiques
- Broad international workshops also held
- UK and France
- Issue resolution at ISO/Consultative Committee
for Space Data Systems international workshops
8Technical Approach
- Investigate other Reference Models.
- ISO Seven LayerCommunications Reference Model
- ISO Reference Model for Open Distributed
Processing - ISO TC211 Reference Model for Geomantics
- Define what is meant by archiving of data
- Break archiving into a few functional areas
(e.g., ingest, storage, access, and preservation
planning) - Define a set of interfaces between the functional
areas - Define a set of data classes for use in Archiving
- Choose formal specification techniques
- Data flow diagrams for functional models and
interfaces - Unified Modeling Language (UML) for data classes
9Results
- Reference Model targeted to several categories of
reader - Archive designers
- Archive users
- Archive managers, to clarify digital preservation
issues and assist in securing appropriate
resources - Standards developers
- Adopted terminology that crosses various
disciplines - Traditional archivists
- Scientific data centers
- Digital libraries
10Reference Model Status
- Already widely adopted as starting point in
digital preservation efforts - Digital libraries (e.g., Netherlands National
Library) - Traditional archives (e.g., US National Archives)
- Scientific data centers (e.g., National Space
Science Data Center) - Commercial Organizations (e.g., Aerospace
Industries Association preservation working team) - Published as final CCSDS standard (Blue Book)
available from - http//www.ccsds.org/documents/650x0b1.pdf
- Recently published as a final ISO standard ISO
14721 2003
11Reference Model for anOpen Archival Information
System Technical Overview
12Open Archival Information System (OAIS)
- Open
- Reference Model standard(s) are developed using a
public process and are freely available - Information
- Any type of knowledge that can be exchanged
- Independent of the forms (i.e., physical or
digital) used to represent the information - Data are the representation forms of information
- Archival Information System
- Hardware, software, and people who are
responsible for the acquisition, preservation and
dissemination of the information
13Document Organization
- Introduction
- Purpose and Scope, Applicability, Rationale, Road
Map for Future Work, Document Structure, and
Definitions of Terms - OAIS Concepts and Responsibilities
- High level view of OAIS functionality and
information models - OAIS external environment
- Minimum responsibilities to become an OAIS
- Detailed Models
- Functional model descriptions and information
model perspectives - Preservation perspectives
- Media migration, compression, format conversions,
and access service preservation - Archive Interoperability
- Criteria to distinguish types of cooperation
among archives - Annexes
- Scenarios of existing archives, compatibility
with other standards
14Purpose, Scope, and Applicability
- Framework for understanding and applying concepts
needed for long-term digital information
preservation - Long-term is long enough to be concerned about
changing technologies - Starting point for model addressing non-digital
information - Provides set of minimal responsibilities to
distinguish an OAIS from other uses of archive - Framework for comparing architectures and
operations of existing and future archives - Basis for development of additional related
standards - Addresses a full range of archival functions
- Applicable to all long-term archives and those
organizations and individuals dealing with
information that may need long-term preservation - Does NOT specify an implementation
15Model View of an OAIS Environment
- Producer is the role played by those persons, or
client systems, who provide the information to be
preserved - Management is the role played by those who set
overall OAIS policy as one component in a broader
policy domain - Consumer is the role played by those persons, or
client systems, who interact with OAIS services
to find and acquire preserved information of
interest
16OAIS Responsibilities
- Negotiates and accepts Information from
information producers - Obtains sufficient control to ensure long-term
preservation - Determines which communities (designated) need to
be able to understand the preserved information - Ensures the information to be preserved is
independently understandable to the Designated
Communities - Follows documented policies and procedures which
ensure the information is preserved against all
reasonable contingencies - Makes the preserved information available to the
Designated Communities in forms understandable to
those communities
17OAIS Information Definition
- Information is always expressed (i.e.,
represented) by some type of data - Data interpreted using its Representation
Information yields Information - Information Object preservation requires clear
identification and understanding of the Data
Object and its associated Representation
Information
18Information Package Definition
Preservation Description Information
Content Information
- An Information Package is a conceptual container
holding two types of information - Content Information
- Preservation Description Information (PDI)
19Information Package Variants
- Submission Information Package
- Negotiated between Producer and OAIS
- Sent to OAIS by a Producer
- Archival Information Package
- Information Package used for preservation
- Includes complete set of Preservation Description
Information (PDI) for the Content Information - Dissemination Information Package
- Includes part or all of one or more Archival
Information Packages - Sent to a Consumer by the OAIS
20External Data Flow View
Producer
OAIS
queries
result sets
orders
Consumer
21Detailed Models
22Overview of Detailed Models
- It was decided to do both a functional and an
information model of the OAIS - Both models were tasked to
- Use the models to better communicate OAIS
Concepts - Use a well established, formal modeling technique
- Stay as implementation independent as possible
- Avoid detailed designs
23Detailed Models
24General Principles
- Define classes of information objects that
illustrate information necessary to enable
Long-term storage and access to Archives - The class definition should be implementation
Independent - Use a subset of Unified Modeling Language (UML)
25UML Notation
26Information Object
Information Object
1
Interpreted using
1
Data Object
Representation Information
Interpreted using
Physical Object
Digital Object
1
Bit Sequence
27Representation Information
- The Representation Information accompanying a
physical object, like a moon rock, may give
additional meaning - It typically is a result of some analysis of the
physically observable attributes of the rock - The Representation Information accompanying a
digital object, or sequence of bits, is used to
provide additional meaning. - It typically maps the bits into commonly
recognized data types such as character, integer,
and real and into groups of these data types. - It associates these with higher level meanings
which can have complex inter-relationships that
are also described
28Recursive Nature ofRepresentation Information
Interpreted using
- Structure Information
- Semantic Information
- Other Representation Information
Representation Information
1
1
Other Representation Information
Semantic Information
Structure Information
adds meaning to
29Types of Information Used in OAIS
Information Object
. . .
Packaging Information
Preservation Description Information
Descriptive Information
Content Information
30Content Information
- The information which is the primary object of
preservation - An instance of Content Information is the
information that an archive is tasked to
preserve. - Deciding what is the Content Information may not
be obvious and may need to be negotiated with the
Producer - The Data Object in the Content Information may be
either a Digital Object or a Physical Object
(e.g., a physical sample, microfilm)
31Preservation Description Information
- Provenance Information
- Describes the source of Content Information, who
has had custody of it, what is its history - Context Information
- Describes how the Content Information relates to
other information outside the Information Package - Reference Information
- Provides one or more identifiers, or systems of
identifiers, by which the Content Information may
be uniquely identified - Fixity Information
- Protects the Content Information from
undocumented alteration
32PDI Examples
33Descriptive Information
- Contain the data that serves as the input to
documents or applications called Access Aids. - Access Aids can be used by a consumer to locate,
analyze, retrieve, or order information from the
OAIS.
34Packaging Information
- Information which, either actually or logically,
binds and relates the components of the package
into an identifiable entity on specific media - Examples of Packaging Information include tape
marks, directory structures and filenames
35OAIS Archival Information Package
Archival Information Package (AIP)
Packaging Information
Package Description
delimited by
derived from
e.g., How to find Content information and PDI
on some medium
e.g., Information supporting customer searches
for AIP
Preservation Description Information (PDI)
Content Information
further described by
e.g., Hardcopy document Document as an
electronic file together with its format
description Scientific data set consisting
of image file, text file, and format
descriptions file describing the other files
e.g., How the Content Information came into
being, who has held it, how it relates to
other information, and how its integrity is
assured
36AIP Types
- Archival Information Unit (AIU) contains a single
Data Object as the Content Object - Archival Information Collection (AIC) contains
multiple AIPs in its Content Object - Each member of an AIC is an AIP containing
Content Information and PDI - The AIC contains unique PDI on the collection
process
Archival Information Package
Archival Information Unit
Archival Information Collection
37Package Descriptions and Access Aids
- Package Descriptions are needed by an OAIS to
provide visibility and access to the OAIS
holdings - Package Descriptions contain 1 or more Associated
Descriptions which describe the AIP Content
Information from the point of view of a single
Access Aid - Some example of Access Aids Include
- Finding Aids - assist the consumer in locating
information of interest - Ordering Aids - allow the consumer to discover
the cost of and order AIUs of interest - Retrieval Aids - enable authorized users to
retrieve the AIU described by the Unit Descriptor
from Archival Storage
38Information Model Summary
- Presented a model of information objects as
containing data objects and representation
objects - Classified information required for Long-term
archiving into 4 classes Content Information,
PDI, Packaging Information and Descriptive
Information - Described how these classes would be aggregated
and related in an AIP to fully describe an
instance of Content Information - Presented information needed for Access, in
addition to that needed for Long-term
Preservation - Put the Access oriented structures in the context
of the other data needed to operate an OAIS
39Detailed Models
40General Principles
- Highlight the major functional areas important to
digital archiving - Use functional decomposition to clarify the range
of functionality that might be encountered - Don't decompose beyond two levels to avoid
becoming too implementation dependent - Provide a useful set of terms and concepts
- Do not imply that all archives need to implement
all the sub-functions - Identify some common services which are likely to
be needed, and are assumed to be available, as
underlying support
41Common Services
- Modern, distributed computing applications assume
a number of supporting services - Examples of Common Services include
- inter-process communication
- name services
- temporary storage allocation
- exception handling
- security
- file and directory services
42Open Archival Information SystemSix Functional
Entities
Preservation Planning
P R O D U C E R
C O N S U M E R
Data Management
queries
result sets
SIP
Ingest
Access
orders
Archival Storage
DIP
Administration
MANAGEMENT
SIP Submission Information Package
AIP Archival Information Package
DIP Dissemination Information Package
43Functional Entities In An OAIS
- Ingest This entity provides the services and
functions to accept Submission Information
Packages (SIPs) from Producers and prepare the
contents for storage and management within the
archive - Archival Storage This entity provides the
services and functions for the storage,
maintenance and retrieval of Archival Information
Packages - Data Management This entity provides the
services and functions for populating,
maintaining, and accessing both descriptive
information which identifies and documents
archive holdings and internal archive
administrative data. - Administration This entity manages the overall
operation of the archive system - Preservation Planning This entity monitors the
environment of the OAIS and provides
recommendations to ensure that the information
stored in the OAIS remain accessible to the
Designated User Community over the long term even
if the original computing environment becomes
obsolete. - Access This entity supports consumers in
determining the existence, description, location
and availability of information stored in the
OAIS and allowing consumers to request and
receive information products
44Ingest Data Flow Diagram
45Preservation Planning
46Preservation Perspectives
47Migration Context
Content Information Identifier
Data Management And Access View
Descriptive Information Mapping
AIP Identifier
Archival Storage View
Archival Storage Mapping
Packaging Information
Preservation Description Information
Content Information
48Digital Migration
- Digital Migration is defined to be the transfer
of digital information, while intending to
preserve it, within the OAIS. - Focus on preservation of the full information
content - New information implementation replaces the old
- OAIS has full control and responsibility over all
aspects of the transfer
49Migration Motivators
- Motivators driving digital migrations
- Media Decay
- Often this is superceded by escalating media
drive maintenance costs - Increased Cost Effectiveness
- More cost-effective media types with higher
volumes and lower drive maintenance costs - New User/Consumer Service Requirements
- New formats more compatible with users
technology and applications - Proprietary software evolution
- New software versions used to upgrade formats
of the information objects being preserved
50Digital Migration Approaches
- Four primary types of digital migration in
response to motivators, ordered by increasing
risk of information loss - Refreshment
- Media replacement with no bit changes
- Replication
- No change to Packaging Information or Content
Information bits - Repackaging
- Some bit changes in Packaging Information
- Transformation
- Reversible Bit changes in Content Information
are reversible by an algorithm - Non-reversible Bit changes in Content
Information are not reversible by an algorithm
51Access Preservation
- Effective access to digital information requires
the use of software - Application Programming Interfaces (APIs) may be
cost-effectively maintained across time by an
OAIS when - API is not too complex
- API is applicable to a wide variety of AIUs
- API source code may be ported to new environments
- Extensive testing is needed to ensure against
information loss - Preservation of executables by full emulation of
underlying hardware is problematic - Hard to know what is the information being
preserved - May not be possible to fully emulate associated
devices
52Archive Interoperability
53Archive Interoperability Motivators
- Users of multiple OAIS archives have reasons to
wish for some interoperability or cooperation
among the OAISs. - Consumers
- Common finding aids to aid in locating
information over several OAIS archives - Common Package Descriptor schema for access
- Common DIP schema for dissemination, or a single
global access site. - Producers
- common SIP schema for submission to different
archives - a single depository for all their products.
- Managers
- Cost reduction through sharing of expensive
hardware increasing the uniformity and quality of
user interactions with the OAIS
54Categories of Archive Interactions
- Independent no knowledge by one OAIS of
Standards implemented at another - Cooperating Potentially common submission
standards, and common dissemination standards,
but no common access. One archive may make
subscription requests for key data at the
cooperating archive - Federated Access to all federated OAIS is
provided through a common set of access aids that
provide visibility into all participating OAISs.
Global dissemination and Ingest are options - Shared resources An OAIS in which Management has
entered into agreements with other OAISs is to
share resources to reduce cost. This requires
various standards internal to the archive (such
as ingest-storage and access-storage interface
standards), but does not alter the communitys
view of the archive
55Federated Archives
563 Levels of Autonomy in Associated Archives
- No interactions and therefore no association
- Associations that maintain your autonomy. You
have to do certain things to participate, but you
can leave the association without notice or
impact to you. - Associations that bind you by contract. To
change the nature of this association you will
have to re-negotiate the contract. The amount of
autonomy retained depends on how difficult it is
to negotiate the changes.
57Reference Model Summary
- Reference model is to be applicable to all
digital archives, and their Producers and
Consumers - Identifies a minimum set of responsibilities for
an archive to claim it is an OAIS - Establishes common terms and concepts for
comparing implementations, but does not specify
an implementation - Provides detailed models of both archival
functions and archival information - Discusses OAIS information migration and
interoperability among OAISs
58Some Applications
59Selected OAIS Usage Examples
- Networked European Deposit Library (NEDLIB)
- Royal Library of the Netherlands
- IBM is developing an OAIS like mplementation
- British National Library
- Asking IBM to extend its OAIS like
implementation - Research Library Group and OnLine Computer
Library Center - Developed an OAIS based approach to trusted
repositories - Web page to track OAIS implementation
efforts/issues - http//www.rlg.org/longterm/oais.html
- Library of Congress
- Hosting METS XML data packaging approach
- National Digital Information Infrastructure
Preservation Program (NDIIPP)
60Selected OAIS Usage Examples-2
- InterPARES
- Body of National Archives from many countries,
adopted OAIS as a starting point for their
modeling work - France set up a working group within ARISTOTE
- interested in archive of digital information,
including libraries and Dept of Justice. - http//www.aristote.asso.fr/ (in french)
- astonishing unifying role from OAIS reference
model - System for Preservation and Access to Data and
Information (SIPAD) - French space agency plasma physics archive used
the OAIS as a basis for design - National Space Science Data Center (NSSDC)
- Evolving our archive using OAIS as a basis for a
new architecture
61Selected OAIS Usage Examples-3
- National Archives and Records Administration
contracted preservation work with San Diego Super
Computer Center - Both parties claimed use of the OAIS RM saved
several weeks of effort in the specification of
the task - Similar experiences between
- National Library of France and French space
agency (CNES) representatives - National Center for Supercomputer Applications
HDF format developers and DNA researchers - Life Sciences Archive developer and micro-gravity
researchers - United States Department of Agriculture and
digital preservation experts
62Follow-on Activities
- Research Libraries Group has established a web
page to track OAIS implementation efforts and
issues - http//www.rlg.org/longterm/oais.html
- CCSDS Certification Coordination Function
- Will track and summarize various archive
certification efforts - Will attempt to extract high-level
model/checklist - RLG is organizing a group to establish
certification approaches
63Follow-on Activities - 2
- Standard Submission Information Package
- Just getting started under CCSDS Archive Ingest
Working Group - CCSDS/ISO Producer-Archive Interface Methodology
Standard - Provides framework for Producer/Archive
interactions - Identifies steps and types of information
exchanged during the negotiation - May be used as a checklist by archives
64CCSDS/ISO Producer-Archive Interface Methodology
Abstract Standard Overview
65Model View of an OAIS Environment
- Producer is the role played by those persons, or
client systems, who provide the information to be
preserved - Management is the role played by those who set
overall OAIS policy as one component in a broader
policy domain - Consumer is the role played by those persons, or
client systems, who interact with OAIS services
to find and acquire preserved information of
interest
66Purpose
- Standardize the relationships and interactions
between an information Producer and an Archive - Abstract Model
- Terms and Concepts
- Define a methodology
- Allows all actions to be structured within this
context - Covers times from first contact by Producer until
all information objects are received by the
Archive - Provide guidance on the specialization of the
methodology to meet the needs of classes of
archives, or of specific archives
67Scope
- Identifies different phases in process of
transferring information between Producer and
Archive - Defines objectives of each phase
- Actions to be carried out during each phase
- Expected results from end of each phase
- General framework able to be re-used for all
processes related to Producer-Archive
interactions - Basis for development of additional related
standards - Basis for development of software tools to assist
in different stages of the interactions between
Producer and Archive
68Applicability
- All archives conformant to OAIS Reference Model
- May be of interest to archives not conformant to
OAIS Reference Model - Relevant to archives holding physical as well as
digital materials
69Methodology Conformance
- When methodology is used by an archive for a
particular archive project (acquiring a set of
information) - Usage conforms when all actions in this standard
have been considered and implemented as
appropriate - When methodology has been specialized or extended
to be a Community Standard, it conforms when - All actions have been considered and incorporated
appropriately, AND - Methodology for creating the Community Standard
has addressed the various work phases defined in
section 4
70Document Organization
- Section1
- Purpose, Scope, Applicability, Conformance
- Section 2
- Overview of methodology, players, their
relationships, activity phases - Section 3
- Detailed analysis of the four phased defined
- Preliminary definition phase
- Formal phase
- Transfer Phase
- Validation Phase
- Section 4
- Work stages leading to a Community Standard
- Annex Overview of OAIS Reference Model
applicable to this standard
71Overview Schematic
72Preliminary Phase Outline
First contact
Information to be archived Digital objects and
standards applied Quantification Object
references Security conditions Legal and
contractual aspects Transfer operations Validation
Schedule Permanent impact on archive Summary of
cost, risks Critical points
Preliminary definition, Feasibility and assessment
Preliminary agreement
73Example Actions
74Formal Phase Outline
75Transfer Phase Actions
76Validation Phase Actions
77Creating a Community Producer-Archive Standard
- Examples of communities creating such a standard
- National or international standards bodies
- National or international organizations
- An individual archive to guide interactions with
its Producers - Work stages to be considered
- Definition of terminology
- Information model for community
- Standards and tools available or required
- Address actions defined in the Abstract Standard
- Best practices
- Broad definition of the community
- Include diverse representation on the writing
committee - Publicize and seek comments from the community
- Submit to standards body as appropriate
78Status
- Track versions of the document from
- http//ssdoo.gsfc.nasa.gov/nost/isoas/us/overview.
html - Register to participate
- Version R-1, April 2003 just released for
formal review - Review Site
- http//www.ccsds.org/review/RPA305/uRPA305.html
- Document
- http//www.ccsds.org/review/RPA305/651x0r1.pdf