Title: DEVELOPING AN ISO REFERENCE MODEL FOR AN OPEN ARCHIVAL INFORMATION SYSTEM (OAIS) Presentation to Society of American Archivists Annual Meeting Don Sawyer National Aeronautics and Space Administration National Space Science Data Center NASA/Scienc
1ISO Reference Model For anOpen Archival
InformationSystem (OAIS)Tutorial
Presentation Don Sawyer /NASA/NSSDC Lou Reich
/CSC October 2002
2Outline of Talk
- History
- Reference Model overview
- Some Applications
- Follow-on Activities
3NASA Role
- National Space Science Data Center
- NASAs first digital archive
- Experienced many technology changes since 1966
- Consultative Committee for Space Data Systems
- International group of space agencies
- Developed variety of science discipline-
independent standards - Became working body for an ISO TC 20/ SC 13 about
1990 - TC20 Aircraft and Space Vehicles
- SC13 Space Data and Information Transfer Systems
4Initial Archive Standards Proposal
- ISO suggested that SC 13 should develop archive
standards - Address data used in conjunction with space
missions - Address intermediate and indefinite long term
storage of digital data
5Response
- Response to Consultative Committee for Space Data
Systems (CCSDS) and ISO TC 20/SC 13 - No framework widely recognized for developing
specific digital archive standards - Begin by developing a Reference Model to
establish common terms and concepts - Ensure broad participation, including traditional
archives - (Not restricted to space communities all
participation is welcome!) - Focus on data in electronic forms, but recognize
that other forms exist in most archives - Follow up with additional archive standards
efforts as appropriate
6What is a Reference Model?
- A framework
- for understanding significant relationships among
the entities of some environment, and - for the development of consistent standards or
specifications supporting that environment. - A reference model
- is based on a small number of unifying concepts
- is an abstraction of the key concepts, their
relationships, and their interfaces both to each
other and to the external environment - may be used as a basis for education and
explaining standards to a non-specialist.
7Organizational Approach
- Organize US contribution under a framework with
NASA lead - Establish liaison with Federal Geographic Data
Committee (FGDC) and National Archives and
Records Administration (NARA) - Agency archives and users must be represented in
this process - An Open process
- Important to stimulate dialogue with broad
archive/user communities - Results of US and International workshops put on
WEB - Support e-mail comments/critiques
- Broad international workshops also held
- UK and France
- Issue resolution at ISO/Consultative Committee
for Space Data Systems international workshops
8Technical Approach
- Investigate other Reference Models.
- ISO Seven LayerCommunications Reference Model
- ISO Reference Model for Open Distributed
Processing - ISO TC211 Reference Model for Geomantics
- Define what is meant by archiving of data
- Break archiving into a few functional areas
(e.g., ingest, storage, access, and preservation
planning) - Define a set of interfaces between the functional
areas - Define a set of data classes for use in Archiving
- Choose formal specification techniques
- Data flow diagrams for functional models and
interfaces - Unified Modeling Language (UML) for data classes
9Results
- Reference Model targeted to several categories of
reader - Archive designers
- Archive users
- Archive managers, to clarify digital preservation
issues and assist in securing appropriate
resources - Standards developers
- Adopted terminology that crosses various
disciplines - Traditional archivists
- Scientific data centers
- Digital libraries
10Reference Model Status
- Already widely adopted as starting point in
digital preservation efforts - Digital libraries (e.g., Netherlands National
Library) - Traditional archives (e.g., US National Archives)
- Scientific data centers (e.g., National Space
Science Data Center) - Commercial Organizations (e.g., Aerospace
Industries Association preservation working team) - Recently approved for publication as final CCSDS
and ISO (147212002) standards - CCSDS version is available at
- http//www.ccsds.org/documents/pdf/CCSDS-650.0-B-1
.pdf
11Reference Model for anOpen Archival Information
System Technical Overview
12Open Archival Information System (OAIS)
- Open
- Reference Model standard(s) are developed using a
public process and are freely available - Information
- Any type of knowledge that can be exchanged
- Independent of the forms (i.e., physical or
digital) used to represent the information - Data are the representation forms of information
- Archival Information System
- Hardware, software, and people who are
responsible for the acquisition, preservation and
dissemination of the information
13Document Organization
- Introduction
- Purpose and Scope, Applicability, Rationale, Road
Map for Future Work, Document Structure, and
Definitions of Terms - OAIS Concepts and Responsibilities
- High level view of OAIS functionality and
information models - OAIS external environment
- Minimum responsibilities to become an OAIS
- Detailed Models
- Functional model descriptions and information
model perspectives - Preservation perspectives
- Media migration, compression, format conversions,
and access service preservation - Archive Interoperability
- Criteria to distinguish types of cooperation
among archives - Annexes
- Scenarios of existing archives, compatibility
with other standards
14Purpose, Scope, and Applicability
- Framework for understanding and applying concepts
needed for long-term digital information
preservation - Long-term is long enough to be concerned about
changing technologies - Starting point for model addressing non-digital
information - Provides set of minimal responsibilities to
distinguish an OAIS from other uses of archive - Framework for comparing architectures and
operations of existing and future archives - Basis for development of additional related
standards - Addresses a full range of archival functions
- Applicable to all long-term archives and those
organizations and individuals dealing with
information that may need long-term preservation - Does NOT specify any implementation
15Model View of an OAIS Environment
- Producer is the role played by those persons, or
client systems, who provide the information to be
preserved - Management is the role played by those who set
overall OAIS policy as one component in a broader
policy domain - Consumer is the role played by those persons, or
client systems, who interact with OAIS services
to find and acquire preserved information of
interest
OAIS (archive)
Producer
Consumer
Management
16OAIS Responsibilities
- Negotiates and accepts Information from
information producers - Obtains sufficient control to ensure long-term
preservation - Determines which communities (designated) need to
be able to understand the preserved information - Ensures the information to be preserved is
independently understandable to the Designated
Communities - Follows documented policies and procedures which
ensure the information is preserved against all
reasonable contingencies - Makes the preserved information available to the
Designated Communities in forms understandable to
those communities
17OAIS Information Definition
- Information is always expressed (i.e.,
represented) by some type of data - Data interpreted using its Representation
Information yields Information - Information Object preservation requires clear
identification and understanding of the Data
Object and its associated Representation
Information
Interpreted Using its
Yields
Data Object
Representation Information
Information Object
18Information Package Definition
Preservation Description Information
Content Information
- An Information Package is a conceptual container
holding two types of information - Content Information
- Preservation Description Information (PDI)
19Information Package Variants
- Submission Information Package
- Negotiated between Producer and OAIS
- Sent to OAIS by a Producer
- Archival Information Package
- Information Package used for preservation
- Includes complete set of Preservation Description
Information for the Content Information - Dissemination Information Package
- Includes part or all of one or more Archival
Information Packages - Sent to a Consumer by the OAIS
20External Data Flow View
Producer
Submission Information Packages
OAIS
Archival Information Packages
queries
result sets
orders
Dissemination Information Packages
Consumer
21Detailed Models
22Overview of Detailed Models
- It was decided to do both a functional and an
information model of the OAIS - Both models were tasked to
- Use the models to better communicate OAIS
Concepts - Use a well established, formal modeling technique
- Stay as implementation independent as possible
- Avoid detailed designs
23Detailed Models
24General Principles
- Define classes of information objects that
illustrate information necessary to enable
Long-term storage and access to Archives - The class definition should be implementation
Independent - Use a subset of Unified Modeling Language (UML)
25UML Notation
26Information Objects
27Representation Information
- The Representation Information accompanying a
physical object, like a moon rock, may give
additional meaning - It typically is a result of some analysis of the
physically observable attributes of the rock - The Representation Information accompanying a
digital object, or sequence of bits, is used to
provide additional meaning. - It typically maps the bits into commonly
recognized data types such as character, integer,
and real and into groups of these data types. - It associates these with higher level meanings
which can have complex inter-relationships that
are also described
28Recursive Nature ofRepresentation Information
- Structure Information
- Semantic Information
- Other Representation Information
29Types of Information Used in OAIS
30Content Information
- The information which is the primary object of
preservation - An instance of Content Information is the
information that an archive is tasked to
preserve. - Deciding what is the Content Information may not
be obvious and may need to be negotiated with the
Producer - The Data Object in the Content Information may be
either a Digital Object or a Physical Object
(e.g., a physical sample, microfilm)
31Preservation Description Information
- Provenance Information
- Describes the source of Content Information, who
has had custody of it, what is its history - Context Information
- Describes how the Content Information relates to
other information outside the Information Package - Reference Information
- Provides one or more identifiers, or systems of
identifiers, by which the Content Information may
be uniquely identified - Fixity Information
- Protects the Content Information from
undocumented alteration
32Preservation Description Information
33Descriptive Information
- Contain the data that serves as the input to
documents or applications called Access Aids. - Access Aids can be used by a consumer to locate,
analyze, retrieve, or order information from the
OAIS.
34Packaging Information
- Information which, either actually or logically,
binds and relates the components of the package
into an identifiable entity on specific media - Examples of Packaging Information include tape
marks, directory structures and filenames
35OAIS Archival Information Package
Archival Information Package (AIP)
Packaging Information
Package Description
delimited by
derived from
e.g., How to find Content information and PDI
on some medium
e.g., Information supporting customer searches
for AIP
Preservation Description Information (PDI)
Content Information
further described by
e.g., Hardcopy document Document as an
electronic file together with its format
description Scientific data set consisting
of image file, text file, and format
descriptions file describing the other files
e.g., How the Content Information came into
being, who has held it, how it relates to
other information, and how its integrity is
assured
36AIP Types
- Based on the difference in Content Object
complexity - AIUs contain a single Data Object as the Content
Object - AICs contain multiple AIPs in their Content
Objects - Each member of an AIC is an AIP containing
Content Information and PDI - The AIC contains unique PDI on the collection
process
37Package Descriptors and Access Aids
- Package descriptors are needed by an OAIS to
provide visibility and access to the OAIS
holdings - Package Descriptors contain 1 or more Associated
Descriptions which describe the AIP Content
Information from the point of view of a single
Access Aid - Some example of Access Aids Include
- Finding Aids - assist the consumer in locating
information of interest - Ordering Aids - allow the consumer to discover
the cost of and order AIUs of interest - Retrieval Aids - enable authorized users to
retrieve the AIU described by the Unit Descriptor
from Archival Storage
38Information Model Summary
- Presented a model of information objects as
containing data objects and representation
objects - Classified information required for Long-term
archiving into 4 classes Content Information,
PDI, Packaging Information and Descriptive
Information - Described how these classes would be aggregated
and related in an AIP to fully describe an
instance of Content Information - Presented information needed for Access, in
addition to that needed for Long-term
Preservation - Put the Access oriented structures in the context
of the other data needed to operate an OAIS
39Detailed Models
40General Principles
- Highlight the major functional areas important to
digital archiving - Use functional decomposition to clarify the range
of functionality that might be encountered - Don't decompose beyond two levels to avoid
becoming too implementation dependent - Provide a useful set of terms and concepts
- Do not imply that all archives need to implement
all the sub-functions - Identify some common services which are likely to
be needed, and are assumed to be available, as
underlying support
41Common Services
- Modern, distributed computing applications assume
a number of supporting services - Examples of Common Services include
- inter-process communication
- name services
- temporary storage allocation
- exception handling
- security
- file and directory services
42OAIS Functional Entities
Preservation Planning
P R O D U C E R
C O N S U M E R
Data Management
Descriptive Info.
Descriptive Info.
queries
result sets
Ingest
Access
orders
SIP
DIP
AIP
AIP
Archival Storage
Administration
MANAGEMENT
SIP Submission Information Package AIP
Archival Information Package DIP Dissemination
Information Package
43Functional Entities In An OAIS
- Ingest This entity provides the services and
functions to accept Submission Information
Packages (SIPs) from Producers and prepare the
contents for storage and management within the
archive - Archival Storage This entity provides the
services and functions for the storage,
maintenance and retrieval of Archival Information
Packages - Data Management This entity provides the
services and functions for populating,
maintaining, and accessing both descriptive
information which identifies and documents
archive holdings and internal archive
administrative data. - Administration This entity manages the overall
operation of the archive system - Preservation Planning This entity monitors the
environment of the OAIS and provides
recommendations to ensure that the information
stored in the OAIS remain accessible to the
Designated User Community over the long term even
if the original computing environment becomes
obsolete. - Access This entity supports consumers in
determining the existence, description, location
and availability of information stored in the
OAIS and allowing consumers to request and
receive information products
44Ingest Data Flow Diagram
45Preservation Planning
46Preservation Perspectives
47Migration Context
48Digital Migration
- Digital Migration is defined to be the transfer
of digital information, while intending to
preserve it, within the OAIS. - Focus on preservation of the full information
content - New information implementation replaces the old
- OAIS has full control and responsibility over all
aspects of the transfer - Three major motivators are seen to drive Digital
Migrations of Archival Information Packages
within an OAIS - Media Decay
- Increased Cost Effectiveness
- New Consumer Service Requirements
49Digital Migration Approaches
- Four primary types of digital migration in
response to motivators, ordered by increasing
risk of information loss - Refreshment
- Media replacement with no bit changes
- Replication
- No change to Packaging Information or Content
Information bits - Repackaging
- Some bit changes in Packaging Information
- Transformation
- Reversible Bit changes in Content Information
are reversible by an algorithm - Non-reversible Bit changes in Content
Information are not reversible by an algorithm
50Access Preservation
- Effective access to digital information requires
the use of software - Application Programming Interfaces (APIs) may be
cost-effectively maintained across time by an
OAIS when - API is not too complex
- API is applicable to a wide variety of AIUs
- API source code may be ported to new environments
- Extensive testing is needed to ensure against
information loss - Preservation of executables by full emulation of
underlying hardware is problematic - Hard to know what is the information being
preserved - May not be possible to fully emulate associated
devices
51Archive Interoperability
52Archive Interoperability Motivators
- Users of multiple OAIS archives have reasons to
wish for some interoperability or cooperation
among the OAISs. - Consumers
- Common finding aids to aid in locating
information over several OAIS archives - Common Package Descriptor schema for access
- Common DIP schema for dissemination, or a single
global access site. - Producers
- common SIP schema for submission to different
archives - a single depository for all their products.
- Managers
- Cost reduction through sharing of expensive
hardware increasing the uniformity and quality of
user interactions with the OAIS
53Categories of Archive Interactions
- Independent no knowledge by one OAIS of
Standards implemented at another - Cooperating Potentially common submission
standards, and common dissemination standards,
but no common access. One archive may make
subscription requests for key data at the
cooperating archive - Federated Access to all federated OAIS is
provided through a common set of access aids that
provide visibility into all participating OAISs.
Global dissemination and Ingest are options - Shared resources An OAIS in which Management has
entered into agreements with other OAISs is to
share resources to reduce cost. This requires
various standards internal to the archive (such
as ingest-storage and access-storage interface
standards), but does not alter the communitys
view of the archive
54Federated Archives
Local
Consumer
Dissemination Information Package
(Optional)
Access
Access
Global
Consumer
Common Catalog
Administration
Administration
OAIS 2
Ingest
Access
Access
Local
Consumer
Dissemination Information Package
(Optional)
55Levels of Autonomy in Associated Archives
- No interactions and therefore no association
- Associations that maintain your autonomy. You
have to do certain things to participate, but you
can leave the association without notice or
impact to you. - Associations that bind you by contract. To
change the nature of this association you will
have to re-negotiate the contract. The amount of
autonomy retained depends on how difficult it is
to negotiate the changes.
56Reference Model Summary
- Reference model is to be applicable to all
digital archives, and their Producers and
Consumers - Identifies a minimum set of responsibilities for
an archive to claim it is an OAIS - Establishes common terms and concepts for
comparing implementations, but does not specify
an implementation - Provides detailed models of both archival
functions and archival information - Discusses OAIS information migration and
interoperability among OAISs
57Some Applications
58Basis of Systems Architectures
- NEDLIB (Networked European Deposit Library)
effort used OAIS Reference Model as a basis for
the design and architecture of Deposit System for
Electronic Publications (DSEP) - National Library of Australia used it as basis
for their implementation - CEDARS A multi-site UK project to create
exemplars in Digital Archiving is using OAIS
representation data as the basis for research
into long term preservation - NSSDC (National Space Science Data Center ) is
evolving their archive using OAIS RM as a basis
for a new architecture - SIPAD French space agency plasma physics archive
used the OAIS as a basis for design - METS (Metadata Encoding and Transmission
Standard) is using OAIS concepts in an
implementation of types of Submission, Archival,
and Dissemination Information Packages. - InterPARES, a body of National Archives from many
countries, adopted OAIS as a starting point for
their modeling work
59Enhanced Communications and Productivity among
varied Communities
- National Archives and Records Administration
contracted some work on long term preservation of
collections to the San Diego Super Computer
Center. Both parties claimed use of the OAIS RM
saved several weeks of effort in the
specification of the task - Similar experiences between
- National Library of France and French space
agency (CNES) representatives - National Center for Supercomputer Applications
HDF format developers and DNA researchers - Life Sciences Archive developer and micro-gravity
researchers - United States Department of Agriculture and
digital preservation experts
60More OAIS Accomplishments
- Royal Library of the Netherlands (RLN)
- OAIS mandated in their implementation RFP
- IBM implementing OAIS-based system for RLN (5M
project) - British National Library is following suit
- France setting up a working group within ARISTOTE
- interested in archive of digital information,
including libraries and Dept of Justice. - http//www.aristote.asso.fr/ (in french)
- astonishing unifying role from OAIS reference
model - OAIS likely to be used by CODATA archive task
group in study on long-term preservation - Playing significant role in Research Libraries
Group and OCLC (Online Computer Library Center)
digital preservation work -
61Follow-on Activities
- Research Libraries Group has established a web
page to track OAIS implementation efforts and
issues - http//www.rlg.org/longterm/oais.html
- CCSDS/ISO Producer-Archive Interface Methodology
Standard - Provides framework for Producer/Archive
interactions - Identifies steps and types of information
exchanged during the negotiation - May be used as a checklist by archives
- CCSDS Certification Coordination Function
- Will track and summarize various archive
certification efforts - Will attempt to extract high-level
model/checklist - RLG is organizing a group to establish
certification approaches