Title: Implementing PREMIS in Container Formats
1Implementing PREMIS in Container Formats
- Rebecca Guenther, Library of Congress
- rgue_at_loc.goc
- Zhiwu Xie, Los Alamos National Laboratory
- zxie_at_lanl.gov
- ISTs Archiving 2007
- Arlington, VA, May 23, 2007
2OUTLINE
- Introduction
- OAIS Reference Model and Containers
- METS
- MPEG-21 DID
- Implementing PREMIS in METS
- Implementing PREMIS in MPEG-21 DID
- Summary
3Digital preservation imperative and challenge
- More and more of scholarly and cultural record
exists in digital form steps must be taken to
secure its long-term future - Significant progress has been made in raising
awareness about digital preservation imperative - Shift in focus from articulating problem to
solving it - Not so much Why is digital preservation
important, but What must be done to achieve
preservation objectives? - Many practical challenges in implementing
reliable, sustainable digital preservation
programs - One key challenge preservation metadata
4PREMIS Working Group
- June 2003 OCLC, RLG sponsored new international
working group - PREMIS Preservation Metadata Implementation
Strategies - Membership
- gt 30 experts from 5 countries, representing
libraries, museums, archives, government
agencies, and the private sector - Co-Chairs Priscilla Caplan (FCLA), Rebecca
Guenther (LC) - Objective 1 Identify and evaluate alternative
strategies for encoding, storing, managing, and
exchanging preservation metadata - PREMIS Survey Report (September 2004)
- Snapshot of current practices/emerging trends
related to managing and using preservation
metadata in digital archiving systems - http//www.oclc.org/research/projects/pmwg/surveyr
eport.pdf - Objective 2 Define implementable, core
preservation metadata, with guidelines/recommendat
ions for management and use
5PREMIS Data Model
Intellectual Entities
Rights
Agents
Objects
Events
6PREMIS Data Dictionary
- May 2005 Data Dictionary for Preservation
- Metadata Final Report of the PREMIS Working
Group - 237-page report includes
- PREMIS Data Dictionary 1.0
- Context/assumptions, data model, usage examples
- Set of XML schema to support implementation
- Data Dictionary
- Comprehensive view of information needed to
support digital preservation - Guidelines/recommendations to support creation,
use, management - Used Framework as starting point
- Based on deep pool of institutional experiences
in setting up and managing operational capacity
for digital preservation - Received the 2005 Digital Preservation Award (UK)
and 2006 Society of American Archivists
Publication Award
http//www.oclc.org/research/projects/pmwg/premis-
final.pdf
7Some guiding principles
- Implementable, core, preservation metadata
- Preservation metadata maintain viability,
renderability, understandability, authenticity,
identity in a preservation context - Core What most preservation repositories need
to know to preserve digital materials over the
long-term - Implementable rigorously defined supported by
usage guidelines/recommendations emphasis on
automated workflows - Technical neutrality
- Digital archiving system no assumptions about
specific archiving technology, system/DB
architectures, preservation strategy - Metadata management no assumptions about whether
metadata is stored locally or in external
registry recorded explicitly or known
implicitly instantiated in one metadata element
or multiple elements - Promotes flexibility, applicability in wide range
of contexts
8Scope
- What PREMIS DD is
- Common data model for organizing/thinking about
preservation metadata - Guidance for local implementations
- Standard for exchanging information packages
between repositories - What PREMIS DD is not
- Out-of-the-box solution need to instantiate as
metadata elements in repository system - All needed metadata excludes business rules,
format-specific technical metadata, descriptive
metadata for access, non-core preservation
metadata - Lifecycle management of objects outside
repository - Rights management limited to permissions
regarding actions taken within repository
9PREMIS Maintenance Activity
- Web site
- Permanent Web presence, hosted by
- Library of Congress
- Central destination for PREMIS-related
- info, announcements, resources
- Home of the PREMIS Implementers Group (PIG)
discussion list - PREMIS Editorial Committee
- Set directions/priorities for PREMIS development
- Coordinate future revisions of Data Dictionary
and XML schema - Membership Library of Congress, OCLC, FCLA,
National Archives of Scotland, British Library,
National Library of Australia, U. of Goettingen,
LANL, Ex Libris, Library Archives Canada
http//www.loc.gov/standards/premis/
10OAIS Reference Model and Containers
11OAIS Reference Model
- Developed by the Consultative Committee for Space
Data Systems (CCSDS) - ISO 147212003
- A functional model for preservation activities
- An information model specifying types of
information required for long-term preservation
12OAIS Reference Model and PREMIS
- OAIS reference model specifies the Preservation
Description Information (PDI) - PREMIS used the OAIS information model as a
starting point - PREMIS Data Dictionary consolidated and further
developed the conceptual types of information
objects into more than 100 structured and
logically integrated semantic units. - PREMIS Data Dictionary provided detailed
descriptions and guidelines to implement these
semantic units. - PREMIS Data Dictionary does not provide semantic
units for Intellectual Entities, but provides
semantic units to link to other metadata sources
for Intellectual Entities - All entities have reference (identification)
information. - No packaging information that links content
with metadata, but PREMIS can be used with
container schemas - PREMIS deals mostly with representation, context,
provenance, and fixity information, in keeping
with PREMIS definition of preservation metadata.
13PREMIS XML schemas
- One schema for each PREMIS entity in data model
- Allows user to choose which parts of PREMIS to
use - PREMIS container schema
- References schema for each entity type
- Provides a container if it is desirable to keep
some or all PREMIS metadata together - If using container requires at least an object
which in turn requires objectIdentifier and
objectCategory - Individual schemas may used alone or with
container - Semantic units in PREMIS schemas
- XML is faithful to data dictionary
- Only those units mandatory for all categories of
objects are mandatory in object schema
14Need a Container in the XML Implementation
- Archival Information Package (AIP) may include
much more metadata besides the preservation
metadata - A well defined container is usually necessary to
group and appropriately associate these metadata
with the data object - For example METS or MPEG-21 DID
15- METS records the (possibly hierarchical)
structure of digital objects, the names and
locations of the files that comprise those
objects, and the associated metadata - A METS document may be a unit of storage (e.g.
OAIS AIP) or a transmission format (e.g. OAIS SIP
or DIP) - METS is extensible and modular
- METS uses extension wrappers or sockets where
elements from other schemas can be plugged in - METS uses the XML Schema facility for combining
vocabularies from different Namespaces - The METS Editorial Board has endorsed PREMIS as
an extension schema - Many institutions trying to use PREMIS within the
METS context
16The structure of a METS file
17OAIS and METS
ltMETSgt
described by
delimited by
Archival Information Package
Descriptive Information
Packaging Information
identifies
derived from
ltdmdSecgt
MODS MARCXML DC
Preservation Description Information
Content Information
further described by
ltfileGrpgt
ltamdSecgt
Reference Information
ltmdRefgt
Representation Information
Data Object
ltrightsMDgt
Context Information
metsRights premisrights
lttechMDgt
ltfilegt
ltstructMapgt
ltdigiProvMDgt ltsourceMDgt premisevent
Provenance Information
Semantics
Structure
described by
Fixity Information
lttechMDgt
premisobject
File formats
premisobject textMD MIX
Legend Black Arial OAIS Red Times New Roman
METS Primary Schema Blue Times New Roman Italics
Extension Schema
18METS extension schemas
- wrappers or sockets where elements from other
schemas can be plugged in - Provides extensibility
- Uses the XML Schema facility for combining
vocabularies from different Namespaces - Endorsed extension schemas
- Descriptive MODS, DC, MARCXML
- Technical metadata MIX (image) textMD (text)
- Preservation related PREMIS
19Issues in using PREMIS with METS
- Which METS sections to use and how many
- Whether to record elements redundantly in PREMIS
that are defined explicitly in the METS schema - How to record elements that are also part of a
format specific technical metadata schema (e.g.
MIX) - Recording structural relationships
- How to deal with locally controlled vocabularies
- Whether to use the PREMIS container
20PREMIS and METS sections
- Flexibility of METS requires implementation
decisions - You cant put all PREMIS metadata directly under
amdSec - What sections to use for PREMIS metadata?
- Alternative 1
- Object in techMD
- Event in digiProvMD
- Rights in rightsMD
- Agent with event or rights
- Alternative 2
- Everything in digiProvMD
- Alternative 3
- Everything in techMD
- How many administrative MD sections to use?
- Experimentation will result in best practices
21Inserting technical metadata in a METS Document
ltmetsgt ltamdSecgt lttechMDgt ltmdWrapgt
ltxmlDatagt lt!-- insert data from
different namespace here --gt
lt/xmlDatagt lt/mdWrapgt lt/techMDgt
lt/amdSecgt ltfileSec /gt ltstructMap /gt lt/metsgt
22- ltfileSecgtltfileGrpgt
- ltfile ID"FID1" SIZE"184302" ADMID"TMD1PREMIS
TMD1MIX DP1EVENT DP1AGENT CHECKSUM"4638bc65c5b97
15557d09ad373eefd147382ecbf" CHECKSUMTYPE"SHA-1"gt
- ltFLocat LOCTYPE"OTHER" xlinkhref"BXF22.JPG" /gt
- lt/filegtlt/fileGrpgtlt/fileSecgt
- lttechMD ID"TMD1PREMIS"gt
- ltmdWrap MDTYPE"PREMIS"gt
- ltxmlDatagt ltpremisobject gt
ltobjectCharacteristicsgt ltfixitygt
ltmessageDigestAlgorithmgtSHA-1Â lt/messageDigestAlgor
ithmgt ltmessageDigestgt4638bc65c5b9715557d09
ad373eefd147382ecbf - lt/messageDigestgt
ltmessageDigestOriginatorgtEchoDep/me
ssageDigestOriginatorgt lt/fixitygt
ltsizegt184302lt/sizegt lt/objectCharacteristicsgt - Elements defined in both METS and PREMIS
- METS Checksum, Checksumtype
- attribute of ltfilegt
- not repeatable
- PREMIS fixity
- also includes messageDigestOriginator
- allows multiples
23- ltfileSecgtltfileGrpgt
- ltfile ID"FID1" ADMID"TMD1PREMIS DP1EVENT
DP1AGENT MIMETYPE"image/jpeg" - ltFLocat LOCTYPE"OTHER" xlinkhref"BXF22.JPG"/gt
- lt/filegtlt/fileGrpgtlt/fileSecgt
- lttechMD ID"TMD1PREMIS
- ltmdWrap MDTYPE"PREMIS"gt
- ltxmlDatagt
- ltpremisobjectgt
- ltobjectCharacteristicsgt
- ltformatgt
- ltformatDesignationgt
- ltformatNamegtimage/jpeglt/formatNam
egt - Â ltformatVersiongt1.02Â lt/formatVersi
ongt - lt/formatDesignationgtlt/formatgt
- lt/objectCharacteristicsgt
- Elements defined both in METS and PREMIS
- METS MIMETYPE
- attribute of ltfilegt
24- ltfileSecgt ltfileGrpgt
- ltfile ID"FID1" ADMID"TMD1PREMIS TMD1MIX
DP1EVENT DP1AGENT"gt - lttechMD ID"TMD1PREMIS"gt
- ltlinkingEventIdentifiergt
- ltlinkingEventIdentifierTypegtECHODEP Hub
Event - lt/linkingEventIdentifierTypegt
- ltlinkingEventIdentifierValuegtecho12345lt/linki
ngEventIdentifierValuegt - lt/linkingEventIdentifiergt
- ltdigiprovMD ID"DP1EVENT"gt
- Â ltpremiseventgt
- lteventIdentifiergt
- lteventIdentifierTypegtECHODEP Hub Eventlt/e
ventIdentifierTypegt - lteventIdentifierValuegtecho12345Â lt/eventId
entifierValuegt - lt/eventIdentifiergt
- lteventTypegtingestionlt/eventTypegt
- lteventDateTimegt2006-05-02T151253Â lt/eventD
ateTimegtlt/eventgt - Elements defined both in METS and PREMIS
- METS ID/Idref used to associate metadata in
different sections and for different files
25- ltstructMap TYPEphysicalgt
- ltdiv ORDER"1" TYPE"text"gt
- ltfptr FILEID"FID9"/gt
- ltdiv ORDER"1" TYPE"page" LABEL" Page
1"gt - ltfptr FILEID"FID1"/gtlt/metsdivgt
- ltdiv ORDER"2" TYPE"page" LABEL" Page
2"gt - ltfptr FILEID"FID2"/gtlt/metsdivgt
- lt/divgt
- ltrelationshipgt
- ltrelationshipTypegtstructurallt/relationshipTypegt
- ltrelationshipSubTypegtis sibling of
lt/relationshipSubTypegt - ltrelatedObjectIdentificationgt
- ltrelatedObjectIdentifierTypegtUCBlt/relatedObje
ctIdentifierTypegt - ltrelatedObjectIdentifierValuegtFID2lt/relatedOb
jectIdentifierValuegt - ltrelatedObjectSequencegt1lt/relatedObjectSequen
cegt - Elements defined both in METS and PREMIS
- METS structMap
26How to record elements from 2 different technical
metadata schemas
- Format specific metadata may be included in
addition to PREMIS general technical metadata - Use multiple techMD sections and specify source
in MDType attribute and/or namespace declaration - e.g. MDTYPENISOIMG or PREMIS
- Give MIX schema declaration in METS document
- MIX was recently revised to correspond with the
revision of the Z39.87 technical metadata for
digital still images standard names harmonized
with corresponding PREMIS semantic units - For digital still images, best practice may be to
use PREMIS for general semantic units defined in
PREMIS and MIX for format specific units without
redundancy
27MPEG-21 Digital Item Declaration (DID)
- ISO/IEC 21000-2 Digital Item Declaration
- A promising alternative to represent Digital
Objects - Starting to get supported by some repositories,
e.g., aDORe, DSpace, Fedora - A flexible and expressive model that easily
represents compound objects (recursive item) - Attach well-formed XML from persistent namespaces
as metadata - Strong industry support
28Abstract Model for MPEG-21 DID
container grouping of items and
descriptor/statement constructs pertaining to the
container
container
item represents a Digital Item aka Digital
Object aka asset. Descriptor/statement constructs
convey information about the Digital Item
descriptor/statement
item
component binding of descriptor/statements to
datastreams
descriptor/statement
item
resource datastream
component
component
descriptor/statement
resource
resource
resource
descriptor/statement
29Implementing PREMIS in DID
- DID abstract model is an object-centric
containment model - Semantically, Descriptor/statement constructs
under a certain level are the metadata about
that level of DID container or item or component. - Descriptor/statement about the DID container
should be mapped to OAIS packaging information,
therefore out of the PREMIS scope - Rights, Agents, and Events in the PREMIS model
are linked to the objects, but not about the
objects. - However, the PREMIS metadata as a whole
(premispremis), is about an object (the target
of the preservation)
30Mapping
All rights, events, and agents go here. The top
level object goes here. Other objects may be
duplicated here or linked here.
DID
DIDInfo
object1
premispremis
object2
object3
object4
premisobject
premis object
resource
resource
resource
premis object
31Partial Implementation in DID
When metadata are not sufficient to form the top
level PREMIS elements, partial implementation may
be done if PREMIS elements are globally defined.
DID
DIDInfo
object1
premispremis
object2
object3
object4
premissignificantProperties
premis creatingApplication
resource
resource
resource
premis format
32Examples of PREMIS in XML containers
- PREMIS in METS
- Portrait of Louis Armstrong (Library of Congress)
- PREMIS in MPEG DID
- aDORe example (LANL)
33Proposed schema changes for new version
- Define an abstract object type to allow for
better validation of object category
(representation, file, bitstream) - Define elements and types globally to allow for
reuse - Implement an extensibility mechanism to provide
for further structure when needed - Implement a mechanism to use controlled
vocabularies - Adjust schemas to support changes in version 2 of
data dictionary
34Summary container formats
- A container format is needed to package together
all forms of metadata (of which PREMIS is one)
and digital content - Use of a container is compatible with and an
implementation of the OAIS information package
concept - Co-existence with other types of metadata
requires best practices for both approaches
redundancy seems to be preferred - Changes to the next version of the PREMIS XML
schemas will facilitate a phased approach to full
PREMIS implementation - Development of registries (informal or formal)
for controlled vocabularies will benefit
implementation
35Summary METS vs. MPEG 21 DID
- METS and MPEG DID are similar types of container
formats in that both are expressed in XML, both
represent the structure of digital objects, and
both include metadata - MPEG DID doesnt have the segmentation in
metadata sections that METS does, so this
implementation decision need not be made in DID - METS is open source and developed by open
discussion, mainly cultural heritage community - MPEG DID is an ISO standard and has industry
support, but is often implemented in a
proprietary way and standards development is
closed - It would be possible to transform a METS
container to a MPEG DID and vice versa
development of stylesheets will enable
transformations