Title: IAC Digital Preservation Committee ________________________________________________
1IACDigital Preservation Committee_______________
_________________________________
- 10 April 2007
- Yale University Library
10 April 2007
2IAC Digital Preservation Committee_______________
_________________________________
- Outline
- Charge members.
- Accomplishments
- Policy
- Best practices
- Whats next
10 April 2007
3IAC Digital Preservation Committee_______________
_________________________________
- The DPC is an Integrated Access Council committee
charged to - Develop a digital preservation program by
evaluating, compiling, documenting and
articulating policies, procedures, best practices
and systems in order to establish a digital
preservation infrastructure at Yale University
Library. - Work from a base of clearly articulated policies,
then focus on preservation program planning and,
finally, make recommendations for program
implementation through digital preservation
projects, initiatives, and system development.
10 April 2007
4IAC Digital Preservation Committee_______________
_________________________________
- Members
- Rebekah Irwin, BRBL
- David Gewirtz, ILTS/AMT
- Kevin Glick, MSS/A
- Audrey Novak, ILTS (Co-Chair)
- Bobbie Pilette, Preservation (Co-Chair)
- E.C. Schroeder, BRBL
- Former members
- Ann Green, ILTS/ITS, Co-Chair
- Nicole Bouche, Beinecke Library
- Gretchen Gano, Social Science Library
10 April 2007
5IAC Digital Preservation Committee_______________
_________________________________
- Accomplishments
- Published a Digital Preservation policy that
establishes a mission statement and promulgates
preservation policies for institutional standards
governing the quality, type and source of digital
assets to be archived in the repository (revised
Feb 2007). - Published best practices addressing Local
practice for implementing PREMIS Preservation
Strategies Persistent Identifiers Fixity
(checksums, message digest and digital
signatures) Format Registries Encoding
Transmission of Structured Metadata and Care and
Handling of Originals. - Modeled an organizational structure for the
ongoing coordination and management of digital
preservation. This structure recognizes that the
responsibility for the creation and
administration of digital preservation services
at Yale is shared by three services Metadata,
Repository and Preservation.
10 April 2007
6Digital Preservation Best Practices
________________________________________________
- Digital preservation does not have established
and vetted standards. - Issues and problems associated with preserving
digital resources are - numerous, complex and dynamic. DPC best practices
are an effort to - parse the larger digital preservation problem
space into discrete issues and - to identify processes, activities and/or
methodologies that are emerging as - standards. This work by the DPC is by no means
finished. More work is - required to establish additional best practices
for the myriad of related - topics and to keep these recommendations current
with the latest - thinking and research in this field. Note, too,
that although informed by - research, most of these best practices are
untested in production - preservation archives.
10 April 2007
7Best Practice Care Handling of Physical
Collections ______________________________________
__________
- White paper to advise Library staff on how to
protect originals during digital conversion.
Available on the web site for easy access - Sections include
- Assessment of Physical Collections
- Criteria for Selecting Proper Scanning Equipment
- Preparing the Scanning Surface
- Specifications for Scanning
- Handling Procedures for Library Materials
10 April 2007
8Care Handling of Physical Collections,
continued ________________________________________
________
- Assessment of Physical Collections
- Important to include Preservation Department
contact Tara Kennedy, Field Service Librarian - List of questions to ask before scanning an
object - Criteria for Selecting Proper Scanning Equipment
- Describes available equipment and appropriate use
- Indicates which materials can be scanned safely
on each type of equipment - Preparing the Scanning Surface
- How to clean the scanning surface (flatbed)
10 April 2007
9Care Handling of Physical Collections,
continued ________________________________________
__
- Specifications for Scanning
- Illumination levels and types,
- Proper supports for bound materials,
- Environmental considerations (dust, temperature,
relative humidity) - Handling Procedures for Library Materials
- Mostly common sense reminders, but also
specific suggestions, e.g. oversized materials - Includes paper-based, multimedia (sound, film,
historical, optical), objects
10 April 2007
10Best Practice - Fixity ___________________________
_____________________
- Fixity, in preservation terms, means that the
digital object has not been changed between two
points in time or events. - Fixity checks such as checksums, message digests
and digital signatures are used to verify a
digital objects fixity. - Information created by these fixity checks,
provides evidence for the integrity and
authenticity of the digital objects and are
essential to enabling trust.
10 April 2007
11Fixity, continued ________________________________
________________
- Fixity checks are all used in the same basic way.
A value is initially generated and saved. Then,
in response to an event (e.g., ingest) or over
time, it is recomputed and compared to the
original to ensure the object (file or bitstream)
has not changed. - All fixity checks are not the same.
- Checksums are the simplest and least reliable
method. They are typically used in
error-detection to find accidental problems in
transmission and storage. They do not account for
such changes as the re-ordering of bytes or
changes that cancel one another out.
10 April 2007
12Fixity, continued ________________________________
________________
- Message digests are more secure. They are
computed by applying a more complex algorithm to
the file of any length to produce a unique,
short, uniform length character string. Change
one pixel or one note in the file and the message
digests will be completely different. (Ex
93326bff6636655dcd6abff18ed2de997). - Digital signatures combine message digests with
encryption. The message digest is created and
then encrypted using a private/public key pair.
10 April 2007
13Fixity, continued ________________________________
________________
- Current best practice for digital preservation
- repositories
- The creation of message digests using two
algorithms, MD5 and SHA-1. - These are implemented in the widely used JHOVE
format identification, validation and
characterization application (e.g, in the Rescue
Repository before and after ingest).
10 April 2007
14Best Practice Format Registries and Tools
________________________________________________
- What is a Format?
- A technical specification describing a standard
encoding or representation of digital content
stored in a file. - A file format extension such as .jpg indicates
the encoded content is a digital image. - File encoding standards are used by programs to
read the encoded information and present useable
content of the file to a users monitor or
another output device.
10 April 2007
15Format Registries ________________________________
________________
- What is a Format Registry?
- A database that stores information about the
technical specifications of an electronic files
format. - Format registries record file format changes over
time so that files remain readable in the face of
technological obsolescence to a format standard. - How does a format registry work?
- Global Digital Format Registry
-
10 April 2007
16File Format Tools ________________________________
________________
- File format identification validation tools
- answer two questions
- How can we tell a file's type?
- If we know its type, how can we be sure that it
conforms to its format specification so that we
know it is still useable? -
10 April 2007
17File Format Tools ________________________________
__________
- JHOVE A widely used tool file type
identification, validation and characterization
tool developed by Harvard Univ. Library JSTOR. - Handles many format types, (e.g., AIFF, ASCII,
BYTESTREAM, GIF, HTML, JPEG, JPEG2000, PDF, TIFF,
UTF8, WAV, XML.) - Is configurable in many respects, including the
option to select full validation or short
mode, in which only the headers signature is
analyzed the ability to include or exclude
message digests in the output and to choose from
various output formats, including plain text and
XML. - Because JHOVE does both file type identification
as well as validation, it is currently Yale
University Librarys format-related tool of
choice. -
10 April 2007
18File Format Tools ________________________________
_______________
- Other tools
- DROID (Digital Record Object Identification) A
file type identification tool developed by the
Digital Preservation Department of the National
Archives of the United Kingdom, to perform
automated batch file format identification, using
the PRONOM registry . - National Library of New Zealand Preservation
Metadata Extract Tool A tool that extracts
metadata from file headers. This Java tool uses
adapters to extract metadata from filetypes
including MS Word, Word Perfect, Open Office, MS
Works, MS Excel, MS PowerPoint, TIFF, JPEG, WAV,
MP3, HTML, PDF,GIF, and BMP. This data is output
in a standard XML format.
10 April 2007
19Best Practice Persistent Identifiers
__________________________________________
- A persistent identifier (PI) is a unique name
(identifier) associated with an internet resource
that provides a link to the content and persists
over changes of server location, ownership, and
other state conditions. - A location (e.g., a given URL) is not a
persistent identifier if the content moves to
another location.The principal problem addressed
by PIs is Broken links to internet resources,
i.e., the HTTP 404 Error Document not found. - Persistent identification is not possible without
an associated service. It is the service that
supports persistence. The identifier takes you to
the service, the service resolves to the object.
- Optimally a PI should be created and assigned
when the digital object is created.
10 April 2007
20Best Practice Persistent Identifiers
__________________________________________
- Several technologies are available to create
persistent identifiers such as - CNRI Handle System A generic system for
assigning names to objects and resolving them.
Key is the Global Handle Registry which manages
the namespace of all handle prefixes. - DOI (Digital Object Identifier) - An application
of the CNRI Handle System that associates
intellectual property to structured metadata. A
typical use of a DOI is to give a scientific
paper or article a unique identifying number that
can be resolved through the DOI resolver or the
CNRI global handle resolver. - PURL A Persistent Uniform Resource Locator is a
URL that describes an intermediate (and more
persistent) location which when retrieved results
in a standard HTTP redirect to the current
location of the resource.
21Persistent Identifiers - Handle Server
________________________________________________
- The implementation of a CNRI handle server at YUL
is tightly coupled to the implementation of the
VITAL/Fedora Digital Repository Service. - Digital objects within the Digital Repository
Service will have handles such as - http//moonpie8085/fedora/get/hdl10079.2F
-2103288706 (opaque), or - http//hdl.rutgers.edu/1782.1/SPCOLSMAPS.Ma
p.b1849 (semantic) - A handle server, like a web server, requires
ongoing system administration, e.g., when
resources are moved. - Continuing research in the assignment of handles
to resources in other YUL repositories such as
the Rescue Repository, Image Commons
(DL/Insight), etc. - /
10 April 2007
22Best Practice - Maintenance Strategies
________________________________________________
- A1. Clear Allocation of Responsibilities
- A2. Provision of the appropriate technical
infrastructure - A3. Establishment implementation of a plan for
system maintenance, support and replacement - A4. Establishment implementation of plan for
regular transfer of records to new storage media - A5. Adherence to appropriate storage and handling
conditions for storage media - A6. Ensuring redundancy and regular backup
- A7. Establishment of system security
- A8. Disaster planning
10 April 2007
23Best Practice - Preservation Strategies
________________________________________________
- B1. Use of standards
- B2. Data extraction and structuring
- B3. Encapsulation
- B4. Restricting the range of formats to be
managed - B5. Technology preservation
- B6. Reliance on backward compatibility
- B7. Migration
- B8. Software re-engineering
- B9. Viewers and migration at the point of
access - B10. Emulation
- B11. Non-digital approaches
- B12. Data restoration
10 April 2007
24Best Practice - PREMIS ___________________________
_______________
- PREservation Metadata Implementation Strategies
- Yale Working Group
- Matthew Beacom, Metadata Librarian, Catalog and
Metadata Services (Co-chair) - Rebekah Irwin, Catalog Librarian for Digital
Projects, Beinecke Library (Co-chair) - Youn Noh, Digital Resources Catalog Librarian,
Catalog and Metadata Services - George Ouellette, Senior Programmer Analyst,
Library ILTS - David Walls, Preservation Librarian, Library
Preservation Dept - Yale Advisory Group
- Reed Beaman, Associate Director for Biodiversity
Informatics, Peabody Museum - Lee Faulkner, Media Director, Digital Media
Center for the Arts - David Gewirtz, Project Manager, Library Projects,
ITS - Kevin Glick, Electronic Records Archivist,
Manuscripts and Archives - Edward Kairiss, Director, Instructional Computing
Instructional Technology, ITS - Daniel Lee, E-Publishing/Internet Marketing
Manager, Yale University Press - Thomas Raich, Associate Director, Information
Technology, Art Gallery
10 April 2007
25Best Practice - PREMIS ___________________________
____________________
- Outcome
- Develop PREMIS profiles that match specific
digital collection and administrative needs - Base profile (up to 6 elements) This base
profile of elements would support digital
preservation of a wide range of digital assets - Full profile (over 200) This full profile would
provide guidance to administrators of digital
information assets acting as trusted custodians
of material deemed to be of long-term value
10 April 2007
26Best Practices - Summary _________________________
_______________________
- Most of these best practices are the outcome of
current research projects. - Few are tested in production preservation
repositories. - At Yale the Rescue Repository is becoming a local
testbed. - Fixity MD5 and SHA-1 message digests
- JHOVE file format identification and validation
- Maintenance strategies
- PREMIS base profile element set.
- VITAL/Fedora Digital Repository Service
implementation - Persistent identifiers through the CNRI Handle
System.
10 April 2007
27Whats Next______________________________________
__________
- Goals
- Creation of a Transition Team to continue the
work of the DPC, and most importantly, within a 6
month timeframe, create the roadmap for the
implementation of the permanent management model
for an ongoing digital preservation program. - The recommended structure consists of a core team
representing 2FTE comprised of staff with
expertise in metadata, repository and
preservation services. It is modeled as a
virtual Digital Curation Center (DCC). The DCC
will put into practice the identified best
practices and the Digital Repostiory Service
(DRS) Preservation Archive. - The Transition Team will prepare a business plan
for the Digital Curation Center. The business
plan will identify the DCCs Vision, mission,
goals and first year deliverables Staffing
models Budget and Timeline for creation.
10 April 2007
28IAC Digital Preservation Committee
________________________________________________
- Website
- http//www.library.yale.edu/iac/dpc.html
10 April 2007