Title: Defining File Format Obsolescence: a risky journey
1Defining File Format Obsolescencea risky
journey
- David Pearson
- APSR Project Manager
- National Library of Australia
- dapearso_at_nla.gov.au
2The Problem
- Format obsolescence is potentially a major
problem for every repository manager. This is
particularly true given - Ever-increasing volume of digital material.
- Plethora of file formats.
- Dynamic nature of computing environments.
- Rapid and unpredictable drivers that cause
formats to become obsolete. - High business value of the specific content of
some digital materials or collections can result
in policies that mandate that access be
maintained to this data for extended periods of
time. - Repository managers need help to manage the
quantity and diversity of file formats and their
obsolescence risks.
3AONS II
- APSR Project Objective 2006 was to
- refine the Automatic Obsolescence Notification
System (AONS) developed in an earlier stage of
APSR, to a platform-independent downloadable tool
that automatically provides information from
authoritative international registries to support
decisions on preservation action required to
retain access to information resources stored in
repositories. - However,
- the international target registries could not
provide machine-harvestable risk metrics. - in the context of AONS II we had to come up with
another way of quantifying file format risk.
4Precedents
- A number of different paradigms have informed our
thinking on the nature of File Format
Obsolescence - The performance model developed by the National
Archives of Australia. - The view-path model developed by the Koninklijke
Bibliotheek (National Library of the
Netherlands).
5File Format Obsolescence
- There are two predominant factors which may
impede the retrieval of digital information.
Access to - The physical storage medium.
- The logical file content.
(Dinosaurs, media and image courtesy of National
Archives of Australia).
6Some initial thoughts onobsolescence
- We are not making judgments about which formats
should be used. - Similarly, we are not making judgments based on
how hard a format will be to deal with once
preservation action is needed. - We should not only look for indicators of
obsolete formats, but also obsolescence in
formats. - Risk is about the impending loss of the means of
providing access. -
- The same format may well have different levels of
obsolescence risk in different repositories. -
- It is perfectly reasonable to take into account
that there may be more than one means of
providing access to a file. - The purpose of obsolescence risk assessment is to
inform decisions about the need to take action. - We are not about to be overwhelmed by the
juggernaut of technical change.
7- The risk assessment questions must
- seek answers that will indicate the likely stage
of obsolescence for a file format (in a specific
real world repository). - As a consequence of having to cater for
potentially thousands of possible file formats,
the questions need to be generic and somewhat
simplistic. - The questions still aim to allow a repository
owner to build specific risk profiles of an
individual file format. - The risk questions are classified into two
general groups - Community questions (which should be answerable
by reference the digital preservation community). - Repository view-path questions (which relate
specifically to an individual environment and
depend on the sustained availability of
combinations of software and hardware).
8Community Questions
- At a community level, the questions assume
certain generic information might serve as useful
indicators - The current level of support for rendering the
format. -
- How long it has been since the format version was
first released. -
- How many versions have been released since that
time. - The range of view-paths that could be used for
acceptable presentation of content.
9Step. 1 - Community Information Questions
10Step. 1 - Community Information Questions
11Step. 1 - Community Information Questions
12Step. 1 - Community Information Questions
13Step. 1 - Community Information Questions
14Step. 1 - Community Information Questions
15Step. 1 - Community Information Questions
16Step. 1 - Community Information Questions
17Step. 1 - Community Information Questions
18Repository Questions
- At a local repository level, the questions assume
that it is possible for a repository manager to
determine whether required view-paths for access
are locally available and workable. - Other issues where subjective judgments may be
needed include - Decisions about how much notice is needed in
order to take manageable action. - The degree of rendering difficulty that the
repository owner and users are willing to bear. - The degree of loss that is acceptable.
- What constitutes a base format unlikely to
require repeated assessment (because it can be
expected to be readable in all expected computing
environments). - Whether there may be other sources of information
worth checking for indications of a looming
accessibility problem.
19Step 2 - Collection/Repository Information
Questions
20Step 2 - Collection/Repository Information
Questions
21Step 2 - Collection/Repository Information
Questions
22Step 2 - Collection/Repository Information
Questions
23Step 2 - Collection/Repository Information
Questions
24Step 2 - Collection/Repository Information
Questions
25Step 2 - Collection/Repository Information
Questions
26Some further thoughts onObsolescence
- Some interesting points have already arisen in
trying to apply the questions - The approach tries to identify the need for a
decision to take preservation action. We take
action in order to regain or maintain access. - A file format is heading for obsolescence when a
large part of the community of users cannot
access it, or have decided to move content away
from it. - Obsolescence may begin with inconvenience to
users and ends in the digital black hole of loss. - Not all file formats are created equal.
- Open source renderers are a good thing, but they
may not obviate the need to take preservation
action. - Not all repository environments are maintained
equally.
27Next Steps?
- The usefulness of these questions depends on
there being a community to - share the output.
- Next steps
- Develop questions further into an acceptable
standard (with partners). - Develop and quantify risk metrics (machine- and
human-harvestable). - Develop automated workflows (usable by any
application). - Develop a mechanism to share metrics (such as a
exporting results to a central web service
external voting system). -
-
28Questions?
- (Dinosaurs, media and image courtesy of National
Archives of Australia).