Title: Towards smart storage for repository preservation services
1Towards smart storage for repository preservation
services
- Steve Hitchcock, David Tarrant, Adrian Brown1,
Ben OSteen2, Neil Jefferies2 and Leslie Carr - Preserv 2 Project
- School of Electronics and Computer Science,
University of Southampton - 1The National Archives, Kew
- 2Oxford University Library Services
- _at_iPRES 2008 The Fifth International Conference
on Preservation of Digital Objects, London, 29-30
September 2008
2Three-stage strategy for keeping your data safe
- Ability to move data freely, easily and instantly
- OAI, ORE, Atom
- Reliable, trusted large-scale storage
- Open Storage
- Risk profiling invoke a range of selectable
services - Smart storage
3About institutional repositories
- IRs in flux
- Uncertainty in terms of target content -
published papers, theses, research data, teaching
materials - policy, rights, even locus of content
and responsibility for long-term management. - OAI-ORE (Object Reuse and Exchange) effectively
frees the data from being captive to repository
software. - Commercial repository services, from
software-specific services to digital library
services or more general 'cloud' or network
storage services.
- Set up by institutions of higher education and
research to manage and disseminate their digital
intellectual outputs. - IRs are a special type of Web site, typically
based on some repository software that presents a
database of records pointing to the objects
deposited. - The Preserv 2 project is investigating the
provision of preservation services for IRs.
Photo Flickr/cpikas
4 IRs are
- Open source repository softwares
- Open access content
- Open archives using OAI-PMH to share data with
e.g. discovery services. - Open repositories, using OAI-ORE enables the easy
movement of data between different types of
repository software
Photo Flickr/Rightee
5A new openHow open storage supports
preservation services
- Open storage, large-scale storage devices based
on open source software - Open storage averts the need for a repository
layer to access first-class objects these are
objects that can be addressed directly - In turn, these digital objects can be distributed
and/or replicated over many open storage
platforms. - In turn, able to select storage with built-in
preservation support - Resilient storage platforms may be viable for
preservation services aimed at multiple
repositories - E.g. Sun Microsystems STK5800 (codenamed
Honeycomb) - Google Repository
6Smart storage
- Smart storage combines an underlying passive
storage approach with the intelligence provided
through services. - The key to realising smart storage is to enable
the services to communicate and share information
with the digital content sources they may be
acting on. This is done through machine-level
application programming interfaces (APIs) and
protocols.
7APIs, interfaces and the Web architecture
- Major services on the Web, such as deploy their
own simple, but different, APIs, e.g. - Google Maps
- Within the repository community, SWORD (Simple
Web-service Offering Repository Deposit) - Open storage platforms such as Sun's STK5800 and
the Amazon Simple Storage Service (S3) - To take advantage of open storage, repositories
have to be able to talk to these services through
their APIs.
8Smart storage example format services
- Preservation methods affecting formats can be
classified in three stages (seamless flow) - Format identification and characterization (which
format?) - Preservation planning and technology watch
(format risk and implications) - Preservation action, migration, etc. (what to do
with the format) - Format-based services tend to be ad hoc processes
for which some tools are available - E.g. PRONOM-DROID from The National Archives (UK)
- PRONOM is an online registry of technical
information, such as file format signatures - DROID is a downloadable file format
identification tool that applies these
signatures) - These and other tools could be used in a more
coordinated manner.
9Smart storage DROID concept
10Smart storage DROID scheduling/history
- Scheduling interface controls when a DROID
classification needs to be performed. - Preserv 2 has developed a scheduling service that
uses the Darwin Calendar Server and iCalendar
format. - Provides a powerful scheduling service with many
clients already available - Apple iCal, Mozilla
Sunbird, and others - that can read and interpret
the files so that past and future events can be
reviewed.
11Smart storage DROID OAI-PMH interface
- An OAI-PMH interface to open storage discovers
the latest objects to have been deposited and
which are ready for format classification. - Could also be performed by simpler RSS or
Atom-based methods. - The interface has since been expanded to allow
export of OAI-ORE resource maps in both RDF and
Atom formats.
12Smart storage DROID implementation
- E.g. iCal, Outlook, Sunbird
DROID
Scheduler
DROID-OAI harvester
Open storage
Schedule event
Calendar server
OAI-PMH
Repository
History
Is event done?
url, date
Messaging
Atom?
Web server HTTP Stores results of DROID events
User interface
Get results of event
Machine interface, API
Implemented
To be implemented
13- Risk profiling
- The scheduler will invoke actions based on the
results of scanning by DROID allied to
decision-making tools that use intelligence from
planning and technology watch tools, such as - PRONOM,
- Plato preservation planning tool from the
EC-funded Planets project, - and others.
Photo Flickr/yourbartender
14Summary smart storage in the storage scheme
Binary stream
File system need to store multiple streams with permissions
Content addressable adds content validation and object identifiers, metadata required to locate an object
Open adds error correction and recovery, places processing close to storage, solves some bandwidth problems
Smart opens up the close-to-storage approach for application development, transition to 'cloud' storage
How smart storage addresses current storage
issues see full paper
15Storage can become smarter
- Openness, in its various forms, the ability to
move data freely and easily, needs to be
supplemented by decision-making that can be
automated based on the supplied intelligence and
information. - In this way, open storage can become smarter.
- http//preserv.eprints.org/
Thanks to