Title: Persistent Digital Archives and Library System PeDALS
1Persistent Digital Archives and Library System
(PeDALS)
- South Carolina Information Technology Directors
Association - September 8, 2008
- Bill Henry, Matt Guzzi
- SC Department of Archives and History
2Background Last Year
- 2007 NHPRC grant proposal not funded
- AZ Archives submitted multi-state grant proposal
to Library of Congress - AZ proposal had same basic goals
- SC too late for funding
- Paid own expenses to join project
2
3Electronic Archives Funding
- One-time funding from General Assembly
- Digitize paper records
- Capture agency website snapshots
- Purchase hardware and software
- Library of Congress approved additional funds for
project - SC now a fully-funded partner
3
4What is PeDALS?
- Persistent Digital Archives and Library System
- Multi-state grant project funded by the Library
of Congress and the Institute for Museum and
Library Services - Five state partners Arizona, Florida, New York,
Wisconsin, South Carolina - Project will run 18-24 months if successful,
SCDAH intends to continue participation beyond
this period - At the end of the project each partner will have
a functioning digital archives system
4
5Why is PeDALS Needed?
- An increasing number of long-term and archival
records are created and maintained only in
digital formats - Traditional archival practices designed for paper
records wont work in digital environment - Need ability to preserve electronic records so
that we can demonstrate authenticity and protect
integrity - PeDALS is both a learning opportunity and a
chance to implement a functioning system
5
6Technical Goals
- To develop a curatorial rationale that can be
implemented in software to support an automated,
integrated workflow to process collections of
digital records - To build digital stacks storage that has
appropriate controls for preservation and
disaster preparedness
6
7Traditional Curatorial Processes for Paper Records
- Appraisal
- Acquisition
- Arrangement and description
- Housing and storage
- Reference and access
- Preservation
7
8Curatorial Rationale for Digital Records
- Transformation of traditional, paper-based
practices into the digital arena - Focus on the rules, not the records
- Automate the rules
-
-
8
9Digital Stacks
- More than storing the data (CD, tape, disk)
- LOCKSS
- 1. Automatic integrity checking and
- error detection
- 2. Secure
- 3. Geographically distributed
9
10Additional Goals
- To build a community of shared practice that
meets the needs of a wide range of repositories - - For best practices
- - For resource sharing
- To remove barriers by keeping costs as low as
possible
10
11The Open Archival Information System (OAIS)
Reference Model
- OAIS an international (ISO) standard
- Defines minimal set of responsibilities for
long-term preservation - Can be applied to any information or object that
needs to be retained long-term - OAIS does not specify a specific design or
implementation - http//public.ccsds.org/publications/archive/650x0
b1.pdf
11
12View of an OAIS Environment
OAIS (PeDALS)
Producer
Consumer
Management
12
13PeDALS (OAIS) Functional Areas
- Ingest
- Archival storage
- Data management
- Administration
- Preservation planning
- Access
14PeDALS Overview - 1
- Agency records in an electronic records system
are transferred via the Internet to the PeDALS
system - Supplemental processing checks for file integrity
and completeness prior to transfer
15PeDALS Overview - 2
- Agency records with associated metadata are
transferred to middleware server (Microsoft
BizTalk) - Rules-based software will transform records into
format for long-term storage along with a copy
for web access
16PeDALS Overview - 3
- Records are transferred into LOCKSS servers for
long-term preservation - LOCKSS is a dark archives
17PeDALS Overview - 4
- Public access will be provided via the web
- Restricted records will be blocked from public
access
18Technology behind the South Carolina Digital
Archive
19PeDALS Network Architecture
- Agencys will have the ability to login and
upload records to the South Carolina Digital
Archive. - Biz Talk will check the incoming records for
completeness and matches the hash value on
upload.
19
20Archivist Review
- Once records are received the Archivist will
receive an email. - The files will then be reviewed and a high level
description will be entered in the Database
Catalog. - The SIP (Submission Information Package) is
created.
20
21Biz Talk
- This is where the magic happens.
21
22Biz Talk Processes
- DIP (Dissemination Information Package) created.
- The Catalog database is updated with Access,
Description and Preservation Information. - The Archival records are placed on the Manifest
Server for Ingest into LOCKSS. - The public access database is updated.
22
23LOCKSS (Lots of Copies Keep Stuff Safe)
- Based at Stanford University.
- LOCKSS has primarily been used for scientific
journals and publications. - Open Source and uses Open BSD which is a
multi-platform 4.4BSD-based UNIX-like operating
system.
23
24LOCKSS
- Boots from CD No operating system installed on
the server. - Communicates using a VPN virtual private network.
- Files for LOCKSS are stored on a separate Admin
server running linux. - 1 LOCKSS cluster with 7 Servers in our private
distributed LOCKSS network. - Initially setup to take in 1TB of data and can be
expanded.
24
25LOCKSS Storage
- Dark secure archival storage
- LOCKSS is a sophisticated data storage system
that scans for and repairs file corruption and
other data integrity problems - Level 4 firewalls and geographic distribution
provide added security
25
26Public Access Process
- BizTalk Process - AIP (Archives Information
Package). - This process moves records from LOCKSS to the
Public Access web server based on the record
access date.
26
27PeDALS Network Architecture
- Web server will provide Internet access to
records through a web-based search interface. - Access to records restricted by statute or
otherwise will be blocked during restriction
period. - Restricted records are held in the LOCKSS dark
archive no user copy is sent to the web server
until public access is allowed.
27
28Future Public Access
- We are currently in the process of implementing
the web component of Rediscovery. - This will allow the public to search our
holdings. - We are hoping to use Biz Talk to automatic
populate the Rediscovery catalog. - Public access will be granted through URls to the
Rediscovery web component.
28
29PeDALS Open Archival Information System (OAIS)
Network Architecture
29
30Records Eligible for PeDALS
- Permanently valuable electronic records scheduled
for transfer to the SCDAH - Pilot project agencies and records
- Judicial Department Supreme Court Case Files
- Election Commission Voter Registration
Master Files - Public Service Commission Orders
- DHEC Electronic Index to Death Certificates
30
31Project Status
- Core metadata defined and data dictionary
completed - System design completed
- Hardware and software acquired and installed
- Agency partners and records identified
- System prototype built (AZ SC)
- BizTalk training completed
32On the Horizon
- Other states purchase and configure hardware
software - First ingest of records in early winter
- Develop public search website
33Post-Grant
- Move from pilot to production mode
- Develop procedures for agency participation
- Expand participation to additional agencies and
records
33
34PeDALS
- Bill Henry
- Electronic Records Consultant
- henry_at_scdah.state.sc.us
- (803) 896-6137
- Matt Guzzi
- Electronic Records Archivist
- guzzi_at_scdah.state.sc.us
- (803) 896-6103
34