Title: Getting
1Getting Digi with itat the Digital Archives
- Adam Jansen
- Deputy State Archivist
- Washington State Archives
2What is Archiving in the Electronic Age?
- Protecting machine readable records of enduring
legal, historical or fiscal value from loss,
alteration, deterioration and technological
obsolescence in a environment independent from
that which produced the record.
3Mission of the Digital Archives
- Collect electronic records of enduring legal,
historical or fiscal value - Maintain these records in perpetuity in a useable
state for the good of the public - Make records that are discloseable accessible to
the public
4Public Records
- As defined in RCW 40.14
- ANY records that have been made by or received
by any agency of the state of Washington in
connection with the transaction of public
business
5Redefining Public
- Avg over 650 researchers per day
- Avg length of stay over 6 minutes
- 6 .gov - 4 .edu - 1 .org
- 13 came from Internet Search (Google, MSN,
Yahoo) - Researchers from 131 foreign countries
- Researchers from
- Canada, US Military, Romania, Germany, France,
Australia, Japan, UK, Netherlands, Russia,
Thailand, Portugal, Belgium, Poland, Italy,
Indonesia, Singapore, Sweden, Mexico, New
Zealand, Czech Republic, Hungary, Brazil, Norway,
Columbia, Austria, Greece, Bulgaria, China,
Yugoslavia, Philippines, Spain, South Korea,
Denmark, Oman, Pakistan, South Africa, Jamaica,
Switzerland
6Records and Informationor, Why we do what we do
- If - Information is power
- And - Records are storage of information
- Then Records must be preserved for future
generations - Why?
- The foundation of democracy in America is
government accountability to the people
7What are the challenges (or why is it so
hard!?!?!?)
- Socio-political
- Resistance to change
- Inability to keep pace
- Technology
- Ever upwards and onwards
- Little thought on looking back
8(No Transcript)
9New Federal Mandates to Manage Certain
Electronic Records
- As electronic records become more integrated into
society, producers of those records will be held
to higher standards of conduct - Health Insurance Portability Accountability Act
of 1996 (HIPAA) - Gramm-Leach-Billey Act of 1999
- Patriot Act of 2001
- Sarbanes-Oxley Act of 2002
- Help America Vote Act of 2002 (HAVA)
- More mandates to come
- Records must be managed and destroyed
methodically in normal course of business
10Shifting Media
- Historically records were stored on paper, kept
in filing cabinets - When the cabinet was full, records sent to file
room - Now records stored electronically on computers
- When the computer is full add more hard
drives - Basic skills to manage and maintain records has
been lost, replaced by infinite storage
11- So the question becomes who takes care of the
records, and do they have the knowledge?
12Why a Digital Archives?
- Comply with statutory regulatory mandates.
- The Law requires preservation of certain public
records it doesnt specify whether those
records are paper or electronic. All records
must be given the same care. - Avoid loss of legal historical records
- As technology changes, the older media (5 ¼
floppy disks, for instance) become harder to
read. - Preserve rare and at-risk paper records
- Centralize Records
- Centralization means uniformity in maintenance
- Trained professionals serve as caretakers
- Improved access for citizens
- By centralizing historical electronic records in
one location, one-stop shopping will provide
the information quicker and easier
13What the Digital Archives is not
- Not mass storage for active business applications
data - Not remote back-up for state local government
networks data
14The Digital Archives will
- Preserve electronic records with long-term legal,
historical and/or fiscal significance - Assure platform-neutral retrieval 50, 100, or
more years from now - Provide security back-up of certain permanent
electronic legal records (courts, vital records,
land records, etc.)
158 Requirements for Preservation
- Readable
- Retrievable
- Intelligible
- Encapsulated
- Reconstructible
- Identifiable
- Understandable
- Authentic
- From Authentic Electronic Records by Charles
Dollar
Hardware
File Format
Content Management
16Hardware
- File Room of the 21st century
- Capacity and Speed double every 18 months
- Many choices
- Tape
- Optical
- Hard Drives
- First Immutable Law of Digital Archiving
- What hardware you use today will be obsolete
within four years
17Archival Software Formats
- Native
- ASCII
- TIF
- PDF/A
- XML
- Whenever possible seek the
- open standard solution!
- Remember WordStar and DBase II ???
18Content Management
- Essential to maintain control of the information
explosion - Allows hard coded rules and information exchange
- BUT still requires a strong knowledge,
understanding and implementation of basic records
management - Second Immutable Law of Digital Archiving
- Data is Data, a Record is a Record, It is the
content that drives retention, not the media
19The Digital Archives Experience
20Standards Driven
- Open Archival Information System ISO Standard
for electronic records archiving - DOD 5015.2 ISO Standard for Records Management
Applications - InterPARES International effort to define
requirements for e-archiving
21Protection from Obsolescence
- Digital Archives Multi-pronged approach
- Stored as BLOBs in DB with metadata
- Maintain native format
- Create open file format version
- Render XML formatted version, wrapped
- Acquire original hardware and software
22Ingestion Process
- MUST be flexible
- No Mandate and 3300 agencies
- Microsoft BizTalk 2004
- Transforms, adds metadata based on business rules
- Creates deep storage copy wrapping original
file in XML, with Hash - Creates web version of original file
23Data Ingestion
- How we use it
- Design XML/Flat-file schemas for all incoming
data - Use Maps to convert from external formats to
internal formats - Build Orchestrations to move the data from the
data files to the database - Image conversion
- Generate Deep Storage XML file
24Predefined Pipelines
fname
firstname
First_Name
Fst_name
first
Jun-07-05
07-Jun-05
06/07/2005
06/07/05
06/07/2005
25Deep Storage XML Schema
- Record Common
- Who
- What
- When
- Where
- Original File
- web file
- Security
- Fixity
- Birth
- Date of
- Father, Mother
- Hospital
26Data Security
- Encrypted SSH FTP transmission
- Issue Digital Certificate
- Verify IP and computer information
- MD5 Hash on all original files
- Copy of FTP on tape prior to ingestion
- DB backups on tape
- Record Level Security for confidential Info
27Record Level Security
- Restrict records at item, field or series level
- Restrict to individual, dept, office or global
- Uses authenticated login to reveal fields
- Anonymous users see Restricted
28Digital Archives New Projects
29Capturing the Web
- Web pages are how we do business
- Universally accessible to public, 24x7
- Information repository
- Captures history, business of agency
- Important to archive news, forms
30Web Archiving
- Custom Built Solution
- Multiple streams, Assist with Archiving
- Stores all web content in database, full text
searchable - Allows predefining of internal fragments, levels,
maximum file size, secure authentication - Web Services allows use of current architecture
for retrieval - Cannot capture deep web content
31Email Archiving
- Permanent, executive level correspondence
- Sent as .pst, .msg
- Store ALL email, even the junk
- Transfer from proprietary into database
- Full text search
- Attachments stored separately, migratable
32Maps and Photos
- Stores oversized maps and high resolution photos
- Converts images to compressed format for viewing
over the web - Provides thumbnails for searching
- Uses LoC metadata indexing standards
- Search on title, description
- E-commerce to order photo-reproductions
33Third Immutable Law
- Anything that you do today, will need major
overhaul in two years - Technology and industry changing at unprecedented
rates But, more records are lost every day! - Key is to be flexible and attack with forethought
34Digital Archives _at_ Eastern Washington
University, Cheney, Washington
Questions? Adam Jansen Deputy State
Archivist ajansen_at_secstate.wa.gov