Getting - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Getting

Description:

at the Digital Archives Adam Jansen Deputy State Archivist Washington State Archives Digital Archives _at_ Eastern Washington University, Cheney, Washington Questions? – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 35
Provided by: adamj4
Category:

less

Transcript and Presenter's Notes

Title: Getting


1
Getting Digi with itat the Digital Archives
  • Adam Jansen
  • Deputy State Archivist
  • Washington State Archives

2
What is Archiving in the Electronic Age?
  • Protecting machine readable records of enduring
    legal, historical or fiscal value from loss,
    alteration, deterioration and technological
    obsolescence in a environment independent from
    that which produced the record.

3
Mission of the Digital Archives
  • Collect electronic records of enduring legal,
    historical or fiscal value
  • Maintain these records in perpetuity in a useable
    state for the good of the public
  • Make records that are discloseable accessible to
    the public

4
Public Records
  • As defined in RCW 40.14
  • ANY records that have been made by or received
    by any agency of the state of Washington in
    connection with the transaction of public
    business

5
Redefining Public
  • Avg over 650 researchers per day
  • Avg length of stay over 6 minutes
  • 6 .gov - 4 .edu - 1 .org
  • 13 came from Internet Search (Google, MSN,
    Yahoo)
  • Researchers from 131 foreign countries
  • Researchers from
  • Canada, US Military, Romania, Germany, France,
    Australia, Japan, UK, Netherlands, Russia,
    Thailand, Portugal, Belgium, Poland, Italy,
    Indonesia, Singapore, Sweden, Mexico, New
    Zealand, Czech Republic, Hungary, Brazil, Norway,
    Columbia, Austria, Greece, Bulgaria, China,
    Yugoslavia, Philippines, Spain, South Korea,
    Denmark, Oman, Pakistan, South Africa, Jamaica,
    Switzerland

6
Records and Informationor, Why we do what we do
  • If - Information is power
  • And - Records are storage of information
  • Then Records must be preserved for future
    generations
  • Why?
  • The foundation of democracy in America is
    government accountability to the people

7
What are the challenges (or why is it so
hard!?!?!?)
  • Socio-political
  • Resistance to change
  • Inability to keep pace
  • Technology
  • Ever upwards and onwards
  • Little thought on looking back

8
(No Transcript)
9
New Federal Mandates to Manage Certain
Electronic Records
  • As electronic records become more integrated into
    society, producers of those records will be held
    to higher standards of conduct
  • Health Insurance Portability Accountability Act
    of 1996 (HIPAA)
  • Gramm-Leach-Billey Act of 1999
  • Patriot Act of 2001
  • Sarbanes-Oxley Act of 2002
  • Help America Vote Act of 2002 (HAVA)
  • More mandates to come
  • Records must be managed and destroyed
    methodically in normal course of business

10
Shifting Media
  • Historically records were stored on paper, kept
    in filing cabinets
  • When the cabinet was full, records sent to file
    room
  • Now records stored electronically on computers
  • When the computer is full add more hard
    drives
  • Basic skills to manage and maintain records has
    been lost, replaced by infinite storage

11
  • So the question becomes who takes care of the
    records, and do they have the knowledge?

12
Why a Digital Archives?
  • Comply with statutory regulatory mandates.
  • The Law requires preservation of certain public
    records it doesnt specify whether those
    records are paper or electronic. All records
    must be given the same care.
  • Avoid loss of legal historical records
  • As technology changes, the older media (5 ¼
    floppy disks, for instance) become harder to
    read.
  • Preserve rare and at-risk paper records
  • Centralize Records
  • Centralization means uniformity in maintenance
  • Trained professionals serve as caretakers
  • Improved access for citizens
  • By centralizing historical electronic records in
    one location, one-stop shopping will provide
    the information quicker and easier

13
What the Digital Archives is not
  • Not mass storage for active business applications
    data
  • Not remote back-up for state local government
    networks data

14
The Digital Archives will
  • Preserve electronic records with long-term legal,
    historical and/or fiscal significance
  • Assure platform-neutral retrieval 50, 100, or
    more years from now
  • Provide security back-up of certain permanent
    electronic legal records (courts, vital records,
    land records, etc.)

15
8 Requirements for Preservation
  • Readable
  • Retrievable
  • Intelligible
  • Encapsulated
  • Reconstructible
  • Identifiable
  • Understandable
  • Authentic
  • From Authentic Electronic Records by Charles
    Dollar

Hardware

File Format

Content Management
16
Hardware
  • File Room of the 21st century
  • Capacity and Speed double every 18 months
  • Many choices
  • Tape
  • Optical
  • Hard Drives
  • First Immutable Law of Digital Archiving
  • What hardware you use today will be obsolete
    within four years

17
Archival Software Formats
  • Native
  • ASCII
  • TIF
  • PDF/A
  • XML
  • Whenever possible seek the
  • open standard solution!
  • Remember WordStar and DBase II ???

18
Content Management
  • Essential to maintain control of the information
    explosion
  • Allows hard coded rules and information exchange
  • BUT still requires a strong knowledge,
    understanding and implementation of basic records
    management
  • Second Immutable Law of Digital Archiving
  • Data is Data, a Record is a Record, It is the
    content that drives retention, not the media

19
The Digital Archives Experience
20
Standards Driven
  • Open Archival Information System ISO Standard
    for electronic records archiving
  • DOD 5015.2 ISO Standard for Records Management
    Applications
  • InterPARES International effort to define
    requirements for e-archiving

21
Protection from Obsolescence
  • Digital Archives Multi-pronged approach
  • Stored as BLOBs in DB with metadata
  • Maintain native format
  • Create open file format version
  • Render XML formatted version, wrapped
  • Acquire original hardware and software

22
Ingestion Process
  • MUST be flexible
  • No Mandate and 3300 agencies
  • Microsoft BizTalk 2004
  • Transforms, adds metadata based on business rules
  • Creates deep storage copy wrapping original
    file in XML, with Hash
  • Creates web version of original file

23
Data Ingestion
  • How we use it
  • Design XML/Flat-file schemas for all incoming
    data
  • Use Maps to convert from external formats to
    internal formats
  • Build Orchestrations to move the data from the
    data files to the database
  • Image conversion
  • Generate Deep Storage XML file

24
Predefined Pipelines
fname
firstname
First_Name
Fst_name
first
Jun-07-05
07-Jun-05
06/07/2005
06/07/05
06/07/2005
25
Deep Storage XML Schema
  • Record Common
  • Who
  • What
  • When
  • Where
  • Original File
  • web file
  • Security
  • Fixity
  • Vital Records
  • Type
  • Birth
  • Date of
  • Father, Mother
  • Hospital

26
Data Security
  • Encrypted SSH FTP transmission
  • Issue Digital Certificate
  • Verify IP and computer information
  • MD5 Hash on all original files
  • Copy of FTP on tape prior to ingestion
  • DB backups on tape
  • Record Level Security for confidential Info

27
Record Level Security
  • Restrict records at item, field or series level
  • Restrict to individual, dept, office or global
  • Uses authenticated login to reveal fields
  • Anonymous users see Restricted

28
Digital Archives New Projects
29
Capturing the Web
  • Web pages are how we do business
  • Universally accessible to public, 24x7
  • Information repository
  • Captures history, business of agency
  • Important to archive news, forms

30
Web Archiving
  • Custom Built Solution
  • Multiple streams, Assist with Archiving
  • Stores all web content in database, full text
    searchable
  • Allows predefining of internal fragments, levels,
    maximum file size, secure authentication
  • Web Services allows use of current architecture
    for retrieval
  • Cannot capture deep web content

31
Email Archiving
  • Permanent, executive level correspondence
  • Sent as .pst, .msg
  • Store ALL email, even the junk
  • Transfer from proprietary into database
  • Full text search
  • Attachments stored separately, migratable

32
Maps and Photos
  • Stores oversized maps and high resolution photos
  • Converts images to compressed format for viewing
    over the web
  • Provides thumbnails for searching
  • Uses LoC metadata indexing standards
  • Search on title, description
  • E-commerce to order photo-reproductions

33
Third Immutable Law
  • Anything that you do today, will need major
    overhaul in two years
  • Technology and industry changing at unprecedented
    rates But, more records are lost every day!
  • Key is to be flexible and attack with forethought

34
Digital Archives _at_ Eastern Washington
University, Cheney, Washington
Questions? Adam Jansen Deputy State
Archivist ajansen_at_secstate.wa.gov
Write a Comment
User Comments (0)
About PowerShow.com