Digital Records and Digital Archives: Preservation in Theory and Practice - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Digital Records and Digital Archives: Preservation in Theory and Practice

Description:

They fall outside record management regimes. Computers need experts' ... Open-reel tapes. Tape cartridges. Hard disks. CD-ROM or CD-R. Punched cards, paper tape ... – PowerPoint PPT presentation

Number of Views:197
Avg rating:3.0/5.0
Slides: 41
Provided by: ulc3
Category:

less

Transcript and Presenter's Notes

Title: Digital Records and Digital Archives: Preservation in Theory and Practice


1
Digital Records and Digital ArchivesPreservation
in Theory and Practice
  • Richard Davis
  • Data and applications specialist
  • ULCC
  • http//ndad.ulcc.ac.uk/

2
What we will cover
  • Physical forms
  • Logical forms
  • Physical preservation
  • Migration and refreshing
  • Preservation metadata
  • Organisational issues

3
Why do they deserve attention ?
  • Digital records require an intermediary
  • They dont have a fixed form
  • Their carriers are perishable
  • They fall outside record management regimes
  • Computers need experts
  • Mutual lack of understanding

4
What are the advantages
  • Easy to copy
  • Easy to reuse
  • No worry about which is original
  • Take up less space
  • Easy to search, even without a catalogue

5
Assumptions
  • You know what digital records exist
  • You know what you want to preserve
  • You have a retention/disposal policy
  • You can separate material for preservation
  • You know what you want to do with it

6
The basic tasks
  • Protecting the media
  • Copying to new media
  • Choosing a file format
  • Migrating to new file formats
  • Creating/preserving/migrating metadata

7
What are we trying to achieve ?
  • Legal protection
  • Creating historical collections
  • Enabling use/reuse
  • Informing the corporate memory
  • Marketing

8
How might we achieve it ?
  • Preserve the bits
  • Preserve the data
  • Preserve the record
  • Preserve the experience

9
Preserving the bits
  • Keep the data in exactly the same format
  • Interpretation a problem for others
  • Avoid risk
  • Works in some contexts
  • Still need preservation metadata and refreshing
  • Useful as an adjunct

10
Preserving the data
  • Keep the information, dont worry about
    presentation and context
  • Better than nothing
  • Often used for databases
  • Reduces long-term utility

11
Preserving the record
  • Keep the information and context
  • The ideal approach
  • Dont necessarily preserve appearance
  • Balances utility against costs

12
Preserving the experience
  • Keep everything - software, information, etc
  • May require emulators or old computers
  • A museum-like approach
  • Expensive and doesnt help re-use
  • Someone ought to do it, but not us

13
What physical form do they take ?
  • Floppy disks
  • Open-reel tapes
  • Tape cartridges
  • Hard disks
  • CD-ROM or CD-R
  • Punched cards, paper tape
  • ZIP, JAZ, etc disks

14
Media Lifetimes
15
Refreshing media
  • The process of copying to new media
  • Maybe the same, maybe different
  • Either at end of predicted lifetime, or after
    a detected failure
  • Check all copies once produced
  • Lifetime may be number of uses, not years

16
Logical Preservation
  • Selecting the right file format
  • At creation time or accession time
  • No universal solution
  • Preservation format not necessarily access format
  • Ideally will combine metadata

17
Properties of preservation formats
  • A public definition
  • Stability
  • Good conversion from ingest formats
  • Good conversion to access formats
  • Adequate representation of structure of
    information
  • NOTE not all data types have formats which
    satisfy these properties

18
What forms do they take ?
  • Documents - digital paper (including email)
  • Spreadsheets
  • Databases
  • Digital audio/video/images
  • Exotic forms virtual worlds, games, etc.
  • Programs
  • Assemblies of the above web sites, etc

19
Characteristics
  • Two forms dynamic and static
  • Static records are created once and not altered
  • Dynamic records continually change
  • Most common dynamic records are databases
  • Static records must be captured to prevent change

20
Capturing the record
  • Two basic approaches automatic or manual
  • Automatic system forces capture of record copy
  • Manual users must choose what is retained
  • Each has advantages and disadvantages
  • Collecting archive is manual by definition

21
Automated approaches
  • Email central server captures and indexes
  • Documents EDMS/force addition of metadata
  • Databases capture transaction logs or snapshots
  • Web sites as databases
  • Custom applications design it in
  • Procedures are more important than fancy software

22
Short-term storage
  • If information only needed for 5 years, may be
    kept in original format
  • Check that systems or software not changed
  • periodically try to access older information
  • Suitable if only use of records is in organisation

23
Long term storage
  • Over 5 years, must convert to a standard format
  • Alternatively, use a standard format to create
  • Documents PDF, plain text, XML
  • Databases/spreadsheets CSV and SQL schema
  • Pictures TIFF
  • Sound PCM, AIFF

24
Migration
  • Frequency is not predictable
  • However, assume every 10 years on average
  • External factors are the usual influence
  • May be done in order to keep all files in one
    format
  • Should also be automated
  • Devise ways to check migration does not lose
    information

25
Compression
  • Beware
  • Only use lossless compression
  • Example TIFF G4 for black-and-white images
  • Other types lose information
  • Each migration causes more to be lost
  • Space is cheap information is not

26
Metadata
  • Data about data
  • It isnt specific to digital records
  • Deals with
  • resource discovery Is there a document about
    X?
  • resource description What is this document?
  • May be embedded (TIFF) or external (catalogues)
  • Most records contain some embedded data

27
Metadata (2)
28
Metadata examples
  • Author
  • Sensitivity/access conditions
  • Retention period
  • Subject
  • Date of creation/use/retirement
  • Keywords
  • Abstract

29
Non-digital metadata
  • Most computer systems need paper to be understood
  • Manuals, specifications, reports
  • Some essential information may only be in
    peoples heads
  • Most important when dealing with older records

30
Non-digital metadata (2)
31
Preservation metadata
  • Time to next refresh
  • Time of last integrity check
  • Checksum
  • Date of last migration
  • File format
  • Number of copies

32
Preservation tools
  • Create multiple copies - at least two
  • Check you can read the copies!
  • For extra security, use different software
  • Store copies in different places
  • Create checksums of files (MD5)
  • Re-read every six months or yearly

33
Preservation and Access
  • Preservation systems
  • Keep information safe and secure
  • Control accessibility
  • Deliver data without interpretation
  • Access systems
  • Mediate between user needs and preservation
    system
  • Format, select and present information
  • Guide users as to what is possible
  • Relate information to context

34
Hints and tips
  • Beware of ...
  • Automated dates in documents
  • Dynamic documents, e.g. Spreadsheets, OLE
  • Password-protected files
  • Hybrid assemblies and embedded objects
  • Store checksums separately from records
  • Databases may have many different views

35
Hybrid systems and embedded objects
36
Variant views
  • Some systems present many forms of the same data
    to different users
  • Example railway timetables
  • No one person who uses the record may understand
    all of it
  • IT staff should know WHAT exists but may not know
    WHY or WHO FOR

37
Preservation by emulation
  • Emulators reproduce a whole system, or a program
  • Emulators created once, used by many
  • System emulators require us to use old interfaces
  • Program emulators may use current interfaces
  • Technology works, but does it give what we want?
  • Only solution at present for some systems

38
What is being preserved ?
  • Existence may be more important than content
  • Dont worry about emulating original views
  • Do worry about describing original views and
    constraints
  • Preserving old interfaces means denying access
  • Just because questions can be answered now does
    not mean they were always answerable

39
Getting to know your IT people
  • Style of IT support depends on size/age/type of
    organisation
  • Central control is easier to work with
  • Understand issues, but dont try to be an expert
  • They like simple, reusable formats as well
  • Try to be involved before records are created

40
And finally.
  • For now - preserve the original bits as well the
    standard formats
  • Dont wait for all the answers before you begin
  • What you need now are the questions
  • PLAN in advance TEST what you do
  • Remember that digital isnt that different
Write a Comment
User Comments (0)
About PowerShow.com