Working with metadata in digital archives - PowerPoint PPT Presentation

About This Presentation
Title:

Working with metadata in digital archives

Description:

Tessella Support Services plc. 3 Vineyard Chambers. Abingdon OX14 3PX. United Kingdom ... automation in user environment (business process, workflow etc. ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 18
Provided by: robw55
Learn more at: https://www.erpanet.org
Category:

less

Transcript and Presenter's Notes

Title: Working with metadata in digital archives


1
Working with metadata in digital archives
  • Erpanet
  • Metadata in Digital Preservation
  • Marburg, 3-5 September 2003
  • Bill Roberts
  • bill.roberts_at_tessella.com
  • Tessella Support Services plc3 Vineyard
    ChambersAbingdon OX14 3PX
  • United Kingdom
  • www.tessella.com

2
Metadata functions
Edit
Import
Search
Collect
View
Store
Export
3
Collect metadata (1)
  • Some must be manual assist user, prevent
    mistakes
  • Avoid duplication record hierarchies
  • automation in user environment (business process,
    workflow etc.)
  • automatic analysis of file properties
  • processing history (virus checking results etc.)

4
Collect metadata (2)
  • UK National Archives Digital Archive Stellent
    OutsideIn
  • analyses file to determine type
  • could also form part of approach to extract
    metadata from content

5
Collect metadata (3)
  • Pfizer Central Electronic Archive
  • Small metadata set
  • Automatic collection of metadata
  • Software agents on user servers
  • Possible to do more
  • Improve ease of use
  • Improve accuracy
  • Pfizer aiming to simplify provenance metadata

6
Import metadata (1)
  • Transfer format XML
  • link metadata to files during transfer
  • virus checking, file format analysis etc.
  • Maintain loose coupling between components of
    system agreed interfaces

7
Import metadata (2)
  • Efficiency large transfers
  • XML can be expensive to process
  • speed
  • memory DOM can be 20 times larger than XML file

8
Storage - requirements
  • dont lose it!
  • maintain links between metadata, records and
    files
  • find what you are looking for
  • retrieve

9
Storage approaches
  • encapsulation vs. ease of access
  • volume of data
  • speed of searching vs. speed of import/export
  • typically metadata in database and files on file
    server

10
The National Archives (UK) Digital Archive
approach
  • Relational database for metadata, file server for
    computer files
  • Metadata stored as XML documents in database
  • A few key elements stored in tables and indexed
    (unique identifier, PROCAT reference)
  • Links between records, files, accessions,
    metadata managed in database
  • Subset of metadata identified as searchable
    values extracted into text based index
  • File contents not currently searchable

11
UK Digital Archive (2)
  • record and file metadata kept separately
  • flexible relationship between records and
    computer files
  • Unlimited depth of record hierarchy (records can
    contain sub-records)
  • metadata imported/exported as XML so
    easier/quicker to store as XML
  • designed for ease of extension to metadata
    (disadvantage of extracting metadata into
    database tables)
  • ltGSMElement nameTitlegt rather than ltTitlegt

12
Alternatives
  • VERS approach metadata and content files
    encapsulated together within XML file
  • ve record is self-contained
  • ve well-suited to use of digital signatures on
    both metadata and content
  • -ve more denormalisation required for access
  • -ve complexity of adding to or editing metadata
  • -ve if file is needed for more than one record,
    must be duplicated

13
Interoperability
  • Not much experience in practice so far
  • XML helps - but not much!
  • Likely to be similar but not identical schemas
  • Different implementations of same schema
  • Short term ad hoc mapping between schemas for
    specific systems
  • Longer term various initiatives, but
    standardisation and semantics-based approaches
    are difficult

14
Extending or changing the schema
  • Schema may (will!) change in future
  • No one size fits all approach
  • TNA plans for extensions to core metadata
    according to file type and according to function
  • Version control

15
Preservation metadata
  • Maintain ability to understand and authentically
    reproduce content files
  • PRONOM system separate database for file
    formats/accessibility
  • KB preservation layer model approach
  • Technology watch

16
Authentication/Integrity
  • Digital signatures has something changed? (also
    simpler hashing algorithms)
  • Digital signatures who signed it?
  • Control access
  • Audit logs

17
Conclusions
  • Digital preservation is still a young
    discipline, so best approach not always clear
  • Do something! Learn from experience
  • Design for flexibility/replaceability records
    must outlive any implementation
Write a Comment
User Comments (0)
About PowerShow.com