Ingest Strategies for Digital Libraries - PowerPoint PPT Presentation

About This Presentation
Title:

Ingest Strategies for Digital Libraries

Description:

Published CD-ROMs and Floppy Disks ... Unpublished CD-ROMs and Floppy Disks - Unstructured, undocumented (often backup) collections ... – PowerPoint PPT presentation

Number of Views:12
Avg rating:3.0/5.0
Slides: 27
Provided by: erpa
Learn more at: https://www.erpanet.org
Category:

less

Transcript and Presenter's Notes

Title: Ingest Strategies for Digital Libraries


1
Ingest Strategies for Digital Libraries
  • Seamus Ross and
  • Adam Rusbridge
  • HATII, University of Glasgow

2
Introduction
  • Investigate Requirements, Procedures and
    Surrounding Issues of Ingest
  • Focused on Metadata Requirements
  • Prototype Repository with PHP/MySQL
  • Based Preservation Metadata on NLNZ Schema
  • Subset of MARC21 for Bibliographic Metadata
  • Pragmatic and Practical

3
Objectives
  • To investigate the
  • Development of Workflows
  • Representation of Complex Objects
  • Metadata Requirements
  • Feasibility and Implications of Manual
    Acquisition
  • Time and Cost Requirements

4
Media Types
  • Published CD-ROMs and Floppy Disks
  • - Specialist software often required, complex
    navigational requirements
  • Unpublished CD-ROMs and Floppy Disks
  • - Unstructured, undocumented (often backup)
    collections

5
Development
  • Prototype Repository Development Targeted
  • Defined Minimum Repository Structure
  • Preservation and Bibliographic Metadata Strategy
  • Linux OS, MySQL, PHP

6
Analysis of Media
  • Disks had none or limited additional information
    at submission
  • Necessary to execute and view all objects
  • Preliminary Work Included
  • - Packaging, Documentation, System Requirements
    and Entry Points
  • Search Internet for Semantic Information
  • Execute under Native Platform

7
Investigation
  • Establish Content of Media
  • - e.g. Hybrid Disk Unrelated Objects
  • Validate Completeness and Integrity
  • - e.g. Virus Checking Comparisons of Table Of
    Contents and Checksums
  • Identification of File Types
  • - Reliance on Format Identification websites
    (www.filext.com)

8
Investigation
  • Alternate methods of Identification
  • - Binary editors required for unconventional
    extensions. Unsurprisingly, difficult to
    interpret information
  • Time consuming and technically demanding
  • Many Open-Source Utilities Required
  • - Difficult to find, technically demanding to
    use
  • Time Consuming, Mentally Exhausting, Ambiguous

9
A Workflow Model
Selection
Archiving/ Ingest
Description
Registration
Ingest Prep
Delivery Acquisition
Quarantine Virus Checking
Verification
10
Representation of Objects
1.
3.
2.
6.
5.
4.
11
Representation of Objects
1.
3.
2.
6.
5.
4.
12
Representation of Objects
1.
3.
2.
6.
5.
4.
13
Representation of Objects
1.
3.
2.
6.
5.
4.
14
Representation of Objects
1.
3.
2.
6.
5.
4.
15
Representation of Objects
1.
3.
2.
6.
5.
4.
16
Representation of Objects
1.
3.
2.
6.
5.
4.
17
Representation of Objects
1.
3.
2.
6.
5.
4.
18
Representation of Objects
1.
3.
2.
6.
5.
4.
19
Representation of Objects
1.
3.
2.
6.
5.
4.
20
Representation of Objects
1.
3.
2.
6.
5.
4.
21
Representation of Objects
1.
3.
2.
6.
5.
4.
22
Metadata Requirements
  • Minimum metadata required on deposit
  • - In our sample set, inversely proportional to
    available documentation
  • - Creator, Title, Description, Audience, Group
  • - Automation easier with submission standards
  • - dependant on curators of selection
  • Too easy to skip non-mandatory elements
  • Easier to extract metadata on original platform
  • How can we ensure correct representation?

23
Feasibility of Manual Acquisition
  • Many utilities available, development required
  • Automation essential at the file level
  • - Too many files and errors
  • NLNZ Extract Tool 80 Technical Metadata
  • - Current limitation of formats
  • - More binary filetypes needed
  • Collaboration with File Format Registries
  • Automation of Semantic Metadata Extraction...?

24
Cost Requirements
  • Difficult to determine costs from pilot
  • Automation reduces costs
  • Number and Expertise of Technicians
  • Adherence to Submission Requirements or Standards
    could reduce costs
  • - Only to point of content selection and
    appraisal, which is often also in-house

25
Conclusions
  • Lack of understanding and awareness must be
    addressed
  • Focused goal of Preservation Metadata
  • Collaborating system of Preservation,
    Bibliographic, Collection Management,
    Authentication and File Format Information
  • Several schemas available, what's best for your
    needs?

26
Conclusions
  • Further Investigation and Guidance necessary
  • - Infrastructures must be developed and
    implemented
  • - Institutions must tailor tried and tested
    solutions for their needs
Write a Comment
User Comments (0)
About PowerShow.com