Sometimes it takes three to Tango - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Sometimes it takes three to Tango

Description:

To deploy VITAL and the FEDORA repository environment to test it as a ... Individual records only (no threads) Scanned reference documents partially successful ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 21
Provided by: jeffrey172
Category:
Tags: sometimes | takes | tango | three

less

Transcript and Presenter's Notes

Title: Sometimes it takes three to Tango


1
Sometimes it takes three to Tango
  • Evaluating the role of vendor support for Open
    Source repository components in a Research
    Library environment

2
Credits
  • Co Authors
  • David Gewirtz
  • Gretchen Gano
  • Frederick Martz
  • Co Investigators
  • Gail Barnett, Elisabeth Beaudin, Art Belanger,
    Derek Merleaux, Roy Lechich , Ernest Marinko
  • Co Developers
  • VTLS
  • Numerous additional library and university staff

3
Goals of the research study
  • To deploy VITAL and the FEDORA repository
    environment to test it as a potential home for
    collections within the Yale libraries
  • To explore the possible infrastructure for
    digital preservation of the Librarys electronic
    collections.
  • To help librarians better understand how other
    services in the digital environment interface
    with a digital repository.

4
Why VITAL, Not Just Fedora
  • Benefits of Commercial Support
  • Reduce risk
  • More functionality
  • Better support system
  • Reduce learning curve to build collections
  • Possible development partnerships that reduce the
    cost and risk of custom development

5
Additional VITAL Features
  • Administrative tools for efficient operation
  • Batch ingest utility for bulk loading
  • Web Self-Submission interface
  • Staff or faculty create their own collections
  • Configurable templates for metadata
  • Staged submission enables staff review
  • Multiple workflows to suit different users
  • Automatic extraction of full text from PDF

6
Common Information Infrastructure
7
TANSTAAFL1
  • Research suggests that Yales digital repository
    framework is likely to be based upon a solution
    that supports multiple digital repositories
    integrated through a tiered architecture.
  • Development based upon open source applications
    like Fedora require community participation and
    contributions
  • Hardware failure significantly delayed
    installation of new code disaster recovery
    procedures are needed even for research projects.
  • Significant additional training is needed
    throughout the library/university on metadata and
    XML based tools (another source of delay in the
    project).
  • 1 Robert Heinlein, The Moon is a Harsh Mistress

8
Research Results
  • Three Servers configured and installed
  • Seven Collections Defined and Tested
  • Thirty-seven Use Case Scenarios
  • Twenty-one executed
  • Seventeen met basic success criteria
  • Twenty Gaps identified
  • Fourteen gaps addressed in VITAL 2.1
  • Twenty-two () educated librarians/faculty/staff
  • Ten have received XSLT training
  • Some double counting and overlap exists

9
Collection Summaries
  • Finding Aids
  • Archives
  • Medical Images
  • Data Sets
  • Unicode Text
  • Annotated Historical Records
  • Images-Audio-Video

10
Finding Aids
  • Successful batch and interactive Ingest, Indexing
    and basic DC creation
  • Successful retrieval with Access Portal
  • Successful Index creation with Admin client
  • Partial success with DC regen.
  • Issues and gaps
  • Distinct Collection Styles
  • Access to XSL navigation tools
  • Support for dynamic formatting

11
Archives
  • Email archiving partially successful
  • Batch ingest and indexing successful
  • Only partial access to attachments
  • Email Retrival partially successful
  • Individual records only (no threads)
  • Scanned reference documents partially successful
  • Drag and Drop fails for some documents
  • Logging data not clear
  • PDF viewing awkward
  • Batch ingest model configuration awkward
  • Access Controls not evaluated
  • Researcher time was limited

12
Medical Images
  • Batch ingest and indexing successful
  • Local enhanced DC partially supported
  • Regen DC works but needs batch mode
  • Local Index generation successful
  • Web submission successful
  • Issues gaps
  • Lack of TIF thumbnail support
  • Lack of index browse by local fields
  • Could not access OCR text within PDFs

13
Data Sets
  • Successful batch ingest with externally generated
    XML
  • Requires FOXML preprocessing
  • Manual DC regen is awkward
  • Functions with a subset of preferred metadata
  • ICPSR, not SSDA
  • Important mime types not recognized
  • Problems reported with setup, but production
    usage deemed feasible
  • Depends of access to API and or programming
    resources

14
Unicode (non-Roman) text
  • Batch Ingest not tested
  • Complex for non-standard formats
  • Lack of configuration setup documentation
  • Ingest via VITAL manger successful
  • Retrieval via Access portal successful
  • Lack of control over Display Order cited
  • Incorrect sort for Arabic initial articles
  • Lack of support for TIFF and JPEG2000
  • Incorrect Display of single page PDFs
  • Search inconsistent for multi-word Arabic
  • Concern about OCR within PDF (see medical
    experience)
  • Expect to need Arabic Disseminators

15
Annotated Historical Records(Scanned handwritten
images with transcriptions and annotations)
  • Load Images of the original papers for subsequent
    indexing and transcription Successful
  • Locate original document images based on the
    accession numbers recorded in the project index
    documents Successful
  • Discover and retrieve information about
    Connecticut tribes based upon a subject and tribe
    Successful
  • Locate original document images based on the text
    of the transcript Successful
  • Navigate from original transcript to annotation
    using internal deep links to Fedora Successful
  • Navigate from external Project Documentation to
    Collection content using external deep links to
    Fedora Successful
  • Web Submission -- Successful

16
Annotated Historical Records (cont)
  • Issues and gaps
  • The collection is too small to justify and
    independent repository, need a method to manage
    content on small scale
  • Lack of control over presentation sequence makes
    discovery experience confusing
  • Lack of hierarchy ("part of") limits
    documentation of important relationships
  • Lack of control over Datastream PID makes
    multipart document annotation difficult

17
Images-Audio-Video
18
Lessons
  • Yales collections require more sophisticated
    tools to ingest content from a broad spectrum of
    formats.
  • Ingestion of text in non-Western languages such
    as Unicode-encoded Arabic text require special
    XSL style sheets that VTLS does not provide in
    their default system.
  • Ingest tools and functions of VITAL need to be
    extended to handle more than simple Dublin Core
    metadata and
  • Knowledge of the XML stack like XPATH and XSLT is
    needed to effectively work with, i.e. extend
    VITAL tools for batch ingest

19
Conclusion
  • Yale University Library views the collaborative
    process used to evaluate VITAL as a valuable
    model for learning new technology, and plans more
    research projects based upon this assessment
    model.
  • VTLS serves as a good example where commercial
    vendors can become contributors to open source
    projects like Fedora that benefit our community
    while still being able to profit by their work.

20
Links
  • http//www.library.yale.edu/jbarnett/FEDORA_UG_20
    06/Fedora_User_Group_2006.ppt
  • http//www.library.yale.edu/iac/documents/DR_Revie
    w_final_27Sept05.pdf
Write a Comment
User Comments (0)
About PowerShow.com