The Historical AND Digital Brown - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

The Historical AND Digital Brown

Description:

The Historical AND Digital Brown – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 22
Provided by: christine128
Category:

less

Transcript and Presenter's Notes

Title: The Historical AND Digital Brown


1
THE HISTORICAL AND DIGITAL BrownWhite
Turning reels and paper into a CONTENTdm
collection with article level access
Christine Guenther Product Manger, Digital
Services Bethlehem, PAformerly OCLC
Preservation Service Center
2
The Roadmap
Selection
ACCESSQUALITY
METADATAQUALITY
IMAGEQUALITY
3
Workflow Choices
Lehigh U Project
4
(No Transcript)
5
Evaluation of source material
Start with FILM or digitize ORIGINAL?
? Print Master? Preservation grade film?
Budget gt Scan from existing film
? Poor filming? Only poor copies? (color
content) gtScan direct
6
Hybrid approach scan good film, use originals
for the other volumes
112 years, 114 volumes, about 55,000
pagesProduction 2007/2008.
7
Scanning Original BW volumes
  • Zeutschel 7000 bookscanner
  • 400 dpi capture
  • 8-bit grayscale
  • Bound volumes vs. gutter text loss

8
Scanning Preservation Microfilm
  • NextScan Eclipse
  • 35mm roll film scanner
  • Same specifications
  • Pro/Con Scan rate high, but post-scan image
    enhancements necessary (deskew, crop etc)

9
Quality Assurance!
  • Bethlehem Digital QA team
  • View every image for proper capture and
    completeness
  • Collect issue date and page sequence data for all
    pages
  • Deal with naming irregularities typical for
    student newspapers or dupes on film

10
From pixels to content
11
Beyond OCR
  • Challenge OCR is cost-effective, but only as
    accurate as text quality appearance in the source
    file
  • Simple OCR is not reliable for recognizing
    specific metadata such as Issue Date and blind
    for document structure (articles)
  • Used metadata elements similar to NDNP guidelines
    (including LCCN, geographic coverage, etc.)
  • Advanced Search goes beyond full text
    search!Offering access points for discovery.

12
Metadata collection - Results
13
Content conversion
  • CCS docWORKS software
  • CCS developed the schema in cooperation with 12
    EU and US libraries during the EU-funded METAe
    project.
  • ALTO Analyzed Layout and Text Object
  • The Library of Congress chose CCS's ALTO-schema
    for the National Digital Newspaper Program (NDNP)

14
METS/ALTO
  • METS Contains all digital preservation data
    bibliographical, administrative, technical,
    structural ONE PER ISSUE
  • Passport for digital preservation
  • ALTO Contains layout information and OCR results
    each word is mapped to a specific location in
    an image ONE PER PAGE
  • Can also include article level information

15
Summary The data package per page
access
  • Best Image in standard format
  • SUSTAIN
  • REPROCESS(?)
  • Well suited for oversize content
  • QUICK ACCESSw/ DETAIL
  • De-facto standard
  • PRINT
  • CONTAINED
  • SearchabilityXML
  • FULL TEXT

CCS docWORKS
16
Extra feature Segmentation
  • Layout analysis
  • Manual correction
  • Article jumps
  • Headline correction

17
Production floor in India
18
Headline correction
19
Online presentation system CONTENTdm
20
CONTENTdm Collection Building
  • Import METS/ALTO JPEG2000 PDF Compound
    Objects (one per issue)
  • Troubleshooting
  • Quality Assurance

21
Thank you!
  • For further information
  • Christine Guenther
  • Backstage Library Works9 South Commerce
    WayBethlehem, PA 180171-800-773
    7222guenthec_at_oclc.org
  • Lara Henry
  • Sales Representative
  • lara_at_bslw.com
  • 1-800-288-1265
  • www.bslw.com
Write a Comment
User Comments (0)
About PowerShow.com