A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery PowerPoint PPT Presentation

presentation player overlay
1 / 39
About This Presentation
Transcript and Presenter's Notes

Title: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery


1
  • A Multi-Tiered Architecture for Distributed Data
    Collection and Centralized Data Delivery

Stacy Kowalczyk and James Halliday
April 28, 2008
2
Project Overview
  • IN Harmony is
  • An IMLS funded grant
  • Awarded in Fall 2004
  • To be competed in Fall 2008
  • A partnership of
  • Indiana University Digital Library Program
  • Indiana University Lilly Library
  • Indiana State Library
  • Indiana State Museum
  • Indiana Historical Society

3
Project Goals
  • To provide a model for fostering collaborative
    digital library development by partnering with
    institutions with complementary collections
  • To digitize a portion of the sheet music from
    these collections and offer access to these
    materials free of charge on the web
  • To bring these materials and their attendant
    metadata together on a single web site, offering
    both federated searching of the entire collection
    and searching of one or more selected
    collections

4
Deliverables
  • Tools to
  • Process the images
  • Capture metadata
  • Provide search and display functions
  • 10,000 pieces of sheet music scanned and
    cataloged
  • 4,000 Indiana University Lilly Library
  • 2,000 Indiana State Library
  • 2,000 Indiana State Museum
  • 2,000 Indiana Historical Society

5
Cataloging and Imaging Workflow Goals
  • Data integrity
  • Quality of the scans
  • Quality of the metadata
  • Accuracy of the links between page images
  • Accuracy of the links between metadata and images
  • Simplicity of use
  • Balance of flexibility and constraints

April 28, 2008
IN Harmony DLP Spring Forum 2008
6
Cataloging and Imaging Use Cases
  • Catalog first
  • Scanning first
  • Metadata created in another system and imported
    into IN Harmony

April 28, 2008
IN Harmony DLP Spring Forum 2008
7
Digitizing Quality Control
  • 2 phased Quality Control Process
  • Automated QC process verifies
  • All TIFF tags of every digital file
  • TIFF must be uncompressed
  • Files names
  • Embedded profile appropriate to its bit depth
  • Consistency of pixel dimensions within a score
  • Appropriate resolution

8
Digitizing Quality Control (2)
  • Manual QC at 100 pixel display, verify
  • Correct page orientation and order
  • Correct color balance
  • Sharp and in-focus scan
  • No digital artifacts
  • When all QC is passed, derivative files are
    created
  • Large and small jpgs for screen delivery
  • PDF sized for 8.5 x 11 printing

9
Digitizing Quality Control Software
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
Designing the metadata model
  • User studies
  • Work with the partners
  • Define fields
  • Write cataloging guidelines with partner input
  • Representation in MODS

April 28, 2008
IN Harmony DLP Spring Forum 2008
14
Types of fields
  • Title elements
  • Name elements
  • Publication elements
  • Subject elements
  • Identification elements
  • Note elements
  • Cover information

April 28, 2008
IN Harmony DLP Spring Forum 2008
15
Metadata Collection Tool
16
(No Transcript)
17
(No Transcript)
18
Public Search and Discovery System Demo
January 11, 2012
Customize footer View menu/Header and Footer
19
Architecture OverviewJim Halliday
20
IN Harmony Technical Overview
Mass Storage System
Web Browser
SRU and http
MODs Export
Cataloging Client
FTP
Quality Control
Java Swing
Oracle
Perl Web Application
Authentication Service
21
Getting Data Into IN Harmony
  • 2 primary data sources
  • Cataloging client
  • Image QC/upload application
  • Other data sources
  • XML data exported from other cataloging systems
  • Score images exported from older systems

April 28, 2008
IN Harmony DLP Spring Forum 2008
22
(No Transcript)
23
Image QC/upload application
  • User scans scores and uploads to IN Harmony
    server
  • User accesses Perl-based web application to
    initiate automated quality control
  • A second user proceeds with manual QC, then uses
    web application to signal that manual QC is
    finished
  • The application moves and backs up the files,
    creates derivatives, and alerts both Fedora and
    the internal database that the process is
    complete

April 28, 2008
IN Harmony DLP Spring Forum 2008
24
IN Harmony Derivatives
  • Three sizes of JPGs produced per page
  • Full (1200px high)
  • Screen (600px high)
  • Thumb (200px high)
  • Multi-page, playable PDF
  • Approx. 1MB for an average score

April 28, 2008
IN Harmony DLP Spring Forum 2008
25
(No Transcript)
26
IN Harmony cataloging client
  • Standalone Java Swing based client
  • Connects to Oracle database and outputs MODS for
    Fedora ingestion
  • Implemented as a client-server application via
    web services using Axis
  • Specialized UI components (such as smart combo
    boxes) assist with quick, correct data entry

April 28, 2008
IN Harmony DLP Spring Forum 2008
27
(No Transcript)
28
Internal IN Harmony database
  • Oracle database stores record and user data in
    our own internal format
  • Communicates with upload/QC application, and
    cataloging client
  • Cataloging client and internal scripts can output
    to MODS format for ingestion into Fedora

April 28, 2008
IN Harmony DLP Spring Forum 2008
29
(No Transcript)
30
IN Harmony authentication
  • CAS (IUs Central Authentication Service) is
    used to authenticate all users
  • Non-IU users must create IU Guest Accounts to
    authenticate
  • All account/password maintenance in users
    control

April 28, 2008
IN Harmony DLP Spring Forum 2008
31
(No Transcript)
32
Fedora and IN Harmony
  • Fedora used as a single storage and
    infrastructure solution for Digital Library
    Program projects as IU
  • Data (score images and metadata) ingested into
    Fedora and referenced as METS objects
  • Master images sent to IUs mass storage system
  • Derivatives stored internally
  • Objects indexed using Lucene for SRU-based
    searching

April 28, 2008
IN Harmony DLP Spring Forum 2008
33
Fedora Object Model
Collection
Sheet music
Copy
Page
34
(No Transcript)
35
IN Harmony end-user interface
  • Java Struts based web application
  • Offers searching, browsing, and record display
  • Each partner institution is offered a
    personalized view of their data only
  • Interaction with Fedora
  • Application sends CQL queries to Fedora and
    retrieves MODS data which is transformed via XSLT
  • PURLs (persistent URLs) are used to access
    image derivatives

April 28, 2008
IN Harmony DLP Spring Forum 2008
36
METS Navigator
  • METS Navigator is used to page through scores
    online
  • Uses METS structmap to facilitate navigation
  • Allows views of multiple sizes of images
  • Released by IU as open source see
    http//metsnavigator.sourceforge.net

April 28, 2008
IN Harmony DLP Spring Forum 2008
37
IN Harmony Technical Overview
Mass Storage System
Web Browser
SRU and http
MODs Export
Cataloging Client
FTP
Quality Control
Java Swing
Oracle
Perl Web Application
Authentication Service
38
IN Harmony Links
  • IN Harmony Public Interface
  • IN Harmony Project Information
  • Cataloging Tool Release date June 2008

39
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com