Title: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery
1- A Multi-Tiered Architecture for Distributed Data
Collection and Centralized Data Delivery
Stacy Kowalczyk and James Halliday
April 28, 2008
2Project Overview
- IN Harmony is
- An IMLS funded grant
- Awarded in Fall 2004
- To be competed in Fall 2008
- A partnership of
- Indiana University Digital Library Program
- Indiana University Lilly Library
- Indiana State Library
- Indiana State Museum
- Indiana Historical Society
3Project Goals
- To provide a model for fostering collaborative
digital library development by partnering with
institutions with complementary collections - To digitize a portion of the sheet music from
these collections and offer access to these
materials free of charge on the web - To bring these materials and their attendant
metadata together on a single web site, offering
both federated searching of the entire collection
and searching of one or more selected
collections
4Deliverables
- Tools to
- Process the images
- Capture metadata
- Provide search and display functions
- 10,000 pieces of sheet music scanned and
cataloged - 4,000 Indiana University Lilly Library
- 2,000 Indiana State Library
- 2,000 Indiana State Museum
- 2,000 Indiana Historical Society
5Cataloging and Imaging Workflow Goals
- Data integrity
- Quality of the scans
- Quality of the metadata
- Accuracy of the links between page images
- Accuracy of the links between metadata and images
- Simplicity of use
- Balance of flexibility and constraints
April 28, 2008
IN Harmony DLP Spring Forum 2008
6Cataloging and Imaging Use Cases
- Catalog first
- Scanning first
- Metadata created in another system and imported
into IN Harmony
April 28, 2008
IN Harmony DLP Spring Forum 2008
7Digitizing Quality Control
- 2 phased Quality Control Process
- Automated QC process verifies
- All TIFF tags of every digital file
- TIFF must be uncompressed
- Files names
- Embedded profile appropriate to its bit depth
- Consistency of pixel dimensions within a score
- Appropriate resolution
8Digitizing Quality Control (2)
- Manual QC at 100 pixel display, verify
- Correct page orientation and order
- Correct color balance
- Sharp and in-focus scan
- No digital artifacts
- When all QC is passed, derivative files are
created - Large and small jpgs for screen delivery
- PDF sized for 8.5 x 11 printing
9Digitizing Quality Control Software
10(No Transcript)
11(No Transcript)
12(No Transcript)
13Designing the metadata model
- User studies
- Work with the partners
- Define fields
- Write cataloging guidelines with partner input
- Representation in MODS
April 28, 2008
IN Harmony DLP Spring Forum 2008
14Types of fields
- Title elements
- Name elements
- Publication elements
- Subject elements
- Identification elements
- Note elements
- Cover information
April 28, 2008
IN Harmony DLP Spring Forum 2008
15Metadata Collection Tool
16(No Transcript)
17(No Transcript)
18Public Search and Discovery System Demo
January 11, 2012
Customize footer View menu/Header and Footer
19Architecture OverviewJim Halliday
20IN Harmony Technical Overview
Mass Storage System
Web Browser
SRU and http
MODs Export
Cataloging Client
FTP
Quality Control
Java Swing
Oracle
Perl Web Application
Authentication Service
21Getting Data Into IN Harmony
- 2 primary data sources
- Cataloging client
- Image QC/upload application
- Other data sources
- XML data exported from other cataloging systems
- Score images exported from older systems
April 28, 2008
IN Harmony DLP Spring Forum 2008
22(No Transcript)
23Image QC/upload application
- User scans scores and uploads to IN Harmony
server - User accesses Perl-based web application to
initiate automated quality control - A second user proceeds with manual QC, then uses
web application to signal that manual QC is
finished - The application moves and backs up the files,
creates derivatives, and alerts both Fedora and
the internal database that the process is
complete
April 28, 2008
IN Harmony DLP Spring Forum 2008
24IN Harmony Derivatives
- Three sizes of JPGs produced per page
- Full (1200px high)
- Screen (600px high)
- Thumb (200px high)
- Multi-page, playable PDF
- Approx. 1MB for an average score
April 28, 2008
IN Harmony DLP Spring Forum 2008
25(No Transcript)
26IN Harmony cataloging client
- Standalone Java Swing based client
-
- Connects to Oracle database and outputs MODS for
Fedora ingestion - Implemented as a client-server application via
web services using Axis - Specialized UI components (such as smart combo
boxes) assist with quick, correct data entry
April 28, 2008
IN Harmony DLP Spring Forum 2008
27(No Transcript)
28Internal IN Harmony database
- Oracle database stores record and user data in
our own internal format - Communicates with upload/QC application, and
cataloging client - Cataloging client and internal scripts can output
to MODS format for ingestion into Fedora
April 28, 2008
IN Harmony DLP Spring Forum 2008
29(No Transcript)
30IN Harmony authentication
- CAS (IUs Central Authentication Service) is
used to authenticate all users - Non-IU users must create IU Guest Accounts to
authenticate - All account/password maintenance in users
control
April 28, 2008
IN Harmony DLP Spring Forum 2008
31(No Transcript)
32Fedora and IN Harmony
- Fedora used as a single storage and
infrastructure solution for Digital Library
Program projects as IU - Data (score images and metadata) ingested into
Fedora and referenced as METS objects - Master images sent to IUs mass storage system
- Derivatives stored internally
- Objects indexed using Lucene for SRU-based
searching
April 28, 2008
IN Harmony DLP Spring Forum 2008
33Fedora Object Model
Collection
Sheet music
Copy
Page
34(No Transcript)
35IN Harmony end-user interface
- Java Struts based web application
- Offers searching, browsing, and record display
- Each partner institution is offered a
personalized view of their data only - Interaction with Fedora
- Application sends CQL queries to Fedora and
retrieves MODS data which is transformed via XSLT - PURLs (persistent URLs) are used to access
image derivatives
April 28, 2008
IN Harmony DLP Spring Forum 2008
36METS Navigator
- METS Navigator is used to page through scores
online - Uses METS structmap to facilitate navigation
- Allows views of multiple sizes of images
- Released by IU as open source see
http//metsnavigator.sourceforge.net
April 28, 2008
IN Harmony DLP Spring Forum 2008
37IN Harmony Technical Overview
Mass Storage System
Web Browser
SRU and http
MODs Export
Cataloging Client
FTP
Quality Control
Java Swing
Oracle
Perl Web Application
Authentication Service
38IN Harmony Links
- IN Harmony Public Interface
- IN Harmony Project Information
- Cataloging Tool Release date June 2008
39