Title: Building Digital Libraries Made Easy: Toward Open Digital Libraries ICADL 2002
1Building Digital Libraries Made EasyToward Open
Digital Libraries ICADL 2002 Singapore
Dec. 2002
- Edward A. Fox
- (with Hussein Suleman, Ming Luo)
- fox_at_vt.edu http//fox.cs.vt.edu
- CS DLRL Internet TIC
- NDLTD CITIDEL NSDL
- Virginia Tech, Blacksburg, VA, USA
2Acknowledgements (Selected)
- Sponsors ACM, Adobe, DLF, IBM, Mellon
Foundation, Microsoft, NSF (Grants CDA-9312611
DUE-0121741, 0136690, 0121679 IIS-0080748,
0086227, 0002935, and 9986089), OCLC, SOLINET,
UNESCO, US Dept. Ed. (FIPSE), VTLS, - Faculty/Staff (now) Boots Cassel, Su-Shing Chen,
Debra Dudley, Jeremy Frumkin, Joe Futrelle, Lee
Giles, Martin Halbert, Rex Hartson, John
Impagliazzo, Deborah Knox, JAN Lee, Kurt Maly,
Gail McMillan, Eric Morgan, Manuel Perez,
Muhammad Zubair, - Students Fernando Das Neves, Marcos Goncalves,
Rohit Kelapure, Aaron Krowne, Paul Mather, Ryan
Richardson, Priya Shivakumar, Wensi Xi, Liang Xu,
Baoping Zhang,
3Outline
- Overview, Problem
- Experience Case Study Projects
- Open Archives Initiative
- Hussein Suleman Dissertation
- DL in a Box, OCKHAM
- Summary and Conclusion
4Overview
- We
- address the problem of how to develop DLs
- build on experience in building many DLs
- strive for simplicity as per OCKHAM initiative
- build upon the Open Archives Initiative
- demonstrate our approach in diverse situations
- and invite all to
- use DL-in-a-box and
- help build Open Digital Libraries.
5Problem
- Why do DL developers continue to reinvent the
wheel? The top 10 reasons are - The library budget wont allow purchase of a
commercial DL system. - Unless the development effort is local, there
wont be any control. - DLs are extensions of DBMSs, so they are simple
applications to develop. - Since DLs operate on the Web, one must adopt the
newest W3C proposal.
6Problem contd
- Since technology moves so quickly, it is
essential to follow the latest fad. - CS students always develop from scratch.
- This team knows it can do it better.
- This system must have more capabilities than any
other system. - This DL has to be more flexible and extensible.
- This is the right system architecture at last!
7Outline
- Overview, Problem
- Experience Case Study Projects
- Open Archives Initiative
- Hussein Suleman Dissertation
- DL in a Box, OCKHAM
- Summary and Conclusion
8Experience Case Study Projects
- AmericanSouth.org
- NDLTD
- CSTC
- JERIC
- CITIDEL
- NSDL
- Digital Library in a Box
9AmericanSouth.org
- Domain culture and history of the southern
region of America (USA) - Genre diverse distributed collections at a dozen
universities - Submission Collection local sites ? Emory
University (for SOLINET)
10Networked Digital Library of Theses and
Dissertations (NDLTD)
- Domain graduate education and research
- Genre electronic theses and dissertations (ETDs)
- Submission Collection local sites ?
www.ndltd.org, www.theses.org
11Computer Science Teaching Center (CSTC)
- Domain teaching computer science
- Genre courseware
- Submission Collection www.cstc.org
12CS Teaching Center (CSTC) Lessons Learned
- Instead of building large, expensive multimedia
packages, that become obsolete and are difficult
to re-use, concentrate on small knowledge units. - Learners benefit from having well-crafted modules
that have been reviewed and tested. - Use digital libraries to build a powerful base of
support for learners, upon which a variety of
courses, self-study tutorials reference
resources can be built.
13(No Transcript)
14Browsing (2)
15(No Transcript)
16(No Transcript)
17(No Transcript)
18ACM Journal of Educational Resources in Computing
(JERIC)
- Domain teaching computer science
- Genre courseware, scholarly articles
- Submission Collection CSTC, ACM Digital Library
19JERIC
- Journal of Educational Resources in Computing
- Accessible from www.cstc.org and www.acm.org and
www.citidel.org - ACM and SIGCSE support
- Refereed and interactive
- Part of ACM Digital Library
20Computing and Information Technology Interactive
Digital Educational Library (CITIDEL)
- Domain computing / information technology
- Genre one-stop-shopping for teachers learners
courseware (CSTC, JERIC), leading DLs (ACM,
IEEE-CS, DBLP, CiteSeer), PlanetMath.org,
technical reports, - Submission Collection sub/partner collections
? www.citidel.org
21CITIDEL Team
- An NSDL Collection Track project
- Led by Virginia Tech, with co-PIs
- Fox (director, DL systems)
- Lee (history)
- Perez (user interface, Spanish support)
- Partners
- College of New Jersey (Knox)
- Hofstra (Impagliazzo)
- Villanova (Cassel)
- Penn State (Giles)
22Summary of Spring 2001 Survey of CITIDEL-related Collections and their Sizes Summary of Spring 2001 Survey of CITIDEL-related Collections and their Sizes Summary of Spring 2001 Survey of CITIDEL-related Collections and their Sizes Summary of Spring 2001 Survey of CITIDEL-related Collections and their Sizes Summary of Spring 2001 Survey of CITIDEL-related Collections and their Sizes
Size of Collection 1-5 items 6-100 items 101-999items 1000items
Number ofCollectionsIdentified 100-300 50 20-35 10-25
23Multi-dimensional Categorization
24CITIDEL Collection Sources
include
Experts finding aids
IEEE-CS
CSTC
Research Index
ACM
NCSTRL
include
include
include
metadata
fulltext
NECs data
data processed w. R.I.
Borners info viz software repository
include
include
JERIC
SIGCSE proceedings
ACM DL
25CITIDEL Collection Building
thru
Nominating
Submitting
include after
after
or thru
Creating
include after
Searching, Browsing
Crawling
Composing
aided by
using
thru
using
GetSmart
Classifying
Crawlifier
VIADUCT
26Overview of CITIDEL architecture
27Distributed repository structure
28Digital library architecture for local and
interoperable CITIDEL services
29National Science Digital Library (NSDL)
- Domain undergraduate and K-12 education, etc.
- Genre educational resources
- Submission Collection sites of 90 projects ?
www.nsdl.org
30NSDL Information ArchitectureDeveloped by the
Technical Infrastructure Workgroup
User Interfaces
CoreNSDL Bus
Usage Enhancement
Collection Building
31Digital Library in a Box
- Domain helping DL projects
- Genre any domain, but especially those involved
in NSDL (since funded in part is through NSDL
with U. FL, NCSA) - Software and Documentation http//dlbox.nudl.org
32Outline
- Overview, Problem
- Experience Case Study Projects
- Open Archives Initiative
- Hussein Suleman Dissertation
- DL in a Box, OCKHAM
- Summary and Conclusion
33Open Archives Initiative OAI www.openarchives.org
openarchives_at_openarchives.org
34The World According to OAI
Service Providers
Discovery
Current Awareness
Preservation
Data Providers
35Technical Umbrella for Practical Interoperability
Metadata Harvesting
Reference Libraries
Museums
Publishers
E-PrintArchives
that can be exploited by different communities
36Tiered Model of Interoperability
Mediator services
Metadata harvesting
Document models
37OAI Black Box Perspective
Services
Browse
Summarize
Search
Visualize
Metadata
Docs
DO
DO
DO
DO
DO
DO
DO
38Aggregation throughOAI Harvesting
IEEE-CS, ACM,
39Protocol for Metadata Harvesting
- Service Requests
- Identify
- ListMetadataFormats
- ListSets
- GetRecord
- ListIdentifiers
- ListRecords
- Metadata Multiplicity
- Date/Time Ranges
- Sets (with semantics depending on local data
providers) - Resumption Tokens
40NDLTD OAI Example
41Outline
- Overview, Problem
- Experience Case Study Projects
- Open Archives Initiative
- Hussein Suleman Dissertation
- DL in a Box, OCKHAM
- Summary and Conclusion
42Open Digital Library (ODL) Hypothesis (Hussein
Suleman)
- Can we leverage the successful model of the OAI
Protocol for Metadata Harvesting to alleviate our
architectural problems ? - Maybe if
- Digital Libraries can be modeled as
- networks of extended Open Archives, where
- each extended Open Archive is a
- source of data and/or a provider of services.
43Example Architecture (NDLTD)
Virginia Tech
User Interface
PhysNet
Humboldt
Browse
Search
Recent
Duisburg
Union Catalog
CalTech
Dresden
MIT Filter
User Interface OAI/ODL archive OAI/ODL protocol
legend
MIT
44ODL Demonstration - FrontPage
45ODL Demonstration - Search
46ODL Demonstration - Browse
47Hussein Sulemans Thesis Summary
- Open Digital Libraries (DLs)
- Open Archives Initiative (OAI)
- Protocol for Metadata Harvesting (PMH)
- Extending OAI-PMH provides the glue for building
componentized DLs. - Lightweight protocols connect the components to
support modular systems with good efficiency.
48Research in a Nutshell
- We build extensible modular systems with
customizable services. - This supports interoperability and allows
distributed development. - This is in use in www.cstc.org,
AmericanSouth.org, www.citidel.org, - Components include search, browse, annotate,
editorial support, union, filter, whats-new,
submit, rate, recommend,
49?
users
digital objects
50componentized digital library
51open digital library
52ODL Component Requirements
- Search
- Retrieve a list of items
- Index new items
- Annotate
- Add annotation to item
- Retrieve a list of annotations for an item
53Open Digital Library Components
- Running now
- XML-File (data provider from file system)
- Union, search, browse, recent, filter
- E-journal/review, Submit, Edit, Annotation
- Class projects
- High performance multilingual search
- Recommender, Rating Mirroring (see JCDL02)
- Working with NCSA from DB, unstructured text
- Others discussed
- Classification/categorization
- DL-Viz interconnection (VIDI Jun Wang ETD)
54Open Digital Library Extended
As Metadata Search Service Provider
As Metadata Browse Service Provider
As Whats New Service Provider
As Annotation Search Service Provider
As Recommend Rate Service Provider
DBBrowse Browse Engine
IRDB-1 Search Engine
Recommend
IRDB-2 Search Engine
Whats New Engine
Rate Engine
XML File Coll. Data Provider 1
DBUnion Archive Merger Component
Annotation Engine
Harvest from data providers
XML File Coll. Data Provider 2
Filter
XML File Coll. Data Provider 3
OAI-PMH Data Provider
Submit Archive
OAIB (NCSA from RDBMS)
55Example Open Digital Library
ODLRecent
USER INTERFACE
Recent
PMH
ODLUnion
Filter
PMH
ODLUnion
Browse
Union
PMH
ODLBrowse
PMH
ODLUnion
Filter
PMH
Students and researchers
Search
ODLSearch
Digital Library for the Networked Digital
Library of Theses and Dissertations
(www.ndltd.org)
ETD collections
56Example Open Digital Library
Digital Library for the Computer Science Teaching
Center (www.cstc.org)
57CSTC User Interface
58OPEN ARCHIVE
59Layer 1 OAI PMH
- Protocol for Metadata Harvesting
- Transfer stream of metadata from one archive or
component to another - Service Requests
- Identify, ListSets, ListMetadataFormats
- GetRecord, ListIdentifiers, ListRecords
60Layer 2 Extended OAI-PMH
- OAI-PMH extensions for general-purpose
inter-component communication - Added in generic containers in every response for
additional information - Added PutRecord to submit a record
- Increased granularity to support times as well as
dates (same as OAI-PMH v2.0) - Ignored DC requirement
61Layer 3 ODL Protocols
- Specialized protocol semantics for different
components, e.g. - Search component uses ODLSearch protocol
- ListRecords and ListIdentifiers embed query terms
in set parameter - Annotation component uses ODLAnnotate protocol
- ListRecords and ListIdentifiers specify the item
for which annotations are requested in the set
parameter - PutRecord adds an annotation to an item
62Performance Optimizations
- Caching of responses
- Persistent CGI mechanisms
- FastCGI
- SpeedyCGI
- Request multiple records in a single operation
(proposed)
63What have we accomplished ?
- Complete protocol-level separation among
components within the DL - Seamless integration with little glue
- Simple extensions of OAI-PMH
- Modular and portable components
- Efficient in speed but not as efficient in
storage
64Outline
- Overview, Problem
- Experience Case Study Projects
- Open Archives Initiative
- Hussein Suleman Dissertation
- DL in a Box, OCKHAM
- Summary and Conclusion
65Digital Library In A Box
- http//dlbox.nudl.org
- Part of NSFs National Science Digital Library
(www.nsdl.org) - Offers Shrink-wrap Open Digital Library
Components Open Source Software - Users install ready-made digital library
solutions, or build their own from snap-together
components.
66(No Transcript)
67OCKHAM
- Simplicity (a la OCCAMs razor)
- Support by Mellon and DLF
- Next meeting in Atlanta Jan. 8, 2003
- Four main ideas
- Components
- Lightweight protocols
- Open reference models (e.g., 5S, OAIS)
- Community perspective and involvement
685S Layers
Societies
Scenarios
Spaces
Structures
Streams
69Outline
- Overview, Problem
- Experience Case Study Projects
- Open Archives Initiative
- Hussein Suleman Dissertation
- DL in a Box, OCKHAM
- Summary and Conclusion
70Summary and Conclusion
- It is possible to build DLs easily.
- The ODL approach to this has been developed and
validated in a number of settings. - Everyone is invited to
- Use ODL components
- Refine or add ODL components, protocols
- Join ODL and OCKHAM
- For more information see
71(Somewhat) Open Issues
- Is this scalable? Portable ? Extensible ?
- Can we define all popular DL services using such
a methodology? (completeness problem) - Can we define DLs as configurations of ODL
components? (composition problem) - Is OAI-PMH a good baseline protocol ? Can we
design a better baseline protocol upon which to
base harvesting and repository access? - To what degree is an ODL network equivalent to a
monolithic system? (comparison problem)
72Ultimate Goal
- Package different configurations into instant DL
systems or subsystems - DL building component configuration
- All DLs speak the same language(s)
- Basic services are trivial to provide so more
effort is spent on advanced capabilities of DLs
73Selected Links
- CITIDEL www.citidel.org
- NCSTRL www.ncstrl.org
- NDLTD www.ndltd.org
- NSDL www.nsdl.org
- Open Archives Initiative
- www.openarchives.org
- www.openarchives.org/OAI/openarchivesprotocol.htm
- www.dlib.vt.edu/projects/OAI/
74More Links
- Hussein Sulemans Dissertation
- http//purl.org/net/hsdiss/odl.pdf
- Repository Explorer
- http//purl.org/net/oai_explorer
- DL Courseware http//ei.cs.vt.edu/dlib
- Virginia Tech Digital Library Research Laboratory
(DLRL) www.dlib.vt.edu - Listservs
- dl-in-a-box-l_at_listserv.vt.edu
- ockham-sys_at_listserv.cc.emory.edu