Title: Next Generation Digital
1Part 4
- Next Generation Digital
- Libraries Supporting
- Interoperability,
- Semantics, and Quality
2(No Transcript)
3(No Transcript)
4(No Transcript)
5(No Transcript)
6(No Transcript)
7(No Transcript)
8OAI, ODL, DL-in-a-box
- Open Archives Initiative
- since 1999, www.openarchives.org
- Open Digital Libraries
- since 2001, from www.dlib.vt.edu
- with Hussein Suleman (now U. Cape Town)
- DL-in-a-box
- NSDL support since 2001
- Aimed to help new collections / services projects
- http//dlbox.nudl.org
9Open Archives Initiative (OAI)
- Advocacy for interoperability
- Standard for transferring metadata among digital
libraries - Protocol for Metadata Harvesting (PMH)
- Simplicity
- Generality
- Extensibility
- Support for PMH gt Open Archive (OA)
10OAI Technical Umbrella forPractical
Interoperability
Metadata Harvesting
Reference Libraries
Museums
Publishers
E-PrintArchives
that can be exploited by different communities
11OAI Repository Perspective
Required Protocol
Set Structure
URI Scheme
MDO
MDO
MDO
MDO
MDO
MDO
MDO
MDO
Required DC
DO
DO
DO
DO
12OAI Black Box Perspective
13Tiered Model of Interoperability
Mediator services
Metadata harvesting
Document models
14The World According to OAI
Service Providers
Discovery
Current Awareness
Preservation
Data Providers
15?
users
digital objects
16Monolithic and/or Custom-built web-based applicati
on
?
?
digital library
17componentized digital library
18open digital library
19Protocol for Metadata Harvesting
20OPEN ARCHIVE
21Open Digital Library Deployments
- NDLTD (www.ndltd.org)
- Computer Science Teaching Center (www.cstc.org)
- Computing and Information Technology Interactive
Digital Educational Library (www.citidel.org) - Open Archives Distributed (NSF, DFG)
enhancements to PhysNet - OCKHAM
- Open to others through DL-in-a-box
22Open Digital Library
- Network of Extended Open Archives where each node
acts as either a provider of data, services or
both. - Component Node
- Protocol Arc
23Open Digital Library Components
- Running now
- XML-File (data provider from file system)
- Search simple or in-memory (Essex) or
generalized - Union, browse, recent, filter
- E-journal/review, Submit, Edit, Annotation
- Recommender, Rating Mirroring (see JCDL02)
- Working with NCSA from DB, unstructured text
- Others in process
- Classification/categorization
- Registry (and other connections with web services)
24Example Open Digital Library
ODLRecent
USER INTERFACE
Recent
PMH
ODLUnion
Filter
PMH
ODLUnion
Union
Browse
PMH
ODLBrowse
PMH
ODLUnion
Filter
PMH
Search
ODLSearch
ETD DL for the Networked Digital Library of
Theses and Dissertations (www.ndltd.org)
Students and researchers
ETD collections
25Open Digital Library Extended
As Metadata Search Service Provider
As Metadata Browse Service Provider
As Whats New Service Provider
As Annotation Search Service Provider
As Recommend Rate Service Provider
DBBrowse Browse Engine
IRDB-1 Search Engine
Recommend
IRDB-2 Search Engine
Whats New Engine
Rate Engine
XML File Coll. Data Provider 1
DBUnion Archive Merger Component
Annotation Engine
Harvest from data providers
XML File Coll. Data Provider 2
Filter
XML File Coll. Data Provider 3
OAI-PMH Data Provider
Submit Archive
OAIB (NCSA from RDBMS)
26- CITIDEL Technology Features
- Component architecture (Open Digital Library)
- Re-use and compose re-deployable digital library
components. - Built Using Open Standards Technologies
- OAI Used to collect DL Resources and DL
Interoperability - XSL and XML Interface rendering with
multi-lingual community based translation of
screens and content (Spanish, ) - Perl Component Integration
- ESSEX Search Engine Functionality
- Very fast, utilizing in-memory processing
- Includes snap-shots for persistence
- Multi-scheming
- Integrates multiple classifications / views
through maps, closure
27Multi-dimensional Categorization
28OCKHAM Initiative, Contact Info
- Supported by DL Federation, Mellon, NSF,
- P2P University Network involving
- Emory, Notre Dame, U. Arizona, Virginia Tech,
- PI Martin Halbert
- Phone 404-727-2204
- Email mhalber_at_emory.edu
- OCKHAM URL
- http//ockham.library.emory.edu
29The Problem
- Digital library development is complex and
expensive. - Various DL development communities (in the USA at
least) are not working together well. - Results exhibit much incompatibility, little
common practice, slow progress, and no leverage
on investment. - If this continues, we are just going to languish
and fester.
30Lightweight Protocols
- Lightweight, or relatively small and simple
protocols seem to have clear advantages over
Full protocols that attempt to be
comprehensive. - Successes of protocols considered lightweight is
illuminating. - Examples TCP/IP, HTTP, LDAP, and the OAI PMH
31Reference Models
- Reference Model a common vocabulary and
description of components, services, and
inter-relationships that comprise a system under
consideration - Useful as a tool to foster consensus and common
understanding in a time of rapid change and/or
disagreement - Explored in CS6604 class project with 2 focus
groups librarians, education experts
32Current Focus Peer-to-Peer (P2P) Lightweight
(Protocol) Reference Models
- Builds on successful example of the OAI PMH,
clearly understood minimalist concept of metadata
distribution, implemented in simple protocols
(e.g., ODL) - Leads to developing simple reference models of
specific subsystems, with associated simple
protocols and standards - Testing in NSDL, connecting university libraries
to support teaching learning
33OCKHAM Proposed Services
- Alerting
- Browsing
- Cataloging
- Conversion
- OAI Z39.50
- Pathfinding
- Registry prototype in CS6604 now
- (plus others such as from adapted ODL)
34DL Student Research Gonçalves
- 5S as a basis for developing digital libraries
- Theory
- Syntax, Semantics Definitions, Relationships
- Specification of requirements
- Generation of systems
- Quality
35Motivation for 5S
- DLs are not benefiting from formal theories as
have other CS fields DB, IR, PL, etc. - DL construction difficult, ad-hoc, lacking
support for tailoring/customization - Conceptual modeling, requirements analysis, and
methodological approaches are rarely supported in
DL development. - Lack of specific DL models, formalisms, languages
36(No Transcript)
375S Layers
Societies
Scenarios
Spaces
Structures
Streams
385S Model Examples, Objectives
39Intra-Model Relationships Streams
- Participant concepts text, image, video,
audio - Relations
- contains ? video ? image ? video? audio
- Streams define the basic content types over which
digital objects are built the latter being the
ultimate carriers of the information in the DL. - However some complex types of streams (e.g.,
video) may themselves be associated with simpler
types of streams (e.g., images, audio). - This relation indicates that a video contains a
image as one of its frames or a specific audio
recording.
40(No Transcript)
41DL Services/Activities Taxonomy (Gonçalves)
Information Satisfaction Services
Infrastructure Services
Add Value
Repository-Building
Preservational
Creational
Browsing Collaborating Customizing Filtering Provi
ding access Recommending Requesting Searching Visu
alizing
Annotating Classifying Clustering Evaluating Extra
cting Indexing Measuring Publicizing Rating Review
ing (peer) Surveying Translating (language)
Conserving Converting Copying/Replicating Emulatin
g Renewing Translating (format)
Acquiring Cataloging Crawling (focused) Describing
Digitizing Federating Harvesting Purchasing Submi
tting
42Services, Definitions, Parameters
- In the table each service is characterized by
- parameters (input, output)
- of the initial and final events
- of the scenarios that compose those services and
- respective pre- and post-conditions which are
represented in terms of rules on DL relations. - All other previous definitions and keys apply
here. - That set is complemented with the following
definitions
43Services Related Definitions
- A query q is the representation of user interest
or information need. - Hyptxt is an hypertext wherein anchor is a node.
- A log_entry is a descriptive metadata
specification about an event of a scenario. - Let doi doi1, doi2,, doin be a set of
digital objects and Ct c1, c2,,cn is a set
of labels for categories. A classifier classCt
doi ? 2Ct is a function that maps a digital
object to a set of categories. - A cluster cluk do1k, do2k, , donk is a
subset of a set of digital objects.
44(No Transcript)
45(No Transcript)
46DL Services I/O Behavior
- Regarding the prior figure, which shows
- Instantiations of the Services Definition model
- Inputs and outputs of examples of infrastructure
and information satisfaction DL services - Key
- CDL Collection
- ICDL index for collection CDL
- doi digital object
- Soc Society
47(No Transcript)
48Defining Quality in Digital Libraries
49(No Transcript)
50Completeness of Metadata (1)
- Degree of completeness of a metadata
specification msx - Completeness(msx) 1 - (no. of missing
attributes in msx/ total attributes of the schema
to which msx conforms) - According to 5S definition of conformance
51Completeness of Metadata (2)
- Example of application
- OCLC NDLTD Union
- average of completeness of all metadata
specifications (records) - of the NDLTD union Archive
- administered by OCLC
- as of Feb, 23, 2004
- regarding to the Dublin Core metadata standard
(15 attributes)
52Completeness of Metadata (3)
53Collection Completeness (1)
- Defn A complete DL collection Cx is one which
contains all the pertinent existing digital
objects. - completeness(Cx)
- Cx /ideal collection
- can be defined as the ratio between the size of
Cx and the ideal real-world collection
54Collection Completeness (2)
- Example of use. Computing collections
- The ACM Guide is a collection of bibliographic
references and abstracts of works published by
ACM and other publishers. - The Guide can be considered a good approximation
of an ideal computing collection it contains
most of the different types of computing-related
literature (about 735K works)
55(No Transcript)
56Reliability (1)
- Scope operations of DL
- Defn the probability that the service will not
fail during a given period of time Hansen83 - Example of use CITIDEL services
- Example details using log analysis April 1
57Reliability (2)
58Extensibility, Reusability (1)
- Scope Design and Implementation of DL services
- Two main classes
- Composability of services
- Extensibility
- Reusability
- Quality aspects of models and implementations
- completeness, consistency, correctness, soundness
59Extensibility, Reusability (2)
- Micro-Reusability(Serv) (? LOC(smx)
reused(sei), - smx ? SM, sei ? Serv, sex runs sei) / ?LOC(sm),
?sm ? SM, - where LOC corresponds to the number of lines of
code of a service manager - Macro-Reusability(Serv) ? reused(sei), sei ?
Serv/ Serv, where reused is a indicator
function defined as - 1, if ? smj sej reuses si
- 0, otherwise
60Extensibility, Reusability (3)
- Example ETANA-DL
- Consider
- Services
- Use of existing ODL component
- Lines of Code (LOC)
- Reused from component
- Added for implementation
61(No Transcript)
62Extensibility, Reusability (5)
- Macro-Reusability(ETANA DL Services)
- 3/13 0.23
- only a few important services are componentized
- Micro-Reusability
- 3630/11910 0.304
- we can re-use a very significant percentage of
DL code by implementing common DL services as
components
63Review of Gonçalves Achievements in Past Year
- Book Chapters
- Fox, E. A., Gonçalves, M. A., Luo, M., Chen, Y.,
Krowne, A., Zhang, B., McDevitt,, K.
Pérez-Quiñones, M., Cassel, L. N. Harvesting
Broadening the Field of Distributed Information
Retrieval. In Multimedia Distributed Information
Retrieval, eds. Fabio Crestani, Mark Sanderson,
and Jamie Callan, 2003. - Fox, E., McMillan, G., Suleman, H., Gonçalves,
M., Networked Digital Library of Theses and
Dissertations. Invited chapter for Digital
Libraries Policy, Planning, and Practice, eds.
Judith Andrews and Derek Law, Ashgate Publishing,
2003 - Journal papers
- 5S TOIS paper (April 2004, issue)
- S. Perugini, M. A. Gonçalves, and E. A. Fox. A
Connection-Centric Survey of Recommender Systems
Research. Journal of Intelligent Information
Systems, Jun, 2004. - Zhu, Q., Gonçalves, M. A., Fox, E. A.. 5SGraph
A Domain-Specific Visual Modeling Tool for
Digital Libraries. Journal of the American
Society for Information Science and Technology,
submitted 2003, in revision - Baoping Zhang, Marcos Andre Goncalves, Yuxin
Chen, Edward A. Fox, and Pavel Calado, "Combining
Support Vector Machines and Structural Rules for
Effective Filtering of OAI-Based Repositories",
submitted to Journal of Digital Libraries
(Springer Verlag) Special Issue on Asian Digital
Libraries, 2004
64- Conference papers
- Pável P. Calado, Marcos André Gonçalves, Edward
A. Fox, Berthier Ribeiro-Neto, Alberto H. F.
Laender, Altigran S. da Silva, Davi C. Reis,
Pablo A. Roberto,Monique V. Vieira, and Juliano
P. Lage. The Web-DL Environment for Building
Digital Libraries from the Web. JCDL'2003, Third
Joint ACM / IEEE-CS Joint Conference on Digital
Libraries, May 27-31, 2003, Houston. - Marcos André Gonçalves, Ganesh Panchanathan,
Unnikrishnan Ravindranathan, Aaron Krowne, Edward
A. Fox, Filip Jagodzinski, and Lillian Cassel.
The XML Log Standard for Digital Libraries
Analysis, Evolution, and Deployment. Proc.
JCDL'2003, Third Joint ACM / IEEE-CS Joint
Conference on Digital Libraries, May 27-31, 2003,
Houston. - Qinwei Zhu, Marcos André Gonçalves, Rao Shen,
Lillian Cassel, Edward A. Fox. Visual Semantic
Modeling of Digital Libraries. ECDL'2003, 7th
European Conference on Research and Advanced
Technology for Digital Libraries, 17-22 August,
2003, Trondheim, Norway. - Rohit Kelapure, Marcos André Gonçalves, Edward A.
Fox. Scenario-Based Generation of Digital Library
Services. ECDL'2003, 7th European Conference on
Research and Advanced Technology for Digital
Libraries, 17-22 August, Trondheim, Norway - Marco Cristo, Pavel Calado, Edleno Moura, Nivio
Ziviani, Berthier Ribeiro-Neto, and Marcos André
Gonçalves. Combining Link-Based and Content-Based
Methods for Web Document Classification. CIKM
2003, 3-8 November, New Orleans, Louisiana, USA,
2003. - Baoping Zhang, Marcos Andre Goncalves, and Edward
A. Fox. An OAI-based Filtering Service for
CITIDEL from NDLTD. ICADL 2003, 6th International
Conference of Asian Digital Libraries, 8-11
December, Kuala Lumpur, Malaysia, 2003 - U. Ravindranathan, R. Shen, M. A. Goncalves, W.
Fan, E. A. Fox, and J. W. Flanagan. ETANA-DL A
Digital Library for Integrated Handling of
Heterogeneous Archaeological Data. To be
presented at ACM-IEEE Joint Conference on Digital
Libraries (JCDL 2004), Tucson, AZ, June 7-11,
2004.
65- Conference papers
- U. Ravindranathan, R. Shen, M. A. Goncalves, W.
Fan, E. A. Fox, and J. W. Flanagan. ETANA-DL A
Digital Library for Integrated Handling of
Heterogeneous Archaeological Data. To be
presented at ACM-IEEE Joint Conference on Digital
Libraries (JCDL 2004), Tucson, AZ, June 7-11,
2004. - M. A. Goncalves, E. A. Fox, A. Krowne, P. Calado,
A. H. F. Laender, A. S. da Silva, and B.
Ribeiro-Neto. The Effectiveness of Automatically
Structured Queries in Digital Libraries. To be
presented at ACM-IEEE Joint Conference on Digital
Libraries (JCDL 2004), Tucson, AZ, June 7-11,
2004. - Alberto H. F. Laender, M. A. Goncalves, Pablo A.
Roberto. BDBComp Building a Digital Library for
the Brazilian Computer Science Community. To be
presented at ACM-IEEE Joint Conference on Digital
Libraries (JCDL 2004), Tucson, AZ, June 7-11,
2004. - U. Ravindranathan, R. Shen, M. A. Goncalves, W.
Fan, E. A. Fox, and J. W. Flanagan. Prototyping
Digital Libraries Handling Heterogeneous Data
Sources - The ETANA-DL Case Study. European
Conference on Digital Libraries (ECDL 2004),
Bath, UK, September 12-17, 2004. (submitted) - Other publications
- R. da S. Torres, C. B. Medeiros, M. A. Goncalves,
and E. A. Fox. An OAI-based Digital Library
Framework for Biodiversity Information Systems.
Department of Computer Science, Virginia Tech,
Technical Report No. TR-04-01, 2004. - R. da S. Torres, C. B. Medeiros, M. A. Goncalves,
and E. A. Fox. An OAI Compliant Content-Based
Image Search Component. Demo to be presented at
ACM-IEEE Joint Conference on Digital Libraries
(JCDL 2004), Tucson, AZ, June 7-11, 2004. - R. da S. Torres, C. B. Medeiros, Renata Q.
Dividino, Mauricio A. Figueiredo, M. A.
Goncalves, E. A. Fox, and R. Richardson. Using
Digital Library Components for Biodiversity
Systems. Poster to be presented at ACM-IEEE Joint
Conference on Digital Libraries (JCDL 2004),
Tucson, AZ, June 7-11, 2004. - U. Ravindranathan, R. Shen, M. A. Goncalves, W.
Fan, E. A. Fox, and J. W. Flanagan. ETANA-DL
Managing Complex Information Applications An
Archaeology Digital Library. Demo to be presented
at ACM-IEEE Joint Conference on Digital Libraries
(JCDL 2004), Tucson, AZ, June 7-11, 2004. - Qinwei Zhu, Marcos André Gonçalves, E. Fox.
5SGraph Demo A Graphical Modeling Tool for
Digital Libraries. Proc. JCDL'2003, Third Joint
ACM / IEEE-CS Joint Conference on Digital
Libraries, May 27-31, 2003, Houston.
66Proposed Outline of Dissertation(Marcos André
Gonçalves)
- Chapter 1 Introduction and Motivation
- Chapter 2 Background and Related Work
- Chapter 3 Streams, Structures, Spaces,
Scenarios and Societies the 5S Formal Model for
Digital Libraries - Chapter 4 Towards a Digital Library Theory A
Formal Digital Library Ontology based on 5S - Chapter 5 Applications of the 5S Model/Ontology
- 5.1 Declarative Specification of DLs the 5S
Language - 5.2 Semantic Visual Modeling of DLs the 5SGraph
Tool - 5.3 (Semi-) Automatic Generation of Componentized
DLs The 5SGen Tool - 5.4 Evaluating DLs The XML Log Standard for DLs
- 5.5 Formally comparing Architectures Fedora and
Buckets (time permitting) - Chapter 6 Defining Quality in Digital Libraries
- Chapter 7 Conclusions and Future Work
- Appendix 1- Mathematical Preliminaries
67Questions/Discussion?