Next Generation Digital - PowerPoint PPT Presentation

1 / 67
About This Presentation
Title:

Next Generation Digital

Description:

Standard for transferring metadata among digital libraries ... XSL and XML: Interface rendering with multi-lingual community based translation ... – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 68
Provided by: mult54
Category:
Tags: digital | generation | next | xsl

less

Transcript and Presenter's Notes

Title: Next Generation Digital


1
Part 4
  • Next Generation Digital
  • Libraries Supporting
  • Interoperability,
  • Semantics, and Quality

2
(No Transcript)
3
(No Transcript)
4
(No Transcript)
5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
OAI, ODL, DL-in-a-box
  • Open Archives Initiative
  • since 1999, www.openarchives.org
  • Open Digital Libraries
  • since 2001, from www.dlib.vt.edu
  • with Hussein Suleman (now U. Cape Town)
  • DL-in-a-box
  • NSDL support since 2001
  • Aimed to help new collections / services projects
  • http//dlbox.nudl.org

9
Open Archives Initiative (OAI)
  • Advocacy for interoperability
  • Standard for transferring metadata among digital
    libraries
  • Protocol for Metadata Harvesting (PMH)
  • Simplicity
  • Generality
  • Extensibility
  • Support for PMH gt Open Archive (OA)

10
OAI Technical Umbrella forPractical
Interoperability
Metadata Harvesting
Reference Libraries
Museums
Publishers
E-PrintArchives
that can be exploited by different communities
11
OAI Repository Perspective
Required Protocol
Set Structure
URI Scheme
MDO
MDO
MDO
MDO
MDO
MDO
MDO
MDO
Required DC
DO
DO
DO
DO
12
OAI Black Box Perspective
13
Tiered Model of Interoperability
Mediator services
Metadata harvesting
Document models
14
The World According to OAI
Service Providers
Discovery
Current Awareness
Preservation
Data Providers
15
?
users
digital objects
16
Monolithic and/or Custom-built web-based applicati
on
?
?
digital library
17
componentized digital library
18
open digital library
19
Protocol for Metadata Harvesting
20
OPEN ARCHIVE
21
Open Digital Library Deployments
  • NDLTD (www.ndltd.org)
  • Computer Science Teaching Center (www.cstc.org)
  • Computing and Information Technology Interactive
    Digital Educational Library (www.citidel.org)
  • Open Archives Distributed (NSF, DFG)
    enhancements to PhysNet
  • OCKHAM
  • Open to others through DL-in-a-box

22
Open Digital Library
  • Network of Extended Open Archives where each node
    acts as either a provider of data, services or
    both.
  • Component Node
  • Protocol Arc

23
Open Digital Library Components
  • Running now
  • XML-File (data provider from file system)
  • Search simple or in-memory (Essex) or
    generalized
  • Union, browse, recent, filter
  • E-journal/review, Submit, Edit, Annotation
  • Recommender, Rating Mirroring (see JCDL02)
  • Working with NCSA from DB, unstructured text
  • Others in process
  • Classification/categorization
  • Registry (and other connections with web services)

24
Example Open Digital Library
ODLRecent
USER INTERFACE
Recent
PMH
ODLUnion
Filter
PMH
ODLUnion
Union
Browse
PMH
ODLBrowse
PMH
ODLUnion
Filter
PMH
Search
ODLSearch
ETD DL for the Networked Digital Library of
Theses and Dissertations (www.ndltd.org)
Students and researchers
ETD collections
25
Open Digital Library Extended
As Metadata Search Service Provider
As Metadata Browse Service Provider
As Whats New Service Provider
As Annotation Search Service Provider
As Recommend Rate Service Provider
DBBrowse Browse Engine
IRDB-1 Search Engine
Recommend
IRDB-2 Search Engine
Whats New Engine
Rate Engine
XML File Coll. Data Provider 1
DBUnion Archive Merger Component
Annotation Engine
Harvest from data providers
XML File Coll. Data Provider 2
Filter
XML File Coll. Data Provider 3
OAI-PMH Data Provider
Submit Archive
OAIB (NCSA from RDBMS)
26
  • CITIDEL Technology Features
  • Component architecture (Open Digital Library)
  • Re-use and compose re-deployable digital library
    components.
  • Built Using Open Standards Technologies
  • OAI Used to collect DL Resources and DL
    Interoperability
  • XSL and XML Interface rendering with
    multi-lingual community based translation of
    screens and content (Spanish, )
  • Perl Component Integration
  • ESSEX Search Engine Functionality
  • Very fast, utilizing in-memory processing
  • Includes snap-shots for persistence
  • Multi-scheming
  • Integrates multiple classifications / views
    through maps, closure

27
Multi-dimensional Categorization
28
OCKHAM Initiative, Contact Info
  • Supported by DL Federation, Mellon, NSF,
  • P2P University Network involving
  • Emory, Notre Dame, U. Arizona, Virginia Tech,
  • PI Martin Halbert
  • Phone 404-727-2204
  • Email mhalber_at_emory.edu
  • OCKHAM URL
  • http//ockham.library.emory.edu

29
The Problem
  • Digital library development is complex and
    expensive.
  • Various DL development communities (in the USA at
    least) are not working together well.
  • Results exhibit much incompatibility, little
    common practice, slow progress, and no leverage
    on investment.
  • If this continues, we are just going to languish
    and fester.

30
Lightweight Protocols
  • Lightweight, or relatively small and simple
    protocols seem to have clear advantages over
    Full protocols that attempt to be
    comprehensive.
  • Successes of protocols considered lightweight is
    illuminating.
  • Examples TCP/IP, HTTP, LDAP, and the OAI PMH

31
Reference Models
  • Reference Model a common vocabulary and
    description of components, services, and
    inter-relationships that comprise a system under
    consideration
  • Useful as a tool to foster consensus and common
    understanding in a time of rapid change and/or
    disagreement
  • Explored in CS6604 class project with 2 focus
    groups librarians, education experts

32
Current Focus Peer-to-Peer (P2P) Lightweight
(Protocol) Reference Models
  • Builds on successful example of the OAI PMH,
    clearly understood minimalist concept of metadata
    distribution, implemented in simple protocols
    (e.g., ODL)
  • Leads to developing simple reference models of
    specific subsystems, with associated simple
    protocols and standards
  • Testing in NSDL, connecting university libraries
    to support teaching learning

33
OCKHAM Proposed Services
  • Alerting
  • Browsing
  • Cataloging
  • Conversion
  • OAI Z39.50
  • Pathfinding
  • Registry prototype in CS6604 now
  • (plus others such as from adapted ODL)

34
DL Student Research Gonçalves
  • 5S as a basis for developing digital libraries
  • Theory
  • Syntax, Semantics Definitions, Relationships
  • Specification of requirements
  • Generation of systems
  • Quality

35
Motivation for 5S
  • DLs are not benefiting from formal theories as
    have other CS fields DB, IR, PL, etc.
  • DL construction difficult, ad-hoc, lacking
    support for tailoring/customization
  • Conceptual modeling, requirements analysis, and
    methodological approaches are rarely supported in
    DL development.
  • Lack of specific DL models, formalisms, languages

36
(No Transcript)
37
5S Layers
Societies
Scenarios
Spaces
Structures
Streams
38
5S Model Examples, Objectives
39
Intra-Model Relationships Streams
  • Participant concepts text, image, video,
    audio
  • Relations
  • contains ? video ? image ? video? audio
  • Streams define the basic content types over which
    digital objects are built the latter being the
    ultimate carriers of the information in the DL.
  • However some complex types of streams (e.g.,
    video) may themselves be associated with simpler
    types of streams (e.g., images, audio).
  • This relation indicates that a video contains a
    image as one of its frames or a specific audio
    recording.

40
(No Transcript)
41
DL Services/Activities Taxonomy (Gonçalves)
Information Satisfaction Services
Infrastructure Services
Add Value
Repository-Building
Preservational
Creational
Browsing Collaborating Customizing Filtering Provi
ding access Recommending Requesting Searching Visu
alizing
Annotating Classifying Clustering Evaluating Extra
cting Indexing Measuring Publicizing Rating Review
ing (peer) Surveying Translating (language)
Conserving Converting Copying/Replicating Emulatin
g Renewing Translating (format)
Acquiring Cataloging Crawling (focused) Describing
Digitizing Federating Harvesting Purchasing Submi
tting
42
Services, Definitions, Parameters
  • In the table each service is characterized by
  • parameters (input, output)
  • of the initial and final events
  • of the scenarios that compose those services and
  • respective pre- and post-conditions which are
    represented in terms of rules on DL relations.
  • All other previous definitions and keys apply
    here.
  • That set is complemented with the following
    definitions

43
Services Related Definitions
  • A query q is the representation of user interest
    or information need.
  • Hyptxt is an hypertext wherein anchor is a node.
  • A log_entry is a descriptive metadata
    specification about an event of a scenario.
  • Let doi doi1, doi2,, doin be a set of
    digital objects and Ct c1, c2,,cn is a set
    of labels for categories. A classifier classCt
    doi ? 2Ct is a function that maps a digital
    object to a set of categories.
  • A cluster cluk do1k, do2k, , donk is a
    subset of a set of digital objects.

44
(No Transcript)
45
(No Transcript)
46
DL Services I/O Behavior
  • Regarding the prior figure, which shows
  • Instantiations of the Services Definition model
  • Inputs and outputs of examples of infrastructure
    and information satisfaction DL services
  • Key
  • CDL Collection
  • ICDL index for collection CDL
  • doi digital object
  • Soc Society

47
(No Transcript)
48
Defining Quality in Digital Libraries
49
(No Transcript)
50
Completeness of Metadata (1)
  • Degree of completeness of a metadata
    specification msx
  • Completeness(msx) 1 - (no. of missing
    attributes in msx/ total attributes of the schema
    to which msx conforms)
  • According to 5S definition of conformance

51
Completeness of Metadata (2)
  • Example of application
  • OCLC NDLTD Union
  • average of completeness of all metadata
    specifications (records)
  • of the NDLTD union Archive
  • administered by OCLC
  • as of Feb, 23, 2004
  • regarding to the Dublin Core metadata standard
    (15 attributes)

52
Completeness of Metadata (3)
53
Collection Completeness (1)
  • Defn A complete DL collection Cx is one which
    contains all the pertinent existing digital
    objects.
  • completeness(Cx)
  • Cx /ideal collection
  • can be defined as the ratio between the size of
    Cx and the ideal real-world collection

54
Collection Completeness (2)
  • Example of use. Computing collections
  • The ACM Guide is a collection of bibliographic
    references and abstracts of works published by
    ACM and other publishers.
  • The Guide can be considered a good approximation
    of an ideal computing collection it contains
    most of the different types of computing-related
    literature (about 735K works)

55
(No Transcript)
56
Reliability (1)
  • Scope operations of DL
  • Defn the probability that the service will not
    fail during a given period of time Hansen83
  • Example of use CITIDEL services
  • Example details using log analysis April 1

57
Reliability (2)
58
Extensibility, Reusability (1)
  • Scope Design and Implementation of DL services
  • Two main classes
  • Composability of services
  • Extensibility
  • Reusability
  • Quality aspects of models and implementations
  • completeness, consistency, correctness, soundness

59
Extensibility, Reusability (2)
  • Micro-Reusability(Serv) (? LOC(smx)
    reused(sei),
  • smx ? SM, sei ? Serv, sex runs sei) / ?LOC(sm),
    ?sm ? SM,
  • where LOC corresponds to the number of lines of
    code of a service manager
  • Macro-Reusability(Serv) ? reused(sei), sei ?
    Serv/ Serv, where reused is a indicator
    function defined as
  • 1, if ? smj sej reuses si
  • 0, otherwise

60
Extensibility, Reusability (3)
  • Example ETANA-DL
  • Consider
  • Services
  • Use of existing ODL component
  • Lines of Code (LOC)
  • Reused from component
  • Added for implementation

61
(No Transcript)
62
Extensibility, Reusability (5)
  • Macro-Reusability(ETANA DL Services)
  • 3/13 0.23
  • only a few important services are componentized
  • Micro-Reusability
  • 3630/11910 0.304
  • we can re-use a very significant percentage of
    DL code by implementing common DL services as
    components

63
Review of Gonçalves Achievements in Past Year
  • Book Chapters
  • Fox, E. A., Gonçalves, M. A., Luo, M., Chen, Y.,
    Krowne, A., Zhang, B., McDevitt,, K.
    Pérez-Quiñones, M., Cassel, L. N. Harvesting
    Broadening the Field of Distributed Information
    Retrieval. In Multimedia Distributed Information
    Retrieval, eds. Fabio Crestani, Mark Sanderson,
    and Jamie Callan, 2003.
  • Fox, E., McMillan, G., Suleman, H., Gonçalves,
    M., Networked Digital Library of Theses and
    Dissertations. Invited chapter for Digital
    Libraries Policy, Planning, and Practice, eds.
    Judith Andrews and Derek Law, Ashgate Publishing,
    2003
  • Journal papers
  • 5S TOIS paper (April 2004, issue)
  • S. Perugini, M. A. Gonçalves, and E. A. Fox. A
    Connection-Centric Survey of Recommender Systems
    Research. Journal of Intelligent Information
    Systems, Jun, 2004.
  • Zhu, Q., Gonçalves, M. A., Fox, E. A.. 5SGraph
    A Domain-Specific Visual Modeling Tool for
    Digital Libraries. Journal of the American
    Society for Information Science and Technology,
    submitted 2003, in revision
  • Baoping Zhang, Marcos Andre Goncalves, Yuxin
    Chen, Edward A. Fox, and Pavel Calado, "Combining
    Support Vector Machines and Structural Rules for
    Effective Filtering of OAI-Based Repositories",
    submitted to Journal of Digital Libraries
    (Springer Verlag) Special Issue on Asian Digital
    Libraries, 2004

64
  • Conference papers
  • Pável P. Calado, Marcos André Gonçalves, Edward
    A. Fox, Berthier Ribeiro-Neto, Alberto H. F.
    Laender, Altigran S. da Silva, Davi C. Reis,
    Pablo A. Roberto,Monique V. Vieira, and Juliano
    P. Lage. The Web-DL Environment for Building
    Digital Libraries from the Web. JCDL'2003, Third
    Joint ACM / IEEE-CS Joint Conference on Digital
    Libraries, May 27-31, 2003, Houston.
  • Marcos André Gonçalves, Ganesh Panchanathan,
    Unnikrishnan Ravindranathan, Aaron Krowne, Edward
    A. Fox, Filip Jagodzinski, and Lillian Cassel.
    The XML Log Standard for Digital Libraries
    Analysis, Evolution, and Deployment. Proc.
    JCDL'2003, Third Joint ACM / IEEE-CS Joint
    Conference on Digital Libraries, May 27-31, 2003,
    Houston.
  • Qinwei Zhu, Marcos André Gonçalves, Rao Shen,
    Lillian Cassel, Edward A. Fox. Visual Semantic
    Modeling of Digital Libraries. ECDL'2003, 7th
    European Conference on Research and Advanced
    Technology for Digital Libraries, 17-22 August,
    2003, Trondheim, Norway.
  • Rohit Kelapure, Marcos André Gonçalves, Edward A.
    Fox. Scenario-Based Generation of Digital Library
    Services. ECDL'2003, 7th European Conference on
    Research and Advanced Technology for Digital
    Libraries, 17-22 August, Trondheim, Norway
  • Marco Cristo, Pavel Calado, Edleno Moura, Nivio
    Ziviani, Berthier Ribeiro-Neto, and Marcos André
    Gonçalves. Combining Link-Based and Content-Based
    Methods for Web Document Classification. CIKM
    2003, 3-8 November, New Orleans, Louisiana, USA,
    2003.
  • Baoping Zhang, Marcos Andre Goncalves, and Edward
    A. Fox. An OAI-based Filtering Service for
    CITIDEL from NDLTD. ICADL 2003, 6th International
    Conference of Asian Digital Libraries, 8-11
    December, Kuala Lumpur, Malaysia, 2003
  • U. Ravindranathan, R. Shen, M. A. Goncalves, W.
    Fan, E. A. Fox, and J. W. Flanagan. ETANA-DL A
    Digital Library for Integrated Handling of
    Heterogeneous Archaeological Data. To be
    presented at ACM-IEEE Joint Conference on Digital
    Libraries (JCDL 2004), Tucson, AZ, June 7-11,
    2004.

65
  • Conference papers
  • U. Ravindranathan, R. Shen, M. A. Goncalves, W.
    Fan, E. A. Fox, and J. W. Flanagan. ETANA-DL A
    Digital Library for Integrated Handling of
    Heterogeneous Archaeological Data. To be
    presented at ACM-IEEE Joint Conference on Digital
    Libraries (JCDL 2004), Tucson, AZ, June 7-11,
    2004.
  • M. A. Goncalves, E. A. Fox, A. Krowne, P. Calado,
    A. H. F. Laender, A. S. da Silva, and B.
    Ribeiro-Neto. The Effectiveness of Automatically
    Structured Queries in Digital Libraries. To be
    presented at ACM-IEEE Joint Conference on Digital
    Libraries (JCDL 2004), Tucson, AZ, June 7-11,
    2004.
  • Alberto H. F. Laender, M. A. Goncalves, Pablo A.
    Roberto. BDBComp Building a Digital Library for
    the Brazilian Computer Science Community. To be
    presented at ACM-IEEE Joint Conference on Digital
    Libraries (JCDL 2004), Tucson, AZ, June 7-11,
    2004.
  • U. Ravindranathan, R. Shen, M. A. Goncalves, W.
    Fan, E. A. Fox, and J. W. Flanagan. Prototyping
    Digital Libraries Handling Heterogeneous Data
    Sources - The ETANA-DL Case Study. European
    Conference on Digital Libraries (ECDL 2004),
    Bath, UK, September 12-17, 2004. (submitted)
  • Other publications
  • R. da S. Torres, C. B. Medeiros, M. A. Goncalves,
    and E. A. Fox. An OAI-based Digital Library
    Framework for Biodiversity Information Systems.
    Department of Computer Science, Virginia Tech,
    Technical Report No. TR-04-01, 2004.
  • R. da S. Torres, C. B. Medeiros, M. A. Goncalves,
    and E. A. Fox. An OAI Compliant Content-Based
    Image Search Component. Demo to be presented at
    ACM-IEEE Joint Conference on Digital Libraries
    (JCDL 2004), Tucson, AZ, June 7-11, 2004.
  • R. da S. Torres, C. B. Medeiros, Renata Q.
    Dividino, Mauricio A. Figueiredo, M. A.
    Goncalves, E. A. Fox, and R. Richardson. Using
    Digital Library Components for Biodiversity
    Systems. Poster to be presented at ACM-IEEE Joint
    Conference on Digital Libraries (JCDL 2004),
    Tucson, AZ, June 7-11, 2004.
  • U. Ravindranathan, R. Shen, M. A. Goncalves, W.
    Fan, E. A. Fox, and J. W. Flanagan. ETANA-DL
    Managing Complex Information Applications An
    Archaeology Digital Library. Demo to be presented
    at ACM-IEEE Joint Conference on Digital Libraries
    (JCDL 2004), Tucson, AZ, June 7-11, 2004.
  • Qinwei Zhu, Marcos André Gonçalves, E. Fox.
    5SGraph Demo A Graphical Modeling Tool for
    Digital Libraries. Proc. JCDL'2003, Third Joint
    ACM / IEEE-CS Joint Conference on Digital
    Libraries, May 27-31, 2003, Houston.

66
Proposed Outline of Dissertation(Marcos André
Gonçalves)
  • Chapter 1 Introduction and Motivation
  • Chapter 2 Background and Related Work
  • Chapter 3 Streams, Structures, Spaces,
    Scenarios and Societies the 5S Formal Model for
    Digital Libraries
  • Chapter 4 Towards a Digital Library Theory A
    Formal Digital Library Ontology based on 5S
  • Chapter 5 Applications of the 5S Model/Ontology
  • 5.1 Declarative Specification of DLs the 5S
    Language
  • 5.2 Semantic Visual Modeling of DLs the 5SGraph
    Tool
  • 5.3 (Semi-) Automatic Generation of Componentized
    DLs The 5SGen Tool
  • 5.4 Evaluating DLs The XML Log Standard for DLs
  • 5.5 Formally comparing Architectures Fedora and
    Buckets (time permitting)
  • Chapter 6 Defining Quality in Digital Libraries
  • Chapter 7 Conclusions and Future Work
  • Appendix 1- Mathematical Preliminaries

67
Questions/Discussion?
Write a Comment
User Comments (0)
About PowerShow.com