Implementation of Digital Libraries Michael L. Nelson Old Dominion University mln@cs.odu.edu http://www.cs.odu.edu/~mln/ - PowerPoint PPT Presentation

About This Presentation
Title:

Implementation of Digital Libraries Michael L. Nelson Old Dominion University mln@cs.odu.edu http://www.cs.odu.edu/~mln/

Description:

Implementation of Digital Libraries Michael L. Nelson Old Dominion University mln_at_cs.odu.edu http://www.cs.odu.edu/~mln/ Congreso Internacional de Informaci n en Salud – PowerPoint PPT presentation

Number of Views:468
Avg rating:3.0/5.0
Slides: 63
Provided by: Michael2407
Learn more at: https://www.cs.odu.edu
Category:

less

Transcript and Presenter's Notes

Title: Implementation of Digital Libraries Michael L. Nelson Old Dominion University mln@cs.odu.edu http://www.cs.odu.edu/~mln/


1
Implementation of Digital LibrariesMichael L.
NelsonOld Dominion Universitymln_at_cs.odu.eduhttp
//www.cs.odu.edu/mln/
  • Congreso Internacional de Información en Salud
  • Lima, Peru
  • May 28, 2004

2
Acknowledgements
  • ODU K. Maly, M. Zubair, J. Bollen
  • LANL R. Luce, X. Liu
  • NASA G. Roncaglia, J. Rocker, C. Mackey
  • Cornell C. Lagoze, S. Warner
  • MAGiC (UK) Paul Needham
  • and, of course, Herbert Van de Sompel (LANL)
  • the OpenURL slides are nicked from his
    presentations

3
Outline
  • A bit of history
  • Core technologies Issues
  • OAI-PMH
  • deep web
  • OpenURL
  • Handles / DOIs
  • Object Models
  • Example implementations
  • Download and go

covered only briefly
4
OAI-PMH
5
Background
  • I met Herbert Van de Sompel in April 1999...
  • we spoke of a demonstration project he had in
    mind and had received sponsorship from Paul
    Ginsparg and Rick Luce
  • We wanted to demonstrate a multi-disciplinary DL
    that leveraged the large number of high quality,
    yet often isolated, tech report servers, e-print
    servers, etc.
  • most digital libraries (DLs) had grown up along
    single disciplines or institutions
  • little to no interoperability isolated DL
    gardens
  • Universal Preprint Service
  • Demonstrated at Santa Fe NM, October 21-22, 1999
  • http//web.archive.org/web//http//ups.cs.odu.edu
    /
  • D-Lib Magazine, 6(2) 2000 (2 articles)
  • http//www.dlib.org/dlib/february00/02contents.htm
    l
  • UPS was soon renamed the Open Archives Initiative
    (OAI) http//www.openarchives.org/

6
Result OAI
  • The OAI was the result of the demonstration and
    discussion during the Santa Fe meeting
  • OAI a bunch of people, a religion, a cult, etc.
  • OAI Protocol For Metadata Harvesting (OAI-PMH)
    the protocol created and maintained by the OAI
  • Initial focus was on federating collections of
    scholarly e-print materials
  • however, interest grew and the scope and
    application of OAI-PMH expanded to become a
    generic bulk metadata transport protocol
  • Note
  • OAI-PMH is only about metadata -- not full text!
  • but what is metadata vs. full-text?
  • OAI is neutral with respect to the nature of the
    metadata or the resources the metadata describes
  • read commercial publishers have an interest in
    OAI-PMH too...

7
OAI-PMH Mechanics
Request is encoded in http
Response is encoded in XML
XML Schema for the responses are defined in the
OAI-PMH document
8
Overview of OAI-PMH Verbs
Verb Function
Identify description of archive
ListMetadataFormats metadata formats supported by archive
ListSets sets defined by archive
ListIdentifiers OAI unique ids contained in archive
ListRecords listing of N records
GetRecord listing of a single record
archival metadata
harvesting verbs
most verbs take arguments dates, sets, ids,
metadata formats and resumption token (for flow
control)
9
OAI-PMH Data Model
item identifier
record identifier metadata format datestamp
10
Data Providers / Service Providers
11
Aggregators
  • aggregators allow for
  • scalability for OAI-PMH
  • load balancing
  • community building
  • discovery

service providers (harvesters)
data providers (repositories)
aggregator
12
Aggregators
  • Frequently interchangeable terms
  • aggregators likely to be community /
    institutionally focused
  • caches stores a copy, less likely to be
    community-oriented
  • proxies less likely to store a copy, may gateway
    between OAI-PMH and other protocols
  • Dienst / OAI Gateway Harrison, Nelson, Zubair,
    JCDL 03
  • To learn more about aggregators, caches
    proxies
  • http//www.openarchives.org/OAI/2.0/guidelines-agg
    regator.htm
  • http//www.cs.odu.edu/mln/jcdl03/

13
Example Aggregators
  • Arc - http//arc.cs.odu.edu/
  • first described hierarchical harvesting in
    D-Lib Magazine, 7(4) 2001
  • http//www.dlib.org/dlib/april01/liu/04liu.html
  • Celestial - http//celestial.eprints.org/
  • among other services, it provides a history of
    harvests (successful vs. errors)
  • http//celestial.eprints.org/cgi-bin/status

14
OAI-PMH 2.0 Registration
  • unregistered because
  • testing / development
  • not for public harvesting
  • public, but low-profile
  • never got around to it
  • ???

??? unregistered repositories
150 repositories registered
DPSP 51
Data Providers http//www.openarchives.org/Regist
er/BrowseSites.pl Service Providers
http//www.openarchives.org/service/listproviders.
html
15
Registration is NiceBut Not Required
  • OAI-PMH is (becoming) the http for digital
    libraries
  • there is no central registry of http servers
  • remember the NCSA Whats New page? (ca. 1994)
  • There will never be registration support in
    OAI-PMH
  • registries are a type of service provider, built
    on top of OAI-PMH
  • registration will be an integral part of
    community building
  • friends

16
NASA ltfriendsgt example
17
NACA Technical Report Server
  • publicly available
  • began in 1996
  • details in NASA TM-1999-209127
  • scanned reports from 1917-1958
  • NACA predecessor to NASA
  • contents mirrored with the MaGIC project
  • a UK-based grey-literature preservation project
  • OAI-PMH used to mirror contents

http//naca.larc.nasa.gov/ http//naca.larc.nasa.g
ov/oai2.0/
18
NACA Report 1345 as seen through its native
DL http//naca.larc.nasa.gov/
19
NACA Report 1345 as seen through
MAGiC http//www.magic.ac.uk/
20
NACA Report 1345 as seen through its
Scirus (Elsevier) http//www.scirus.com/
21
NACA Report 1345 as seen through my.OAI (FS
Consulting) http//www.myoai.com/
22
NASA Technical Report Server
  • replacement for the previous distributed
    searching version of NTRS
  • MySQL
  • Va Tech harvester
  • modified bucket
  • details in Nelson, Rocker, Harrison, Library
    Hi-Tech, 21(2) (March 2003)
  • a service provider aggregator
  • same OAI baseURL as used for interactive searching

http//ntrs.nasa.gov/
23
NASA Technical Report Server
  • advanced, fielded search
  • explicit query routing
  • 12 NASA repositories
  • 4 non-NASA repositories
  • turned off by default
  • gt600k abstracts gt300k full-text

24
Service Providers
  • It is clear that SPs are proliferating, despite
    (because of?) the inherent bias toward DPs in the
    protocol
  • easy to be a DP -gt many DPs -gt SPs eventually
    emerge
  • hard to be a DP -gt SPs starve
  • currently 5x DPs more than SPs
  • SPs are beginning to offer increasingly
    sophisticated services
  • competitive market originally envisioned for SPs
    is emerging

25
Community Building
www.ndltd.org
26
OAI-PMH The Deep Web
27
Exposing Repository Contents
  • DP9 Webcrawler access to OAI-PMH repositories
  • http//dlib.cs.odu.edu/dp9/
  • JCDL 02 http//www.cs.odu.edu/liu_x/dp9/dp9.pdf
  • An Apache module for OAI-PMH
  • http//www.modoai.org/
  • Extensible Repository Resource Locators (ERRoLs)
    for OAI Identifiers
  • http//www.oclc.org/research/projects/oairesolver/
    default.htm

28
Race for This New Market
  • Yahoo! University of Michigan
  • http//www.umich.edu/news/index.html?Releases/2004
    /Mar04/r031004
  • Google CrossRef
  • http//www.nature.com/nature/focus/accessdebate/17
    .html

29
OpenURL
slides from Herbert Van de Sompel, LANL
30
Origins Motivation
  • The Context Library Automation Environment anno
    1998
  • distributed information environment
  • local remote AI databases
  • rapidly growing e-journal collection
  • need to interlink the available information
  • The Problem
  • links are delivered by info providers
  • links are not sensitive to users context
  • appropriate copy problem
  • links dependent on business agreements between
    information vendors
  • links dont cover the complete collection

31
Origins Motivation
  • The Context Library Automation Environment anno
    1998
  • distributed information environment
  • local remote AI databases
  • rapidly growing e-journal collection
  • need to interlink the available information
  • The REAL Problem
  • libraries have no say in linking
  • libraries are losing core part of the
    organizing information task
  • expensive collection is not used optimally
  • users are not well served

32
Origins Motivation
  • The Solution
  • In information services
  • DO NOT provide a link which is an actual service
    related to a referenced item (e.g. a link from a
    record in an AI database to the corresponding
    full-text)
  • BUT rather provide
  • a link that transports metadata about the
    referenced item
  • to
  • others that are better placed to provide service
    links

OpenURL
Linking server operated by library
33
non-OpenURL linking
resource
resource
.
link to referenced work
reference
resolution of metadata into link
34
OpenURL linking
transportation of metadata identifiers
user-specific
.
reference
context-sensitive
resolution of metadata identifiers into
services
provision of OpenURL
35
  • default links
  • restricted in nature
  • action-radius restricted by business agreements
  • not context-sensitive

resource2
resource3
metadata plane
resource1
herbert van de sompel
36

extended services plane
service component1
service component2
resource2
resource3
metadata plane
resource1
herbert van de sompel
37
NISO OpenURL Standardization Charge
  • Use existing OpenURL Framework as starting
    point
  • notion of context-sensitive services
  • notion of transporting contextual metadata
    packages to obtain context-sensitive services
  • Define syntax and transport-method for
    contextual metadata packages
  • Ensure extensibility
  • must support future applications
  • must support other information communities
  • gt Generalize and Standardize

38
NISO OpenURL Standardization Charge
  • Therefore, to be addressed were
  • OpenURL Framework beyond scholarly resources
  • contextual metadata packages
  • Syntax for contextual metadata packages
  • Transport of contextual metadata packages

39
OpenURL Status
  • (Nearly) a NISO standard
  • check for details
  • http//library.caltech.edu/openurl/

40
Naming Handles DOIs
41
Naming
  • Fundamental to other technologies (OAI-PMH,
    OpenURL, etc.)
  • Options
  • URNs
  • Persistent URLs (PURLs)
  • http//purl.org/
  • Handles
  • http//www.handle.net/
  • Digital Object Identifiers
  • http//www.doi.org/
  • ARK
  • http//www.cdlib.org/inside/diglib/ark/

42
Inverted Archives
  • Unit of discourse is no longer an archive or
    service, but a DOI which has services linked from
    it
  • cf.
  • UPS demonstration prototype
  • Smart Objects, Dumb Archives (SODA) model

43
Example
http//dx.doi.org/10.1145/374308.374342
44
Object Models
45
Popular Object Models
  • METS
  • used in DSpace, Fedora
  • http//www.loc.gov/standards/mets/
  • MPEG-21 DIDL
  • http//xml.coverpages.org/mpeg21-didl.html
  • used in LANL DLs
  • http//www.dlib.org/dlib/november03/bekaert/11beka
    ert.html
  • http//www.dlib.org/dlib/february04/bekaert/02beka
    ert.html
  • http//lib-www.lanl.gov/herbertv/papers/jcdl2004-
    submitted-draft.pdf

46
Object Models OAI-PMH
resource
item
oaifoo.edu1234
records
METS
Move from simple metadata files pointing to
resources
to records as modeled representations of
resources
47
Download and Go!
48
Where Do You Want to Build?
user
CDSware
service provider
data provider
data provider
data provider
data provider
data provider
. . .
local context- sensitive services
EPrints.org
CDSware
49
Fedora
  • joint project between Cornell UVa
  • funded by the Mellon Foundation
  • a repository management system
  • focuses on complex digital objects and their
    behaviors
  • more info
  • http//www.fedora.info/
  • D-Lib Magazine, 9(4)
  • http//www.dlib.org/dlib/april03/staples/04staples
    .html

50
  • MIT HP Labs
  • constructed to capture all the output of MITs
    faculty
  • now generalized to the DSpace Federation
  • 8 top universities in the US Canada
  • More info
  • http//www.dspace.org/
  • http//sourceforge.net/projects/dspace/
  • D-Lib Magazine 9(1)
  • http//www.dlib.org/dlib/january03/smith/01smith.h
    tml

51
EPrints.org
  • developed at Southampton University
  • part of larger suite of institutional/author
    self-archiving tools and services
  • e.g. citebase paracite
  • widely adopted -- 100 sites
  • http//software.eprints.org/ep2
  • more info
  • http//www.eprints.org/
  • http//www.arl.org/sparc/core/index.asp?pageg206

52
CDSware
  • developed at CERN
  • data provider service provider
  • large-scale use _at_ CERN (gt 600k records)
  • in use at a few non-CERN sites
  • free paid support models
  • more info
  • http//cdsware.cern.ch/

53
  • P2P publishing for academia
  • community servers for coordination, management
  • archivelets for individual laptops, PCs
  • more info
  • http//kepler.cs.odu.edu/
  • D-Lib Magazine 7(4)
  • http//www.dlib.org/dlib/april01/maly/04maly.html

54
  • developed by UKOLN
  • open source
  • OpenURL 0.1 format resolver
  • NISO 1.0 format???
  • more info
  • Ariadne, 28
  • http//www.ariadne.ac.uk/issue28/resolver/
  • ftp//ftp.ukoln.ac.uk/metadata/tools/openresolver/
  • http//www.ukoln.ac.uk/distributed-systems/openurl
    /

55
Conclusions
56
Why The OAI-PMH is NOT Important
  • Users dont care
  • OAI-PMH is middleware
  • if done right, the uninterested user should never
    have to know
  • Using OAI-PMH does not insure a good SP
  • OAI-PMH is (or is becoming) HTTP for DLs
  • few people get excited about http now
  • http OAI-PMH are core technologies whose
    presence is now assumed

57
Digital Library Technologies
  • http
  • XML
  • OAI-PMH
  • OpenURL ?

58
Other Uses For the OAI-PMH
  • Assumptions
  • Traditional DLs / SPs will continue on their
    present path of increasing sophistication
  • citation indexing, search results viz,
    personalization, recommendations, subject-based
    filtering, etc.
  • growth rates remain the same (5x DPs as SPs)
  • Premise OAI-PMH is applicable to any scenario
    that needs to update / synchronize distributed
    state
  • Future opportunities are possible by creatively
    interpreting the OAI-PMH data model
  • See Van de Sompel, Young Hickey, D-Lib Magazine
    July 2003, http//www.dlib.org/dlib/july03/young/0
    7young.html
  • Nelson, 2nd OAI Workshop, http//agenda.cern.ch/as
    kArchive.php?baseagendacatega02333ida02333s5t
    8/transparencies

59
OpenURL Framework evolution
60
The Future Community Building
  • Ultimately, protocols and metadata formats are
    not what makes a difference
  • Rather, the critical mass afforded by a common
    set of utilities (cf. http, Dublin Core, XML)
  • The best current example The Open Language
    Archives Community
  • http//www.language-archives.org/
  • OAI-PMH provides the basis for communication
    between strangers, but allows even richer
    communication between friends

61
Further Reading
  • Gerry McKiernan, Library Hi-Tech News
  • http//www.public.iastate.edu/gerrymck/OAI-SP-I.p
    df
  • http//www.public.iastate.edu/gerrymck/OAI-SP-II.
    pdf
  • http//www.public.iastate.edu/gerrymck/OAI-SP-III
    .pdf
  • Open Archives Forum OAI-PMH Tutorial
  • http//www.oaforum.org/tutorial/
  • A Survey of Digital Library Aggregation
    Services
  • http//www.diglib.org/pubs/brogan/
  • Open Access News
  • http//www.earlham.edu/peters/fos/fosblog.html
  • Guide To Institutional Repository Software
  • http//www.soros.org/openaccess/software/

62
Great Stuff I Did Not Cover
  • OAI-PMH
  • Static Repositories
  • http//www.openarchives.org/OAI/2.0/guidelines-sta
    tic-repository.htm
  • OAI-Rights
  • http//www.openarchives.org/documents/OAIRightsWhi
    tePaper.html
  • http//www.openarchives.org/news/oairightspress030
    929.html
  • Digital Preservation
  • http//www.digitalpreservation.gov/
Write a Comment
User Comments (0)
About PowerShow.com