Title: Implementation of Digital Libraries Michael L. Nelson Old Dominion University mln@cs.odu.edu http://www.cs.odu.edu/~mln/
1Implementation of Digital LibrariesMichael L.
NelsonOld Dominion Universitymln_at_cs.odu.eduhttp
//www.cs.odu.edu/mln/
- Congreso Internacional de Información en Salud
- Lima, Peru
- May 28, 2004
2Acknowledgements
- ODU K. Maly, M. Zubair, J. Bollen
- LANL R. Luce, X. Liu
- NASA G. Roncaglia, J. Rocker, C. Mackey
- Cornell C. Lagoze, S. Warner
- MAGiC (UK) Paul Needham
- and, of course, Herbert Van de Sompel (LANL)
- the OpenURL slides are nicked from his
presentations
3Outline
- A bit of history
- Core technologies Issues
- OAI-PMH
- deep web
- OpenURL
- Handles / DOIs
- Object Models
- Example implementations
- Download and go
covered only briefly
4OAI-PMH
5Background
- I met Herbert Van de Sompel in April 1999...
- we spoke of a demonstration project he had in
mind and had received sponsorship from Paul
Ginsparg and Rick Luce - We wanted to demonstrate a multi-disciplinary DL
that leveraged the large number of high quality,
yet often isolated, tech report servers, e-print
servers, etc. - most digital libraries (DLs) had grown up along
single disciplines or institutions - little to no interoperability isolated DL
gardens - Universal Preprint Service
- Demonstrated at Santa Fe NM, October 21-22, 1999
- http//web.archive.org/web//http//ups.cs.odu.edu
/ - D-Lib Magazine, 6(2) 2000 (2 articles)
- http//www.dlib.org/dlib/february00/02contents.htm
l - UPS was soon renamed the Open Archives Initiative
(OAI) http//www.openarchives.org/
6Result OAI
- The OAI was the result of the demonstration and
discussion during the Santa Fe meeting - OAI a bunch of people, a religion, a cult, etc.
- OAI Protocol For Metadata Harvesting (OAI-PMH)
the protocol created and maintained by the OAI - Initial focus was on federating collections of
scholarly e-print materials - however, interest grew and the scope and
application of OAI-PMH expanded to become a
generic bulk metadata transport protocol - Note
- OAI-PMH is only about metadata -- not full text!
- but what is metadata vs. full-text?
- OAI is neutral with respect to the nature of the
metadata or the resources the metadata describes - read commercial publishers have an interest in
OAI-PMH too...
7OAI-PMH Mechanics
Request is encoded in http
Response is encoded in XML
XML Schema for the responses are defined in the
OAI-PMH document
8Overview of OAI-PMH Verbs
Verb Function
Identify description of archive
ListMetadataFormats metadata formats supported by archive
ListSets sets defined by archive
ListIdentifiers OAI unique ids contained in archive
ListRecords listing of N records
GetRecord listing of a single record
archival metadata
harvesting verbs
most verbs take arguments dates, sets, ids,
metadata formats and resumption token (for flow
control)
9OAI-PMH Data Model
item identifier
record identifier metadata format datestamp
10Data Providers / Service Providers
11Aggregators
- aggregators allow for
- scalability for OAI-PMH
- load balancing
- community building
- discovery
service providers (harvesters)
data providers (repositories)
aggregator
12Aggregators
- Frequently interchangeable terms
- aggregators likely to be community /
institutionally focused - caches stores a copy, less likely to be
community-oriented - proxies less likely to store a copy, may gateway
between OAI-PMH and other protocols - Dienst / OAI Gateway Harrison, Nelson, Zubair,
JCDL 03 - To learn more about aggregators, caches
proxies - http//www.openarchives.org/OAI/2.0/guidelines-agg
regator.htm - http//www.cs.odu.edu/mln/jcdl03/
13Example Aggregators
- Arc - http//arc.cs.odu.edu/
- first described hierarchical harvesting in
D-Lib Magazine, 7(4) 2001 - http//www.dlib.org/dlib/april01/liu/04liu.html
- Celestial - http//celestial.eprints.org/
- among other services, it provides a history of
harvests (successful vs. errors) - http//celestial.eprints.org/cgi-bin/status
14OAI-PMH 2.0 Registration
- unregistered because
- testing / development
- not for public harvesting
- public, but low-profile
- never got around to it
- ???
??? unregistered repositories
150 repositories registered
DPSP 51
Data Providers http//www.openarchives.org/Regist
er/BrowseSites.pl Service Providers
http//www.openarchives.org/service/listproviders.
html
15Registration is NiceBut Not Required
- OAI-PMH is (becoming) the http for digital
libraries - there is no central registry of http servers
- remember the NCSA Whats New page? (ca. 1994)
- There will never be registration support in
OAI-PMH - registries are a type of service provider, built
on top of OAI-PMH - registration will be an integral part of
community building - friends
16NASA ltfriendsgt example
17NACA Technical Report Server
- publicly available
- began in 1996
- details in NASA TM-1999-209127
- scanned reports from 1917-1958
- NACA predecessor to NASA
- contents mirrored with the MaGIC project
- a UK-based grey-literature preservation project
- OAI-PMH used to mirror contents
http//naca.larc.nasa.gov/ http//naca.larc.nasa.g
ov/oai2.0/
18NACA Report 1345 as seen through its native
DL http//naca.larc.nasa.gov/
19NACA Report 1345 as seen through
MAGiC http//www.magic.ac.uk/
20NACA Report 1345 as seen through its
Scirus (Elsevier) http//www.scirus.com/
21NACA Report 1345 as seen through my.OAI (FS
Consulting) http//www.myoai.com/
22NASA Technical Report Server
- replacement for the previous distributed
searching version of NTRS - MySQL
- Va Tech harvester
- modified bucket
- details in Nelson, Rocker, Harrison, Library
Hi-Tech, 21(2) (March 2003) - a service provider aggregator
- same OAI baseURL as used for interactive searching
http//ntrs.nasa.gov/
23NASA Technical Report Server
- advanced, fielded search
- explicit query routing
- 12 NASA repositories
- 4 non-NASA repositories
- turned off by default
- gt600k abstracts gt300k full-text
24Service Providers
- It is clear that SPs are proliferating, despite
(because of?) the inherent bias toward DPs in the
protocol - easy to be a DP -gt many DPs -gt SPs eventually
emerge - hard to be a DP -gt SPs starve
- currently 5x DPs more than SPs
- SPs are beginning to offer increasingly
sophisticated services - competitive market originally envisioned for SPs
is emerging
25Community Building
www.ndltd.org
26OAI-PMH The Deep Web
27Exposing Repository Contents
- DP9 Webcrawler access to OAI-PMH repositories
- http//dlib.cs.odu.edu/dp9/
- JCDL 02 http//www.cs.odu.edu/liu_x/dp9/dp9.pdf
- An Apache module for OAI-PMH
- http//www.modoai.org/
- Extensible Repository Resource Locators (ERRoLs)
for OAI Identifiers - http//www.oclc.org/research/projects/oairesolver/
default.htm
28Race for This New Market
- Yahoo! University of Michigan
- http//www.umich.edu/news/index.html?Releases/2004
/Mar04/r031004 - Google CrossRef
- http//www.nature.com/nature/focus/accessdebate/17
.html
29OpenURL
slides from Herbert Van de Sompel, LANL
30Origins Motivation
- The Context Library Automation Environment anno
1998 - distributed information environment
- local remote AI databases
- rapidly growing e-journal collection
- need to interlink the available information
- The Problem
- links are delivered by info providers
- links are not sensitive to users context
- appropriate copy problem
- links dependent on business agreements between
information vendors - links dont cover the complete collection
31Origins Motivation
- The Context Library Automation Environment anno
1998 - distributed information environment
- local remote AI databases
- rapidly growing e-journal collection
- need to interlink the available information
- The REAL Problem
- libraries have no say in linking
- libraries are losing core part of the
organizing information task - expensive collection is not used optimally
- users are not well served
32Origins Motivation
- The Solution
- In information services
- DO NOT provide a link which is an actual service
related to a referenced item (e.g. a link from a
record in an AI database to the corresponding
full-text) - BUT rather provide
- a link that transports metadata about the
referenced item - to
- others that are better placed to provide service
links
OpenURL
Linking server operated by library
33non-OpenURL linking
resource
resource
.
link to referenced work
reference
resolution of metadata into link
34OpenURL linking
transportation of metadata identifiers
user-specific
.
reference
context-sensitive
resolution of metadata identifiers into
services
provision of OpenURL
35- default links
- restricted in nature
- action-radius restricted by business agreements
- not context-sensitive
resource2
resource3
metadata plane
resource1
herbert van de sompel
36 extended services plane
service component1
service component2
resource2
resource3
metadata plane
resource1
herbert van de sompel
37NISO OpenURL Standardization Charge
- Use existing OpenURL Framework as starting
point - notion of context-sensitive services
- notion of transporting contextual metadata
packages to obtain context-sensitive services - Define syntax and transport-method for
contextual metadata packages - Ensure extensibility
- must support future applications
- must support other information communities
- gt Generalize and Standardize
38NISO OpenURL Standardization Charge
- Therefore, to be addressed were
- OpenURL Framework beyond scholarly resources
- contextual metadata packages
- Syntax for contextual metadata packages
- Transport of contextual metadata packages
39OpenURL Status
- (Nearly) a NISO standard
- check for details
- http//library.caltech.edu/openurl/
40Naming Handles DOIs
41Naming
- Fundamental to other technologies (OAI-PMH,
OpenURL, etc.) - Options
- URNs
- Persistent URLs (PURLs)
- http//purl.org/
- Handles
- http//www.handle.net/
- Digital Object Identifiers
- http//www.doi.org/
- ARK
- http//www.cdlib.org/inside/diglib/ark/
42Inverted Archives
- Unit of discourse is no longer an archive or
service, but a DOI which has services linked from
it - cf.
- UPS demonstration prototype
- Smart Objects, Dumb Archives (SODA) model
43Example
http//dx.doi.org/10.1145/374308.374342
44Object Models
45Popular Object Models
- METS
- used in DSpace, Fedora
- http//www.loc.gov/standards/mets/
- MPEG-21 DIDL
- http//xml.coverpages.org/mpeg21-didl.html
- used in LANL DLs
- http//www.dlib.org/dlib/november03/bekaert/11beka
ert.html - http//www.dlib.org/dlib/february04/bekaert/02beka
ert.html - http//lib-www.lanl.gov/herbertv/papers/jcdl2004-
submitted-draft.pdf
46Object Models OAI-PMH
resource
item
oaifoo.edu1234
records
METS
Move from simple metadata files pointing to
resources
to records as modeled representations of
resources
47Download and Go!
48Where Do You Want to Build?
user
CDSware
service provider
data provider
data provider
data provider
data provider
data provider
. . .
local context- sensitive services
EPrints.org
CDSware
49Fedora
- joint project between Cornell UVa
- funded by the Mellon Foundation
- a repository management system
- focuses on complex digital objects and their
behaviors - more info
- http//www.fedora.info/
- D-Lib Magazine, 9(4)
- http//www.dlib.org/dlib/april03/staples/04staples
.html
50- MIT HP Labs
- constructed to capture all the output of MITs
faculty - now generalized to the DSpace Federation
- 8 top universities in the US Canada
- More info
- http//www.dspace.org/
- http//sourceforge.net/projects/dspace/
- D-Lib Magazine 9(1)
- http//www.dlib.org/dlib/january03/smith/01smith.h
tml
51EPrints.org
- developed at Southampton University
- part of larger suite of institutional/author
self-archiving tools and services - e.g. citebase paracite
- widely adopted -- 100 sites
- http//software.eprints.org/ep2
- more info
- http//www.eprints.org/
- http//www.arl.org/sparc/core/index.asp?pageg206
52CDSware
- developed at CERN
- data provider service provider
- large-scale use _at_ CERN (gt 600k records)
- in use at a few non-CERN sites
- free paid support models
- more info
- http//cdsware.cern.ch/
53- P2P publishing for academia
- community servers for coordination, management
- archivelets for individual laptops, PCs
- more info
- http//kepler.cs.odu.edu/
- D-Lib Magazine 7(4)
- http//www.dlib.org/dlib/april01/maly/04maly.html
54- developed by UKOLN
- open source
- OpenURL 0.1 format resolver
- NISO 1.0 format???
- more info
- Ariadne, 28
- http//www.ariadne.ac.uk/issue28/resolver/
- ftp//ftp.ukoln.ac.uk/metadata/tools/openresolver/
- http//www.ukoln.ac.uk/distributed-systems/openurl
/
55Conclusions
56Why The OAI-PMH is NOT Important
- Users dont care
- OAI-PMH is middleware
- if done right, the uninterested user should never
have to know
- Using OAI-PMH does not insure a good SP
- OAI-PMH is (or is becoming) HTTP for DLs
- few people get excited about http now
- http OAI-PMH are core technologies whose
presence is now assumed
57Digital Library Technologies
- http
- XML
- OAI-PMH
- OpenURL ?
58Other Uses For the OAI-PMH
- Assumptions
- Traditional DLs / SPs will continue on their
present path of increasing sophistication - citation indexing, search results viz,
personalization, recommendations, subject-based
filtering, etc. - growth rates remain the same (5x DPs as SPs)
- Premise OAI-PMH is applicable to any scenario
that needs to update / synchronize distributed
state - Future opportunities are possible by creatively
interpreting the OAI-PMH data model - See Van de Sompel, Young Hickey, D-Lib Magazine
July 2003, http//www.dlib.org/dlib/july03/young/0
7young.html - Nelson, 2nd OAI Workshop, http//agenda.cern.ch/as
kArchive.php?baseagendacatega02333ida02333s5t
8/transparencies
59OpenURL Framework evolution
60The Future Community Building
- Ultimately, protocols and metadata formats are
not what makes a difference - Rather, the critical mass afforded by a common
set of utilities (cf. http, Dublin Core, XML) - The best current example The Open Language
Archives Community - http//www.language-archives.org/
- OAI-PMH provides the basis for communication
between strangers, but allows even richer
communication between friends
61Further Reading
- Gerry McKiernan, Library Hi-Tech News
- http//www.public.iastate.edu/gerrymck/OAI-SP-I.p
df - http//www.public.iastate.edu/gerrymck/OAI-SP-II.
pdf - http//www.public.iastate.edu/gerrymck/OAI-SP-III
.pdf - Open Archives Forum OAI-PMH Tutorial
- http//www.oaforum.org/tutorial/
- A Survey of Digital Library Aggregation
Services - http//www.diglib.org/pubs/brogan/
- Open Access News
- http//www.earlham.edu/peters/fos/fosblog.html
- Guide To Institutional Repository Software
- http//www.soros.org/openaccess/software/
62Great Stuff I Did Not Cover
- OAI-PMH
- Static Repositories
- http//www.openarchives.org/OAI/2.0/guidelines-sta
tic-repository.htm - OAI-Rights
- http//www.openarchives.org/documents/OAIRightsWhi
tePaper.html - http//www.openarchives.org/news/oairightspress030
929.html - Digital Preservation
- http//www.digitalpreservation.gov/