Title: U.S. Government Use of the OAI-PMH
1U.S. Government Use of the OAI-PMH
- Michael L. Nelson
- Old Dominion University
- Norfolk Virginia, USA
- mln_at_cs.odu.edu
- http//www.cs.odu.edu/mln/
ISTEC / NSF Ibero-American Digital Library Joint
Project Development Symposium Campinas, Brazil -
March 20, 2003
2Acknowledgements
- ODU K. Maly, M. Zubair, J. Bollen, X. Liu
- LANL R. Luce, X. Liu
- NASA G. Roncaglia, J. Rocker
- MAGiC (UK) Paul Needham
3Outline
- Review of data provider / service provider model
- including aggregators
- Role of registration for repositories
- NASA projects
- OSTI demo project
- Technical Report Interchange (TRI)
- NASA, DOE, DOD
4Disclaimer Scientific and Technical Information
(STI)
- This talk will cover US Government focused /
sponsored STI only - This talk will not cover American Memory
- a cultural history project from the Library of
Congress (LoC) - http//memory.loc.gov/
- the LoC played a significant role in the
definition and early adoption of the OAI-PMH
5Acronym Review
LANL Los Alamos National Laboratory Sandia
Sandia National Laboratory
LaRC Langley Research Center
AFRL Air Force Research Laboratory
NASA
Department of Energy
Department of Defense
CASI (Center for AeroSpace Information) http//ww
w.sti.nasa.gov/
OSTI (Office of Scientific and Technical
Information) http//www.osti.gov/
DTIC (Defense Technical Information
Center) http//www.dtic.mil/
6Data Providers / Service Providers
7Aggregators
- aggregators allow for
- scalability for OAI-PMH
- load balancing
- community building
- discovery
service providers (harvesters)
data providers (repositories)
aggregator
8Aggregators
- Frequently interchangeable terms
- aggregators likely to be community /
institutionally focused - caches stores a copy, less likely to be
community-oriented - proxies less likely to store a copy, may gateway
between OAI-PMH and other protocols - Dienst / OAI Gateway Harrison, Nelson, Zubair,
JCDL 03 - To learn more about aggregators, caches
proxies - http//www.openarchives.org/OAI/2.0/guidelines-agg
regator.htm - http//www.cs.odu.edu/mln/jcdl02/
9Example Aggregators
- Arc - http//arc.cs.odu.edu/
- first described hierarchical harvesting in
D-Lib Magazine, 7(4) 2001 - http//www.dlib.org/dlib/april01/liu/04liu.html
- Celestial - http//celestial.eprints.org/
- among other services, it provides a history of
harvests (successful vs. errors) - http//celestial.eprints.org/cgi-bin/status
10OAI-PMH 2.0 Registration
- unregistered because
- testing / development
- not for public harvesting
- public, but low-profile
- never got around to it
- ???
??? unregistered repositories
75 repositories registered
Data Providers http//www.openarchives.org/Regist
er/BrowseSites.pl Service Providers
http//www.openarchives.org/service/listproviders.
html
DPSP 51
11Registration is NiceBut Not Required
- OAI-PMH is (becoming) the http for digital
libraries - there is no central registry of http servers
- remember the NCSA Whats New page? (ca. 1994)
- There will never be registration support in
OAI-PMH - registries are a type of service provider, built
on top of OAI-PMH - registration will be an integral part of
community building - friends
12ltfriendsgt
- A light weight, optional, DP-centric method to
communicate the existence of others - http//techreports.larc.nasa.gov/ltrs/oai2.0/?verb
Identify - ..
- ltdescriptiongt
- ltfriends ..namespace stuff..gt
- ltbaseURLgthttp//naca.larc.nasa.gov/oai2.0lt/base
URLgt - ltbaseURLgthttp//ntrs.nasa.gov/oai2.0lt/baseURLgt
- ltbaseURLgthttp//horus.riacs.edu/perl/oai/lt/base
URLgt - ltbaseURLgthttp//ston.jsc.nasa.gov/collections/
TRS/oai/lt/baseURLgt - lt/friendsgt
- lt/descriptiongt
- ..
13NASA ltfriendsgt example
14Langley Technical Report Server
- publicly available
- began as an anonymous ftp server in 1992 http
access in 1993 - model for other technical report servers at other
NASA centers - details in NASA TM-109162
- mostly LaTeX, MS Word, other systems
- some scanned reports
http//techreports.larc.nasa.gov/ltrs/ http//tech
reports.larc.nasa.gov/ltrs/oai2.0/
15NACA Technical Report Server
- publicly available
- began in 1996
- details in NASA TM-1999-209127
- scanned reports from 1917-1958
- NACA predecessor to NASA
- contents mirrored with the MaGIC project
- a UK-based grey-literature preservation project
- OAI-PMH used to mirror contents
http//naca.larc.nasa.gov/ http//naca.larc.nasa.g
ov/oai2.0/
16NACA Report 1345 as seen through its native
DL http//naca.larc.nasa.gov/
17NACA Report 1345 as seen through
MAGiC http//www.magic.ac.uk/
18NACA Report 1345 as seen through its
Scirus (Elsevier) http//www.scirus.com/
19NACA Report 1345 as seen through my.OAI (FS
Consulting) http//www.myoai.com/
20NTRS OAI Architecture
all searching, browsing, etc. performed on the
metadata here
user
individual nodes can still support direct
user interaction
search for cfd applications
NTRS
local copy of metadata
metadata harvested offline, through OAI
interface
each node independently maintained
. . .
LTRS
ATRS
GTRS
CASITRS
content (reports) remain archived at the local
sites
21NASA Technical Report Server
- (nearly) publicly available
- replacement for the current distributed searching
version of NTRS - MySQL
- Va Tech harvester
- modified bucket
- details in Nelson, Rocker, Harrison, Library
Hi-Tech, 21(2) (March 2003) - a service provider aggregator
- same OAI baseURL as used for interactive searching
http//ntrs.nasa.gov/
22NASA Technical Report Server
- advanced, fielded search
- explicit query routing
- 10 NASA repositories
- 4 non-NASA repositories
- turned off by default
23non-NASA repositories
gt 0.5M records
24NASA DLs in the Larger STI Realm
DOE
DOD
Universities
Publishers
. . .
International
this could be a fully connected graph
NTRS could also be a data provider from the
point of view of other DLs allowing
the harvesting of NASA report metadata.
NTRS could also harvest metadata from other
DLs, and provide access to non-NASA content. We
hope to influence the direction of the
science.gov effort to use OAI-PMH
25OSTI Energy Citations Database
- OAI-PMH support just recently added (Feb 2003)
- not yet officially announced
- 20k records, 8k full-text
- other OSTI collections planned
http//www.osti.gov/energycitations/
26Technical Report Interchange
- Goal share technical reports between 4 US
government labs without creating new digital
libraries for users to learn! - NASA Langley Research Center
- Air Force Research Laboratory
- Los Alamos National Laboratory (DOE)
- Sandia National Laboratory (DOE)
- Solution use cooperating OAI-PMH caches at each
site to - export local contents
- ingest remote contents
27TRI Production System - Status
LaRC TRI System
LANL TRI System
Sandia TRI System
AFRL TRI System
ODU TRI System (Listener)
Records coming in from other TRI systems
Records going out to other TRI systems
Proposed
In Production
Slide from M. Zubair, ODU
28Mappings in TRI
Details in Liu, et al. ECDL 2002 the above table
also taken from the same paper
29A Single TRI Module
Slide from M. Zubair, ODU
30The Future Community Building
- Ultimately, protocols and metadata formats are
not what makes a difference - Rather, the critical mass afforded by a common
set of utilities (cf. http, Dublin Core, XML) - The best current example The Open Language
Archives Community - http//www.language-archives.org/
- OAI-PMH provides the basis for communication
between strangers, but allows even richer
communication between friends
31STI Communities
- Government produced/sponsored STI
- http//ntrs.nasa.gov/
- http//www.osti.gov/energycitations/
- http//dlib.cs.odu.edu/tri/
- Academia
- self-archiving vs. institutional archives
- http//www.soros.org/openaccess/
- http//www.ecs.soton.ac.uk/harnad/Tp/resolution.h
tm - Commercial publishers
- e.g. BioMed Central
- http//www.biomedcentral.com/