Title: OAI
1OAI NSDL Research at GraingerBriefing to UIUC
Library Faculty15 April 2003
- Timothy W. Cole (t-cole3_at_uiuc.edu)
- William H. Mischo (w-mischo_at_uiuc.edu)
- http//dli.grainger.uiuc.edu/Publications/TWCole/L
ibFac2003/
2Projects
- Open Archives Initiative
- Illinois OAI Metadata Harvesting (Mellon)
- IMLS Digital Collections Content (IMLS)
- Grainger OAI Resources in Science Engineering
- National Science Digital Library
- 2nd Generation Math Resources (NSF / DUE)
3OAI Protocol for Metadata Harvesting
- Harvesting approachto interoperabilityat
metadata level - Divides world intoMetadata Providers Service
Providers - Builds on HTTP,XML, Dublin Core
- http//www.openarchives.org/
4OAI is a tool
- All about moving metadata (not data) around
- A building block, useable by many communities
supports new models of scholarly communication - Can facilitate, in some cases enable, advanced
digital library services functions - Assumes widely distributed content,
butcentralized indexing(!) requires critical
mass - Providers build once, share many times
- Purpose of OAI is to foster interoperability
5Harvesting vs. Federation
- Competing approaches to interoperability
- Federation is when services are run remotely on
remote data (e.g. Broadcast Searching) - Harvesting is when data/metadata is transferred
from the remote source to the destination where
the services are located (e.g. Union Catalogs) - Federation requires more effort at each remote
source but is easier for the central system and
vice versa for harvesting - OAI focuses on harvesting
6Reliance on HTTP, XML, DC
- OAI is a REpresentational State Transfer (REST)
protocol i.e., URL-based - Z39.50, Web services, SOAP are RPC-based
- OAI requests are sent via the HTTP protocol using
GET or POST - OAI responses are valid XML documents
- XML allows validation, increases reliability of
whats harvested (in terms of structure) - DC is OAIs Lowest Common Denominator
- Communities encouraged to use additional schemas
7How OAI Works
- OAI VERBS
- Identify
- ListMetadataFormats
- ListSets
- ListIdentifiers
- ListRecords
- GetRecord
Service Provider Metadata Provider
H A R VESTER
REPOSITORY
OAI
OAI
HTTP Request
(OAI Verb)
HTTP Response
(Valid XML)
8As Compared to Z39.50
9Mellon-OAI Project
- Create a web portal to scholarly information
resources in cultural heritage harvested via OAI - Primary objectives
- Develop make available OAI harvesting tools
- Build harvesting and search services
- Investigate viability and utility of searching
OAI harvested resources - Explore issues of advanced search/indexing/display
- Explore user needs metadata usage patterns
- Identify critical issues and best practices for
using OAI with cultural heritage material
10Mellon-OAI Achievements
- Developed harvesting tools (Open Source)
- Refined data provider tools (Open Source)
- Investigated logistics of harvesting activities
- Investigated metadata provider usage of DC, EAD
- Created XSL stylesheets for metadata
transformations (MARC to DC EAD to DC) - Experimented w/ configurations to address
scalability performance issues - Usability testing with students in College of
Education
11Metadata aggregation
- 39 providers (OAI-compliant and surrogates)
- Metadata describing resources of 580 institutions
(CIMI, CDP) - 1.1 million original records
- 2.6 million including item-level records derived
from EAD finding aids
12IMLS Digital Collections Content
- Build registry of all National Leadership Grant
collections with digital content. - Assist guide NLG projects in making item-level
metadata sharable using OAI. - Build repository, search discovery tools for
integrated access to content of NLG collections - Research best practices for sharing metadata
about diverse digital content supporting
interests of diverse user communities. - Collaboration between UIUC Library, GSLIS, IMLS
13Project Sites
- UIUC OAI Cultural Heritage Repository
- Mellon-OAI Project Site
- IMLS DCC Project Site
14National Science Foundation NSDL Program
- National Science, Technology, Engineering,
Mathematics Digital Library. - http//www.nsdl.org/
- Coverage K to Grey.
- National system for distributed science
education characterized by a set of exemplary
resource collections and services. - Highly competitive grants 3 years, 339
proposals, 105 funded three main categories
collections, services and targeted research.
152nd Generation Math Resources
- Collaboration with UIUC Library, Wolfram Research
Inc., COE Dept of Theoretical and Applied
Mechanics. - Project Objectives
- Adding interactive and graphical content to two
feature-rich Wolfram sites. - Generating and extracting OAI-compliant metadata,
establishing OAI Provider site, adding
mathematics controlled vocabulary terms. - Developing courseware and problem libraries for
TAM courses.
16Providing Metadata to NSDL
- Exposing metadata via OAI
- Preferred method for bringing metadata into the
NSDL repository (requires little manual
intervention) - Sending metadata via ftp
- Enabling metadata "scraping"
- Creating and editing directly to the NSDL
metadata repository - See also NSDL Metadata Primer
17Wolfram Functions Web Site Source HTML Page
Derived Metadata
ltdcidentifiergtltdcdescriptiongt ltdcdate
gt ltdcrightsgt
18Wolfram Functions Web Site Source HTML
Head Extracted Metadata
ltdctitlegt ltdcdescriptiongt ltdcsubjectgt ltdcs
ubjectgt ltdcformatgt
- lthtmlgt
- ltheadgt
- lttitlegtSquare root Primarylt/titlegt
- ltmeta name'Description'
- content'Primary definition ' gt
- ltmeta name'Keywords'
- content'Sqrt, square root, ' gt
- ltmeta http-equiv'Content-Type'
- content'text/html charsetiso-'gt
- lt/headgt
-
19Sample Metadata File for a Wolfram Functions Web
Page
- ltoai_dcdc gt
- ltdctitlegtSquare root Primary definition
(formula lt/dctitlegt - ltdcsubjectgtSqrtlt/dcsubjectgt
- ltdcsubjectgtsquare rootlt/dcsubjectgt
-
- ltdcdescriptiongtPrimary definition (2
formulas)lt/dcdescriptiongt - ltdcdescriptiongtltmath
lt/mathgtlt/dcdescriptiongt - ltdcdategt2001-10-29lt/dcdategt
- ltdcpublishergtWolfram Research,
Inc.lt/dcpublishergt - ltdctypegtTextlt/dctypegt
- ltdcformatgttext/html charsetiso-8859-1lt/dcfor
matgt - ltdcidentifiergthttp//functions.wolfram.com/Sqr
t/02/0001/lt/dcidentifiergt - ltdcidentifiergthttp//functions.wolfram/01.01.0
2.0001.01lt/dcidentifiergt - ltdclanguagegtenlt/dclanguagegt
- ltdcrightsgt169 2002 Wolfram Research,
Inc.lt/dcrightsgt - lt/oai_dcdcgt
20The NSDL metadata repository
Core Integration Project Cornell, Columbia,
DLESE. The metadata repository is a resource for
service providers. It holds information about
every collection and item known to the NSDL.
Services
Users
Metadata repository
From The NSDL Metadata Strategy, A
presentation by William Y. Arms and Diane I.
Hillman. Available http//nsdl.comm.nsdlib.org/al
lprojects01/metastrategy.ppt
Collections
21Working Assumptions
- The WWW is the primary medium (for now)
- Content is a mix of born digital and analog
- There is no lack of great piles of stuff
- There is a need for piles of great stuff
- The unit of content can and will shrink
- Users will increasingly be creators, and vice
versa - While much of the use will be free, there is a
need to explore multiple models of sustainability - Experimental nature of distributed digital
library building - one library, many portals
22Related Links
- http//mathworld.wolfram.com/
- http//functions.wolfram.com/
- OAI Resources in Science Engineering