OAI Protocol for Metadata Harvesting - PowerPoint PPT Presentation

About This Presentation
Title:

OAI Protocol for Metadata Harvesting

Description:

identifier (OPTIONAL) Errors. id does not exist. Arguments. identifier (OPTIONAL) Errors. badArgument. noMetadataFormats. idDoesNotExist. 1.1. 2.0 ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 26
Provided by: opcitE
Category:

less

Transcript and Presenter's Notes

Title: OAI Protocol for Metadata Harvesting


1
OAI Protocol for Metadata Harvesting
Tim Brody Intelligence, Agents, Multimedia
Group University of Southampton OpCit
http//opcit.eprints.org/ www.ecs.soton.ac.uk
BCS Metadata Meeting, London 29th May 2002
(Many slides borrowed from Michael L. Nelson)
2
OAI 2.0
  • Public, stable not released yet (but very
    close)
  • Beta released mid-May
  • Public release scheduled 1st June
  • 2.0 implementations in the pipeline
  • British Library, Cornell Univ, Ex Libris, my.OAI,
    Humbolt Univ, InQuirion Pty Ltd, Library of
    Congress, NASA, OCLC, Old Dominion Univ, U. of
    Illinois, U. of Southampton, UCLA, John Hopkins
    U., Indiana U., NYU, UKOLN, Virginia Tech

3
Open Archives Initiative
4
Metadata Harvesting
  • Move away from distributed searching
  • Extract metadata from various sources
  • Build services on local copies of metadata
  • Resources remain at remote repositories

all searching, browsing, etc. performed on the
metadata here
user
individual nodes can still support direct
user interaction
search for cfd applications
local copy of metadata
metadata harvested offline
metadata harvested offline
metadata harvested offline
metadata harvested offline
each node independently maintained
. . .
5
Metadata Harvesting
  • Repositories (archives etc.) low implementation
    cost
  • Services higher implementation cost
  • Similar to web search model
  • DP9 gateway makes it exactly the same

6
Santa Fe convention
OAI-PMH v.1.0/1.1
OAI-PMH v.2.0
7
OAI-PMH v.2.0 06/2002
  • Goal recurrent exchange of metadata about
    resources between systems
  • Input
  • OAI-PMH v.1.0 01/01 09/02
  • feedback on OAI-implementers
  • deliberations by OAI-tech 09/01 -
  • alpha test group of OAI-PMH v.2.0 03/02 -

8
OAI-PMH v.2.0 06/2002
  • low-barrier interoperability specification
  • metadata harvesting model data provider /
    service provider
  • metadata about resources
  • autonomous protocol
  • distinction between protocol and periphery
  • community-specific extensions
  • HTTP based
  • XML responses
  • unqualified Dublin Core
  • stable (1.0 characterized as experimental)

9
OAI Data ModelResources / Items / Records
item identifier
record identifier metadata format datestamp
10
Overview of OAI Verbs
archival metadata
harvesting verbs
most verbs take arguments dates, sets, ids,
metadata formats and resumption token (for flow
control)
11
Identify
1.1
2.0
  • Arguments
  • none
  • Errors
  • none
  • Arguments
  • none
  • Errors
  • badArgument

12
ListMetadataFormats
1.1
2.0
  • Arguments
  • identifier (OPTIONAL)
  • Errors
  • id does not exist
  • Arguments
  • identifier (OPTIONAL)
  • Errors
  • badArgument
  • noMetadataFormats
  • idDoesNotExist

13
ListSets
1.1
2.0
  • Arguments
  • resumptionToken (EXCLUSIVE)
  • Errors
  • no set hierarchy
  • Arguments
  • resumptionToken (EXCLUSIVE)
  • Errors
  • badArgument
  • badResumptionToken
  • noSetHierarchy

14
ListIdentifiers
1.1
2.0
  • Arguments
  • from (OPTIONAL)
  • until (OPTIONAL)
  • set (OPTIONAL)
  • resumptionToken (EXCLUSIVE)
  • Errors
  • no records match
  • Arguments
  • from (OPTIONAL)
  • until (OPTIONAL)
  • set (OPTIONAL)
  • resumptionToken (EXCLUSIVE)
  • metadataPrefix (REQUIRED)
  • Errors
  • badArgument
  • cannotDisseminateFormat
  • badResumptionToken
  • noSetHierarchy
  • noRecordsMatch

15
ListRecords
1.1
2.0
  • Arguments
  • from (OPTIONAL)
  • until (OPTIONAL)
  • set (OPTIONAL)
  • resumptionToken (EXCLUSIVE)
  • metadataPrefix (REQUIRED)
  • Errors
  • no records match
  • metadata format cannot be disseminated
  • Arguments
  • from (OPTIONAL)
  • until (OPTIONAL)
  • set (OPTIONAL)
  • resumptionToken (EXCLUSIVE)
  • metadataPrefix (REQUIRED)
  • Errors
  • noRecordsMatch
  • cannotDisseminateFormat
  • badResumptionToken
  • noSetHierarchy
  • badArgument

16
GetRecord
1.1
2.0
  • Arguments
  • identifier (REQUIRED)
  • metadataPrefix (REQUIRED)
  • Errors
  • id does not exist
  • metadata format cannot be disseminated
  • Arguments
  • identifier (REQUIRED)
  • metadataPrefix (REQUIRED)
  • Errors
  • badArgument
  • cannotDisseminateFormat
  • idDoesNotExist

17
response no errors
lt?xml version"1.0" encoding"UTF-8"?gt ltOAI-PMHgt lt
responseDategt2002-0208T085546Zlt/responseDategt
ltrequest verbGetRecord gthttp//arXiv.org/oai
2lt/requestgt ltGetRecordgt ltrecordgt ltheadergt
ltidentifiergtoaiarXivcs/0112017lt/identifiergt
ltdatestampgt2001-12-14lt/datestampgt
ltsetSpecgtcslt/setSpecgt ltsetSpecgtmathlt/setSpecgt
lt/headergt ltmetadatagt ..
lt/metadatagt lt/recordgt lt/GetRecordgt lt/OAI-PMHgt
18
response with error
lt?xml version"1.0" encoding"UTF-8"?gt ltOAI-PMHgt lt
responseDategt2002-0208T085546Zlt/responseDategt
ltrequestgthttp//arXiv.org/oai2lt/requestgt lterror
codebadVerbgtShowMe is not a valid OAI-PMH
verblt/errorgt lt/OAI-PMHgt
19
resumptionToken Flow-Control
  • Idempotency of resumptionToken return same
    incomplete list when rT is re-issued
  • while no changes occur in the repo strict
  • while changes occur in the repo all items with
    unchanged datestamp
  • new attributes for the resumptionToken
  • expirationDate
  • completeListSize
  • cursor

20
Adoption
  • evolution
  • from talking about OAI-PMH
  • to talking about projects that use OAI-PMH
  • to talking about projects and failing to mention
    they use OAI-PMH
  • gt OAI-PMH becomes part of the infrastructure

21
Data Providers (a.k.a. repositories)
  • 49 registered repositories 11/2001
  • 65 registered repositories 03/2002
  • 77 registered repositories 05/2002
  • 5 million records
  • many unregistered repositories
  • private implementations (e.g. RDN)

22
Service Providers
  • Arc cross-searching of registered repositories
    http//arc.cs.odu.edu
  • CiteBase research literature search citation
    ranking http//citebase.eprints.org
  • OLAC cross-searching of Language Archive
    Community repositories http//www.language-archi
    ves.org/index.html

23
Service Providers
  • Scirus scientific search engine Elsevier
    http//www.scirus.com
  • my.OAI user-tailorable cross-searching of
    registered repositories FS Consulting, Inc.
    http//www.myoai.com
  • Growing interest from web search engines

24
OAI-PMH tools
  • Repository Explorer interactive exploration of
    repositories Virginia Tech http//www.purl.org
    /NET/oai_explorer
  • eprints.org generic OAI-PMH compliant repository
    software U of Southampton http//www.eprints.o
    rg
  • ALCME repository and harvester software OCLC
    http//alcme.oclc.org/index.html
  • APIs, others tools _at_ www.openarchives.org

25
http//www.openarchives.org/ openarchives_at_openarc
hives.org
Write a Comment
User Comments (0)
About PowerShow.com