New Developments in OAI - PowerPoint PPT Presentation

1 / 65
About This Presentation
Title:

New Developments in OAI

Description:

some of the details of this presentation are still subject to change! ... general changes to improve solidity of protocol. quick recap. Overview of OAI Verbs ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 66
Provided by: tri134
Learn more at: https://www.cs.odu.edu
Category:

less

Transcript and Presenter's Notes

Title: New Developments in OAI


1
New Developments in OAI
  • Michael L. Nelson
  • Old Dominion University
  • http//www.cs.odu.edu/mln/
  • mln_at_cs.odu.edu
  • OA-Forum
  • May 13-14, 2002
  • Pisa, Italy

Many slides borrowed from Herbert Van de Sompel
Carl Lagoze
2
N.B.
  • OAI-PMH 2.0 is not scheduled for public beta
    release until May 19, 2002
  • some of the details of this presentation are
    still subject to change!
  • final public release of 2.0 scheduled for June 1

3
Whats New in 2.0?!
  • Good news OAI-PMH is still
  • Six Verbs DC
  • Incremental improvements
  • single XML schema
  • ambiguities removed
  • more expressive options
  • cleaner separation of roles responsibilities
  • Bad news not backwards compatible with 1.1

4
Open Archives Initiative
5
The Rise and Fall of Distributed Searching
  • wholesale distributed searching, popular at the
    time, is attractive in theory but troublesome in
    practice
  • Davis Lagoze, JASIS 51(3), pp. 273-80
  • Powell French, Proc 5th ACM DL, pp. 264-265
  • distributed searching of N nodes still viable,
    but only for small values of N
  • NCSTRL N gt 100 bad
  • NTRS/NIX Nlt20 ok (but could be better)

6
The Rise and Fall of Distributed Searching
  • Other problems of distributed searching (from
    STARTS)
  • source-metadata problem
  • how do you know which nodes to search?
  • query-language problem
  • syntax varies and drifts over time between the
    various nodes
  • rank-merging problem
  • how do you meaningfully merge multiple result
    sets?
  • Temptations
  • centralize all functions
  • everything will be done at X
  • standardize on a single product
  • everyone will use system Y

7
Metadata Harvesting
  • Move away from distributed searching
  • Extract metadata from various sources
  • Build services on local copies of metadata
  • data remains at remote repositories

all searching, browsing, etc. performed on the
metadata here
user
individual nodes can still support direct
user interaction
search for cfd applications
local copy of metadata
metadata harvested offline
metadata harvested offline
metadata harvested offline
metadata harvested offline
each node independently maintained
. . .
8
Santa Fe convention
OAI-PMH v.1.0/1.1
OAI-PMH v.2.0
9
Santa Fe Convention 02/2000
  • goal optimize discovery of e-prints
  • input
  • the UPS prototype
  • RePEc /SODA data provider / service provider
    model
  • Dienst protocol
  • deliberations at Santa Fe meeting 10/99

10
OAI-PMH v.1.0 01/2001
  • goal optimize discovery of document-like
    objects
  • input
  • SFC
  • DLF meetings on metadata harvesting
  • deliberations at Cornell meeting 09/00
  • alpha test group of OAI-PMH v.1.0

11
OAI-PMH v.1.0 01/2001
  • low-barrier interoperability specification
  • metadata harvesting model data provider /
    service provider
  • focus on document-like objects
  • autonomous protocol
  • HTTP based
  • XML responses
  • unqualified Dublin Core
  • experimental 12-18 months

12
pre- 2.0 OAI Timeline Highlights
  • October 21-22, 1999 - initial UPS meeting
  • February 15, 2000 - Santa Fe Convention published
    in D-Lib Magazine
  • precursor to the OAI metadata harvesting protocol
  • June 3, 2000 - workshop at ACM DL 2000 (Texas)
  • August 25, 2000 - OAI steering committee formed,
    DLF/CNI support
  • September 7-8, 2000 - technical meeting at
    Cornell University
  • defined the core of the current OAI metadata
    harvesting protocol
  • September 21, 2000 - workshop at ECDL 2000
    (Portugal)
  • November 1, 2000 - Alpha test group announced
    (15 organizations)
  • January 23, 2001 - OAI protocol 1.0 announced,
    OAI Open Day in the U.S. (Washington DC)
  • purpose freeze protocol for 12-16 months,
    generate critical mass
  • February 26, 2001 - OAI Open Day in Europe
    (Berlin)
  • July 3, 2001 - OAI protocol 1.1 announced
  • to reflect changes in the W3Cs XML latest schema
    recommendation
  • September 8, 2001 - workshop at ECDL 2001
    (Darmstadt)

13
OAI-PMH v.2.0 06/2002
  • goal recurrent exchange of metadata about
    resources between systems
  • input
  • OAI-PMH v.1.0
  • feedback on OAI-implementers
  • deliberations by OAI-tech 09/01 -
  • alpha test group of OAI-PMH v.2.0 03/02 -

14
OAI-PMH v.2.0 06/2002
  • low-barrier interoperability specification
  • metadata harvesting model data provider /
    service provider
  • metadata about resources
  • autonomous protocol
  • HTTP based
  • XML responses
  • unqualified Dublin Core
  • stable

15
process leading to OAI-PMH v.2.0
  • creation of OAI-tech
  • pre-alpha phase
  • alpha-phase
  • beta-phase

16
creation of OAI-tech 06/01
  • created for 1 year period
  • charge
  • review functionality and nature of OAI-PMH v.1.0
  • investigate extensions
  • release stable version of OAI-PMH by 05/02
  • determine need for infrastructure to support
    broad adoption of the protocol
  • communication listserv, SourceForge, conference
    calls

17
OAI-tech
US representatives Thomas Krichel (Long Island U)
- Jeff Young (OCLC) - Tim Cole - (U of Illinois
at Urbana Champaign) - Hussein Suleman (Virginia
Tech) - Simeon Warner (Cornell U) - Michael
Nelson (NASA) - Caroline Arms (LoC) - Mohammad
Zubair (Old Dominion U) - Steven Bird (U Penn.)
European representatives Andy Powell (Bath U.
UKOLN) - Mogens Sandfaer (DTV) - Thomas Baron
(CERN) - Les Carr (U of Southampton)
18
pre-alpha phase 09/01 02/02
  • review process by OAI-tech
  • identification of issues
  • conference call to filter/combine issues
  • white paper per issue
  • on-line discussion per white paper
  • proposal for resolution of issue by OAI-exec
  • discussion of proposal closure of issue
  • conference call to resolve open issues

19
pre-alpha phase 02/02
  • creation of revised protocol document
  • in-person meeting Lagoze - Van de Sompel -
    Nelson Warner
  • autonomous decisions
  • internal vetting of protocol document

20
alpha phase 02/02 05/02
  • alpha-1 release to OAI-tech March 1st 2002
  • OAI-tech extended with alpha testers
  • discussions/implementations by OAI-tech
  • ongoing revision of protocol document

21
OAI-PMH 2.0 alpha testers (1/2)
  • The British Library
  • Cornell U. -- NSDL project e-print arXiv
  • Ex Libris
  • FS Consulting Inc -- harvester for my.OAI
  • Humboldt-Universität zu Berlin
  • InQuirion Pty Ltd, RMIT University
  • Library of Congress
  • NASA
  • OCLC

22
OAI-PMH 2.0 alpha testers (2/2)
  • Old Dominion U. -- ARC , DP9
  • U. of Illinois at Urbana-Champaign
  • U. Of Southampton -- OAIA, CiteBase, eprints.org
  • UCLA, John Hopkins U., Indiana U., NYU -- sheet
    music collection
  • UKOLN, U. of Bath -- RDN
  • Virginia Tech -- repository explorer

23
beta phase 05/02
  • beta release on May 1st 2002 to
  • registered data providers and service providers
  • interested parties
  • fine tuning of protocol document
  • preparation for the release of 2.0 conformant
    tools by alpha testers

24
Whats new in OAI-PMH v.2.0?
  • quick recap
  • general changes to improve solidity of protocol
  • corrections
  • new functionality

25
Overview of OAI Verbs
archival metadata
harvesting verbs
most verbs take arguments dates, sets, ids,
metadata formats and resumption token (for flow
control)
26
Identify
1.1
2.0
  • Arguments
  • none
  • Errors
  • none
  • Arguments
  • none
  • Errors
  • badArgument

27
ListMetadataFormats
1.1
2.0
  • Arguments
  • identifier (OPTIONAL)
  • Errors
  • id does not exist
  • Arguments
  • identifier (OPTIONAL)
  • Errors
  • badArgument
  • noMetadataFormats
  • idDoesNotExist

28
ListSets
1.1
2.0
  • Arguments
  • resumptionToken (EXCLUSIVE)
  • Errors
  • no set hierarchy
  • Arguments
  • resumptionToken (EXCLUSIVE)
  • Errors
  • badArgument
  • badResumptionToken
  • noSetHierarchy

29
ListIdentifiers
1.1
2.0
  • Arguments
  • from (OPTIONAL)
  • until (OPTIONAL)
  • set (OPTIONAL)
  • resumptionToken (EXCLUSIVE)
  • Errors
  • no records match
  • Arguments
  • from (OPTIONAL)
  • until (OPTIONAL)
  • set (OPTIONAL)
  • resumptionToken (EXCLUSIVE)
  • metadataPrefix (REQUIRED)
  • Errors
  • badArgument
  • cannotDisseminateFormat
  • badGranularity
  • badResumptionToken
  • noSetHierarchy
  • noRecordsMatch

30
ListRecords
1.1
2.0
  • Arguments
  • from (OPTIONAL)
  • until (OPTIONAL)
  • set (OPTIONAL)
  • resumptionToken (EXCLUSIVE)
  • metadataPrefix (REQUIRED)
  • Errors
  • no records match
  • metadata format cannot be disseminated
  • Arguments
  • from (OPTIONAL)
  • until (OPTIONAL)
  • set (OPTIONAL)
  • resumptionToken (EXCLUSIVE)
  • metadataPrefix (REQUIRED)
  • Errors
  • noRecordsMatch
  • cannotDisseminateFormat
  • badGranularity
  • badResumptionToken
  • noSetHierarchy
  • badArgument

31
GetRecord
1.1
2.0
  • Arguments
  • identifier (REQUIRED)
  • metadataPrefix (REQUIRED)
  • Errors
  • id does not exist
  • metadata format cannot be disseminated
  • Arguments
  • identifier (REQUIRED)
  • metadataPrefix (REQUIRED)
  • Errors
  • badArgument
  • cannotDisseminateFormat
  • idDoesNotExist

32
general changes
  • clear distinction between protocol and periphery
  • fixed protocol document
  • extensible implementation guidelines
  • e.g. sample metadata formats, description
    containers, about containers
  • allows for OAI guidelines and community
    guidelines

33
general changes
  • clear separation of OAI-PMH and HTTP
  • OAI-PMH error handling
  • all OK at HTTP level? gt 200 OK
  • something wrong at OAI-PMH level? gt OAI-PMH
    error (e.g. badVerb)

34
OAI Data ModelResources / Items / Records
item identifier
record identifier metadata format datestamp
35
general changes
  • better definitions of harvester, repository,
    item, unique identifier, record, set, selective
    harvesting
  • oai_dc schema builds on DCMI XML Schema for
    unqualified Dublin Core
  • usage of must, must not etc. as in RFC2119
  • wording on response compression

36
general changes
  • all protocol responses can be validated with a
    single XML Schema
  • easier for data providers
  • no redundancy in type definitions
  • SOAP-ready
  • clean for error handling

37
response no errors
lt?xml version"1.0" encoding"UTF-8"?gt ltOAI-PMHgt lt
responseDategt2002-0208T085546Zlt/responseDategt
ltrequest verbGetRecord gthttp//arXiv.org/oai
2lt/requestgt ltGetRecordgt ltrecordgt ltheadergt
ltidentifiergtoaiarXivcs/0112017lt/identifiergt
ltdatestampgt2001-12-14lt/datestampgt
ltsetSpecgtcslt/setSpecgt ltsetSpecgtmathlt/setSpecgt
lt/headergt ltmetadatagt ..
lt/metadatagt lt/recordgt lt/GetRecordgt lt/OAI-PMHgt
38
response with error
lt?xml version"1.0" encoding"UTF-8"?gt ltOAI-PMHgt lt
responseDategt2002-0208T085546Zlt/responseDategt
ltrequestgthttp//arXiv.org/oai2lt/requestgt lterror
codebadVerbgtShowMe is not a valid OAI-PMH
verblt/errorgt lt/OAI-PMHgt
39
corrections
  • all dates/times are UTC, encoded in ISO8601,
    Z-notation
  • 1957-03-20T203000.00Z

40
resumptionToken
  • idempotency of resumptionToken return same
    incomplete list when rT is reissued
  • while no changes occur in the repo strict
  • while changes occur in the repo all items with
    unchanged datestamp
  • new attributes for the resumptionToken
  • expirationDate
  • completeListSize
  • cursor

41
new functionality
  • harvesting granularity
  • mandatory support of YYYY-MM-DD
  • optional support of YYYY-MM-DDThhmmssZ
  • granularity of from and until must be the same

42
new functionality
  • Identify more expressive

ltIdentifygt ltrepositoryNamegtLibrary of
Congress 1lt/repositoryNamegt
ltbaseURLgthttp//memory.loc.gov/cgi-bin/oailt/baseUR
Lgt ltprotocolVersiongt2.0lt/protocolVersiongt
ltadminEmailgtdwoo_at_loc.govlt/adminEmailgt
ltadminEmailgtcaar_at_loc.govlt/adminEmailgt
ltdeletedRecordgttransientlt/deletedRecordgt
ltearliestDatestampgt1990-02-01T000000Zlt/earliestD
atestampgt ltgranularitygtYYYY-MM-DDThhmmssZlt/g
ranularitygt ltcompressiongtdeflatelt/compressiongt
43
new functionality
  • header contains set membership of item

ltrecordgt ltheadergt ltidentifiergtoaiarXiv
cs/0112017lt/identifiergt ltdatestampgt2001-12-14
lt/datestampgt ltsetSpecgtcslt/setSpecgt
ltsetSpecgtmathlt/setSpecgt lt/headergt
ltmetadatagt .. lt/metadatagt lt/recordgt
44
new functionality
  • ListIdentifiers returns headers

lt?xml version"1.0" encoding"UTF-8"?gt ltOAI-PMHgt lt
responseDategt2002-0208T085546Zlt/responseDategt
ltrequest verb gthttp//arXiv.org/oai2lt/reques
tgt ltListIdentifiersgt ltheadergt
ltidentifiergtoaiarXivhep-th/9801001lt/identifiergt
ltdatestampgt1999-02-23lt/datestampgt
ltsetSpecgtphysicheplt/setSpecgt lt/headergt
ltheadergt ltidentifiergtoaiarXivhep-th/9801
002lt/identifiergt ltdatestampgt1999-03-20lt/datest
ampgt ltsetSpecgtphysicheplt/setSpecgt
ltsetSpecgtphysicexplt/setSpecgt lt/headergt
45
new functionality
  • ListIdentifiers mandates metadataPrefix as
    argument

http//www.perseus.tufts.edu/cgi-bin/pdataprov?
verbListIdentifiers metadataPrefixolac
from2001-01-01 until2001-01-01
setPerseuscollectionPersInfo
46
new functionality
  • character set for metadataPrefix and setSpec
    extended to URL-safe characters

A-Z a-z 0-9 _ ! ( ) - .
  • identifierType anyURI
  • repositoryName string

47
in the periphery
  • introduction of provenance container to
    facilitate tracing of harvesting history

ltaboutgt ltprovenancegt ltoriginDescriptiongt
ltbaseURLgthttp//an.oa.orglt/baseURLgt
ltidentifiergtoair1plog/9801001lt/identifiergt
ltdatestampgt2001-08-13T130002Zlt/datestampgt
ltmetadataPrefixgtoai_dclt/metadataPrefixgt
ltharvestDategt2001-08-15T120130Zlt/harvestDategt
lt/originDescriptiongt ltoriginDescriptiongt
lt/originDescriptiongt
lt/provenancegt lt/aboutgt
48
in the periphery
  • introduction of friends container to facilitate
    discovery of repositories

ltdescriptiongt ltFriendsgt ltbaseURLgthttp//cav2001
.library.caltech.edu/perl/oailt/baseURLgt
ltbaseURLgthttp//formations2.ulst.ac.uk/perl/oailt/b
aseURLgt ltbaseURLgthttp//cogprints.soton.ac.uk/pe
rl/oailt/baseURLgt ltbaseURLgthttp//wave.ldc.upenn.
edu/OLAC/dp/aps.php4lt/baseURLgt
lt/Friendsgt lt/descriptiongt
49
in the periphery
  • revision of oai-identifier
  • guidelines for collection-level and set-level
    metadata

50
future
  • OAI-PMH
  • communities
  • adoption

51
the OAI-PMH
  • release of OAI-PMH v.2.0 06/2002
  • no backwards compatibility with v.1.0/1.1
  • stable
  • migration process for registered repos
  • ? formal standardization ?
  • ? SOAP version web services framework SOAP,
    WSDL, UDDI ?

52
communities
  • proliferation of community-specific add-ons for
  • collection set level metadata
  • expressive metadata formats (e.g. qualified DC
    XML Schema)
  • shared set-structures
  • machine readable rights (about the metadata)

53
adoption
  • evolution
  • from talking about OAI-PMH
  • to talking about projects that use OAI-PMH
  • to talking about projects and failing to mention
    they use OAI-PMH
  • gt OAI-PMH becomes part of the infrastructure

54
indicators of adoption of OAI-PMH
  • data providers
  • service providers
  • tools
  • structural support

55
data providers
  • 49 registered repositories 11/2001
  • 65 registered repositories 03/2002
  • 77 registered repositories 05/2002
  • 5 million records
  • many unregistered repositories

56
service providers
  • Arc cross-searching of registered repositories
    Old Dominion U
  • http//arc.cs.odu.edu
  • OLAC cross-searching of Language Archive
    Community repositories
  • http//www.language-archives.org/index.html

57
service providers
  • Scirus scientific search engine Elsevier
  • http//www.scirus.com
  • my.OAI user-tailorable cross-searching of
    registered repositories FS Consulting, Inc.
  • http//www.myoai.com
  • growing interest from web search engines

58
OAI-PMH tools
  • Repository Explorer interactive exploration of
    repositories Virginia Tech
  • http//www.purl.org/NET/oai_explorer
  • eprints.org generic OAI-PMH compliant
    repository software U of Southampton
  • http//www.eprints.org
  • ALCME repository and harvester software OCLC
  • http//alcme.oclc.org/index.html

59
exploration
  • Kepler Old Dominion U
  • your personal OAI data provider Kepler
    archivelet
  • the Kepler service provider harvests from
    archivelets that register
  • archivelet downloadable
  • http//www.dlib.org/dlib/april01/maly/04maly.html

60
exploration
  • DP9 Old Dominion U
  • provides entry page to repositories for
    web-crawlers
  • provides bookmarkable URL for OAI record
  • provides resolution of OAI identifier into
    metadata
  • software downloadable

61
http//www.openarchives.org openarchives_at_openarch
ives.org
62
Emergency Backup Slides
63
resumptionToken
scenario harvesting 277 records in 3
separate 100 record chunks
64
Open Archives Initiative
Open Archival Information System
insuring long-term preservation of archival
materials
exposure of metadata for harvesting
OAIS
OAIS w/ an OAI interface
http//www.dlib.org/dlib/april01/04editorial.html
http//www.dlib.org/dlib/may01/05letters.html http
//ssdoo.gsfc.nasa.gov/nost/isoas/us/overview.html
65
Field of Dreams
  • It should be easy to be a data provider, even if
    it makes more work for the service provider.
  • if enough data providers exist, the service
    providers will come (DPs gtgt SPs)
  • Open-source / freely available tools
  • drop-in data providers
  • industrial strength http//www.eprints.org/
  • personal size http//kepler.cs.odu.edu/
  • tools to make your existing DL a data provider
  • http//www.openarchives.org/tools/tools.htm
  • also OAI-implementers mailing list / mail
    archive!
  • service providers
  • only bits and pieces currently publicly
    available...
Write a Comment
User Comments (0)
About PowerShow.com