the OAI Protocol for Metadata Harvesting - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

the OAI Protocol for Metadata Harvesting

Description:

harves ter. 6. Core concepts in OAI-PMH. low-barrier interoperability ... harves ter. supporting protocol requests. ListMetadataFormats ... harves ter. Purpose ... – PowerPoint PPT presentation

Number of Views:85
Avg rating:3.0/5.0
Slides: 42
Provided by: publi1
Category:

less

Transcript and Presenter's Notes

Title: the OAI Protocol for Metadata Harvesting


1
the OAI Protocol for Metadata Harvesting
2
the joint impact of these and future
initiatives can be substantially higher when
interoperability between them e-print archives
can be established
The Open Archives Initiative has been set up to
create a forum to discuss and solve matters of
interoperability between preprint solutions, as a
way to promote their global acceptance.
Paul Ginsparg, Rick Luce Herbert Van de Sompel
3
Luce, Van de Sompel, Ginsparg
4
federated services
5
metadata harvesting via OAI-PMH
metadata
FTXT
6
federated services via OAI-PMH
metadata
7
Santa Fe convention
OAI-PMH v.1.0/1.1
OAI-PMH v.2.0
8
the OAI-PMH
service provider
data provider
6
9
Core concepts in OAI-PMH
  • low-barrier interoperability
  • data-provider service-provider model
  • metadata harvesting model

OAI-PMH
HTTP based
  • shared metadata format and parallel,
    community-specific metadata formats

Dublin Core
  • acceptable use

Community specific, oai-rights
10
OAI-PMH data model
entry point to all records pertaining to the
resource
metadata pertaining to the resource
11
OAI-PMH harvesting tools
service provider
data provider
  • Supporting protocol requests
  • Identify
  • ListMetadataFormats
  • ListSets

12
supporting protocol requests
service provider
data provider
ListMetadataFormats
  • ListMetadataFormats / Time / Request
  • REPEAT
  • Format prefix
  • Format XML schema
  • /REPEAT

13
Supporting Identify
  • Purpose
  • Return general information about the archive and
    its policies (e.g., datestamp granularity)
  • Parameters
  • None
  • Sample URL
  • http//memory.loc.gov/cgi-bin/oai2_0?verbIdentify

14
Identify
Library of
Congress 1
http//memory.loc.gov/cgi-bin/oai
L 2.0
r.e.gillian_at_larc.nasa.gov
rgillian_at_visi.net
1990-02-01T000000Z
tDatestamp transient
ecord YYYY-MM-DDThhmmssZ
ularity deflate
Identify
15
harvesting granularity
  • harvesting granularity
  • mandatory support of YYYY-MM-DD
  • optional support of YYYY-MM-DDThhmmssZ
  • other granularities considered, but ultimately
    rejected
  • granularity of from and until must be the same

16
Supporting ListMetadataFormats
  • Purpose
  • List metadata formats supported by the repository
    as well as their schema locations and namespaces
  • Parameters
  • identifier for a specific record (O)
  • Sample URL
  • http//memory.loc.gov/cgi-bin/oai2_0?verbListMeta
    dataFormats

17
Supporting ListSets
  • Purpose
  • Provide a listing of sets in which records may be
    organized (may be hierarchical, overlapping, or
    flat)
  • Parameters
  • None
  • Sample URL
  • http//memory.loc.gov/cgi-bin/oai2_0?verbListSets

18
OAI-PMH harvesting tools
service provider
data provider
  • Supporting protocol requests
  • Identify
  • ListMetadataFormats
  • ListSets
  • Harvesting protocol requests
  • ListRecords
  • ListIdentifiers
  • GetRecord

19
OAI-PMH harvesting tools
service provider
data provider
Datestamp Identifier Set
Records
20
harvesting requests
service provider
data provider
froma
untilb
setklm ListRecords metadataPrefixdc
  • ListRecords / Time / Request
  • REPEAT
  • Identifier
  • Datestamp
  • Metadata
  • /REPEAT

21
Harvesting GetRecord
  • Purpose
  • Returns the metadata (specific format) for a
    single item in the form of a record
  • Parameters
  • identifier unique id for item (R)
  • metadataPrefix identifier of metadata format
    for the record (R)
  • Sample URL
  • http//memory.loc.gov/cgi-bin/oai2_0?verbGetRecor
    didentifieroailcoa1.loc.govloc.gdc/lhbcb.00835
    metadataPrefixoai_dc

22
Harvesting ListRecords
  • Purpose
  • Retrieves the metadata (specific format) for
    multiple items in the form of records
  • Parameters
  • from start datestamp (O)
  • until end datestamp (O)
  • set set to harvest from (O)
  • resumptionToken flow control mechanism (X)
  • metadataPrefix metadata format (R)
  • Sample URL
  • http//memory.loc.gov/cgi-bin/oai2_0?verbListReco
    rdsmetadataPrefixoai_dc

23
Harvesting ListIdentifiers
  • Purpose
  • List headers for all items corresponding to the
    specified parameters
  • Parameters
  • from start datestamp (O)
  • until end datestamp (O)
  • set set to harvest from (O)
  • metadataPrefix metadata format to list
    identifiers for (R)
  • resumptionToken flow control mechanism (X)
  • Sample URL
  • http//memory.loc.gov/cgi-bin/oai2_0?verbListIden
    tifiersmetadataPrefixoai_dc

24
header
  • header contains set membership of item

oaiarXiv
cs/0112017 2001-12-14
cs
math
..
25
ListIdentifiers
  • ListIdentifiers returns headers

responseDate2002-0208T085546Z
http//arXiv.org/oai2t
oaiarXivhep-th/9801001
1999-02-23
physichep
oaiarXivhep-th/9801
002 1999-03-20amp physichep
physicexp
26
OAI-PMH identifiers
  • Not (necessarily) identifier of the resource
  • Each item must have a globally unique identifier
  • identifiers must follow rules for valid URIs
  • Example
  • oai
  • oaietd.vt.eduetd-1234567890
  • Each identifier must resolve to a single item and
    always to the same item
  • Cant reuse OAI item identifiers

27
OAI-PMH datestamps
  • Needed for every OAI record to support
    incremental harvesting
  • Must be updated when addition or modification or
    deletion made in order to ensure changes are
    correctly propagated to harvesters
  • Also for dynamically generated metadata formats
  • Different from dates within the metadata OAI
    datestamp is used only for harvesting
  • Can be either YYYY-MM-DD or YYYY-MM-DDThhmmssZ
    (must be GMT timezone)

28
OAI-PMH request format
  • requests must be submitted using the GET or POST
    methods of HTTP
  • repositories must support both methods

29
OAI-PMH response format
  • formatted as HTTP responses
  • content type must be text/xml
  • status codes (distinguished from OAI-PMH
    errors)e.g. 302 (redirect), 503 (service not
    available)

30
OAI-PMH response format
  • response format well formed XML
  • XML declaration (encoding"UTF-8" ?)
  • OAI-PMH root element
  • three child elements
  • responseDate (UTC datetime)
  • request (request that generated this response)
  • a) error (in case of an error or exception
    condition) b) element with the name of the
    OAI-PMH request

31
OAI-PMH response, no errors
responseDate2002-0208T085546Z
http//arXiv.org/oai
2
oaiarXivcs/0112017
2001-12-14
cs math
..

32
OAI-PMH response, error
responseDate2002-0208T085546Z
http//arXiv.org/oai2
codebadVerbShowMe is not a valid OAI-PMH
verb
33
OAI-PMH errors
  • repositories must indicate OAI-PMH errors
  • inclusion of one or more error elements
  • defined errors
  • badArgument
  • badResumptionToken
  • badVerb
  • cannotDisseminateFormat
  • idDoesNotExist
  • noRecordsMatch
  • noMetaDataFormats
  • noSetHierarchy

34
Flow control
  • flow control on two protocol levels
  • HTTP (503, retry-after)
  • OAI-PMH, resumptionToken
  • HTTP retry-after mechanism can be used in order
    to delay requests of clients
  • resumptionTokens are used to return parts
    (incomplete lists) of the result.
  • client receives a resumptionToken which can be
    used to issue another request in order to
    receive further parts of the result

35
Flow control
  • four of the request types return a list of
    entries
  • three of them may reply large lists
  • OAI-PMH supports partitioning response
  • decision on partitioning repository
  • response to a request includes
  • incomplete list
  • resumption token expiration date, size of
    complete list, cursor (optional)
  • new request with same request type
  • resumption token as parameter
  • all other parameters omitted!

36
Flow control
Service Provider
Data Provider
Harvester
Repository
37
record-level about container
  • provenance container to facilitate tracing of
    harvesting history


http//an.oa.org
oair1plog/9801001
2001-08-13T130002Z
oai_dc
2001-08-15T120130Z



38
record-level about container
  • rights container to express rights pertaining to
    metadata
  • W3C XML schema defines format for
    package to be included in container

39
repository-level description container
  • friends container to facilitate dynamic discovery
    of repositories

http//cav2001
.library.caltech.edu/perl/oai
http//formations2.ulst.ac.uk/perl/oaiaseURL http//cogprints.soton.ac.uk/pe
rl/oai http//wave.ldc.upenn.
edu/OLAC/dp/aps.php4

40
more info
  • Protocol document
  • http//www.openarchives.org/OAI/openarchivesprotoc
    ol.html
  • Validation tool
  • http//re.cs.uct.ac.za/
  • Repository and harvesting tools
  • http//www.openarchives.org/tools/tools.html
  • Registries of public OAI-PMH repositories
  • http//re.cs.uct.ac.za/
  • http//gita.grainger.uiuc.edu/registry/
  • http//www.openarchives.org/Register/BrowseSites

41
The OAI-PMH protocol is a low-barrier
interoperability specification for the recurrent
exchange of metadata between systems
  • Things become really cool when allowing
    flexibility re the interpretation of metadata.
  • Indeed in OAI-PMH metadata is XML-formatted
    data pertaining to the resource
Write a Comment
User Comments (0)
About PowerShow.com