OAI from the needle box - PowerPoint PPT Presentation

About This Presentation
Title:

OAI from the needle box

Description:

Early (1991) visionary of free online scholarship. Creator of ... harvester / repository. repos i tory. oai protocol. harves ter. support. data. harvesting ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 26
Provided by: open6
Learn more at: https://openlib.org
Category:
Tags: oai | box | harvester | needle

less

Transcript and Presenter's Notes

Title: OAI from the needle box


1
OAI from the needle box
Thomas Krichel Palmer School of Library and
Information Science Long Island University With
apologies to Carl Lagoze
Humboldt Universität Berlin, March 20, 2002
2
Where I come from...
  • Trained economist
  • Early (1991) visionary of free online scholarship
  • Creator of NetEc in 1993
  • Principal founder of RePEc in 1997
  • Largest distributed academic DL in the world
  • Collection that is open for
  • Contribution
  • Usage
  • Grown to over 200 archives, over 10 partly
    interoperable user services

3
Metadata collection process
  • Metadata is expensive to collect.
  • Free online scholarship requires academic
    self-documentation
  • Building free metadata collection is difficult
  • no established business model
  • no established funding channels
  • Only a collaborative effort will be succeed.

4
The example of eprint servers
  • attractive building block for the transformation
    of scholarly communication
  • but isolated efforts do not make for a scholarly
    communication system
  • need to federate archives
  • need to interoperate with other scholarly
    communication components

5
Example e-print accessibility
6
Example e-print accessibility
7
metadata harvesting
metadata
e-print
8
metadata harvesting
metadata
e-print
9
other examples
  • within the area of scholarly commuication
  • already implemented in RePEc
  • Sharing of log data between service providers
  • Provision non-document data for document data
    provider
  • personal data
  • institutional data

10
core concepts in OAI 1.1
  • low-barrier interoperability
  • data-provider / service-provider model
  • metadata harvesting model

OAI 1.1 protocol
HTTP based
  • shared metadata format

Dublin Core
  • parallel metadata formats

Community specific
11
harvester / repository
12
OAI protocol requests
service provider
data provider
  • Supporting protocol requests
  • Identify
  • ListMetadataFormats
  • ListSets
  • Harvesting protocol requests
  • ListRecords
  • ListIdentifiers
  • GetRecord

13
HTTP encoding - requests
BASE-URL -----------gt an.oa.org/OAI-scriptkeyword
arguments --gt verbListIdentiferssetS1
GET http//an.oa.org/OAI-script?verbListIdenti
ferssetS1
POST POST http//an.oa.org/OAI-script
HTTP/1.0 Content-Length 78 Content-Type
application/x-www-form-urlencoded
verbListIdentiferssetS1
14
HTTP encoding - responses
ltxml version1.0 encodingUTF-8
?gtltGetRecord xmlnshttp//oai.namespace.uri
xmlnsxsihttp//w3.namespace.uri xsischemaL
ocationhttp//oai.namespace.uri http//oai.sc
hemaURLgt ltresponseDategt2000-19-01T193030-0400
lt/responseDategt ltrequestURLgthttp//an.oa.org/OAI-
script?verbGetRecord ampidentifieroai3Aar
Xiv3A0001 ampmetadataPrefixoai_dclt/request
URLgt ltrecordgt record contents lt/recordgt addi
tional recordslt/GetRecordgt
15
record
ltrecordgt ltheadergt ltidentifiergtoaieg001lt/ident
ifiergt ltdatestampgt1999-01-01lt/datestampgt lt/head
ergt ltmetadatagt ltdc xmlnshttp//purl.org/dcgt
lttitlegtMy Examplelt/titlegt lt/dcgt lt/metadatagt
ltaboutgt ltea xmlnshttp//www.arXiv.org/ea
ltusagegtNo restrictionslt/usagegt lt/eagt lt/aboutgtlt
/recordgt
16
selective harvesting - datestamps
17
selective harvesting - sets
S2
18
Communication re OAI
  • lists subscribe via http//www.openarchives.org
  • oai-general list
  • oai-implementers list
  • web http//www.openarchives.org
  • FAQ http//www.openarchives.org/faq.htm
  • mail openarchives_at_openarchives.org

19
revision of specifications
  • Version 1.1 frozen specifications for 12 -18
    months
  • stable for experimentation not definitive
  • minimize risk for early adopters
  • maximize chances for future interoperability
    across communities

The technical committee are working on the
definitive specifications. They will come
out 2002-05-01.
20
The technical committee
  • - Herbert Van de Sompel (LANL) - Carl Lagoze
    (Cornell U) - Thomas Krichel (Long Island U
    RePEc) - Jeff Young (OCLC) - Tim Cole (U
    of Illinois at Urbana Champaign) - Hussein
    Suleman (Virginia Tech) - Simeon Warner
    (Cornell U arXiv) - Michael Nelson (NASA
    NACA) - Caroline Arms (Library of Congress) -
    Muhammad Zubair (Old Dominion U ARC) - Steven
    Bird (U Penn Open Language Archive Community)
    - Robert Tansley (MIT DSpace) - Andy Powell
    (UK (UKOLN) - Mogens Sandfær (DTV, Denmark) -
    Thomas Severiens (Oldenburg U Physnet) -
    Thomas Baron (CERN) - Les Carr (U of
    Southampton) - Thomas Place (Tilburg U)

21
Issues in front of the committee
Error Handling SOAP Harvesting Granularity
  Mandatory DC Set Semantics and Collection
Description XML Schema Result Set Filtering
Flow Control, Result Set Cardinality, Response
Level Container Awareness Mechanisms
Multiple Metadata Return and "Best" Metadata
Selection Machine Readable Rights Management
From GetRecord to GetRecords Dedupping Issues
idempotency of base-urls xml format for
mini-archives response compression
22
Thank you for your attention!
  • Thomas Krichel
  • Palmer School of Library and Information Science
  • 720 Northern Boulevard
  • Brookville NY 11548-1300
  • USA
  • http//openlib.org/home/krichel
  • Krichel_at_openlib.org

23
Error handling
  • badArgument
  • badGranularity
  • badResumptionToken
  • badVerb
  • cannotDisseminateFormat
  • idDoesNotExist
  • noRecordsMatch
  • noSetHierarchy

24
SOAP
  • SOAP is a mechanism to transmit service requests
    over the Internet.
  • As yet it is not a fully matured protocol.
  • A SOAP compatible version of the protocol may be
    written later.

25
Harvesting granuality
  • From and Until arguments may allow a more finer
    time stemps, up to one second.
  • Level supported is chosen by the data provider
    and set in the response to the Identify verb.
  • All times expressed in UTC.
Write a Comment
User Comments (0)
About PowerShow.com