Open Archives Iniative Protocol for Metadata Harvesting - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Open Archives Iniative Protocol for Metadata Harvesting

Description:

Open Archives Iniative. Protocol for Metadata Harvesting. Iztok Kavkler, ... uses LOM-j lib to quickly hack together LOM. http://sourceforge.net/projects/lom-j ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 22
Provided by: stefaan
Category:

less

Transcript and Presenter's Notes

Title: Open Archives Iniative Protocol for Metadata Harvesting


1
Open Archives Iniative Protocol for Metadata
Harvesting
  • Iztok Kavkler, University of Ljubljana
  • Some slides by
  • Stefaan Ternier, KUL
  • Bram Vandenputte, KUL
  • Joris Klerkx, KUL

2
What is OAI?
  • Harvesting standard, documented at
  • http//www.openarchives.org/OAI/openarchivesproto
    col.html
  • Seven service verbs
  • Identify
  • ListMetadataFormats
  • GetRecord
  • ListRecords
  • ListIdentifiers
  • ListSets
  • Allows multiple metadata formats
  • DC (Dublin core) format mandatory

3
How OAI works
  • OAI VERBS
  • Identify
  • ListMetadataFormats
  • GetRecord
  • ListIdentifiers
  • ListRecords
  • ListSets

Service Provider Metadata Provider
H A R VESTER
REPOSITORY
OAI
OAI
HTTP Request
(OAI Verb)
HTTP Response
(Valid XML)
4
Try it
  • Install Apache-Tomcat or any other Java servlet
    container
  • Download WAR file from
  • http//fire.eun.org/Iztok/OAILREApp.war
  • Deploy WAR
  • Demo html
  • http//localhost8080/OAILREApp/
  • Or type a service verb, e.g.
  • http//localhost8080/OAILREApp/oaiHandler?verbI
    dentify

5
The raw XML
  • By default, the resulting XML has stylesheet
    attached for pretty rendering
  • To remove the stylesheet comment the line
  • OAIHandler.styleSheettestoai/oaicat.xsl
  • in file
  • oaicat.properties (in WAR file or the web-app
    dir)

6
OAI XML example
  • ltOAI-PMH xmlns"http//www.openarchives.org/OAI/2.
    0/" ...gt
  • ltresponseDategt2007-06-11T064858Zlt/responseDategt
  • ltrequest metadataPrefix"oai_lom"
    verb"ListRecords"gthttp//localhost8080/OAILREApp
    /oaiHandlerlt/requestgt
  • ltListRecordsgt
  • ltrecordgt
  • ltheadergt
  • ltidentifiergtoaioai.xyz-repository.comexercises/
    112553lt/identifiergt
  • ltdatestampgt2007-06-09T223828Zlt/datestampgt
  • ltsetSpecgtexerciseslt/setSpecgt
  • lt/headergt
  • ltmetadatagt
  • ltlom xmlns...gt ... lt/lomgt
  • lt/metadatagt
  • lt/recordgt
  • ....
  • ltresumptionToken expirationDate"2007-06-11T0748
    58Z"
  • completeListSize"42" cursor"10"gt1181544538265lt/r
    esumptionTokengt
  • lt/ListRecordsgt
  • lt/OAI-PMHgt

7
OAICat - a Java implementation
  • OAICat home at
  • http//www.oclc.org/research/software/oai/cat.htm
  • Takes care of
  • web service details
  • OAI XML specification
  • The implementer has to provide three classes
  • RepositoryOAICatalog
  • RepositoryRecordFactory
  • Repository2oai_dc (lom, ...) - usually more than
    one

8
A sample implementation
  • (Source code and libs inhttp//fire.eun.org/Iztok
    /OAILREApp.zip)
  • Create a new web module
  • Add servlet oaiHandler to web.xml
  • ltservletgt
  • ltservlet-namegtLreOAIHandlerlt/servlet-namegt
  • ltservlet-classgtORG.oclc.oai.server.OAIHandlerlt/se
    rvlet-classgt
  • ltload-on-startupgt5lt/load-on-startupgt
  • lt/servletgt
  • ltservlet-mappinggt
  • ltservlet-namegtLreOAIHandlerlt/servlet-namegt
  • lturl-patterngt/oaiHandlerlt/url-patterngt
  • lt/servlet-mappinggt

9
(cont)
  • Define properties file location
  • ltcontext-paramgt
  • ltparam-namegtpropertieslt/param-namegt
  • ltparam-valuegtoaicat.propertieslt/param-valuegt
  • lt/context-paramgt
  • Welcome file for testing
  • ltwelcome-file-listgt
  • ltwelcome-filegttestoai/index.htmllt/welcome-filegt
  • lt/welcome-file-listgt

10
Sample record
  • A record with basic fieldsid, url, title, descr
    and date
  • SampleOAICatalog contains an array with 3 sample
    records

11
SampleOAICatalog.listIdentifiers
  • Parameters
  • from date to harvest from (String in iso8601
    format)
  • date or datetime - depends on granularity
  • to date to harvest to
  • set a set name, list only records from this set
    (if null, list all records)
  • set names classify objects in natural groups
  • every record may belong to multiple sets (or
    none)
  • metadaPrefix list only records that support
    this format (sample formats oai_dc, oai_lom, ...)

12
SampleOAICatalog.listIdentifiers
  • Must return a map with to fields
  • headers a String iterator of OAI headers
  • identifiers a String iterator of OAI
    identifiers
  • Both created by the call (rec is a SampleRecord)
  • String header getRecordFactory().createHeader
    (rec)
  • headers.add(header0)
  • identifiers.add(header1)
  • Create result
  • MapltString, Objectgt listIdMap new
    HashMapltString, Objectgt()
  • listIdMap.put("headers", headers.iterator())
  • listIdMap.put("identifiers", identifiers.iterator
    ())
  • return listIdMap

13
getRecordFactory().createHeader(rec)
  • Creates header by calling the methods in
    SampleRecordFactory
  • String getOAIIdentifier(Object rec)
  • return full oai identifier oaioay.rep.comid001
  • String getDatestamp(Object rec)
  • returns date in iso8601 format
  • IteratorltStringgt getSetSpecs (Object rec)
  • ArrayListltStringgt list new ArrayListltStringgt()
  • list.add(...)
  • return list.iterator()
  • IteratorltStringgt getAbouts (Object rec)
  • String fromOAIIdentifier(String id)
  • helper method convert id to a local id

14
SampleOAICatalog.listSets
  • takes no parameters, returns the list of all sets
    in this repository
  • each ListIdentifiers or ListRecords query may
    contain a set name, limiting the results to just
    one set

15
SampleOAICatalog.getSchemaLocations
  • like GetRecord, but returns the Vector of all
    metadata schema locations the record supports
  • to obtain them, just callgetRecordFactory().getSc
    hemaLocations(rec)

16
SampleOAICatalog.getRecord
  • String getRecord(String id, String
    metadataPrefix)
  • find record and convert it to xml string
    (ltrecordgt element)
  • id is in global format to get local value call
  • getRecordFactory().fromOAIIdentifier(id)
  • throw IdDoesNotExistException if record not found
  • to generate XML use constructRecord constructReco
    rd(rec, metadataPrefix)

17
SampleOAICatalog.listRecords
  • just like ListIdentifiers, only generates a list
    of XML ltrecordgt elements
  • return a map with one elementMapltString, Objectgt
    listRecMap new HashMapltString, Objectgt()
  • listRecMap.put(records", records.iterator())re
    turn listRecMap

18
Crosswalks
  • Conversions of native record type to XML like
    Sample2oai_lom or Sample2oai_dc
  • Only two methods per implementation
  • boolean isAvailableFor(Object rec)
  • String createMetadata(Object rec)SampleRecord
    record (SampleRecord) recreturn
    LOMFormat.writeStringWithSchema(record.toLOM())
  • throw CannotDisseminateFormatException if the
    metadata not available in this format

19
SampleRecord.toLOM
  • uses LOM-j lib to quickly hack together
    LOM http//sourceforge.net/projects/lom-j/
  • automatic serialization/deserialization of LOM
    and DC XML formats
  • Example
  • lom.newGeneral().newIdentifier(0).newCatalog().set
    String("lre")
  • lom.newGeneral().newIdentifier(0).newEntry().setSt
    ring("sample" id)
  • lom.newTechnical().newLocation(-1).setString(url)
  • lom.newGeneral().newTitle().newString(0).newLangua
    ge().setValue("en")
  • lom.newGeneral().newTitle().newString(0).setString
    (title)

20
Resumption
  • A repository usually has fixed limit on the
    numer of records to return in one call
  • if there are more available, it returns a
    resumption token, allowing to receive next packet
  • Implemented by functions listIdentifiers(String
    resumptionToken) , listRecords(String
    resumptionToken)
  • see XYZOAICatalog for details

21
References
  • http//www.openarchives.org/OAI/openarchivesprotoc
    ol.html
  • http//www.fmf.uni-lj.si/kavkler/
  • http//www.oclc.org/research/software/oai/cat.htm
  • http//www.cs.kuleuven.ac.be/hmdb/SqiOaiMelt
  • http//sourceforge.net/projects/lom-j/
  • SIO/Trubar OAI urlhttp//sio.edus.si/LreTomcat/
Write a Comment
User Comments (0)
About PowerShow.com