Title: OAIPMH and METS for automated export of items from DSpace
1OAI-PMH and METS for automatedexport of items
from DSpace
Stuart LewisUniversity of Wales Aberystwyth
2Contents
- The problem
- The bigger picture
- Solution overview
- How this worked with DSpace
- The finished solution
3The problem
- The federal University of Wales
- National library of Wales
- Thesis collection agreement
- Now moving towards electronic theses
4The problem
- Harvest electronic theses from repositories
across Wales - DSpace
- GNU Eprints
- Import items into NLW repository
- Fedora
5Repository Bridge
- JISC funded
- 1 year project
- Collaboration between
- University of Wales Aberystwyth
- University of Wales Swansea
- National Library of Wales
6The bigger picture
- EThOS
- Electronic Theses Online Service
- UK Database of Theses
- They also want to import our theses
- Welsh consortium
7The solution
- OAI-PMH
- Open Archives Initiative
- Protocol for Metadata Harvesting
- Harvest daily any new items
- Ingest in Fedora
- Available for harvesting by EThOS
8The solution
- Decide which sets (collections) to harvest from
- request?verbListSets
- Find new items in those sets
- request?verbListIdentifierssethdl_2160_20from
2006-04-20
ltsetgt ltsetSpecgthdl_2160_20lt/setSpecgt
ltsetNamegtAdvanced Reasoning Group
Theseslt/setNamegt lt/setgt
ltheadergt ltidentifiergtoaicadair.aber.ac.uk2160/
64lt/identifiergt ltdatestampgt2006-03-21T111452Z
lt/datestampgt ltsetSpecgthdl_2160_21lt/setSpecgt
lt/headergt
9The solution
- Harvest each item
- request?verbGetRecordidentifieroaicadair.aber.
ac.uk2160/64 - Ingest into Fedora
- Parse metadata
- Download items (files)
- Ingest into Fedora
- Data available for EThOS
- Ready to be queried via OAI-PMH
10The solution
11The solution
12The solution
13Which metadata format?
- EThOS defines its own metadata set
- UKETD (United Kingdom Electronic Thesis and
Dissertation) - qualified Dublin Core
- E.g. ltdccreatorgtBell, Jonathanlt/dccreatorgt
- Additional schema
- DSpace crosswalk file
- metadataPrefixuketd_dc
ltuketd_dcuketddc xmlnsuketd_dchttp//naca.centr
al.cranfield.ac.uk/ethos-oai/2.0/ xmlnsuketdterms
http//naca.central.cranfield.ac.uk/ethos-oai/ter
ms/ xmlnsxsihttp//www.w3.org/2001/XMLSchema-ins
tance xsischemaLocation"http//naca.central.cran
field.ac.uk/ethos-oai/2.0/ http//naca.central.cr
anfield.ac.uk/ethos-oai/2.0/uketd_dc.xsd"gt
14Which metadata format?
- Fedora (and the NLW)
- Prefer METS (Metadata and Encoding Transmission
Standard) - and MODS (Metadata Object Description Schema)
ltmodsnamegt ltmodsrolegt ltmodsroleTerm
type"text"gtauthorlt/modsroleTermgt
lt/modsrolegt ltmodsnamePartgtBell,
Jonathanlt/modsnamePartgt lt/modsnamegt
15Which metadata format?
- DSpace support for METS MODS
- Patch for DSpace version 1.3.2
- Built into DSpace version 1.4
- metadataPrefixmets
- dc2mods.cfg
contributor.author ltmodsnamegtltmodsrolegt ltmods
roleTerm type"text"gtauthorlt/modsroleTermgtlt/mods
rolegt ltmodsnamePartgtslt/modsnamePartgtlt/modsname
gt
16But which one?
- metadataPrefixuketd_dc
- Or
- metdataPrefixmets
17But which one?
- Combine the two!
- metdataPrefixuketd_mets
18Two dmdSecs
- METS holds metadata within Descriptive MetaData
Sections - Lets use two of them!
- ltdmdSec ID"DMD_hdl_2160_24_mods"gt
- ltdmdSec ID"DMD_hdl_2160_24_uketd"gt
19Licence encoding
- METS holds rights within Administrative MetaData
Sections - ltamdSecgt
- Use the licence bitstream text
Bundle bundles item.getBundles() for (int i
0 i lt bundles.length i) // Assume
license will be in its own bundle
Bitstream bitstreams bundlesi.getBitstreams(
) if (bitstreams0.getFormat().getID()
licenseFormat) // Return the
license return bitstreams0.retrieve(
)
20Bitstreams (files)
- METS describes files within File Sections
- ltfileSecgt
- All bitstreams except licence
- Includes filtered text
ltfileSecgt ltfileGrp USE"ORIGINAL"gt ltfile
ID"f2160_24_1" MIMETYPE"application/pdf"
SIZE"1728984" CHECKSUM"eb5d0b9d51042
12aed5f0cd76e99cf11" CHECKSUMTYPE"MD5"
OWNERID"http//cadiar.aber.ac.uk/dspace/bitstre
am/2160/24/1/Thesis.pdf" GROUPID"GROUP_f2160_24_1
"gt ltFLocat LOCTYPE"URL" xlinktype"simple"
xlinkhref"http//cadair.aber.ac.uk/dspace/bitst
ream/2160/24/1/Thesis.pdf" /gt lt/filegt
lt/fileGrpgt ltfileGrp USE"TEXT"gt ltfile
IDf2160_24_3" MIMETYPE"text/plain"
SIZE"616933" CHECKSUM"f0bc8e35293028
54ea9bedbac30ec0dd" CHECKSUMTYPE"MD5"
OWNERID"http//cadair.aber.ac.uk/dspace/bitstrea
m/2160/24/3/Thesis.pdf.txt" GROUPID"GROUP_f2160_2
4_1"gt ltFLocat LOCTYPE"URL"
xlinktype"simple" xlinkhref"http//cadair.aber
.ac.uk/dspace/bitstream/2160/24/3/Thesis.pdf.txt"
/gt lt/filegt lt/fileGrpgt lt/fileSecgt
21File structure map
- METS supports file structure relationships
- ltstructMapgt
- ltdivgt
- ltdivgtlt/divgt
- lt/divgt
- lt/structMapgt
- Useful for file hierarchy, e.g. sections and
subsections, or pages of a book - DSpace does not hold structural information
22File structure map
- Cant do it alphabetically
- Appendix A
- Appendix B
- Chapter 1
- Chapter 2
- Chapter 3
- Chapter 4
23File structure map
- Decided upon order of item upload
- Not perfect, but does the job!
- Future requirement for DSpace?
ltstructMapgt ltdiv DMDID"DMD_hdl_123456789_24_uk
etd DMD_hdl_2160_24_mods"
ADMID"rights_123456789_24_mods
TMD_hdl_2160_24"gt ltfptr FILEID"f2160_24_1"
/gt ltfptr FILEID"f2160_24_3" /gt
lt/divgt lt/structMapgt
24The final solution
- Fedora ingests two copies of the metadata
- It uses MODS for itself
- It provides qDC for EThOS
- MODS from
- org.dspace.app.mets.METSExport
- public static void writeMETS
- qDC from
- uk.bl.ethos.UKETDDCCrosswalk.java
- public String writeMetadataWithSchema
25Final METS document
Header
GetRecord
Metadata
dmdSec MDTypeMODS
dmdSec MDTypeOTHER OTHERMDTYPEUKETD_DC
andSec ltmodsuseAndReproductiongt
fileSec
structMap
26The end!
- Stuart Lewis
- Stuart.Lewis_at_aber.ac.uk
- Repository Bridge
- http//www.inf.aber.ac.uk/bridge/
- Any questions?