Title: DSpace Batch Ingest
1DSpace Batch Ingest
- Prepared by Sarah Kim and Lorrie Dong
- Edited by Patricia Galloway
- INF 392K Problems in the Permanent Retention of
Electronic Records - Spring 2008
- School of Information, University of Texas at
Austin
2What is Batch Ingest?
- General DSpace submission process Single-item
submission through a user-friendly web interface - What do you do if you have 5,000 items to submit?
- Batch ingest allows you to ingest as many
items as you want in groups - How?
- Access to DSpace through UNIX
- Only authorized DSpace administrators can
access to the School of Information DSpace file
store and database directly through UNIX.
(Contact Shane Williams and Sam Burns) - Use Linux command lines
- Please follow DSpace batch ingest format
rules. See DSpace simple archive format section
in DSpace System Documentation Application Layer
http//www.dspace.org/
3Batch Ingest Process Overview
Collection 1
Collection 2
Item_001 (Folder) contents dublin_core.xml file1
file2
Item_001 (Folder) contents dublin_core.xml file1
file2
- Off-line Prepare
- item folders
- command line
- for each collection
Upload item folders into DSpace via Linux
On-line
Run Linux command line for each collection
Done
4Step1 Creating a structure in DSpace and your
file-preparation workspace
- Structure the community and collections in
DSpace you need collection IDs for command
lines, which are generated by DSpace when a
collection is created - On the computer where you are preparing materials
for ingest, organize your files by giving each
collection a specifically named directory (or
folder) - Each item in a collection will be represented as
a subdirectory (or folder) in which a specific
set of files should be placed (see next slide).
5Step2 Prepare item folders
- Create and organize item folders for each
collection - Each item (folder) should contain
- contents a text file containing a list of file
names to be included in the item folder Does not
include the dublin_core.xml and contents file
names, and although it is a .txt file that you
will make with Notepad or another text editor,
you must not use the .txt extension, so if your
editor creates it, rename this file to have no
extension. - dublin_core.xml a Qualified Dublin Core metadata
file that pertains to the entire item - file1 original bitstream (there can be several
in an item an item can contain, for example, all
the bitstreams that make up a website) - file2 any access copy or copies that may be
needed to provide access
6Example item folder
Item folders in EACH collection should be
named using the same names item_001, item_002,
item_003 and so forth.
Example contents file
7Example dublin_core.xml
ltdublin_coregt ltdcvalue element"title"
qualifier"none"gtDSpace Batch Ingestlt/dcvaluegt
ltdcvalue element"contributor"
qualifier"author"gtKim, Sarahlt/dcvaluegt ltdcvalue
element"description" qualifier"abstract"gtThis
is a Power Point about how to prepare items for
DSpace batch ingest created for Dr. Galloway's
class, Problems in Permanent Retention of
Electronic Records, Spring 2008.lt/dcvaluegt
ltdcvalue element"subject" qualifier"none"gtDigit
al Preservationlt/dcvaluegt ltdcvalue
element"date" qualifier"created"gt2008-03-21lt/dcv
aluegt ltdcvalue element"language"
qualifier"iso"gten_USlt/dcvaluegt
lt/dublin_coregt   Attention! All letters
within ltgt should be LOWER case.
dublin_core.xml can be created and edited with
XML-editors or with Notepad. Reference See
Dublin Core Elements with Qualifiers supported by
DSpace http//www.dspace.org/index.php?optioncom_
contenttaskviewid141
8Step3 Prepare Linux command lines
- General format of a single command line
- /opt/dspace/bin/dsrun org.dspace.app.itemimpo
rt.ItemImport --add --epersoneperson
--collectioncollection --sourcesource
--mapfilename of mapfile - Example
- /opt/dspace/bin/dsrun org.dspace.app.itemimpor
t.ItemImport --add --epersonsrhkim_at_gmail.com
--collection2081/2254 --sourceWesker-2007_April
--mapfile20070406.ingest.map - Each collection must have its own command
line with unique collection ID. - There is no particular rule for naming the
source and map files. However, DSpace
administrators usually use date.ingest.map
for the map file name. The map file can be used
to remove the materials added through the batch
ingest if something goes wrong. The source
location should be the name of the collection
directory that contains the item folders.
E-persons E-mail address
Collection ID
Source location
9Step4 Conduct Batch Ingest
- Set up an appointment with the authorized iSchool
DSpace administrator, Shane Williams or Sam
Burns. - If you have not been working on an iSchool server
for file preparation, create a source directory
for the collection and upload items to an iSchool
server workspace. - Conduct a test with a small amount of item
folders by running test commend line. (For
test ingest, add --test at the end of each
command line.) - Fix errors if there are any. (DSpace
administrator will inform you of detected errors.
Unqualified DC elements, capitalization in DC
elements, unrecognizable symbols can cause
errors.) - Run the prepared command lines for the actual
batch ingest. (During the actual ingest,
DSpace may reject individual items if they have
errors. If any are rejected, the ingest process
can be stopped, the errors can be fixed, and the
process can be resumed you dont have to start
over again) - Quality assurance after ingesting, visit the
collection in DSpace to ensure the ingest
completed successfully. - Step 2, 3, and 5 need to be conducted by the
iSchool DSpace administrator.
10Question or assistance
- Lorrie Dong lorrie.d_at_gmail.com
- Sarah Kim srhkim_at_gmail.com
- Batch Ingest Process Guide
- https//pacer.ischool.utexas.edu/handle/2081/9226
- https//pacer.ischool.utexas.edu/handle/2081/8870