DSpace Batch Ingest - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

DSpace Batch Ingest

Description:

DSpace Batch Ingest. Prepared by Sarah Kim and Lorrie Dong. Edited by Patricia Galloway. INF 392K Problems in the Permanent Retention of Electronic Records ... – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 11
Provided by: sara109
Category:

less

Transcript and Presenter's Notes

Title: DSpace Batch Ingest


1
DSpace Batch Ingest
  • Prepared by Sarah Kim and Lorrie Dong
  • Edited by Patricia Galloway
  • INF 392K Problems in the Permanent Retention of
    Electronic Records
  • Spring 2008
  • School of Information, University of Texas at
    Austin

2
What is Batch Ingest?
  • General DSpace submission process Single-item
    submission through a user-friendly web interface
  • What do you do if you have 5,000 items to submit?
  • Batch ingest allows you to ingest as many
    items as you want in groups
  • How?
  • Access to DSpace through UNIX
  • Only authorized DSpace administrators can
    access to the School of Information DSpace file
    store and database directly through UNIX.
    (Contact Shane Williams and Sam Burns)
  • Use Linux command lines
  • Please follow DSpace batch ingest format
    rules. See DSpace simple archive format section
    in DSpace System Documentation Application Layer
    http//www.dspace.org/

3
Batch Ingest Process Overview
Collection 1
Collection 2
Item_001 (Folder) contents dublin_core.xml file1
file2
Item_001 (Folder) contents dublin_core.xml file1
file2
  • Off-line Prepare
  • item folders
  • command line
  • for each collection

Upload item folders into DSpace via Linux
On-line
Run Linux command line for each collection
Done
4
Step1 Creating a structure in DSpace and your
file-preparation workspace
  • Structure the community and collections in
    DSpace you need collection IDs for command
    lines, which are generated by DSpace when a
    collection is created
  • On the computer where you are preparing materials
    for ingest, organize your files by giving each
    collection a specifically named directory (or
    folder)
  • Each item in a collection will be represented as
    a subdirectory (or folder) in which a specific
    set of files should be placed (see next slide).

5
Step2 Prepare item folders
  • Create and organize item folders for each
    collection
  • Each item (folder) should contain
  • contents a text file containing a list of file
    names to be included in the item folder Does not
    include the dublin_core.xml and contents file
    names, and although it is a .txt file that you
    will make with Notepad or another text editor,
    you must not use the .txt extension, so if your
    editor creates it, rename this file to have no
    extension.
  • dublin_core.xml a Qualified Dublin Core metadata
    file that pertains to the entire item
  • file1 original bitstream (there can be several
    in an item an item can contain, for example, all
    the bitstreams that make up a website)
  • file2 any access copy or copies that may be
    needed to provide access

6
Example item folder
Item folders in EACH collection should be
named using the same names item_001, item_002,
item_003 and so forth.
Example contents file
7
Example dublin_core.xml
ltdublin_coregt ltdcvalue element"title"
qualifier"none"gtDSpace Batch Ingestlt/dcvaluegt
ltdcvalue element"contributor"
qualifier"author"gtKim, Sarahlt/dcvaluegt ltdcvalue
element"description" qualifier"abstract"gtThis
is a Power Point about how to prepare items for
DSpace batch ingest created for Dr. Galloway's
class, Problems in Permanent Retention of
Electronic Records, Spring 2008.lt/dcvaluegt
ltdcvalue element"subject" qualifier"none"gtDigit
al Preservationlt/dcvaluegt ltdcvalue
element"date" qualifier"created"gt2008-03-21lt/dcv
aluegt ltdcvalue element"language"
qualifier"iso"gten_USlt/dcvaluegt
lt/dublin_coregt    Attention! All letters
within ltgt should be LOWER case.
dublin_core.xml can be created and edited with
XML-editors or with Notepad. Reference See
Dublin Core Elements with Qualifiers supported by
DSpace http//www.dspace.org/index.php?optioncom_
contenttaskviewid141
8
Step3 Prepare Linux command lines
  • General format of a single command line
  • /opt/dspace/bin/dsrun org.dspace.app.itemimpo
    rt.ItemImport --add --epersoneperson
    --collectioncollection --sourcesource
    --mapfilename of mapfile
  • Example
  • /opt/dspace/bin/dsrun org.dspace.app.itemimpor
    t.ItemImport --add --epersonsrhkim_at_gmail.com
    --collection2081/2254 --sourceWesker-2007_April
    --mapfile20070406.ingest.map
  • Each collection must have its own command
    line with unique collection ID.
  • There is no particular rule for naming the
    source and map files. However, DSpace
    administrators usually use date.ingest.map
    for the map file name. The map file can be used
    to remove the materials added through the batch
    ingest if something goes wrong. The source
    location should be the name of the collection
    directory that contains the item folders.

E-persons E-mail address
Collection ID
Source location
9
Step4 Conduct Batch Ingest
  • Set up an appointment with the authorized iSchool
    DSpace administrator, Shane Williams or Sam
    Burns.
  • If you have not been working on an iSchool server
    for file preparation, create a source directory
    for the collection and upload items to an iSchool
    server workspace.
  • Conduct a test with a small amount of item
    folders by running test commend line. (For
    test ingest, add --test at the end of each
    command line.)
  • Fix errors if there are any. (DSpace
    administrator will inform you of detected errors.
    Unqualified DC elements, capitalization in DC
    elements, unrecognizable symbols can cause
    errors.)
  • Run the prepared command lines for the actual
    batch ingest. (During the actual ingest,
    DSpace may reject individual items if they have
    errors. If any are rejected, the ingest process
    can be stopped, the errors can be fixed, and the
    process can be resumed you dont have to start
    over again)
  • Quality assurance after ingesting, visit the
    collection in DSpace to ensure the ingest
    completed successfully.
  • Step 2, 3, and 5 need to be conducted by the
    iSchool DSpace administrator.

10
Question or assistance
  • Lorrie Dong lorrie.d_at_gmail.com
  • Sarah Kim srhkim_at_gmail.com
  • Batch Ingest Process Guide
  • https//pacer.ischool.utexas.edu/handle/2081/9226
  • https//pacer.ischool.utexas.edu/handle/2081/8870
Write a Comment
User Comments (0)
About PowerShow.com