International Conference on Developing Digital Institutional Repositories: Experiences and Challenges December 9-10, 2004, Hong Kong DSpace in Action Implementing the HKUST Institutional Repository System - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

International Conference on Developing Digital Institutional Repositories: Experiences and Challenges December 9-10, 2004, Hong Kong DSpace in Action Implementing the HKUST Institutional Repository System

Description:

International Conference on Developing Digital Institutional Repositories: Experiences and Challenges December 9-10, 2004, Hong Kong DSpace in Action – PowerPoint PPT presentation

Number of Views:281
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: International Conference on Developing Digital Institutional Repositories: Experiences and Challenges December 9-10, 2004, Hong Kong DSpace in Action Implementing the HKUST Institutional Repository System


1
International Conference on Developing Digital
Institutional Repositories Experiences and
ChallengesDecember 9-10, 2004, Hong KongDSpace
in ActionImplementing theHKUST Institutional
Repository System
  • Presented by K.T. Lam
  • Head of Library Systems
  • The Hong Kong University of Science and
    Technology Library
  • lblkt_at_ust.hk

2
Table of Contents
  • From Idea to Creation
  • Why have an IR?
  • IR Software Selection
  • Major Features
  • Future Improvements
  • Conclusions

3
From Idea to Creation
  • The idea of establishing an IR originated from a
    staff development workshop at HKUST Library on 26
    November 2002, where Kimberly Douglas was invited
    to speak on E-prints, OAI and Institutional
    Repository.
  • After the workshop, a Task Force was formed to
    investigate the idea.
  • After two months of software evaluation, DSpace
    was selected to build the Repository.

4
From Idea to Creation (cont.)
  • The IR System at HKUST was brought to life in
    February 2003, with the following configuration
    and data content
  • DSpace Version 1.01
  • Server with Intel Pentium III 733 MHz, 512 MB
    RAM, and RedHat Linux Release 7.3
  • 105 Computer Science Technical Reports

5
From Idea to Creation (cont.)
  • Background / Experience Facilitating the Creation
  • HKUST Library is an early supporter of the Open
    Access concept - joined SPARC (Scholarly
    Publishing Academic Resources Coalition) in
    2001
  • Experience of conducting digital libraries
    projects, with CJK capabilities
  • Electronic Course Reserve - 1993
  • Digital University Archives and Electronic Theses
    - 1997
  • etc.

6
From Idea to Creation (cont.)
  • Why have an IR?
  • To create a permanent record of the scholarly
    output of HKUST
  • No available access to some scholarly works
    published by our own faculty
  • Collections of working papers, technical reports,
    research reports floating around
  • Some of our scholarly works are in the public
    domain

7
From Idea to Creation (cont.)
  • Why have an IR? (cont.)
  • To make HKUSTs scholarly output more globally
    and openly accessible
  • To support the international Open Access effort.
  • The mission of disseminating knowledge is
    only half complete if it is not widely and
    readily available to society - Berlin
    Declaration (http//www.zim.mpg.de/openaccess-berl
    in/berlindeclaration.html)

8
From Idea to Creation (cont.)
  • IR Software Selection
  • The July/August 2004 issue of Library Technology
    Reports provides a very detailed discussion on
    institutional repository systems and functional
    requirements

9
From Idea to Creation (cont.)
  • IR Software Selection (cont.)
  • Decision in the first meeting of the IR Task
    Force in mid December 2002
  • follow Caltech's model, i.e. to base our IR on
    open source software and with OAI-PMH interface.
  • We therefore evaluated two IR systems EPrints
    and DSpace

10
From Idea to Creation (cont.)
  • IR Software Selection (cont.)
  • EPrints
  • Developed by University of Southampton
  • The very first open source IR software since
    2000
  • Written in Perl, with MySQL database and Apache
    Web server

11
From Idea to Creation (cont.)
  • IR Software Selection (cont.)
  • DSpace
  • Jointly developed by MIT Libraries and
    Hewlett-Packard Company
  • Open source software
  • Released on Sourceforge during our system
    evaluation period in late December 2002
  • Written in Java, with PostgreSQL database, Lucene
    search engine, and a Tomcat web servlet container

12
From Idea to Creation (cont.)
  • IR Software Selection (cont.)
  • We chose (almost two years ago) DSpace because
  • DSpace began the development with the experience
    gained from EPrints - the very first and most
    popular open source IR software at that time
  • EPrints did not have full support on Unicode and
    is not Java- and servlet-based
  • Both EPrints and DSpace are open source software,
    fulfill our functional requirements, and follow
    state-of-the-art library standards

13
Current Configuration of IR at HKUST
  • As of 4 December 2004,
  • Home URL http//repository.ust.hk/
  • IR Software DSpace Version 1.2
  • System Software Fedora Core 2 Linux Tomcat 5.0
  • JDK1.4.2
  • Server Intel Pentium 4 2.4GHz, 1GB RAM
  • Content 1650 documents from 38 Departments
  • Usages Documents were accessed 9,051 times
    in the previous month

14
(No Transcript)
15
Growth (May 2003 to September 2004)
16
Major Features
  • This section covers the following topics
  • Data structure
  • Document submission form
  • Add item form
  • CJK support
  • OAI data provider
  • SRW/U interface
  • Google pilot project
  • Authentication and authorization

17
Major Features (cont.)
  • Data Structure
  • Document Types
  • Preprints, technical reports, working papers,
    conference papers, journal articles,
    presentations, book chapters, patents, theses,
    etc.
  • Document Formats
  • Mainly PDF files also contains PowerPoint files

18
Major Features (cont.)
  • Data Structure (cont.)
  • DSpace data model
  • Communities (and sub-communities)
  • Collections
  • Items
  • Metadata
  • Bundles of bitsteams
  • HKUST implementation Items are grouped by
    Departments (i.e. communities) and then by
    Document Types (i.e. collections).

19
(No Transcript)
20
(No Transcript)
21
Major Features (cont.)
  • Document Submission Form
  • Faculty are apathetic about self-submission
  • DSpaces submission and workflow functions are
    too lengthy might scare off faculty
  • In need of a simple and effortless submission
    form - as a quick medium for submitting documents

22
Major Features (cont.)
  • Document Submission Form (cont.)
  • Decided to develop our own form
  • Requires only very minimal data entry
  • Non-exclusive distribution license agreement
  • Library IR staff enhance the metadata of the
    submissions and then add them to DSpace
  • -------
  • Written in Perl
  • Submitted data stored in DSpace Simple Archive
    Format

23
(No Transcript)
24
(No Transcript)
25
Major Features (cont.)
  • Add Item Form
  • Locally developed JSP application to add items to
    DSpace by Library IR staff
  • Allows IR staff to
  • Create new item from scratch
  • Enhance the metadata from faculty submission and
    then add the item to DSpace

26
(No Transcript)
27
(No Transcript)
28
Major Features (cont.)
  • CJK (Chinese, Japanese, Korean) Support
  • DSpace supports Unicode
  • Problem - Lucene search engine is unable to
    search by CJK characters
  • Solved by replacing DSpaces Tokenizer with a
    CJKTokenizer - but has an interesting side effect
  • Problem - URL of query containing CJK characters
    is not properly encoded
  • Solved by setting Tomcat URIEncoding"UTF-8" and
    adding URLEncode() to one line of the java source
    code

29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
Major Features (cont.)
  • OAI Data Provider
  • DSpace is OAI-compliant
  • This means that OAI harvesters can easily collect
    the metadata (in Dublin Core format) from various
    IRs (including HKUSTs) for their added-value
    indexing/searching services.
  • For example OAIster
  • OAI Path to IR at HKUST
  • http//repository.ust.hk/dspace-oai/request?

33
(No Transcript)
34
Major Features (cont.)
  • SRW/U Interface
  • Search and Retrieval for the Web (or by URL)
  • Retain core functionality of Z39.50 but in the
    form of web services
  • This means search service providers can broadcast
    a search to various IRs and deliver the search
    results in their own GUI interface
  • SRW/U Interface for the IR at HKUST
  • Based on OCLCs SRW/U software
  • URL http//repository.ust.hk/SRW/

35
The results of a SRW/U search, with XSLT
transformation
36
Major Features (cont.)
  • Google Pilot Project
  • Initiated in March 2004 by the DSpace user
    community under the leadership by MacKenzie Smith
  • To improve access to DSpace IRs from within
    Google
  • HKUST is a participant of this project
  • Result - created a restrictdspace search filter
    for use in the Google URL. For example
  • http//www.google.com/search?restrictdspaceqcol
    laboration

37
(No Transcript)
38
Major Features (cont.)
  • Authentication and Authorization
  • Authentication - by EPerson record created
    through user registration
  • Authorization - based on the policy settings on
    the object (community, collection, item,
    bitstream, etc.)
  • AA are not a big concern to our IR
  • We do not use DSpaces submission and workflow
    functions
  • It is open to the public
  • AA only required when our library IR staff
    access DSpaces administration functions

39
Major Features (cont.)
  • DSpace Authentication and Authorization (cont.)
  • We have however customized DSpace to allow for
    campus-wide LDAP authentication
  • Mainly for a different project that also uses
    DSpace (Digital University Archives).
  • Transparent creation of EPerson record on-the-fly
    during authentication
  • We have also investigated the feasibility of
    hooking DSpace with Yales Central Authentication
    Services
  • With only little success - due to cumbersome
    stage transfer from authentication to
    authorization

40
(No Transcript)
41
Future Improvements
  • Flatten communitycollection structure - 2-level
    only, not deep enough
  • Linked collection - a collection that belongs to
    more than one community
  • Unable to search across multiple collections from
    multiple communities
  • Query Syntax not apparent to users, e.g.
  • water rapid for exact word match
  • "vapor generator" for phrase search

42
Future Improvements (cont.)
  • Insufficient capability for sorting search
    results
  • Unable to display the number of items in a
    community and in a collection
  • We have developed a JSP page to display the size
    of the Repository
  • Does not have the capability of transferring an
    item from one collection to another nor a
    collection from one community to another
  • DSpace is open source software its success
    depends on contributions from its user community

43
Conclusions
  • DSpace was selected about two years ago to build
    the HKUST IR.
  • Make HKUST's scholarly research more openly and
    globally accessible.
  • Installing DSpace is straightforward, but
    tailoring it to work effectively in your
    institutional environment is not trivial.

44
Conclusions (cont.)
  • Customization
  • CJK support with UTF-8 encoding
  • Driven by the fact that faculty are apathetic
    about self-submission, a simple document
    submission form was developed.
  • Developed the Add Item Form to allow IR staff
    to add items to DSpace without the need of batch
    importing

45
Conclusions (cont.)
  • By having the following implementations
  • DSpace's built-in OAI support
  • OCLC's SRW/U on DSpace
  • Googles DSpace search filter
  • documents in the Repository are more fully
    exposed on the Internet for easy harvesting,
    searching and discovery

46
Conclusions (cont.)
  • Finally, many many thanks to the DSpace team from
    MIT and HP for developing this high quality open
    source product!

Thank you! ? ?!
Write a Comment
User Comments (0)
About PowerShow.com