Title: International Conference on Developing Digital Institutional Repositories: Experiences and Challenges December 9-10, 2004, Hong Kong DSpace in Action Implementing the HKUST Institutional Repository System
1International Conference on Developing Digital
Institutional Repositories Experiences and
ChallengesDecember 9-10, 2004, Hong KongDSpace
in ActionImplementing theHKUST Institutional
Repository System
- Presented by K.T. Lam
- Head of Library Systems
- The Hong Kong University of Science and
Technology Library - lblkt_at_ust.hk
2Table of Contents
- From Idea to Creation
- Why have an IR?
- IR Software Selection
- Major Features
- Future Improvements
- Conclusions
3From Idea to Creation
- The idea of establishing an IR originated from a
staff development workshop at HKUST Library on 26
November 2002, where Kimberly Douglas was invited
to speak on E-prints, OAI and Institutional
Repository. - After the workshop, a Task Force was formed to
investigate the idea. - After two months of software evaluation, DSpace
was selected to build the Repository.
4From Idea to Creation (cont.)
- The IR System at HKUST was brought to life in
February 2003, with the following configuration
and data content - DSpace Version 1.01
- Server with Intel Pentium III 733 MHz, 512 MB
RAM, and RedHat Linux Release 7.3 - 105 Computer Science Technical Reports
5From Idea to Creation (cont.)
- Background / Experience Facilitating the Creation
- HKUST Library is an early supporter of the Open
Access concept - joined SPARC (Scholarly
Publishing Academic Resources Coalition) in
2001 - Experience of conducting digital libraries
projects, with CJK capabilities - Electronic Course Reserve - 1993
- Digital University Archives and Electronic Theses
- 1997 - etc.
6From Idea to Creation (cont.)
- Why have an IR?
- To create a permanent record of the scholarly
output of HKUST - No available access to some scholarly works
published by our own faculty - Collections of working papers, technical reports,
research reports floating around - Some of our scholarly works are in the public
domain
7From Idea to Creation (cont.)
- Why have an IR? (cont.)
- To make HKUSTs scholarly output more globally
and openly accessible - To support the international Open Access effort.
- The mission of disseminating knowledge is
only half complete if it is not widely and
readily available to society - Berlin
Declaration (http//www.zim.mpg.de/openaccess-berl
in/berlindeclaration.html)
8From Idea to Creation (cont.)
- IR Software Selection
- The July/August 2004 issue of Library Technology
Reports provides a very detailed discussion on
institutional repository systems and functional
requirements
9From Idea to Creation (cont.)
- IR Software Selection (cont.)
- Decision in the first meeting of the IR Task
Force in mid December 2002 - follow Caltech's model, i.e. to base our IR on
open source software and with OAI-PMH interface. - We therefore evaluated two IR systems EPrints
and DSpace
10From Idea to Creation (cont.)
- IR Software Selection (cont.)
- EPrints
- Developed by University of Southampton
- The very first open source IR software since
2000 - Written in Perl, with MySQL database and Apache
Web server
11From Idea to Creation (cont.)
- IR Software Selection (cont.)
- DSpace
- Jointly developed by MIT Libraries and
Hewlett-Packard Company - Open source software
- Released on Sourceforge during our system
evaluation period in late December 2002 - Written in Java, with PostgreSQL database, Lucene
search engine, and a Tomcat web servlet container
12From Idea to Creation (cont.)
- IR Software Selection (cont.)
- We chose (almost two years ago) DSpace because
- DSpace began the development with the experience
gained from EPrints - the very first and most
popular open source IR software at that time - EPrints did not have full support on Unicode and
is not Java- and servlet-based - Both EPrints and DSpace are open source software,
fulfill our functional requirements, and follow
state-of-the-art library standards
13Current Configuration of IR at HKUST
- As of 4 December 2004,
- Home URL http//repository.ust.hk/
- IR Software DSpace Version 1.2
- System Software Fedora Core 2 Linux Tomcat 5.0
- JDK1.4.2
- Server Intel Pentium 4 2.4GHz, 1GB RAM
- Content 1650 documents from 38 Departments
- Usages Documents were accessed 9,051 times
in the previous month
14(No Transcript)
15Growth (May 2003 to September 2004)
16Major Features
- This section covers the following topics
- Data structure
- Document submission form
- Add item form
- CJK support
- OAI data provider
- SRW/U interface
- Google pilot project
- Authentication and authorization
17Major Features (cont.)
- Data Structure
- Document Types
- Preprints, technical reports, working papers,
conference papers, journal articles,
presentations, book chapters, patents, theses,
etc. - Document Formats
- Mainly PDF files also contains PowerPoint files
18Major Features (cont.)
- Data Structure (cont.)
- DSpace data model
- Communities (and sub-communities)
- Collections
- Items
- Metadata
- Bundles of bitsteams
- HKUST implementation Items are grouped by
Departments (i.e. communities) and then by
Document Types (i.e. collections).
19(No Transcript)
20(No Transcript)
21Major Features (cont.)
- Document Submission Form
- Faculty are apathetic about self-submission
- DSpaces submission and workflow functions are
too lengthy might scare off faculty - In need of a simple and effortless submission
form - as a quick medium for submitting documents
22Major Features (cont.)
- Document Submission Form (cont.)
- Decided to develop our own form
- Requires only very minimal data entry
- Non-exclusive distribution license agreement
- Library IR staff enhance the metadata of the
submissions and then add them to DSpace - -------
- Written in Perl
- Submitted data stored in DSpace Simple Archive
Format
23(No Transcript)
24(No Transcript)
25Major Features (cont.)
- Add Item Form
- Locally developed JSP application to add items to
DSpace by Library IR staff - Allows IR staff to
- Create new item from scratch
- Enhance the metadata from faculty submission and
then add the item to DSpace
26(No Transcript)
27(No Transcript)
28Major Features (cont.)
- CJK (Chinese, Japanese, Korean) Support
- DSpace supports Unicode
- Problem - Lucene search engine is unable to
search by CJK characters - Solved by replacing DSpaces Tokenizer with a
CJKTokenizer - but has an interesting side effect - Problem - URL of query containing CJK characters
is not properly encoded - Solved by setting Tomcat URIEncoding"UTF-8" and
adding URLEncode() to one line of the java source
code
29(No Transcript)
30(No Transcript)
31(No Transcript)
32Major Features (cont.)
- OAI Data Provider
- DSpace is OAI-compliant
- This means that OAI harvesters can easily collect
the metadata (in Dublin Core format) from various
IRs (including HKUSTs) for their added-value
indexing/searching services. - For example OAIster
- OAI Path to IR at HKUST
- http//repository.ust.hk/dspace-oai/request?
33(No Transcript)
34Major Features (cont.)
- SRW/U Interface
- Search and Retrieval for the Web (or by URL)
- Retain core functionality of Z39.50 but in the
form of web services - This means search service providers can broadcast
a search to various IRs and deliver the search
results in their own GUI interface - SRW/U Interface for the IR at HKUST
- Based on OCLCs SRW/U software
- URL http//repository.ust.hk/SRW/
35The results of a SRW/U search, with XSLT
transformation
36Major Features (cont.)
- Google Pilot Project
- Initiated in March 2004 by the DSpace user
community under the leadership by MacKenzie Smith - To improve access to DSpace IRs from within
Google - HKUST is a participant of this project
- Result - created a restrictdspace search filter
for use in the Google URL. For example - http//www.google.com/search?restrictdspaceqcol
laboration
37(No Transcript)
38Major Features (cont.)
- Authentication and Authorization
- Authentication - by EPerson record created
through user registration - Authorization - based on the policy settings on
the object (community, collection, item,
bitstream, etc.) - AA are not a big concern to our IR
- We do not use DSpaces submission and workflow
functions - It is open to the public
- AA only required when our library IR staff
access DSpaces administration functions
39Major Features (cont.)
- DSpace Authentication and Authorization (cont.)
- We have however customized DSpace to allow for
campus-wide LDAP authentication - Mainly for a different project that also uses
DSpace (Digital University Archives). - Transparent creation of EPerson record on-the-fly
during authentication - We have also investigated the feasibility of
hooking DSpace with Yales Central Authentication
Services - With only little success - due to cumbersome
stage transfer from authentication to
authorization
40(No Transcript)
41Future Improvements
- Flatten communitycollection structure - 2-level
only, not deep enough - Linked collection - a collection that belongs to
more than one community - Unable to search across multiple collections from
multiple communities - Query Syntax not apparent to users, e.g.
- water rapid for exact word match
- "vapor generator" for phrase search
42Future Improvements (cont.)
- Insufficient capability for sorting search
results - Unable to display the number of items in a
community and in a collection - We have developed a JSP page to display the size
of the Repository - Does not have the capability of transferring an
item from one collection to another nor a
collection from one community to another - DSpace is open source software its success
depends on contributions from its user community
43Conclusions
- DSpace was selected about two years ago to build
the HKUST IR. - Make HKUST's scholarly research more openly and
globally accessible. - Installing DSpace is straightforward, but
tailoring it to work effectively in your
institutional environment is not trivial.
44Conclusions (cont.)
- Customization
- CJK support with UTF-8 encoding
- Driven by the fact that faculty are apathetic
about self-submission, a simple document
submission form was developed. - Developed the Add Item Form to allow IR staff
to add items to DSpace without the need of batch
importing
45Conclusions (cont.)
- By having the following implementations
- DSpace's built-in OAI support
- OCLC's SRW/U on DSpace
- Googles DSpace search filter
- documents in the Repository are more fully
exposed on the Internet for easy harvesting,
searching and discovery
46Conclusions (cont.)
- Finally, many many thanks to the DSpace team from
MIT and HP for developing this high quality open
source product!
Thank you! ? ?!