Title: Establishing National Digital Repository System employing Harvesting Model
1Establishing National Digital Repository System
employing Harvesting Model
- Surinder Kumar
- Technical Director, NIC, New Delhi
- suri_at_nic.in, 011-24305503
2IRscontd
- At present, the University of Southamptons
worldwide registry of OAI compliant open access
repositories lists more than 1000 repositories.
Number of IRs produced by India is around 50. To
make it available as single virtual archive and
also means of providing seamless search, it is
becoming essential to form a network of connected
research repositories and resource discovery
services to form National digital repository
system. Examples are CARL, ARROW, DRIVER etc
3National Digital Repository System
- To build an appropriate NDRS, analysis of
existing infrastructure are analyzed. - Technology Components
- Requisite Hardware
- OS
- IRs software such as DSpace, Eprints
- Interoperability among IRs is proven with the
development of OAI-PMH protocol by OAI.
4Technical Model of NDRS
- Alma Swan and Chris Awre has mentioned three
models in Linking UK Repositories. These are - Centralized Model
- Distributed Model
- Harvesting Model
5Centralized Model
- metadata and content are submitted directly to a
central server. - Advantages
- Have complete control of the whole process from
article deposition through to the user interface - Software selection
- Able to manage preservation issue
- Disadvantages
- It is an expensive option
- It may surpass the existing institutional
repositories
6Distributed Model
- All metadata and content remain in their source
locations and metadata is searched on the fly. - Advantages
- providing up-to-date metadata as it provides
instant access to source locations of metadata - Relatively very less expensive as compared to
centralized model - Disadvantages
- No enhancement of metadata
- Network dependent
- Not many IRs support Z39.50 or SRU/W
7Harvesting Model
- It is a hybrid model where metadata is harvested
into a central searchable server and also
distributed as content (full text) would be
provided by individual repositories. Under this
model, service provider would harvest metadata
from existing institutional repositories using
the Open Archives Initiatives Protocol for
Metadata harvesting (OAI-PMH). Service provider
can enhanced the quality of metadata and provide
the various services from their centralized
server. The metadata canbe further exposed via
OAI_PMH, SRU/W, RSS feed for use by other service
providers.
8Harvesting Model-advantages
- Advantages
- OAI-PMH is a standard protocol which is easy to
implement - Unqualified Dublin Core is mandated to be
OAI-compliant, however, more complex metadata
schemas can be employed. - The institutional archives employ software which
supports OAI-PMH - Harvesting can be carried out by automatic
scheduled tasks
9Harvesting Model-disadvantages
- Only Unqualified dublin core is mandated for
harvesting, it lacks rich semantic as compared to
other metadata schema - The metadata exposed by the services may not
always latest. Also changes made in metadata may
not be reflected in the central server.
10NDRS-Accepted Harvesting Model
- It is clear that OAI-PMH model has much
advantages as compared to other model - It has gained worldwide acceptance
- It makes easy to share information about
scholarly resources and to offer enhanced
resource discovery tools. - It has been adopted by thousands of institutions
around the world.
11NDRS-benefits
- National Digital Repository system would offer
number of benefits to end users as well to the
various stake holders of the Institutions. - Benefits to IR Administrator
- IR administrator would only maintain the content
of the repository while offering metadata to
service provider. - NDRS would be inbetter position to provide long
term preservation through appropriate metadata
provision and/or content package - It would offer an enhanced metadata to the end
users
12NDRS-benefitscontd
- End Users as readers and searchers
- NDRS would provide end users access to a large
number of repositories rather than accessing
individual repository. - It would push the content to end users through
RSS/ATOM feed. - It would provide document delivery services to
the end users
13NDRS-benefitscontd
- End Users as a content manager
- NDRS would provide means to expose authors work
so as to make their work widely available to
their peers throughout the globe. - It would able to provode provide preservation and
metadata enhancement capabilities to support the
long term storage and access to the content.
14NDRS-benefitscontd
- Content Aggregators
- NDRS would offer added-value services of their
own to enhance aggregated metadata and supply
this back to the repository concerned. - IT would provide a single point of information
for statistics about access and downloads of
data. - It would offer a single point of information to
multiple source of research and other materials
to aid discovery. - It would able to provide certain collections by
adding value added services on top of it.
15Impediments in implementing in NDRS
- Technical issues at data provider levels such as
installation of IR software, server, server
malfunctioning, backup of data and updating of IR
software etc whereas in case of service provider
level, successful harvesting of data involves
error free network, the proper use of Dublin
core metadata field, data sets and problems with
the correct use of date stamp etc. - Coordination among IR members
- Federated Authentication and Authorization
- Long term preservation, format, migration and
access - Sustainability in providing ling term access to
NDRS
16Current Scenarios of Institutional Repositories
in India
- Registry of Open Access Repositories (ROAR) lists
52 repositories have been registered, however,
this number may be higher as certain repositories
have yet not been registered with ROAR. - Analysis of IRs in India
- Out of 52, 13 were not functional at the time of
writing paper - Number of them have not been updating
- To look further, it is not reaching the critical
mass
17Current Scenario..contd
- As per survey conduced by Webometrics 2010 for
latest ranking of Worlds open access
repositories for visibilities, quality and
available items18, there are seven repositories
listed from India and their details as given in
the following table.
18Sr No. Rank Name of IR Year of establishment No of records
1 82 Indian Institute of Science 05-04-2004 19477
2 148 OpenMed, National Informatics Centre 22-03-2005 2645
3 180 Indian Statistical Institute digital Library 17-01-2004 188
4 218 Indian Institute of Astrophysics 11-11-2004 2468
5 245 National Institute of Oceanography Digital library 06-04-2010 3528
6 278 Raman Research Institute Digital Library 19-04-2005 3731
6 278 National Aerospace Laboratories Institutional Repository 9-11-2004 3164
19Current Scenario-service providers
- There are 9 service providers in the country who
are harvesting data majority of them follows
OAI-PMH and harvesting software used is PKP
Harvester. Out of 9, four are not functional,
though these are highly cited in the literature.
20Proposed NDRS
- Establishing successful, well populated National
level repositories, we need to look at prevailing
information system in our country. For example,
ICMR, CSIR, ICAR, Envis, Deptt of Atomic Energy,
ISRO. Onus should be on those national
information system should able to provide
publications arising out of public funded
research should make it available free of cost to
researchers
21(No Transcript)
22NDRS-Recommendations
- There is a need of national body in the country
as in JISC in UK who is providing advisory as
well technical services to individual
repositories - Responsibility should be given to National level
organizations to set up a national resource
centre that should harvest data from their
respective institutional repositories - Develop strategies to make institutional
repositories a permanent and sustainable part of
the national and local research infrastructure - Guidelines to the respective institutional
members mediate deposit or voluntary deposit and
needs for mandatory deposit of papers and
dissertation - Develop guidelines for metadata entry and best
practices followed
23Conclusion
- There is a new challenge to create an environment
based on OAI protocol so that public funded
research should be made available to the whole
community - National level body is needed so that development
in institutional repositories should be more
coherent as it may able to provide the best
advisory services and adoption of guidelines set
and best practices followed by various national
level systems such as DRIVER, DAREnet, HAL
24Thanks