Title: From eprint archives to open archives and OAI: the Open Citation project
1From eprint archives to open archives and OAI
the Open Citation project
- By The Open Citation Project team
- Presented by Steve Hitchcock, Southampton
University - These slides prepared for the JISC/NSF Digital
Libraries Initiative (DLI) All Projects Meeting,
Edinburgh, 24-25th June 2002 - OpCit is a joint JISC-NSF
- International Digital Libraries Project 1999-2002
2About this presentation
- The aim is to
- Report progress since Stratford All-Projects
meeting in 2000 - Demonstrate new services developed by the
project - Highlight the role of the project in the Open
Archives Initiative - Outline key tasks remaining
- Look beyond the Open Citation Project
3Recap 1 principal partners
- Southampton University, IAM (Intelligence,
Agents, Multimedia) Research Group, PI Stevan
Harnad - Citation-ranked search, EPrints.org, user surveys
- Cornell University, Digital Library Research
Group, PI Carl Lagoze - Architecture for reference linking, experiments
with the ACM Digital Library and D-Lib magazine,
OAI technical support center - arXiv.org, Paul Ginsparg
- Now based at Cornell University. Still the
largest archive of freely accessible
author-deposited scientific papers
4The Open Citation Project deliverables
- The Open Citation Project (OpCit) is developing
software and services to support the Open
Archives Initiative (OAI). OpCit can help OAI
data providers and service providers - Citebase citation-ranked search
- EPrints.org software free software to build and
manage OAI-compliant eprint archives - API for reference linking, an interface on which
reference linking applications can be built
5Recap 2 last time at Stratford
- Reference links on pdf copies of papers
6Citebase, a new interface to the scholarly
literature
7Citebase, a citation-ranked search engine
- http//citebase.eprints.org/
- Google for the refereed literature
- Citebase is based on a citation database
- Harvests metadata using OAI-PMH
- Extracts reference lists from arXiv papers
- Provides impact (and other)-ranked search based
on reference data - Re-exports metadata references
8Evaluating Citebase
- http//citebase.eprints.org/survey/
- The evaluation is aimed at users of arXiv, and
all others who use bibliographic services to
access the refereed journal literature. You can
contribute (June-July 2002) using the form linked
above. - Aims of the evaluation
- Discover the users awareness of related
services - Assess usability with a practical exercise
- Invite the users views on the main features
- Assess the level of user satisfaction with the
service
9Citebase further developments
- OpenURL-enabled pointing Citebase links at
library and journal services - Google interface using DP9 getting Citebase
results, and open archives, into Google - Metadata format and XML schema for citations
making citation metadata harvestable via OAI-PMH.
Possible formats include - Academic Metadata Format a local profile
format, some collaborative experiments performed
within OpCit - OpenURL metadata, moving towards NISO
standardisation
10Recap 3 API for reference linking
- getLinkedText contents of the paper,
reference-linked plus lots of metadata for the
paper - getReferenceList this papers references
getCurrentCitationList the list of
works citing this paper (best knowledge) - getMyData metadata for this paper
11Surrogates in the API
- Based on an automatic analysis of the work, a
surrogate for a scholarly work (and of other
works, for citations), consists of the following
three XML files - Bibliographic data for the scholarly work
- References contained in that work, and their
contexts within the full text - Citations of that work
12API evaluation
- API tested on D-Lib Magazine and the ACM Digital
Library. Try demo at http//cs-tr.cs.cornell.edu/R
efLinkingDemo/ - Performance (in terms of accuracy of data
extracted) - Reference analysis 86.7
- Item analysis (bib data, contexts, and
references for a given paper) 82.42 - Implementability
- Simple interface Surrogate s new Surrogate
(some-url) - Portable written in Java, has run in Solaris,
Win2K, and NT4 - Installation API source code plus public domain
jar files
13EPrints.org software
- http//www.eprints.org/
- Generates eprint archives that are compliant with
the Open Archives Initiative Protocol for
Metadata Harvesting. EPrints is free (GPL)
software. It is aimed at organisations and
communities. - EPrints v. 2.0 released February 2002 (now on v.
2.0.1, which fixes bugs and typos). Features - Internationalised metadata stored as Unicode
- Support for multiple archives on one server
- Improved user interface
14OpCit and OAI
- OAI Aggregator (Celestial) collecting and
caching the results from OAI data providers to
improve the efficiency of data harvesting
http//celestial.eprints.org - OAI infrastructure proxies, caches, gateways.
Improve interoperability, scalability and
reliability of OAI services. Joint work with Old
Dominion University, see paper http//arxiv.org/ab
s/cs.DL/0205071 - OAI Registration and Validation performed at
Cornell http//www.openarchives.org/Register/Brows
eSites.pl
15EPrints and OAI
- EPrints feeds repository URLs straight into the
OAI registration process (if so desired by the
EPrints administrator) - A scan of the OAI database of registered sites
shows many sites use EPrints software to create
repositories
16A repository administrators view of OAI
- As we have introduced our repository to our
faculty and staff, we have emphasized the point
that because they would be depositing their
material in an OAI-compliant archive, it would
automatically and painlessly be discoverable from
various other points around the globe. Luckily,
we were right. - Roy Tennant, eScholarship, California Digital
Library, June 2002 http//www.ecs.soton.ac.uk/har
nad/Hypermail/Amsci/2085.html
17OpCit user surveys and data mining
- Maximising impact
Maximising access - Results from Mining the Social Life of an Eprint
Archive http//opcit.eprints.org/tdb198/opcit/ - When interoperability is not enough show authors
what users do when open access services are
available
18Key project tasks remaining
- OpCit formally ends in September 2002. Before
then - Evaluation and reporting of the results
- Programmer's guide to using the API
- Journal and conference papers
- Final reports to JISC and NSF
19Beyond OpCit
- Beyond the project, the following will continue
to be developed - Citebase
- EPrints.org
- OAI
- and variously applied in the JISC FAIR
programme (start 2002) - http//www.jisc.ac.uk/dner/development/programmes/
fair.html - Targeting Academic Research for Deposit and
Disclosure (lead institution Southampton
University) - e-prints UK (RDN, Kings College London)
citation analysis service for eprints database - Machine-readable rights metadata (Loughborough
University)
20What we have achieved what we have learned
- OAI is gathering momentum
- Software for building OAI repositories is
available - Institutional archives are being created, but
need to be filled by authors - Attracting authors requires evidence of services
that will improve the visibility and impact of
their works - Citation-ranked search and reference linking are
examples of OAI services that do this - The infrastructure supporting OAI services
continues to be enhanced - Resource discovery and current awareness are
exemplar OAI services now. Future services may be
preservation management, and personalization
21Credits
- Other contributors to the project include
- Technical development at Southampton is directed
by Les Carr - Research at Cornell by Donna Bergmark
- EPrints.org software is being developed by Chris
Gutteridge - Citebase is produced and managed by Tim Brody
- Project manager is Steve Hitchcock
- A copy of these slides can be found on the OpCit
Web site - http//opcit.eprints.org/. Look for Papers and
Presentations - Contact Steve Hitchcock sh94r_at_ecs.soton.ac.uk