Title: Repositories, Data and the RQF
1Repositories, Data and the RQF (or How to tackle
some of the RQF with FezFedora) Andrew
Bennett University of Queensland Library
2A Brief Overview
Some architectural decisions a view from 6000
metres up and making systems play together
The Institutional Repository a few things to
know about UQ eSpace
Enabling UQ eSpace for RQF where DID we get all
that data from?
Bundling it all up Checking, de-duplication and
verification of publication data
What do you mean its all over ? what else to do
with some of the data until next time ?
www.uq.edu.au
3Some Early Decisions
We decided that we would NOT attempt to make the
repository do everything
Library Would concentrate on developing and
enhancing the functionality of the Institutional
Repository and work with the Office of the DVC
Research to identify and ingest as much data as
possible to save manual re-entry Information
Technology services Would develop the
institutional Evidence Portfolio System and
systems to facilitate feedback from academic
community. Would also work on functionality to
provide the institutional submission to the DEST
IMS Office of the DVC Research Staff from here
would take a coordinating role and work with both
development teams to analyse and determine the
required functionality and to recruit, check and
if necessary, create content
4Early information workflow
5Early information workflow
Where we started
6Core Systems and Data Sources
Light - Weight DSS
7Core Systems and Data Sources
Light - Weight DSS
8(No Transcript)
9UQ eSpace RepositoryPublications Data
Consolidation Project
UQ eSpace was to become the authoritative UQ
source of bibliographic data for RQF UQ eSpace
will also eventually replace functionality
currently provided from ResearchMaster as the
primary data source for the DEST HERDC
submission
- Why was eSpace population so important for the
RQF? - Body of Work is one very significant area where
RQF requirements were fairly predictable - RQF Submission of bibliographic data was
critically dependent on the success of this
project - RQF Review Panel access to Research Outputs was
critically dependent on the success of this
project - We already had a reasonable start on data from
our practice runs and trial assessments
10UQ eSpace RepositoryPublications Data
Consolidation Project
11Sources of Data
Data was ingested into the repository from
multiple sources.
Existing publications records The repository
contained records created as part of our 2005 and
2006 trail-runs of the internal UQ research
assessment Exercise (RAE)
Thompson National Citation Report data
set Bibliographic and citation performance data
was purchased for Australian Universities up to
October 2006.
Publications Records from our Research Management
Solution Approximately 55,000 publication records
were ingested from the Universitys Research
Management system (Research Master)
Data from Academic CV, Citation Analyses and
Endnote Libraries Additional records were able to
be imported from other sources including
literature searches, citation reports, endnote
Libraries and curriculum vitae This required
specialised tools and filters to be built which
are now part of the release of FezFedora
12(No Transcript)
13UQ eSpace RepositoryRecord De-duplication and
checking
14Deduplication and Checking
In many instances multiple records now existed
with overlapping and sometimes conflicting data.
Needed a mechanism to try to match records which
were duplicates Algorhythmic matching based on a
combination of factors including ISI-LOC, and
fuzzy logic on keyword/title/name/publication
Developed a mechanism to present lists of
duplicates Some records could be automatically
merged but other required human oversight
Matching the right fields from different
records A major factor in the merge process was
ensuring that when a duplicate record was
discarded, no data was lost
Matching repository content models to RQF
Technical Specification Many of the existing
content models needed to be updated to handle new
fields and display methods
15(No Transcript)
16(No Transcript)
17(No Transcript)
18(No Transcript)
19(No Transcript)
20UQ eSpace RepositoryPublications Data
Consolidation Project
21(No Transcript)
22Passing Data back to the EPS
The Evidence Portfolio system needs to extract
and update data based on records in the repository
Simple web services were created to expose
records to EPS via XML A number of services make
records available based on arguments passed,
including a list of authors, single records,
collections and communities
XACML based policies protect data from
unauthorised changes The data checking team is
able to edit only the fields that need to be
changed
Workflow tracking and management The data
checking team uses our issue tracking system and
messages can be exchanged between that and the
EPS to indicate records needing updates or which
have been completed
Academic Community is able to view/check
publications in the EPS Each academic is
presented with a single view of all their RQF
eligible publications in one place in the EPS
yet the underlying data is sourced from the
institutional repository
23Fitting It All Together
24Not quite the end of the story
Users of the DEST IMS need to be able to directly
access the published outputs which are held in
the repository
The DEST IMS communicates directly with the
repository via basic authentication using a
secret username and password A web service in the
repository can then present the PDF data-stream
directly into the web browser of the DEST
Assessor without displaying any local repository
metadata or graphical presentation. Publisher
outputs are stored on the record in the
repository The same XACML policy engine which
protects key fields can also be used to
selectively hide the datastream containing a
published version of the output to all BUT the
authorised DEST user OF COURSE . HARVESTING or
OBTAINING the published versions is a completely
kettle of fish . . .
25A couple of final words on IRs
Scalability and performance become even more
critical when you are relying on your IR so
heavily
Test your system and architecture to see how it
scales Be sure that your hardware and system is
capable of scaling to not just hold 5000 objects,
but also serve them up in a timely fashion. What
about 10000? 55,000? FezFedora is currently
being tested with an ingest of over 200,000
records ! Backup, archiving and preservation
remain critical Issues of backup, archiving and
preservation of records which have been submitted
to RQF require careful consideration .. Can you
correct/edit records for the public repository
yet retain the archival version submitted? What
strategies have you in place to ensure that the
full-text versions remain readable and usable ?
AONS2 of course is a good answer
26Other benefits of all this data
Development of the Universitys Research Profile
system With the migration of some content from
Research Master to the Repository it has created
an opportunity to redevelop some of the other
services which make ise of publications
information
Critical mass of publications in your IR Using
your IR to support RQF will most likely end up
filling it with an emormous amount ofd valuable
publications information .. Great for recruiting
further content or for showcasing aspects of your
institutions research If you are also able to
expose the metadata to harvester such as Google,
OAISTER etc you will see an enormous increase in
traffic to your IR and records too
Use the IR make your Research more Accessible By
depositing publications in your IR and including
legal version sof the outputs, you dramatically
increase the accessibility of your research
outputs
27All done .. Thankyou!
For more information about UQ eSpace or this
presentation Andrew Bennett or Belinda
Weaver The University of Queensland Library The
University of QueenslandBrisbane QLD
4072 Australia Telephone 07 33464342 Web
http//espace.library.uq.edu.au Email
a.bennett_at_library.uq.edu.au