Title: CCinterop: A Post Mortem Strathclyde University, 13th October 2004
1 CC-interop A
Post Mortem - Strathclyde University, 13th
October 2004
CC-interop A Post Mortem http//ccinterop.cdlr.
strath.ac.uk/ George Macgregor Gordon
Dunsire, Centre for Digital Library
Research, Department of Computer Information
Sciences
2 CC-interop A Post Mortem - Strathclyde
University, 13th October 2004
- Introduction
- George Introduction / Background and Work
Package A (Work Package C) - Gordon Work Package B and the Future!!!!
- All project reports are available at
http//ccinterop.cdlr.strath.ac.uk/
3 CC-interop A Post Mortem - Strathclyde
University, 13th October 2004
- What is a Distributed Catalogue?
- Each institution has a database located at the
heart of their Library Management System - This database can be accessed from outside the
institution - Searches can be performed using Z39.50
- - Z information retrieval protocol
- - A broadcast search can be conducted
(involves searching multiple databases / targets
simultaneously Virtual Union Catalogue or
CLUMP if you prefer!) - Software gathers results from the remote
databases and presents them to the user - Search can be a sub-set of databases available
(e.g. CAIRNS has 1-19, InforM25 1-36 in and
around London area)
4 CC-interop A Post Mortem - Strathclyde
University, 13th October 2004
- What is COPAC?
- COPAC the CURL OPAC
- Institutional databases copied and fused
together - Thus producing a single, mammoth, database
- Weekly data loads
- 26 UK library members, including BL
- Administered by MIMAS, Manchester Computing
- On behalf of the JISC
5 CC-interop A Post Mortem - Strathclyde
University, 13th October 2004
USER
Internet
Clump Software User interface
COPAC Single, large database
Internet
Remote databases in library systems
6 CC-interop A Post Mortem - Strathclyde
University, 13th October 2004
- CC-interop Project
- CC-interop COPAC/Clumps Continuing
Technical Cooperation Project - Funded by JISC via the JISC Committee for the
Information Environment - Duration May 2002 - June 2004 (Final Report
Submitted to JISC in July 2004) - Three work packages - WP A - M25 Systems Team
MIMAS - WP B - CDLR RIDING - WP C -
CERLIM project partners
7 CC-interop A Post Mortem - Strathclyde
University, 13th October 2004
- WP A
- Thorough technical investigations of
cross-searching/linking between different
architectures - Tasks
- - Comparing how searches are carried out at
target database - - Analysis of record retrieval process
- - Performance testing
- - Detailed technical analysis of combined
architecture options
8 CC-interop A Post Mortem - Strathclyde
University, 13th October 2004
WP BUsing CAIRNS (CDLR) and RIDING clumps with
the SCONE Collection Level Description (CLD)
service for - Investigating and specifying
collection description standards
requirements - Looking at CLD schemas in
relation to both the clumps and COPAC -
Looking at the intelligent selection of databases
in clumps by CLDs, based on dynamic
landscaping - Working towards guidelines for
coping with variations in cataloguing
indexing practices to facilitate
interoperability between the clumps and COPAC
9 CC-interop A Post Mortem - Strathclyde
University, 13th October 2004
- WP C
- User Behaviour Study - area such as
- - What do users do when they search large union
catalogues? - - Do they understand what it is they are
searching? - - Do they find what they are looking for?
- - What features would they like to see?
- CERLIM (MMU)
- - 11 user sessions at 3 partner sites
- - Pre-search questionnaire
- - Recorded searches of local clump and COPAC
(Snag It) - - Interview immediately after to discuss their
experience - - 3 focus groups of librarians
- - Set of 10 questions about a range of issues
- Report available on the project web site!
10 CC-interop A Post Mortem - Strathclyde
University, 13th October 2004
- But, to what end?
- To continue work undertaken by previous JISC
funded programmes, eLib Phase 3, etc. Component
of the Research Libraries Network (RLN) - UK National Catalogue (formerly known as UKNUC)
- - Still on the JISC agenda
- - Likely to incorporate national, university and
large public libraries - - Likely to be a mix of physical and distributed
architectures - To complement the Serials Union Catalogue
- - SUNCAT project at EDINA
11 CC-interop A Post Mortem - Strathclyde
University, 13th October 2004
- WP A
- As mentioned, the primary remit of WPA was to
investigate interoperability/interlinking between
union catalogues of distributed and
non-distributed architectures - This entailed
- Investigating whether both models could be
connected (i.e. adding a clump to COPAC and vice
versa) - Investigating relevant issues pertaining to
searching performance, results issues,
landscaping, etc.
12 CC-interop A Post Mortem - Strathclyde
University, 13th October 2004
- WP A Method (cont.)
- InforM25 Copy (CC25) added as COPAC Z-target
- Deployment of JAFER as middleware Java Access
to Electronic Resources developed at Oxford for
JISC 5/99 - Free Open Source software
- Customised for the purposes of CC-interop
(Logging facilities augmented, Extensible
Stylesheet Language Transformations (XSLT),
Concatenations (mini-clump))
13 CC-interop A Post Mortem - Strathclyde
University, 13th October 2004
WP A Method (cont.) COPAC Interface Copy Enable
independent logging, etc. Results Display
Issues Detailed analysis of COPAC search result
manipulation and display issues. Could they be
applied in a distributed environment?
14 CC-interop A Post Mortem - Strathclyde
University, 13th October 2004
- Outputs Results (WP A)
- Semantic interoperability index composition
- Technical interoperability relatively easy,
but limited semantic interoperability - Disparate cataloguing indexing practices
impairing semantic interoperability (detailed
findings analysis of conclusions outlined in
the CAIRNS final report) - COPAC exploits features peculiar to physical
union models (COPAC can enrich indexing, thus
potentially improving the retrieval of relevant
records)
15 CC-interop A Post Mortem - Strathclyde
University, 13th October 2004
- Outputs Results (WP A) (cont.)
- Technical interoperability
- JAFER meets many of the needs for distributed
catalogue services could be used by the clumps.
Further exploitation of JAFER recommended in IE.
(JAFER further investigated by CREE (Contextual
Resource Evaluation Environment) as we speak) - Technically possible to landscape using JAFER as
middleware - Query reconfiguration can be carried out within
the middleware to ensure optimal searching of
different Z-targets (although this functionality
would not be necessary if there was wider
adoption of the Bath Profile)
16 CC-interop A Post Mortem - Strathclyde
University, 13th October 2004
- Outputs Results (WP A) (cont.)
- Results processing
- Problems with record matching, de-duplication,
consolidation, ranking in most distributed
services - COPAC on-the-fly routines could feasibly be
applied to the clumps (such routines would
possibly benefit from revision to reflect rapidly
changing user behaviour see WPC, work of CIBER) - Further testing is needed as the algorithms
developed by COPAC would add value to results
display - Transaction time Is a trade off is needed?
17 CC-interop A Post Mortem - Strathclyde
University, 13th October 2004
- Outputs Results (WP A) (cont.)
- Response Times
- 90 of response were received in under 1 second,
with some responding in less than 0.125 seconds
Broad fast times worthy of further
investigation - No servers showed slower response times during
what would be consider peak periods of heavy
use of the local OPAC (i.e. mid-morning to early
evening) - Generally good performance response problems
the result of non-response and how this is
handled by the client software - Further investigation short time-outs MORE
user research response times Boolean quick
dirty Z installations
18 CC-interop A Post Mortem - Strathclyde
University, 13th October 2004
Over to Gordon..
19 CC-interop A Post Mortem - Strathclyde
University, 13th October 2004
- WP B Using CAIRNS, RIDING clumps with SCONE
- Investigating and specifying collection
description standards requirements - Looking at CLD schemas in relation to both the
clumps and COPAC - Looking at the intelligent selection of
databases in clumps by CLDs, based on dynamic
landscaping - Working towards guidelines for coping with
variations in cataloguing indexing practices to
facilitate interoperability
20 CC-interop A Post Mortem - Strathclyde
University, 13th October 2004
CLD requirements
- Comparison of SCONE with other schema (IESR,
UKOLN, other clumps). - General agreement on approach (all based on
Heaneys work). - SCONE schema modified to include additional
attributes (e.g. music notations, education
levels). - Report published on similarities and differences
between schemas and Heaney, including a
comparative data dictionary. - Modified SCONE schema tested against RIDING
clump Collection Level Descriptions.
21 CC-interop A Post Mortem - Strathclyde
University, 13th October 2004
SCONE CLDs for COPAC collections
- Functional granularity define collections for
functional purposes. - Can treat aggregations of metadata as
collections same CLD schema is used in SCONE. - But SCONE only relates metadata collections to
the collections they describe inter-collection
links (super- and sub-collections) are confined
to the collections themselves (to keep it
simple!) - So aggregation relationships for metadata
(super- and sub-catalogues) can only be expressed
via the collections described, using functional
granularity. - Thus COPAC generates a COPAC collection CLD.
22 CC-interop A Post Mortem - Strathclyde
University, 13th October 2004
SCONE CLDs for COPAC collections
COPAC collection is a super-collection of member
collections (e.g. Edinburgh, Aberdeen, Glasgow
university libraries). Member collections have
their own catalogues (opacs)
COPAC collection
COPAC (catalogue)
EUL collection
EUL catalogue
Italics functional granularity CLD No italics
existing collection CLD
23 CC-interop A Post Mortem - Strathclyde
University, 13th October 2004
Scottish collection in another clump
InforM25 collection
InforM25 (clump)
GCL collection
GCL catalogue
MacColl collection
(Ewan) MacColl collection is of Scottish
interest Part of Goldsmiths College Library,
which is a member of the InforM25 clump
24 CC-interop A Post Mortem - Strathclyde
University, 13th October 2004
Hybrid union catalogues can be complex
Item-level metadata
Local catalogue
Z39.50 catalogue
Metadata repository
Harvested Union cat. B
Distributed Union cat. B
Physical Union cat.
Harvested Union cat. A
Distributed Union cat. A
UKNUC
Single item metadata can be aggregated repeatedly
in physical and distributed union catalogues,
with potentially confusing and inefficient
results for the enquirer.
25 CC-interop A Post Mortem - Strathclyde
University, 13th October 2004
Solution display nearest catalogue
COPAC collection
COPAC (catalogue)
EUL collection
EUL catalogue
Further work required to define nearest in
terms of aggregation/granularity level in
relation to collection/sub-collection, and taking
into account that catalogues are often not
completely co-extensive with the collections they
describe (i.e. not all items are catalogued).
26 CC-interop A Post Mortem - Strathclyde
University, 13th October 2004
Functional model of the (Scottish) information
environment
Entry
Initial landscape Scottish Cultural Portal
SCONE
Survey
Collection descriptions service SCONE
Landscaper
Collection-level descriptions
27 CC-interop A Post Mortem - Strathclyde
University, 13th October 2004
Functional model of the (Scottish) information
environment
Discover
Distributed union catalogue CAIRNS
Harvested union catalogue HaIRST
Union catalogue COPAC
Detail
Item metadata
Item metadata
Item metadata
Item metadata
28 CC-interop A Post Mortem - Strathclyde
University, 13th October 2004
Dynamic and static landscaping
- Intelligent selection of catalogues for
item-level searching. - Two modes of selection static (selection is
preset by system) dynamic (selection is under
control of enquirer). - Static selection is available via mini-clumps
(preset groups of Z39.50 catalogues) and
mini-landscapes (preset groups of CLDs). - Dynamic selection is browse and check facility
in CAIRNS. - Also IR facility for CLDs in SCONE title,
location town, subject (LCSH), subject strength
(RCO Conspectus), education level, language, etc. - CC-interop developed m2m interface between SCONE
and CAIRNS dynamic landscape selects Z39.50
catalogues. - Similar interface selects online (one-by-one)
catalogues (more).
29 CC-interop A Post Mortem - Strathclyde
University, 13th October 2004
Cataloguing to improve interoperability
- Variation between and within stated cataloguing
standards, and divergent practice, degrade
interoperability. - Two groups of cataloguers, including CURL,
InforM25, SCURL and others, invited to discuss at
open meetings. - Agree something should be done, but need
national framework to achieve effective results. - More communication would have significant
impact. - Furthur via SDDL (Scottish framework?),
lis-ukbibs/AACR/DDC work.