Title: A New Kind of Catalog
1A New Kind of Catalog
- Charley Pennell
- Principal Cataloger for Metadata
- North Carolina State University
- North Carolina Library Association 2007
2Where is this talk headed?
- Local motivation
- National trends
- What is Endeca?
- Features
- Does Endeca work?
- Where are we going from here?
- Where is everybody else going?
3Why a new catalog?What was wrong with the old
one?
4A little TRLN catalog primer
- TRLN libraries (Duke, NCCU, NCSU, UNC-CH) jointly
develop and maintain BIS, 1985-1992 - DRA implemented for catalog (UNC Duke continue
Acq/Serials modules), 1991-1993 - No integrated keyword/browse capability,
1993-1999 - Web2 catalog implemented, 1999-
- Sirsi DRA merge in 2002 Taos DOA
5A little TRLN catalog primer 2
- NCSU NCCU to Unicorn Duke to Aleph UNC-CH to
Millenium, 2003-2004 - Sirsi/Dynix merger, 2004 vendor focus shifts
(even more) toward school/public market - While agreeing to continue to support Web2, S/D
increasingly looking to merge all product
catalogs into single interface
6What was the catalog lacking?
- Simplicity a simple, hopefully uncluttered
interface - Interactivity ways to interact with results to
get better results - Forgiveness just fix my typos and case errors,
dont make me feel stupid! - Response time always
- Real-time sorting the limit is how many?!!
- Relevance ranking as if!
- Web services use the Web to repurpose data,
enable mash-ups, add-ons improvements
7Which interface is ready for immediate use?
8(No Transcript)
9So, why DOES everyone think that the catalog
sucks stinks?
- "Most integrated library systems, as they are
currently configured and used, should be removed
from public view." - - Roy Tennant, OCLC
10The old model
11The integrated library system
- Historically, the ILS developed as an inventory
control system for use by library staff only - First library automation systems (Plessey, CLSI,
Geac, Innovative) were designed around
circulation or acquisitions functions - Interaction time was calibrated to the slow pace
of backroom work where the audience was basically
captive - Staff focus on known-item searching, not resource
discovery
12The catalog as part of the ILS
- The first integrated OPACs were veneers on top of
existing inventory management systemspatrons
staff competed for system resources! They still
do! - First OPACs allowed for browse only early
keyword searching restricted to certain fields
(A/T/S) only - Libraries with no IT support were stuck with what
their vendor provided and the enhancement process
for improvements - Libraries with IT support created their own
systems BIS, NOTIS, Clarement Colleges,
Georgetown, PALS, DOBIS/LIBIS
13The state of the ILS in 2007
- Customer demands for increasing
functionality in a marketplace with
little to spend has
reduced the
ILS vendor pool through mergers
and buyouts - New functionality (multi-search,
ERMS, E-Ref,
ILL, etc.) increasingly
being met by stand-alone and third party
applications - Increasing competition from open source (Koha,
Evergreen, Scriblio, LibraryThing) and e-commerce - Q Is our dogged adherence to MARC the only thing
keeping the remaining ILS vendors afloat?
14The state of the catalog 2007
- Library users search expectations have been
conditioned by interactions with commercial
Websites and Google, with which Libraries can
barely afford to compete, but must - Libraries are becoming increasingly
virtual as users interact with us
online (e-resources, Second
Life) - User expectations for online
experiences are more interactive, instantaneous,
and inviting
15Perhaps most importantly
- The information resources represented in the
catalog represent a shrinking percentage of what
end users need or want
Calhouns Aristotelian vs. Copernican views of
the catalog
16What do users want from the OPAC?
- Make subject searching in online catalogs easier
using post-Boolean probabilistic searching with
automatic spelling correction, term weighting,
intelligent stemming, relevance feedback, and
output ranking - Streamline users' book selection decisions at the
catalog by adding tables of contents and
back-of-the-book indexes to cataloging (i.e.,
metadata) records - Reduce the many failed subject searches by
expanding the online catalog with full
textsjournal and newspaper articles,
encyclopedias, dissertations, government
documents, etc. Increase finding strategies in
online catalogs through the library
classification - -- Markey, Karen (2007). The online library
catalog Paradise lost and paradise regained,
D-Lib Magazine, 13(1/2).
17- Many researchers express surprise at the brevity
(from one to three words) of the queries people
submit to online systems. Belkin tells why so
few words make up their queries, "Precisely
because of the inquirer's lack of knowledge about
a problem area, it is impossible to specify what
would resolve it." For Belkin, the saving grace
is the inquirer's ability to recognize what he or
she wants or does not want during the course of
the search. Therein lies an important solution to
the probleminformation systems that report
results for easy eyeballing and instantaneous
recognition of relevant possibilities. Karen
Markey
18What is an Endeca?
19(No Transcript)
20- A software company based in Cambridge, MA
- A search and information access technology
provider for a number of major e-commerce
websites - Developers of the Endeca Information Access
Platform
21Endeca features
- Commercial-strength search/sort speeds
- Site customizable relevance ranking
- Faceted browse
- True browsing (LC classification)
- Spell-checking
- Did you mean?
- Automatic word stemming
22Endeca at NCSU Libraries
- Went live in January 2006
- Works with a text version of a daily snapshot of
Libraries MARC other metadata - Used to improve the discovery portion of the
library catalog - Interoperates with ILS for holdings, current
availability status - Web2 interface still present for known item
authority searching
23Implementation timeline
- License / negotiation Spring 2005
- Acquire Summer 2005
- Implementation
- August 2005 vendor training
- September 2005 finalize requirements
- October 2005 January 2006 design and
development - January 12, 2006 go-live date
- Widen to TRLN partners Winter 2008
24Implementation Team
- Implementation Team brought together from IT,
DLI, Cataloging, Collections, Reference,
Circulation - Worked on indexing, UI, usability testing, etc.
- Areas of contention
- Number of initial search boxes (1 or 2)
- Order, grouping of facets
- Placement of classification hierarchies,
breadcrumbs - Use of search and browse on tabs
- Visualization aided by Titos wireframes
25(No Transcript)
26Brief view vs. Full view gives user choice about
displaying holdings.
Reduces complexity of continuing and online
resources.
8th (and Final) Revision Aggregate holdings
information by library.
27NCSU Endeca features
Breadcrumbs
Call browse
Results
Facets
28Features we started with
- Faceted browse
- Availability facet
- Breadcrumbs
- Spell check / Did you mean
- Hierarchical subject browse based on LCC
- Fuzzy link to live Web2 data
- New book browse for titles added in last week
only
29Features that weve added
- New book browse based on relative date (last
week, last month, last three months) - RSS feeds based on user results
- Search within results
- Send search to TRLN partners
- Static unique link to live Web2 data
30Relevance ranking
- Based on locally customizable algorithm
- Most relevant query exactly as entered
- For multi-term searches phrase match
- Field match
- title match more relevant than notes match
- Other factors
- number of fields matched
- weighted frequency
- static ordering (publication date, circulation
stats)
31Faceting at the NCSU Libraries
- Follows on what we have learned from the
commercial Web search model - Mines metadata already available via MARC record,
local class number, ILS item categories, circ
status, and date stamping - Required massive clean-up of 6xx subdivisions
- Allows both pre- and post-coordinate limits
- Uses table mapping to enable drilling down
through call number results
32Facet refinements
- Availability
- Author
- Library
- Format
- Language
- New(ness)
- LC Classification
- Subject Topic
- Subject Genre
- Subject Region
- Subject Era
33A single facet need not represent data from a
single field
- Single Unicorn item types (Book, Kit, Manuscript,
Map, Data set) - Multiple Unicorn item types (Audio, Microform,
Thesis/Dissertation, Software Multimedia,
Videos) - Leader byte 07 (Bib lvl) Journal, Magazine
- Library (Online)
34Ranking facet results by number of postings makes
sense in a short list, but not in a long list
35The author facet is less useful in some types of
searches
36 than others!
37Technical overview
Information Access Platform
NCSU exports and reformats
Data Foundry
MDEX Engine
Parse text files
Raw MARC data
Indices
Flat text files
HTTP
HTTP
NCSU Web Application
38MARC ingest
- MARC ? flat text file(s) for ingest by Endeca.
- Transformation accomplished with MARC4J.
- Opportunity to manipulate data on the back-end.
39Transformed data
40The end result
41Other Endeca library catalogs
- Phoenix Public Library http//www.phoenixpublicli
brary.org/ - McMaster University http//libcat.mcmaster.ca
- Florida Center for Library Automation
http//catalog.fcla.edu/ - Individual Florida universities
http//fs.catalog.fcla.edu/, etc.
42Does Endeca work?
43Problems authority control
- Endeca is a keyword search engine browse can
only be effected using sort options - There is no authority control within Endeca
itself, rather it relies on AC within ILS - To make use of available metadata, subjects were
split along subdivisions. Authors were not - Talks were held with the vendor to explain the
potential for drawing on authority x-refs to
collocate searches
44Problems subject context
- Problems with wrong delimiter values (esp. v)
- Problems maintaining context in atomized LCSH
- One-way relationships
- English languagevDictionariesxSpanish
- Chronological headings devoid of geographic
context - CubaxHistoryyRevolution, 1959
- Phrase headings expressed in multiple
subdivisions - PrisonersxAbuse of
45Problems subject hierarchies
- Chronological hierarchy not built into y
- 19th century does not subsume 1800-1809,
1801-1861, 1809-1817, 1815-1861, 1817-1825, Civil
War, 1861-1865, etc. - Geological periods exist as text only
(Ordovician, Pleistocene, etc.) - Some chronological headings are expressed as text
in 650a - Middle Ages
- Nineteen sixties
- Geographic hierarchy not consistent between 651
and 650 - zNorth CarolinazRaleigh
- aRaleigh (N.C.)
- BT/NT/RT relationships from authority file
lacking
46Some potential solutions
- Search behavior education
- FAST (Faceted Application of Subject Terminology)
- Web2 x-refs to redirect searches to Endeca
- Combining z hierarchies
- Hierarchy lists
47What do our users think?
48- The new Endeca system is incredible. It would be
difficult to exaggerate how much better it is
than our old online card catalog (and therefore
that of most other universities). I've found
myself searching the catalog just for fun,
whereas before it was a chore to find what I
needed. - - NCSU Undergrad, Statistics
- The new library catalog search features are a
big improvement over the old system. Not only is
the search extremely fast, but seemingly it's
much more intelligent as well. - - NCSU faculty, Psychology
49Usability testing
50Usability testing
51Usage statistics
52Newness wearing off?
- March 06 - May 06
- July 06-January 07
53July 06 Jan 07
54July 06 Jan 07
55July 06 Jan 07
56Where are we going from here?
57Future directions
- Additional hierarchies (geographic names, dates)
- Make use of NAF, SAF, particularly
cross-reference structure - Massage underlying metadata
- Addition of Date Cataloged Done!
- Addition of LC Class numbers to e-resources
Done! - FRBR work numbers/records? Tested!
- FAST headings?
- Accommodation of true browse for all indexes
58Future opportunities
- Expanding the scope of the implementation to the
10M records in TRLN (Duke, NCCU, NCSU, UNC-Chapel
Hill) - Enrich catalog through external web services
- book jackets, reviews, TOC, etc. Amazon, OCLC.
LibraryThing, Bowker Syndetics - Build use-case based cross-application shopping
cart functionality - Integrate catalog w/other tools through web
servicesFree the Data
59Web services
60(No Transcript)
61Mobile device searching
62(No Transcript)
63Where is everybody else going?
- Catalogs detaching themselves from ILS
- Detached data lends itself to experimentation
- Dont have to throw out baby with bathwater when
better interfaces come out - Data itself safe and secure in ILS
- MARC becoming superfluous MARCs granularity
NOT! - Social interaction reviews, folksonomic tags,
ratings
64Phoenix Public Library on Endeca
65IIIs new faceted catalog, Encore
66ExLibris Primo at Vanderbilt
67Athens County, OHKoha Zoom open source
68Georgia PINESEvergreen open source
69Casey Bissons Scriblio
70Danbury Public powered by LibraryThing
71OCLC WorldCat Local at UW
72Thanks for listening!
- Charley Pennell
- Principal Cataloger for Metadata
- NCSU Libraries
- North Carolina State University
- Raleigh, NC 27695-7111
- cpennell_at_ncsu.edu
- More info at http//www.lib.ncsu.edu/endeca/