A New Kind of Catalog - PowerPoint PPT Presentation

About This Presentation
Title:

A New Kind of Catalog

Description:

Forgiveness: just fix my typos and case errors, don't make me feel stupid! ... item types (Audio, Microform, Thesis/Dissertation, Software & Multimedia, Videos) ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 73
Provided by: charley150
Learn more at: https://www.lib.ncsu.edu
Category:
Tags: catalog | kind | new | stupid | videos

less

Transcript and Presenter's Notes

Title: A New Kind of Catalog


1
A New Kind of Catalog
  • Charley Pennell
  • Principal Cataloger for Metadata
  • North Carolina State University
  • North Carolina Library Association 2007

2
Where is this talk headed?
  • Local motivation
  • National trends
  • What is Endeca?
  • Features
  • Does Endeca work?
  • Where are we going from here?
  • Where is everybody else going?

3
Why a new catalog?What was wrong with the old
one?
4
A little TRLN catalog primer
  • TRLN libraries (Duke, NCCU, NCSU, UNC-CH) jointly
    develop and maintain BIS, 1985-1992
  • DRA implemented for catalog (UNC Duke continue
    Acq/Serials modules), 1991-1993
  • No integrated keyword/browse capability,
    1993-1999
  • Web2 catalog implemented, 1999-
  • Sirsi DRA merge in 2002 Taos DOA

5
A little TRLN catalog primer 2
  • NCSU NCCU to Unicorn Duke to Aleph UNC-CH to
    Millenium, 2003-2004
  • Sirsi/Dynix merger, 2004 vendor focus shifts
    (even more) toward school/public market
  • While agreeing to continue to support Web2, S/D
    increasingly looking to merge all product
    catalogs into single interface

6
What was the catalog lacking?
  • Simplicity a simple, hopefully uncluttered
    interface
  • Interactivity ways to interact with results to
    get better results
  • Forgiveness just fix my typos and case errors,
    dont make me feel stupid!
  • Response time always
  • Real-time sorting the limit is how many?!!
  • Relevance ranking as if!
  • Web services use the Web to repurpose data,
    enable mash-ups, add-ons improvements

7
Which interface is ready for immediate use?
8
(No Transcript)
9
So, why DOES everyone think that the catalog
sucks stinks?
  • "Most integrated library systems, as they are
    currently configured and used, should be removed
    from public view."
  • - Roy Tennant, OCLC

10
The old model
11
The integrated library system
  • Historically, the ILS developed as an inventory
    control system for use by library staff only
  • First library automation systems (Plessey, CLSI,
    Geac, Innovative) were designed around
    circulation or acquisitions functions
  • Interaction time was calibrated to the slow pace
    of backroom work where the audience was basically
    captive
  • Staff focus on known-item searching, not resource
    discovery

12
The catalog as part of the ILS
  • The first integrated OPACs were veneers on top of
    existing inventory management systemspatrons
    staff competed for system resources! They still
    do!
  • First OPACs allowed for browse only early
    keyword searching restricted to certain fields
    (A/T/S) only
  • Libraries with no IT support were stuck with what
    their vendor provided and the enhancement process
    for improvements
  • Libraries with IT support created their own
    systems BIS, NOTIS, Clarement Colleges,
    Georgetown, PALS, DOBIS/LIBIS

13
The state of the ILS in 2007
  • Customer demands for increasing
    functionality in a marketplace with
    little to spend has
    reduced the
    ILS vendor pool through mergers
    and buyouts
  • New functionality (multi-search,
    ERMS, E-Ref,
    ILL, etc.) increasingly
    being met by stand-alone and third party
    applications
  • Increasing competition from open source (Koha,
    Evergreen, Scriblio, LibraryThing) and e-commerce
  • Q Is our dogged adherence to MARC the only thing
    keeping the remaining ILS vendors afloat?

14
The state of the catalog 2007
  • Library users search expectations have been
    conditioned by interactions with commercial
    Websites and Google, with which Libraries can
    barely afford to compete, but must
  • Libraries are becoming increasingly
    virtual as users interact with us
    online (e-resources, Second
    Life)
  • User expectations for online
    experiences are more interactive, instantaneous,
    and inviting

15
Perhaps most importantly
  • The information resources represented in the
    catalog represent a shrinking percentage of what
    end users need or want

Calhouns Aristotelian vs. Copernican views of
the catalog
16
What do users want from the OPAC?
  • Make subject searching in online catalogs easier
    using post-Boolean probabilistic searching with
    automatic spelling correction, term weighting,
    intelligent stemming, relevance feedback, and
    output ranking
  • Streamline users' book selection decisions at the
    catalog by adding tables of contents and
    back-of-the-book indexes to cataloging (i.e.,
    metadata) records
  • Reduce the many failed subject searches by
    expanding the online catalog with full
    textsjournal and newspaper articles,
    encyclopedias, dissertations, government
    documents, etc. Increase finding strategies in
    online catalogs through the library
    classification
  • -- Markey, Karen (2007). The online library
    catalog Paradise lost and paradise regained,
    D-Lib Magazine, 13(1/2).

17
  • Many researchers express surprise at the brevity
    (from one to three words) of the queries people
    submit to online systems. Belkin tells why so
    few words make up their queries, "Precisely
    because of the inquirer's lack of knowledge about
    a problem area, it is impossible to specify what
    would resolve it." For Belkin, the saving grace
    is the inquirer's ability to recognize what he or
    she wants or does not want during the course of
    the search. Therein lies an important solution to
    the probleminformation systems that report
    results for easy eyeballing and instantaneous
    recognition of relevant possibilities. Karen
    Markey

18
What is an Endeca?
19
(No Transcript)
20
  • A software company based in Cambridge, MA
  • A search and information access technology
    provider for a number of major e-commerce
    websites
  • Developers of the Endeca Information Access
    Platform

21
Endeca features
  • Commercial-strength search/sort speeds
  • Site customizable relevance ranking
  • Faceted browse
  • True browsing (LC classification)
  • Spell-checking
  • Did you mean?
  • Automatic word stemming

22
Endeca at NCSU Libraries
  • Went live in January 2006
  • Works with a text version of a daily snapshot of
    Libraries MARC other metadata
  • Used to improve the discovery portion of the
    library catalog
  • Interoperates with ILS for holdings, current
    availability status
  • Web2 interface still present for known item
    authority searching

23
Implementation timeline
  • License / negotiation Spring 2005
  • Acquire Summer 2005
  • Implementation
  • August 2005 vendor training
  • September 2005 finalize requirements
  • October 2005 January 2006 design and
    development
  • January 12, 2006 go-live date
  • Widen to TRLN partners Winter 2008

24
Implementation Team
  • Implementation Team brought together from IT,
    DLI, Cataloging, Collections, Reference,
    Circulation
  • Worked on indexing, UI, usability testing, etc.
  • Areas of contention
  • Number of initial search boxes (1 or 2)
  • Order, grouping of facets
  • Placement of classification hierarchies,
    breadcrumbs
  • Use of search and browse on tabs
  • Visualization aided by Titos wireframes

25
(No Transcript)
26
Brief view vs. Full view gives user choice about
displaying holdings.
Reduces complexity of continuing and online
resources.
8th (and Final) Revision Aggregate holdings
information by library.
27
NCSU Endeca features
Breadcrumbs
Call browse
Results
Facets
28
Features we started with
  • Faceted browse
  • Availability facet
  • Breadcrumbs
  • Spell check / Did you mean
  • Hierarchical subject browse based on LCC
  • Fuzzy link to live Web2 data
  • New book browse for titles added in last week
    only

29
Features that weve added
  • New book browse based on relative date (last
    week, last month, last three months)
  • RSS feeds based on user results
  • Search within results
  • Send search to TRLN partners
  • Static unique link to live Web2 data

30
Relevance ranking
  • Based on locally customizable algorithm
  • Most relevant query exactly as entered
  • For multi-term searches phrase match
  • Field match
  • title match more relevant than notes match
  • Other factors
  • number of fields matched
  • weighted frequency
  • static ordering (publication date, circulation
    stats)

31
Faceting at the NCSU Libraries
  • Follows on what we have learned from the
    commercial Web search model
  • Mines metadata already available via MARC record,
    local class number, ILS item categories, circ
    status, and date stamping
  • Required massive clean-up of 6xx subdivisions
  • Allows both pre- and post-coordinate limits
  • Uses table mapping to enable drilling down
    through call number results

32
Facet refinements
  • Availability
  • Author
  • Library
  • Format
  • Language
  • New(ness)
  • LC Classification
  • Subject Topic
  • Subject Genre
  • Subject Region
  • Subject Era

33
A single facet need not represent data from a
single field
  • Single Unicorn item types (Book, Kit, Manuscript,
    Map, Data set)
  • Multiple Unicorn item types (Audio, Microform,
    Thesis/Dissertation, Software Multimedia,
    Videos)
  • Leader byte 07 (Bib lvl) Journal, Magazine
  • Library (Online)

34
Ranking facet results by number of postings makes
sense in a short list, but not in a long list
35
The author facet is less useful in some types of
searches
36
than others!
37
Technical overview
Information Access Platform
NCSU exports and reformats
Data Foundry
MDEX Engine
Parse text files
Raw MARC data
Indices
Flat text files
HTTP
HTTP
NCSU Web Application
38
MARC ingest
  • MARC ? flat text file(s) for ingest by Endeca.
  • Transformation accomplished with MARC4J.
  • Opportunity to manipulate data on the back-end.

39
Transformed data
40
The end result
  • Video

41
Other Endeca library catalogs
  • Phoenix Public Library http//www.phoenixpublicli
    brary.org/
  • McMaster University http//libcat.mcmaster.ca
  • Florida Center for Library Automation
    http//catalog.fcla.edu/
  • Individual Florida universities
    http//fs.catalog.fcla.edu/, etc.

42
Does Endeca work?
43
Problems authority control
  • Endeca is a keyword search engine browse can
    only be effected using sort options
  • There is no authority control within Endeca
    itself, rather it relies on AC within ILS
  • To make use of available metadata, subjects were
    split along subdivisions. Authors were not
  • Talks were held with the vendor to explain the
    potential for drawing on authority x-refs to
    collocate searches

44
Problems subject context
  • Problems with wrong delimiter values (esp. v)
  • Problems maintaining context in atomized LCSH
  • One-way relationships
  • English languagevDictionariesxSpanish
  • Chronological headings devoid of geographic
    context
  • CubaxHistoryyRevolution, 1959
  • Phrase headings expressed in multiple
    subdivisions
  • PrisonersxAbuse of

45
Problems subject hierarchies
  • Chronological hierarchy not built into y
  • 19th century does not subsume 1800-1809,
    1801-1861, 1809-1817, 1815-1861, 1817-1825, Civil
    War, 1861-1865, etc.
  • Geological periods exist as text only
    (Ordovician, Pleistocene, etc.)
  • Some chronological headings are expressed as text
    in 650a
  • Middle Ages
  • Nineteen sixties
  • Geographic hierarchy not consistent between 651
    and 650
  • zNorth CarolinazRaleigh
  • aRaleigh (N.C.)
  • BT/NT/RT relationships from authority file
    lacking

46
Some potential solutions
  • Search behavior education
  • FAST (Faceted Application of Subject Terminology)
  • Web2 x-refs to redirect searches to Endeca
  • Combining z hierarchies
  • Hierarchy lists

47
What do our users think?
48
  • The new Endeca system is incredible. It would be
    difficult to exaggerate how much better it is
    than our old online card catalog (and therefore
    that of most other universities). I've found
    myself searching the catalog just for fun,
    whereas before it was a chore to find what I
    needed.
  • - NCSU Undergrad, Statistics
  • The new library catalog search features are a
    big improvement over the old system. Not only is
    the search extremely fast, but seemingly it's
    much more intelligent as well.
  • - NCSU faculty, Psychology

49
Usability testing
50
Usability testing
51
Usage statistics
52
Newness wearing off?
  • March 06 - May 06
  • July 06-January 07

53
July 06 Jan 07
54
July 06 Jan 07
55
July 06 Jan 07
56
Where are we going from here?
57
Future directions
  • Additional hierarchies (geographic names, dates)
  • Make use of NAF, SAF, particularly
    cross-reference structure
  • Massage underlying metadata
  • Addition of Date Cataloged Done!
  • Addition of LC Class numbers to e-resources
    Done!
  • FRBR work numbers/records? Tested!
  • FAST headings?
  • Accommodation of true browse for all indexes

58
Future opportunities
  • Expanding the scope of the implementation to the
    10M records in TRLN (Duke, NCCU, NCSU, UNC-Chapel
    Hill)
  • Enrich catalog through external web services
  • book jackets, reviews, TOC, etc. Amazon, OCLC.
    LibraryThing, Bowker Syndetics
  • Build use-case based cross-application shopping
    cart functionality
  • Integrate catalog w/other tools through web
    servicesFree the Data

59
Web services
60
(No Transcript)
61
Mobile device searching
62
(No Transcript)
63
Where is everybody else going?
  • Catalogs detaching themselves from ILS
  • Detached data lends itself to experimentation
  • Dont have to throw out baby with bathwater when
    better interfaces come out
  • Data itself safe and secure in ILS
  • MARC becoming superfluous MARCs granularity
    NOT!
  • Social interaction reviews, folksonomic tags,
    ratings

64
Phoenix Public Library on Endeca
65
IIIs new faceted catalog, Encore
66
ExLibris Primo at Vanderbilt
67
Athens County, OHKoha Zoom open source
68
Georgia PINESEvergreen open source
69
Casey Bissons Scriblio
70
Danbury Public powered by LibraryThing
71
OCLC WorldCat Local at UW
72
Thanks for listening!
  • Charley Pennell
  • Principal Cataloger for Metadata
  • NCSU Libraries
  • North Carolina State University
  • Raleigh, NC 27695-7111
  • cpennell_at_ncsu.edu
  • More info at http//www.lib.ncsu.edu/endeca/
Write a Comment
User Comments (0)
About PowerShow.com