Title: Endeca and faceted browsing: Giving the user a useful catalog
1Endeca and faceted browsing Giving the user a
useful catalog
- Scott Warren
- NCSU Libraries
- South Carolina Library Association Annual Meeting
- June 7, 2007
2Outline
- Problem and Context
- Online searching, shopping, and examples
- Demo
- Faceted Navigation
- Implementation Challenges
- Facet Usage Statistics
- Reflections
3The Context
4Online Catalogs
"Most integrated library systems, as they are
currently configured and used, should be removed
from public view. - Roy Tennant, CDL
5What is the problem?
- Existing catalogs are hard to use
- known item searching works pretty well, but
- users often do keyword searching and get large
result sets returned in system sort order (last
in, etc.) - catalogs are unforgiving on spelling errors,
stemming - Authority searching completely mystifying
6Catalog metadata is buried
- Subject headings are not leveraged in searching
- they should be browsed or linked from, not
searched - Data from the item record is not leveraged
- should be able to filter by item type, location,
circulation status, popularity
7Word of the Day for Saturday, May 5, 2007
- moil \MOYL\, intransitive verb
- 1. To work with painful effort to labor to
toil to drudge.2. To churn or swirl about
continuously.3. Toil hard work drudgery.4.
Confusion turmoil.
8Whats the big picture?
- Improve the quality of the library catalog user
experience. - Exploit our existing metadata infrastructure
(make MARC work harder). - Build a more flexible catalog tool that can be
integrated with discovery tools of the future.
9What is Endeca?
- Software company based in Cambridge, MA
- Search/information access technology provider for
a number of major e-commerce websites - Developers of the Endeca Information Access
Platform
10Why Endeca?
- Customized relevance ranking of results
- Better subject access by leveraging available
metadata through facets - Improved response time
- Enhanced natural language searching through spell
correction, etc. - Browse
11A question
- How is the new generation of library catalog
being developed? - informed and enhanced by search technologies
developed outside of the library - based on how our users know how to search, not on
how we want them to search - What does search look like for our users?
12Examples
13(No Transcript)
14(No Transcript)
15Faceted Navigation on the Web
16(No Transcript)
17Facet
Value
18(No Transcript)
19(No Transcript)
20(No Transcript)
21Faceted Navigation in Libraries
22Faceted Navigation in Libraries
23Faceted Navigation in Libraries
24Demonstration
25Faceted Navigation
26What is Faceted Navigation?
27What is Faceted Navigation?
- Search and browse in a single interface
- Facets can vary in scope
- What is the item about?
- What kind of item is it?
- Where is it?
- Enables users to narrow results
- Macroscopic behavior of results set
- Clues to being on the right path
28Origins of Facets
- 1930s Ranganathan
- Colon Classification
29Cartesian Coordinates
30Coordinate System
Format
(x, y, z) (Library, LCSH,
Format) (Branch 1, History, Book) (Branch 2,
History, DVD) Multiple records could
be associated with each coordinate point. Each
point is associated with at least one record.
(Branch 1, History, Book)
Book
LCSH
DVD
Art
History
Branch 1
Branch 2
Library
31Another way to think about it
- 11 dimensional lattice space
- All points associated with at least one
item/record - Records can be associated with gt 1 point
- Keyword search selects subset of points with
word(s) in record - Facets shown are those dimensions corresponding
to the points in that set (nonzero values). - Choosing a facet value is equivalent to slicing
through the multidimensional lattice on a plane
along that facet value and reducing the lattices
dimension by 1. - Choose enough facets and you will get down to a
few items (never a null set)
32Implementation
33Implementation Challenges
- Facet selection
- Interface design
- Data issues
34Endeca at NCSU
- Endeca used to improve the discovery portion of
the library catalog - Endeca software indexes 1.6 million MARC records
exported nightly from Sirsi Unicorn ILS - Backend functions of ILS remain intact
35Facets Implemented at NCSU
- Availability
- Author
- Library
- Format
- Language
- Browse New
- LC Classification
- Subject Topic
- Subject Genre
- Subject Region
- Subject Era
36Facet Selection
37Interface Design
- Iterative approach using wireframes
- Eight major revisions in a four month period
- Still lots of room for improvement
38Technical Overview
- Endeca co-exists with SirsiDynix Unicorn ILS and
Web2 online catalog - Endeca handles keyword search
- Web2 handles authority search and detail page
display - Endeca indexes MARC records exported nightly from
Unicorn - Endeca discovery portion of the ILS
39Technical Overview
Information Access Platform
Data Foundry
NCSU exports and reformats
MDEX Engine
Parse text files
Raw MARC data
Indices
Flat text files
HTTP
HTTP
NCSU Web Application
40Technical Overview
Offline - Nightly
NCSU exports and reformats
Data Foundry
MDEX Engine
Parse text files
Raw MARC data
Indices
Flat text files
HTTP
HTTP
NCSU Web Application
41Technical Overview
Always Online
NCSU exports and reformats
Data Foundry
MDEX Engine
Parse text files
Raw MARC data
Indices
Flat text files
HTTP
HTTP
NCSU Web Application
42Implementation Team
- Seven member team
- 5 IT staff,
- 1 cataloging librarian,
- 1 reference librarian
- Timeline
- License / negotiation Spring 2005
- Software acquisition Summer 2005
- Implementation Aug 2005 to Jan 2006
43Data Issues
- ILS data with MARC-8 encoding gt Text data with
UTF-8 encoding - Data consistency between ILS and Endeca catalog
indexes (updates!) - Data issues revealed by exposing metadata (ex
subject headings) in facets
44Outcomes
45Added search tools
- Automatic spell correction
- Did you mean suggestions
- Automatic stemming
- Bookmark-ability
46True browse
- Regain ability to browse catalog without entering
any search terms
47July 06 Jan 07
48July 06 Jan 07
49July 06 Jan 07
50July 06 Jan 07
51July 06 Jan 07
52Dimension Value Requests
New NEW 56,286
Format Book 16,188
LC Classification Q - Science 12,462
Library Textiles 11,160
Library D.H. Hill 11,060
Availability Available 9,276
Library Online Resources 8,164
LC Classification T Technology 8,052
Subject Topic History 7,915
Format Online 7,858
LC Classification P - Language and literature 7,005
LC Classification H - Social Sciences 6,953
Language English 6,854
Subject Region United States 6,298
Format Journal, Magazine, or Serial 4,621
53Usability testing
- 10 undergraduate students
- 5 with new Endeca-based interface
- 5 with old catalog interface
- Identical searching tasks
- Data collected
- Task difficulty/failure
- Task duration
54Usability testing
55Usability testing
56Usability testing
- For students, relevance ranking is key.
- July 06 Jan 07 19 continued to page 2
- Faceted navigation is intuitive, even for
students who dont use it. - Beware of library jargon
- keyword anywhere, keyword in subject
- User behavior is influenced by previous
experience.
57Reflections
- Faceted navigation enables new ways to discovery
resources - Library collections often contain rich
descriptive metadata exploit this! - We have much to learn about how to optimize these
interfaces for the user - Great for collection analysis
58Analyzing collections
59Conclusions
60Features Not Supported
- Work level aggregations / roll-up
- Customization / personalization
- Folksonomies / user contributed content
- Recommender functionality
- Shopping cart functionality
61QuickSearch
62Future directions
- Experiment with FRBR search/display through
partnership with OCLC. - Integrate catalog w/other tools through web
services - OpenSearch, RSS
- Enrich catalog through external web services
- book jackets, reviews, etc. Amazon/OCLC
- Build modular shopping cart functionality.
- Use Endeca to index local collections.
63Big Issues
- Benchmarking
- Just how much better is it? For whom? When is it
not better? - Natural Language
- Revolutionary War problem
- Experimenting
- What is the optimal interface?
- Power Search?
64Big Wins
- Relevance ranking
- Speed / performance
- Locally managed presentation interface
- Persistent parameter based entry points
- Proving it could be done