The Failure of Clustering in Search Interfaces

About This Presentation

Title:

The Failure of Clustering in Search Interfaces

Description:

The Failure of Clustering in Search Interfaces or When/How/Why Clustering can be Successful in Search Interfaces Marti Hearst UC Berkeley Oct 6, 2004 – PowerPoint PPT presentation

Number of Views:162

Avg rating:3.0/5.0

Slides: 99

Provided by: peopleIsc4

Learn more at: https://people.ischool.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: The Failure of Clustering in Search Interfaces

1
The Failure of Clustering in Search Interfaces
orWhen/How/Why Clustering can be Successful
in Search Interfaces
Marti Hearst UC Berkeley Oct 6, 2004
http//www.sims.berkeley.edu/hearst
2
Main Points

Grouping search results is desirable
However, getting good groups is difficult
Furthermore, incorporation of groups into
interfaces has not been done well
Good news improvements are happening

3
Talk Outline

Why search interfaces are difficult to define
Definition of categories and clusters
Studies showing failure of clustering in
interfaces
A new development in clustering in web search
How to remedy these problems

4
Clustering Interface Problems

Big problem
Clusters used primarily as part of a
visualization
This just doesnt work
Every usability study says so
Lots of dots scattered about the screen is
meaningless to users
There is no inherent spatial relationship among
the documents
Need text to understand content
Another big problem
Clustering images according to an approximation
of visual similarity
This just doesnt work
What limited studies have been done say so
Instead group according to textual categories

5
Search interfaces are difficult to design

Content and queries are hugely varying
The scope of what people search for is all of
human knowledge and experience (!)
Interfaces must accommodate human differences in
Knowledge / life experience
Cultural background and expectations
Reading / scanning ability and style
Methods of looking for things (pilers vs. filers)

6
Abstractions Are Difficult to Represent

Text describes abstract concepts
Difficult to show the contents of text in a
visual or compact manner
Exercise
How would you show the preamble of the US
Constitution visually?
How would you show the contents of Joyces
Ulysses visually? How would you distinguish it
from Homers The Odyssey or McCourts Angelas
Ashes?
The point it is difficult to show text without
using text

7
Lack of Technical Understanding

Most people dont understand the underlying
methods by which search engines work.
Without appropriate explanations, most of 14
people had strong misconceptions about
ANDing vs ORing of search terms
Some assumed ANDing search engine indexed a
smaller collection most had no explanation at
all
For empty results for query to be or not to be
9 of 14 could not explain in a method that
remotely resembled stop word removal
For term order variation boat fire vs. fire
boat
Only 5 out of 14 expected different results

Muramatsu Pratt, Transparent Queries
Investigating Users Mental Models of Search
Engines, SIGIR 2001.
8
Other Issues

Vocabulary Disconnect
If you ask a set of people to describe a set of
things there is little overlap in the results.
If one person assigns a name, the probability of
it NOT matching with another persons is about
80
It is difficult to represent content compactly
Small details matter
People are reluctant to change search interfaces

Furnas, et al The Vocabulary Problem in
Human-System Communication. Commun. ACM 30(11)
964-971 (1987)
9
The Need to Group

Interviews with lay users often reveal a desire
for better organization of retrieval results
Useful for suggesting where to look next
People prefer links over generating search terms
But only when the links are for what they want
Two main approaches for text and images
Group items according to pre-defined categories
Group items into automatically-created clusters

Ojakaar and Spool, Users Continue After Category
Links, UIETips Newsletter, http//world.std.com/u
ieweb/Articles/, 2001
10
Categories

Human-created
But often automatically assigned to items
Arranged in hierarchy, network, or facets
Can assign multiple categories to items
Or place items within categories
Usually restricted to a fixed set
So help reduce the space of concepts
Intended to be readily understandable
To those who know the underlying domain
Provide a novice with a conceptual structure
There are many already made up!
However, until recently, their use in interfaces
has been
Under-investigated
Not met their promise

11
Category System Examples
12
Category System Examples
13
Category System Examples
eat.epicurious.com
14
Category System Examples
eat.epicurious.com
15
Example of Faceted MetadataMedical Subject
Headings (MeSH)

Facets
1. Anatomy A
2. Organisms B
3. Diseases C
4. Chemicals and Drugs D
5. Analytical, Diagnostic and Therapeutic
Techniques and Equipment E
6. Psychiatry and Psychology F
7. Biological Sciences G
8. Physical Sciences H
9. Anthropology, Education, Sociology and
Social Phenomena I
10. Technology and Food and Beverages J
11. Humanities K
12. Information Science L
13. Persons M
14. Health Care N
15. Geographic Locations Z

16
Each Facet Has Hierarchy

1. Anatomy A Body Regions A01
2. B
Musculoskeletal System A02
3. C Digestive
System A03
4. D Respiratory
System A04
5. E Urogenital
System A05
6. F
7. G
8. Physical Sciences H
9. I
10. J
11. K
12. L
13. M

17
Clustering

The art of finding groups in data
Kaufman and Rousseeuw
Groups are formed according to associations and
commonalities among the datas features.
There are dozens of algorithms, more all the time
Most need a way of determing similarity or
difference between a pair of items
In text clustering, documents usually represented
as a vector of weighted features which are some
transformation on the words
Similarity between documents is a weighted
measure of feature overlap

18
Clustering

Potential benefits
Find the main themes in a set of documents
Potentially useful if the user wants a summary of
the main themes in the subcollection
Potentially harmful if the user is interested in
less dominant themes
More flexible than pre-defined categories
There may be important themes that have not been
anticipated
Disambiguate ambiguous terms
ACL
Clustering retrieved documents tends to group
those relevant to a complex query together

Hearst, Pedersen, Revisiting the Cluster
Hypothesis, SIGIR96
19
Scatter/Gather Clustering

Developed at PARC in the late 80s/early 90s
Top-down approach
Start with k seeds (documents) to represent k
clusters
Each document assigned to the cluster with the
most similar seeds
To choose the seeds
Cluster in a bottom-up manner
Hierarchical agglomerative clustering
Start with n documents, compare all by pairwise
similarity, combine the two most similar
documents to make a cluster
Now compare both clusters and individual
documents to find the most similar pair to
combine
Continue until k clusters remain
Use the centroid of each of these as seeds
Centroid average of the weighted vectors
Can recluster a cluster to produce a hierarchy of
clusters

Pedersen, Cutting, Karger, Tukey, Scatter/Gather
A Cluster-based Approach to Browsing Large
Document Collections, SIGIR 1992
20
(No Transcript)
21
The Scatter/Gather Interface
22
S/G Example query on star

Encyclopedia text
14 sports
8 symbols 47 film, tv
68 film, tv (p) 7 music
97 astrophysics
67 astronomy(p) 12 steller phenomena
10 flora/fauna 49 galaxies, stars
29 constellations
7 miscelleneous
Clustering and re-clustering is entirely
automated

23
(No Transcript)
24
(No Transcript)
25
S/G Example query on star

Newspaper/Magazine text
22 products / business
41 software / computers 35 hollywood
58 restaurants / food (reviews) 54
astronomers/movies
98 movies / tv (reviews) 9 film mini-reviews
31 wall street / finance
Topics quite different from encyclopedia text

26
Two Queries Two Clusterings
The main differences are the clusters that are
central to the query
27
Clustering ExampleMedical Text

Query mastectomy on a breast cancer collection
250 documents retrieved
Summary of cluster themes (subjective)
prophylactic mastectomy (preventative)
prostheses and reconstruction
conservative vs radical surgery
side effects of surgery
psychological effects of surgery
The first two clusters found themes for which
there was no corresponding MESH category

Hearst, The Use of Categories and Clusters for
Organizing Retrieval Results, in Natural Language
Information Retrieval, Kluwer, 1999
28
A Clustering Failure

Query implant and prosthesis
Four clusters returned
use of implants to administer radiation dosages
complications resulting from breast implants
other issues surrounding breast implants
other kinds of prostheses
Reclustering clusters 2 and 3 does not find
cohesive subgroups
An examination of the documents indicates that a
valid subdivision was possible
type of surgical procedure
risk factors
This seems to happen when there are too many
features in common
Perhaps a better clustering algorithm can help in
this case

29
Clustering Algorithm Problems

Doesnt work well if data is too homogenous or
too heterogeneous
Often is difficult to interpret quickly
Automatically generated labels are unintuitive
and occur at different levels of description
Often the top-level can be ok, but the subsequent
levels are very poor
Need a better way to handle items that fall into
more than one cluster

30
Visualizing Clustering Results

Use clustering to map the entire huge
multidimensional document space into a huge
number of small clusters.
User dimension reduction and then project these
onto a 2D/3D graphical representation

31
Clustering Multi-Dimensional Document
Space(image from Wise et al 95)
32
Clustering Multi-Dimensional Document
Space(image from Wise et al 95)
33
Kohonen Feature Maps on Text(from Chen et al.,
JASIS 49(7))
34
Is it useful?

4 Clustering Visualization Usability Studies

35
Clustering for Search Study 1

This study compared
a system with 2D graphical clusters
a system with 3D graphical clusters
a system that shows textual clusters
Novice users
Only textual clusters were helpful (and they were
difficult to use well)

Kleiboemer, Lazear, and Pedersen. Tailoring a
retrieval system for naive users. SDAIR96
36
Clustering Study 2 Kohonen Feature Maps

Comparison Kohonen Map and Yahoo
Task
Window shop for interesting home page
Repeat with other interface
Results
Starting with map could repeat in Yahoo (8/11)
Starting with Yahoo unable to repeat in map (2/14)

Chen, Houston, Sewell, Schatz, Internet Browsing
and Searching User Evaluations of Category Map
and Concept Space Techniques. JASIS 49(7)
582-603 (1998)
37
Kohonen Feature Maps(Lin 92, Chen et al. 97)
38
Study 2 (cont.)

Participants liked
Correspondence of region size to documents
Overview (but also wanted zoom)
Ease of jumping from one topic to another
Multiple routes to topics
Use of category and subcategory labels

Chen, Houston, Sewell, Schatz, Internet Browsing
and Searching User Evaluations of Category Map
and Concept Space Techniques. JASIS 49(7)
582-603 (1998)
39
Study 2 (cont.)

Participants wanted
hierarchical organization
other ordering of concepts (alphabetical)
integration of browsing and search
correspondence of color to meaning
more meaningful labels
labels at same level of abstraction
fit more labels in the given space
combined keyword and category search
multiple category assignment (sportsentertain)
(These can all be addressed with faceted
hierarchical categories)

Chen, Houston, Sewell, Schatz, Internet Browsing
and Searching User Evaluations of Category Map
and Concept Space Techniques. JASIS 49(7)
582-603 (1998)
40
Clustering Study 3 NIRVE

Each rectangle is a cluster. Larger clusters
closer to the pole. Similar clusters near one
another. Opening a cluster causes a projection
that shows the titles.

41
Study 3

This study compared
3D graphical clusters
2D graphical clusters
textual clusters
15 participants, between-subject design
Tasks
Locate a particular document
Locate and mark a particular document
Locate a previously marked document
Locate all clusters that discuss some topic
List more frequently represented topics

Visualization of search results a comparative
evaluation of text, 2D, and 3D interfaces
Sebrechts, Cugini, Laskowski, Vasilakis and
Miller, SIGIR 99.
42
Study 3

Results (time to locate targets)
Text clusters fastest
2D next
3D last
With practice (6 sessions) 2D neared text
results 3D still slower
Computer experts were just as fast with 3D
Certain tasks equally fast with 2D text
Find particular cluster
Find an already-marked document
But anything involving text (e.g., find title)
much faster with text.
Spatial location rotated, so users lost context
Helpful viz features
Color coding (helped text too)
Relative vertical locations

Visualization of search results a comparative
evaluation of text, 2D, and 3D interfaces
Sebrechts, Cugini, Laskowski, Vasilakis and
Miller, SIGIR 99.
43
Clustering Study 4

Compared several factors
Findings
Topic effects dominate (this is a common finding)
Strong difference in results based on spatial
ability
No difference between librarians and other people
No evidence of usefulness for the cluster
visualization

Aspect windows, 3-D visualizations, and indirect
comparisons of information retrieval systems,
Swan, Allan, SIGIR 1998.
44
SummaryVisualizing for Search Using Clusters

Huge 2D maps may be inappropriate focus for
information retrieval
cannot see what the documents are about
space is difficult to browse for IR purposes
(tough to visualize abstract concepts)
Perhaps more suited for pattern discovery and
gist-like overviews

45
How do people want to search and browse images?

Ethnographic studies of people who use images
intensely find
Find specific objects is easy
Find images of the Empire State Building
Browsing is hard
In a usability study with architects, to our
surprise we found their response to an
image-browsing interface mock-up was they wanted
to see more text (categories).

Elliott, A. (2001). "Flamenco Image Browser
Using Metadata to Improve Image Search During
Architectural Design," in the Proceedings of CHI
2001.
46
Clustering in Image Search

Using Visual Content
Extract color, texture, shape
QBIC (Flickner et al. 95)
Blobworld (Carson et al. 99)
Body Plans (Forsyth Fleck 00)
Piction images text (Srihari et al. 91 99)
Two uses
Show a clustered similarity space
Show those images similar to a selected one

47
K. Rodden, Evaluating Similarity-Based
Visualisations as Interfaces for Image Browsing,
PhD thesis, 2001 K. Rodden, W. Basalaj, D.
Sinclair, and K. Wood, Does Organisation by
Similarity Assist Image Browsing?, CHI 2001
48
K. Rodden, Evaluating Similarity-Based
Visualisations as Interfaces for Image Browsing,
PhD thesis, 2001 K. Rodden, W. Basalaj, D.
Sinclair, and K. Wood, Does Organisation by
Similarity Assist Image Browsing?, CHI 2001
49
K. Rodden, Evaluating Similarity-Based
Visualisations as Interfaces for Image Browsing,
PhD thesis, 2001 K. Rodden, W. Basalaj, D.
Sinclair, and K. Wood, Does Organisation by
Similarity Assist Image Browsing?, CHI 2001
50
Image Clustering Study Results

Searching was faster with the random arrangement
Preference for the clustered arrangement was not
overwhelming stronger than random
2 out of 10 participants prefered random and 3
had no preference
Median satisfaction for clustered was 4.5 and for
random was 4.0

K. Rodden, Evaluating Similarity-Based
Visualisations as Interfaces for Image Browsing,
PhD thesis, 2001 K. Rodden, W. Basalaj, D.
Sinclair, and K. Wood, Does Organisation by
Similarity Assist Image Browsing?, CHI 2001
51
An Alternative

In the Flamenco project, we have shown that
hierarchical faceted metadata, paired with a good
interface, is highly effective for browsing image
collections
Flamenco.berkeley.edu
(But thats a different talk)

52
Study 5 Comparing Textual Cluster Interfaces to
Category Interfaces

DynaCat system
Decide on important question types in an advance
What are the adverse effects of drug D?
What is the prognosis for treatment T?
Make use of MeSH categories
Retain only those types of categories known to be
useful for this type of query.

Pratt, W., Hearst, M, and Fagan, L. A
Knowledge-Based Approach to Organizing Retrieved
Documents. AAAI-99
53
DynaCat Interface
Pratt, W., Hearst, M, and Fagan, L. A
Knowledge-Based Approach to Organizing Retrieved
Documents. AAAI-99
54
DynaCat Study

Design
Three queries
24 cancer patients
Compared three interfaces
ranked list, clusters, categories
Results
Participants strongly preferred categories
Participants found more answers using categories
Participants took same amount of time with all
three interfaces

Pratt, W., Hearst, M, and Fagan, L. A
Knowledge-Based Approach to Organizing Retrieved
Documents. AAAI-99
55
Study 6 Categories vs. Lists

One study found users prefered one level of
categories over lists, and were faster at finding
answers
Only 13 top-level categories shown
Secondary-level categories not very accurate
However, the queries appeared to be somewhat
setup to optimize the usefulness of the clusters
Example
Query word indian
Task find indian motorcyles
Query alaska
Task find yatching adventures in alaska

Chen, Dumais, Bringing order to the web
Automatically categorizing search results. CHI
2000
56
What about Textual Displays of Clusters?

Text-based clustering is more promising
Text-based clustering on the Web
In the early days, Excite had a mockup on about
10 documents that pretended to do Scatter/Gather
(when it was called Architext)
Quickly removed it and started providing standard
search
For a while NorthernLight had a clustering
interface
Didnt really get anywhere
The latest entry is Vivisimo
Has a lot of problems
BUT theres a new development from Vivisimo
called Clusty
Seems to have much improved clustering and
interface

57
An Analysis of Vivisimo

Query barcelona
Query dog pregnancy

58
(No Transcript)
59
(No Transcript)
60
(No Transcript)
61
An Analysis of Vivisimo

Query barcelona
Hotels and Travel Guide are both at top level
Also, Barcelona City
But Travel Guide contains
Hotels
Spain, Spanish
Not really helping to make useful distinctions

62
(No Transcript)
63
(No Transcript)
64
An Analysis of Vivisimo

Query pregnant dog
What does the category pregnant mean here?
Why does it have a subcategory of whelping, when
there is also a main category of whelping?
And what the relationship to Pregnancy and Birth
The pages shown dont seem strongly related to
one another
How to followup?
There is a find in clusters box, but not very
helpful because no hints about which words might
work

65
Search within Results
66
Then along came Clusty

Announced less than a week ago
Produced by Vivisimo
Much better interface
Much better clusters

67
(No Transcript)
68
(No Transcript)
69
(No Transcript)
70
(No Transcript)
71
(No Transcript)
72
Clusty Improvements

Labels tend to be more at the same level of
description
Subcategories are more cautious, reflecting
groups of very similar documents
Do a better job of really showing subcategories
Nice interface touches
Better use of color for distinguishing
Small icons are inviting
Incorporation of encyclopedia results high up
Search results are better
(Not always pregnant dog not much better)
Using metasearch
May be throwing out some docs to get more
distribution in the types of results found
Looks like they are focusing on term proximity to
get more meaningful grouping
Dont allow very many results

73
(No Transcript)
74
(No Transcript)
75
(No Transcript)
76
Clusty Improvements

Doing sense disambiguation for abbreviations like
ACL
However, no good followup for how to make use of
this
E.g., to search on ACL (meaning comp ling) plus
some other concepts
On the other hand, using multiple terms is how
most disambiguation is done now
ACL disambiguation
Jaguar prey
So not clear if there is a net benefit
Trying to approximate faceted queries
Under Jaguar query, for history, show both
history of band with history of car and video
game

77
(No Transcript)
78
Analysis

Is it really helping? Or are the categories now
too general and overlapping?
The main effect seems to be that the search
results are better due to the metasearch and term
proximity

79
(No Transcript)
80
More Analysis

Reflects the frequency of topics in the data
So no discussion of nukes in the Spain categories
No discussion of hotels in the North Korea
categories
Is this good or bad? It depends.

81
(No Transcript)
82
(No Transcript)
83
(No Transcript)
84
(No Transcript)
85
(No Transcript)
86
(No Transcript)
87
More analysis

Adding a related term (Degas, Cezanne) brings up
relations between the two that dont appear with
the general term Degas alone
Impressionists
Pissaro, in particular (should be under
impressionists)
Also leads to messier results

88
Summary

Grouping search results is desirable
Often requested by lay users
Very positive results for category interface
However, getting good groups is difficult
Two main approaches
Predefined category sets
Automatically created clusters
Furthermore, incorporation of groups into
interfaces has not been done well
Notable Failures in Search Interfaces
Visualization of clusters
Unintuitive clusters and labels
Clustering of images according to visual
attributes
Poor incorporation of categories into search
interfaces (not covered)
Good news improvements are happening
Improved clustering that takes better account of
good display principles as seen in Clusty
Flamenco Flexible search and navigation via
faceted category hierarchies (not discussed here)

89
A Promising DirectionCombining Categories and
Clusters

Mehran Sahamis work on combing categories and
clusters
Ray Larsons work on clustering results of
categorization
Would be interesting to cluster MeSH category
labels
Work using UMLS to select subsets of MeSH has
been successful for analysis tasks

90
Conclusions

In order to use clustering in an interface, must
pay attention to what makes the groupings
intuitive
Much work has been too much of a science
project
Up to now, clustering hasnt succeeded on web
search results, but Clusty show marked
improvements that are promising

91
Thank you!

Marti Hearst
www.sims.berkeley.edu/hearst

92
More Recent Attempts

Analyzing retrieval results
KartOO http//www.kartoo.com/
Grokker http//www.groxis.com/service/grok

93
(No Transcript)
94
(No Transcript)
95
(No Transcript)
96
(No Transcript)
97
References

Chen, Houston, Sewell, and Schatz, JASIS 49(7)
Chen and Yu, Empirical studies of information
visualization a meta-analysis, IJHCS 53(5),2000
Dumais, Cutrell, Cadiz, Jancke, Sarin and
Robbins, Stuff I've Seen A system for personal
information retrieval and re-use. SIGIR 2003.
Hearst, English, Sinha, Swearingen, Yee. Finding
the Flow in Web Site Search, CACM 45(9), 2002.
Hearst, User Interfaces and Visualization,
Chapter 10 of Modern Information Retrieval,
Baeza-Yates and Rebeiro-Nato (Eds),
Addison-Wesley 1999.
Johnson, Manning, Hagen, and Dorsey. Specialize
Your Site's Search. Forrester Research, (Dec.
2001), Cambridge, MA

98
References

Sebrechts, Cugini, Laskowski, Vasilakis and
Miller, Visualization of search results a
comparative evaluation of text, 2D, and 3D
interfaces, SIGIR 99.
Swan and Allan, Aspect windows, 3-D
visualizations, and indirect comparisons of
information retrieval systems, SIGIR 1998.
Yee, Swearingen, Li, Hearst, Faceted Metadata for
Image Search and Browsing, Proceedings of CHI 2003

Write a Comment

User Comments (0)