Designing and Evaluating Search Interfaces

About This Presentation

Title:

Designing and Evaluating Search Interfaces

Description:

Designing and Evaluating Search Interfaces Prof. Marti Hearst School of Information UC Berkeley – PowerPoint PPT presentation

Number of Views:246

Avg rating:3.0/5.0

Slides: 113

Provided by: berk140

Learn more at: https://people.ischool.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: Designing and Evaluating Search Interfaces

1
Designing and Evaluating Search Interfaces
Prof. Marti Hearst School of Information UC
Berkeley
2
Outline

Why is Supporting Search Difficult?
What Works?
How to Evaluate?

3
Why is Supporting Search Difficult?

Everything is fair game
Abstractions are difficult to represent
The vocabulary disconnect
Users lack of understanding of the technology
Clutter vs. Information

4
Everything is Fair Game

The scope of what people search for is all of
human knowledge and experience.
Other interfaces are more constrained
(word processing, formulas, etc)
Interfaces must accommodate human differences in
Knowledge / life experience
Cultural background and expectations
Reading / scanning ability and style
Methods of looking for things (pilers vs. filers)

5
Abstractions Are Hard to Represent

Text describes abstract concepts
Difficult to show the contents of text in a
visual or compact manner
Exercise
How would you show the preamble of the US
Constitution visually?
How would you show the contents of Joyces
Ulysses visually? How would you distinguish it
from Homers The Odyssey or McCourts Angelas
Ashes?
The point it is difficult to show text without
using text

6
Vocabulary Disconnect

If you ask a set of people to describe a set of
things there is little overlap in the results.

7
The Vocabulary Problem

Data sets examined (and of participants)
Main verbs used by typists to describe the kinds
of edits that they do (48)
Commands for a hypothetical message decoder
computer program (100)
First word used to describe 50 common objects
(337)
Categories for 64 classified ads (30)
First keywords for a each of a set of recipes
(24)

Furnas, Landauer, Gomez, Dumais The Vocabulary
Problem in Human-System Communication. Commun.
ACM 30(11) 964-971 (1987)
8
The Vocabulary Problem

These are really bad results
If one person assigns the name, the probability
of it NOT matching with another persons is about
80
What if we pick the most commonly chosen words as
the standard? Still not good

Furnas, Landauer, Gomez, Dumais The Vocabulary
Problem in Human-System Communication. Commun.
ACM 30(11) 964-971 (1987)
9
Lack of Technical Understanding

Most people dont understand the underlying
methods by which search engines work.

10
People Dont Understand Search Technology

A study of 100 randomly-chosen people found
14 never type a url directly into the address
bar
Several tried to use the address bar, but did it
wrong
Put spaces between words
Combinations of dots and spaces
nursing spectrum.com consumer reports.com
Several use search form with no spaces
plumberslocal9 capitalhealthsystem
People do not understand the use of quotes
Only 16 use quotes
Of these, some use them incorrectly
Around all of the words, making results too
restrictive
lactose intolerance recipies
Here the excludes the recipes
People dont make use of advanced features
Only 1 used find in page
Only 2 used Google cache

Hargattai, Classifying and Coding Online Actions,
Social Science Computer Review 22(2), 2004
210-227.
11
People Dont Understand Search Technology

Without appropriate explanations, most of 14
people had strong misconceptions about
ANDing vs ORing of search terms
Some assumed ANDing search engine indexed a
smaller collection most had no explanation at
all
For empty results for query to be or not to be
9 of 14 could not explain in a method that
remotely resembled stop word removal
For term order variation boat fire vs. fire
boat
Only 5 out of 14 expected different results
Understanding was vague, e.g.
Lycos separates the two words and searches for
the meaning, instead of whatre your looking for.
Google understands the meaning of the phrase.

Muramatsu Pratt, Transparent Queries
Investigating Users Mental Models of Search
Engines, SIGIR 2001.
12
What Works?
13
Cool Doesnt Cut It

Its very difficult to design a search interface
that users prefer over the standard
Some ideas have a strong WOW factor
Examples
Kartoo
Groxis
Hyperbolic tree
But they dont pass the will you use it test
Even some simpler ideas fall by the wayside
Example
Visual ranking indicators for results set
listings

14
Early Visual Rank Indicators
15
(No Transcript)
16
(No Transcript)
17
Metadata Matters

When used correctly, text to describe text,
images, video, etc. works well
Searchers often turn into browsers with
appropriate links
However, metadata has many perils
The Kosher Recipe Incident

18
Small Details Matter

UIs for search especially require great care in
small details
In part due to the text-heavy nature of search
A tension between more information and
introducing clutter
How and where to place things important
People tend to scan or skim
Only a small percentage reads instructions

19
Small Details Matter

UIs for search especially require endless tiny
adjustments
In part due to the text-heavy nature of search
Example
In an earlier version of the Google Spellchecker,
people didnt always see the suggested correction
Used a long sentence at the top of the page
If you didnt find what you were looking for
People complained they got results, but not the
right results.
In reality, the spellchecker had suggested an
appropriate correction.

Interview with Marissa Mayer by Mark Hurst
http//www.goodexperience.com/columns/02/1015googl
e.html
20
Small Details Matter

The fix
Analyzed logs, saw people didnt see the
correction
clicked on first search result,
didnt find what they were looking for (came
right back to the search page
scrolled to the bottom of the page, did not find
anything
and then complained directly to Google
Solution was to repeat the spelling suggestion at
the bottom of the page.
More adjustments
The message is shorter, and different on the top
vs. the bottom

Interview with Marissa Mayer by Mark Hurst
http//www.goodexperience.com/columns/02/1015googl
e.html
21
(No Transcript)
22
Small Details Matter

Layout, font, and whitespace for
information-centric interfaces requires very
careful design
Example
Photo thumbnails
Search results summaries

23
What Works for Search Interfaces?

Query term highlighting
in results listings
in retrieved documents
Term Suggestions (if done right)
Sorting of search results according to important
criteria (date, author)
Grouping of results according to well-organized
category labels (see Flamenco)
DWIM only if highly accurate
Spelling correction/suggestions
Simple relevance feedback (more-like-this)
Certain types of term expansion
So far not really visualization

Hearst et al Finding the Flow in Web Site
Search, CACM 45(9), 2002.
24
Highlighting Query Terms

Boldface or color
Adjacency of terms with relevant context is a
useful cue.

25
(No Transcript)
26
(No Transcript)
27
Highlighted query term hits using Google toolbar
Microso
US
Blackout
PGA
Microsoft
28
How to Introduce New Features?

Example Yahoo shortcuts
Search engines now provide groups of enriched
content
Automatically infer related information, such as
sports statistics
Accessed via keywords
User can quickly specify very specific
information
united 570 (flight arrival time)
map san francisco
Were heading back to command languages!

29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
Introducing New Features

A general technique scaffolding
Scaffolding
Facilitate a students ability to build on prior
knowledge and internalize new information.
The activities provided in scaffolding
instruction are just beyond the level of what the
learner can do already.
Learning the new concept moves the learner up one
step on the conceptual ladder

34
Scaffolding Example

The problem how do people learn about these
fantastic but unknown options?
Example scaffolding the definition function
Where to put a suggestion for a definition?
Google used to simply hyperlink it next to the
statistics for the word.
Now a hint appears to alert people to the
feature.

35
Unlikely to notice the function here
36
Scaffolding to teach what is available
37
Query Term Suggestions
38
Query Reformulation

Query reformulation
After receiving unsuccessful results, users
modify their initial queries and submit new ones
intended to more accurately reflect their
information needs.
Web search logs show that searchers often
reformulate their queries
A study of 985 Web user search sessions found
33 went beyond the first query
Of these, 35 retained the same number of terms
while 19 had 1 more term and 16 had 1 fewer

Use of query reformulation and relevance feedback
by Excite users, Spink, Janson Ozmultu,
Internet Research 10(4), 2001
39
Query Reformulation

Many studies show that if users engage in
relevance feedback, the results are much better.
In one study, participants did 17-34 better with
RF
They also did better if they could see the RF
terms than if the system did it automatically
(DWIM)
But the effort required for doing so is usually a
roadblock.
Before the web and in most research, searches
have to select MANY relevant documents or MANY
terms.

Koenemann Belkin, A Case for Interaction A
Study of Interactive Information Retrieval
Behavior and Effectiveness, CHI96
40
Query Reformulation

What happens when the web search engines suggests
new terms?
Web log analysis study using the Prisma term
suggestion system

Anick, Using Terminological Feedback for Web
Search Refinement A Log-based Study, SIGIR03.
41
Query Reformulation Study

Feedback terms were displayed to 15,133 user
sessions.
Of these, 14 used at least one feedback term
For all sessions, 56 involved some degree of
query refinement
Within this subset, use of the feedback terms was
25
By user id, 16 of users applied feedback terms
at least once on any given day
Looking at a 2-week session of feedback users
Of the 2,318 users who used it once, 47 used it
again in the same 2-week window.
Comparison was also done to a baseline group that
was not offered feedback terms.
Both groups ended up making a page-selection
click at the same rate.

Anick, Using Terminological Feedback for Web
Search Refinement A Log-based Study, SIGIR03.
42
Query Reformulation Study
Anick, Using Terminological Feedback for Web
Search Refinement A Log-based Study, SIGIR03.
43
Query Reformulation Study

Other observations
Users prefer refinements that contain the initial
query terms
Presentation order does have an influence on term
uptake

Anick, Using Terminological Feedback for Web
Search Refinement A Log-based Study, SIGIR03.
44
Query Reformulation Study

Types of refinements

Anick, Using Terminological Feedback for Web
Search Refinement A Log-based Study, SIGIR03.
45
Prognosis Query Reformulation

Researchers have always known it can be helpful,
but the methods proposed for user interaction
were too cumbersome
Had to select many documents and then do feedback
Had to select many terms
Was based on statistical ranking methods which
are hard for people to understand
RF is promising for web-based searching
The dominance of AND-based searching makes it
easier to understand the effects of RF
Automated systems built on the assumption that
the user will only add one term now work
reasonably well
This kind of interface is simple

46
Supporting the Search Process

We should differentiate among searching
The Web
Personal information
Large collections of like information
Different cues useful for each
Different interfaces needed
Examples
The Stuff Ive Seen Project
The Flamenco Project

47
The Stuff Ive Seen project

Did intense studies of how people work
Used the results to design an integrated search
framework
Did extensive evaluations of alternative designs
The following slides are modifications of ones
supplied by Sue Dumais, reproduced with
permission.

Dumais, Cutrell, Cadiz, Jancke, Sarin and
Robbins, Stuff I've Seen A system for personal
information retrieval and re-use. SIGIR 2003.
48
Searching Over Personal Information

Many locations, interfaces for finding things
(e.g., web, mail, local files, help, history,
notes)

Slide adapted from Sue Dumais.
49
The Stuff Ive Seen project

Unified index of items touched recently by user
All types of information, e.g., files of all
types, email, calendar, contacts, web pages, etc.
Full-text index of content plus metadata
attributes (e.g., creation time, author, title,
size)
Automatic and immediate update of index
Rich UI possibilities, since its your content
Search only over things already seen
Re-use vs. initial discovery

Slide adapted from Sue Dumais.
50
SIS Interface
Slide adapted from Sue Dumais
51
Search With SIS
Slide adapted from Sue Dumais
52
Evaluating SIS

Internal deployment
1500 downloads
Users include program management, test, sales,
development, administrative, executives, etc.
Research techniques
Free-form feedback
Questionnaires Structured interviews
Usage patterns from log data
UI experiments (randomly deploy different
versions)
Lab studies for richer UI (e.g., timeline,
trends)
But even here must work with users own content

Slide adapted from Sue Dumais
53
SIS Usage Data

Detailed analysis for 234 people, 6 weeks usage
Personal store characteristics
5k 100k items index lt150 meg
Query characteristics
Short queries (1.59 words)
Few advanced operators or fielded search in query
box (7.5)
Frequent use of query iteration (48)
50 refined queries involve filters type, date
most common
35 refined queries involve changes to query
13 refined queries involve re-sort
Query content
Importance of people
29 of the queries involve peoples names

Slide adapted from Sue Dumais
54
SIS Usage Data, contd

Characteristics of items opened
File types opened
76 Email
14 Web pages
10 Files
Age of items opened
7 today
22 within the last week
46 within the last month
Ease of finding information
Easier after SIS for web, email, files
Non-SIS search decreases for web, email, files

Log(Freq) -0.68 log(DaysSinceSeen) 2.02
Slide adapted from Sue Dumais
55
SIS Usage, contd

UI Usage
Small effects of Top/Side, Previews
Sort order
Date by far the most common sort field, even for
people who had Okapi Rank as default
Importance of time
Few searches for best match many other
criteria

Number of Queries Issued
Slide adapted from Sue Dumais
56
Web Sites and Collections

A report by Forrester research in 2001 showed
that while 76 of firms rated search as
extremely important only 24 consider their Web
sites search to be extremely useful.

Johnson, K., Manning, H., Hagen, P.R., and
Dorsey, M. Specialize Your Site's Search.
Forrester Research, (Dec. 2001), Cambridge, MA
www.forrester.com/ER/Research/Report/Summary/0,133
8,13322,00
57
There are many ways to do it wrong

Examples
Melvyl online catalog
no way to browse enormous category listings
Audible.com, BooksOnTape.com, and
BrillianceAudio
no way to browse a given category and
simultaneosly select unabridged versions
Amazon.com
has finally gotten browsing over multiple kinds
of features working this is a recent development
but still restricted on what can be added into
the query

58
(No Transcript)
59
(No Transcript)
60
(No Transcript)
61
(No Transcript)
62
(No Transcript)
63
(No Transcript)
64
(No Transcript)
65
(No Transcript)
66
(No Transcript)
67
(No Transcript)
68
(No Transcript)
69
(No Transcript)
70
(No Transcript)
71
(No Transcript)
72
The Flamenco Project

Incorporating Faceted Hierarchical Metadata into
Interfaces for Large Collections
Key Goals
Support integrated browsing and keyword search
Provide an experience of browsing the shelves
Add power and flexibility without introducing
confusion or a feeling of clutter
Allow users to take the path most natural to them
Method
User-centered design, including needs assessment
and many iterations of design and testing

Yee, Swearingen, Li, Hearst, Faceted Metadata for
Image Search and Browsing, Proceedings of CHI
2003.
73
Some Challenges

Users dont like new search interfaces.
How to show lots more information without
overwhelming or confusing?
Our approach
Integrate the search seamlessly into the
information architecture.
Use proper HCI methodologies.
Use faceted metadata

74
The Flamenco Interface

Hierarchical facets
Chess metaphor
Opening
Middle game
End game
Tightly Integrated Search
Expand as well as Refine
Intermediate pages for large categories
For this design, small details really matter

75
(No Transcript)
76
(No Transcript)
77
(No Transcript)
78
(No Transcript)
79
(No Transcript)
80
(No Transcript)
81
(No Transcript)
82
(No Transcript)
83
(No Transcript)
84
What is Tricky About This?

It is easy to do it poorly
Yahoo directory structure
It is hard to be not overwhelming
Most users prefer simplicity unless complexity
really makes a difference
It is hard to make it flow
Can it feel like browsing the shelves?

85
Using HCI Methodology

Identify Target Population
Architects, city planners
Needs assessment.
Interviewed architects and conducted contextual
inquiries.
Lo-fi prototyping.
Showed paper prototype to 3 professional
architects.
Design / Study Round 1.
Simple interactive version. Users liked metadata
idea.
Design / Study Round 2
Developed 4 different detailed versions
evaluated with 11 architects results somewhat
positive but many problems identified. Matrix
emerged as a good idea.
Metadata revision.
Compressed and simplified the metadata
hierarchies

86
Using HCI Methodology

Design / Study Round 3.
New version based on results of Round 2
Highly positive user response
Identified new user population/collection
Students and scholars of art history
Fine arts images
Study Round 4
Compare the metadata system to a strong,
representative baseline

87
Most Recent Usability Study

Participants Collection
32 Art History Students
35,000 images from SF Fine Arts Museum
Study Design
Within-subjects
Each participant sees both interfaces
Balanced in terms of order and tasks
Participants assess each interface after use
Afterwards they compare them directly
Data recorded in behavior logs, server logs,
paper-surveys one or two experienced testers at
each trial.
Used 9 point Likert scales.
Session took about 1.5 hours pay was 15/hour

88
The Baseline System

Floogle
Take the best of the existing keyword-based image
search systems

89
Comparison of Common Image Search Systems
System Collection Results /page Categories? Familiar
Google Web 20 No 27
AltaVista Web 15 No 8
Corbis Photos 9-36 No 8
Getty Photos, Art 12-90 Yes 6
MS Office Photos, Clip art 6-100 Yes N/A
Thinker Fine arts images 10 Yes 4
BASELINE Fine arts images 40 Yes N/A
90
sword
sword
91
(No Transcript)
92
(No Transcript)
93
(No Transcript)
94
Evaluation Quandary

How to assess the success of browsing?
Timing is usually not a good indicator
People often spend longer when browsing is going
well.
Not the case for directed search
Can look for comprehensiveness and correctness
(precision and recall)
But subjective measures seem to be most
important here.

95
Hypotheses

We attempted to design tasks to test the
following hypotheses
Participants will experience greater search
satisfaction, feel greater confidence in the
results, produce higher recall, and encounter
fewer dead ends using FC over Baseline
FC will perceived to be more useful and flexible
than Baseline
Participants will feel more familiar with the
contents of the collection after using FC
Participants will use FC to create multi-faceted
queries

96
Four Types of Tasks

Unstructured (3) Search for images of interest
Structured Task (11-14) Gather materials for an
art history essay on a given topic, e.g.
Find all woodcuts created in the US
Choose the decade with the most
Select one of the artists in this periods and
show all of their woodcuts
Choose a subject depicted in these works and find
another artist who treated the same subject in a
different way.
Structured Task (10) compare related images
Find images by artists from 2 different countries
that depict conflict between groups.
Unstructured (5) search for images of interest

97
Other Points

Participants were NOT walked through the
interfaces.
The wording of Task 2 reflected the metadata not
the case for Task 3
Within tasks, queries were not different in
difficulty (tslt1.7, p gt0.05 according to
post-task questions)
Flamenco is and order of magnitude slower than
Floogle on average.
In task 2 users were allowed 3 more minutes in FC
than in Baseline.
Time spent in tasks 2 and 3 were significantly
longer in FC (about 2 min more).

98
Results

Participants felt significantly more confident
they had found all relevant images using FC (Task
2 t(62)2.18, plt.05 Task 3 t(62)2.03, plt.05)
Participants felt significantly more satisfied
with the results
(Task 2 t(62)3.78, plt.001 Task 3 t(62)2.03,
plt.05)
Recall scores
Task2a In Baseline 57 of participants found all
relevant results, in FC 81 found all.
Task 2b In Baseline 21 found all relevant, in
FC 77 found all.

99
Post-Interface Assessments
All significant at plt.05 except simple and
overwhelming
100
Perceived Uses of Interfaces
Baseline
FC
101
Post-Test Comparison
FC
Baseline
Which Interface Preferable For
Find images of roses Find all works from a given
period Find pictures by 2 artists in same media
Overall Assessment
More useful for your tasks Easiest to use Most
flexible More likely to result in dead
ends Helped you learn more Overall preference
102
Facet Usage

Facets driven largely by task content
Multiple facets 45 of time in structured tasks
For unstructured tasks,
Artists (17)
Date (15)
Location (15)
Others ranged from 5-12
Multiple facets 19 of time
From end game, expansion from
Artists (39)
Media (29)
Shapes (19)

103
Qualitative Observations

Baseline
Simplicity, similarity to Google a plus
Also noted the usefulness of the category links
FC
Starting page well-organized, gave ideas for
what to search for
Query previews were commented on explicitly by 9
participants
Commented on matrix prompting where to go next
3 were confused about what the matrix shows
Generally liked the grouping and organizing
End game links seemed useful 9 explicitly
remarked positively on the guidance provided
there.
Often get requests to use the system in future

104
Study Results Summary

Overwhelmingly positive results for the faceted
metadata interface.
Somewhat heavy use of multiple facets.
Strong preference over the current state of the
art.
This result not seen in similarity-based image
search interfaces.
Hypotheses are supported.

105
Summary

Usability studies done on 3 collections
Recipes 13,000 items
Architecture Images 40,000 items
Fine Arts Images 35,000 items
Conclusions
Users like and are successful with the dynamic
faceted hierarchical metadata, especially for
browsing tasks
Very positive results, in contrast with studies
on earlier iterations
Note it seems you have to care about the
contents of the collection to like the interface

106
Using DWIM

DWIM Do What I Mean
Refers to systems that try to be smart by
guessing users unstated intentions or desires
Examples
Automatically augment my query with related terms
Automatically suggest spelling corrections
Automatically load web pages that might be
relevant to the one Im looking at
Automatically file my incoming email into folders
Pop up a paperclip that tells me what kind of
help I need.
THE CRITICAL POINT
Users love DWIM when it really works
Users DESPISE it when it doesnt
unless not very intrusive

107
DWIM that Works

Amazons customers who bought X also bought Y
And many other recommendation-related features

108
DWIM Example Spelling Correction/Suggestion

Googles spelling suggestions are highly accurate
But this wasnt always the case.
Google introduced a version that wasnt very
accurate. People hated it. They pulled it.
(According to a talk by Marissa Mayer of Google.)
Later they introduced a version that worked well.
People love it.
But dont get too pushy.
For a while if the user got very few results, the
page was automatically replaced with the results
of the spelling correction
This was removed, presumably due to negative
responses

Information from a talk by Marissa Mayer of Google
109
What Weve Covered

Introduction
Why is designing for search difficult?
How to Design for Search
HCI and iterative design
What works?
Small details matter
Scaffolding
The Role of DWIM
Core Problems
Query specification and refinement
Browsing and searching collections

110
Final Words

User interfaces for search remains a fascinating
and challenging field
Search has taken a primary role in the web and
internet business
Thus, we can continue to expect fascinating
developments, and maybe some breakthroughs, in
the next few years!

111
Thank you!

Marti Hearst
http//www.ischool.berkeley.edu/hearst

112
References

Anick, Using Terminological Feedback for Web
Search Refinement A Log-based Study, SIGIR03.
Bates, The Berry-Picking Search UI Design, in
User Interface Design, Thimbley (ED),
Addison-Wesley 1990
Chen, Houston, Sewell, and Schatz, JASIS 49(7)
Chen and Yu, Empirical studies of information
visualization a meta-analysis, IJHCS 53(5),2000
Dumais, Cutrell, Cadiz, Jancke, Sarin and
Robbins, Stuff I've Seen A system for personal
information retrieval and re-use. SIGIR 2003.
Furnas, Landauer, Gomez, Dumais The Vocabulary
Problem in Human-System Communication. Commun.
ACM 30(11) 964-971 (1987)
Hargattai, Classifying and Coding Online Actions,
Social Science Computer Review 22(2), 2004
210-227.
Hearst, English, Sinha, Swearingen, Yee. Finding
the Flow in Web Site Search, CACM 45(9), 2002.
Hearst, User Interfaces and Visualization,
Chapter 10 of Modern Information Retrieval,
Baeza-Yates and Rebeiro-Nato (Eds),
Addison-Wesley 1999.
Johnson, Manning, Hagen, and Dorsey. Specialize
Your Site's Search. Forrester Research, (Dec.
2001), Cambridge, MA

113
References

Koenemann Belkin, A Case for Interaction A
Study of Interactive Information Retrieval
Behavior and Effectiveness, CHI96
Marissa Mayer Interview by Mark Hurst
http//www.goodexperience.com/columns/02/1015googl
e.html
Muramatsu Pratt, Transparent Queries
Investigating Users Mental Models of Search
Engines, SIGIR 2001.
ODay Jeffries, Orienteering in an information
landscape how information seekers get from here
to there, Proceedings of InterCHI 93.
Rose Levinson, Understanding User Goals in Web
Search, Proceedings of WWW04
Russell, Stefik, Pirolli, Card, The Cost
Structure of Sensemaking , Proceedings of
InterCHI 93.
Sebrechts, Cugini, Laskowski, Vasilakis and
Miller, Visualization of search results a
comparative evaluation of text, 2D, and 3D
interfaces, SIGIR 99.
Swan and Allan, Aspect windows, 3-D
visualizations, and indirect comparisons of
information retrieval systems, SIGIR 1998.
Spink, Janson Ozmultu, Use of query
reformulation and relevance feedback by Excite
users, Internet Research 10(4), 2001
Yee, Swearingen, Li, Hearst, Faceted Metadata for
Image Search and Browsing, Proceedings of CHI 2003