Interfaces for Information Retrieval Ray Larson - PowerPoint PPT Presentation

About This Presentation

Title:

Interfaces for Information Retrieval Ray Larson

Description:

lecture authors: Marti Hearst, Ray Larson, Warren ... Interfaces for IR using the standard model of IR ... Well-designed interactive computer systems promote: ... – PowerPoint PPT presentation

Number of Views:62

Avg rating:3.0/5.0

Slides: 72

Provided by: valued86

Learn more at: https://courses.ischool.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: Interfaces for Information Retrieval Ray Larson

1
Interfaces for Information RetrievalRay Larson
Warren SackIS202 Information Organization and
RetrievalFall 2001UC Berkeley, SIMSlecture
authors Marti Hearst, Ray Larson, Warren Sack
2
Today

What is HCI?
Interfaces for IR using the standard model of IR
Interfaces for IR using new models of IR and/or
different models of interaction

3
Human-Computer Interaction (HCI)

Human
the end-user of a program
Computer
the machine the program runs on
Interaction
the user tells the computer what they want
the computer communicates results
(slide adapted What is HCI?
from James Landay)

4
What is HCI?
(slide by James Landay)
5
(No Transcript)
6
Shneiderman on HCI

Well-designed interactive computer systems
promote
Positive feelings of success, competence, and
mastery.
Allow users to concentrate on their work, rather
than on the system.

7
Usability Design Goals

Ease of learning
faster the second time and so on...
Recall
remember how from one session to the next
Productivity
perform tasks quickly and efficiently
Minimal error rates
if they occur, good feedback so user can recover
High user satisfaction
confident of success

(slide by James Landay)
8
Who builds UIs?

A team of specialists
graphic designers
interaction / interface designers
technical writers
marketers
test engineers
software engineers

(slide by James Landay)
9
How to Design and Build UIs

Task analysis
Rapid prototyping
Evaluation
Implementation

Iterate at every stage!
(slide adapted from James Landay)
10
Task Analysis

Observe existing work practices
Create examples and scenarios of actual use
Try out new ideas before building software

11
Task Information Access

The standard interaction model for information
access
(1) start with an information need
(2) select a system and collections to search on
(3) formulate a query
(4) send the query to the system
(5) receive the results
(6) scan, evaluate, and interpret the results
(7) stop, or
(8) reformulate the query and go to step 4

12
HCI Interface questions using the standard model
of IR

Where does a user start? Faced with a large set
of collections, how can a user choose one to
begin with?
How will a user formulate a query?
How will a user scan, evaluate, and interpret the
results?
How can a user reformulate a query?

13
Interface design Is it always HCI or the highway?

No, there are other ways to design interfaces,
including using methods from
Art
Architecture
Sociology
Anthropology
Narrative theory
Geography

14
Information Access Is the standard IR model
always the model?

No, other models have been proposed and explored
including
Berrypicking (Bates, 1989)
Sensemaking (Russell et al., 1993)
Orienteering (ODay and Jeffries, 1993)
Intermediaries (Maglio and Barrett, 1996)
Social Navigation (Dourish and Chalmers, 1994)
Agents (e.g., Maes, 1992)
And dont forget experiments like (Blair and
Maron, 1985)

15
IRHCI

Question 1 Where does the user start?

16
Dialog box for choosing sources in old
lexis-nexis interface
17
Where does a user start?

Supervised (Manual) Category Overviews
Yahoo!
HiBrowse
MeSHBrowse
Unsupervised (Automated) Groupings
Clustering
Kohonen Feature Maps

18
(No Transcript)
19
Incorporating Categories into the Interface

Yahoo is the standard method
Problems
Hard to search, meant to be navigated.
Only one category per document (usually)

20
More Complex Example MeSH and MedLine

MeSH Category Hierarchy
Medical Subject Headings
18,000 labels
manually assigned
8 labels/article on average
avg depth 4.5, max depth 9
Top Level Categories
anatomy diagnosis related disc
animals psych technology
disease biology humanities
drugs physics

21
MeshBrowse (Korn Shneiderman95)Only the
relevant subset of the hierarchy is shown at one
time.
22
HiBrowse (Pollitt 97)Browsing several different
subsets of category metadata simultaneously.
23
Large Category Sets

Problems for User Interfaces
Too many categories to browse
Too many docs per category
Docs belong to multiple categories
Need to integrate search
Need to show the documents

24
Text Clustering

Finds overall similarities among groups of
documents
Finds overall similarities among groups of tokens
Picks out some themes, ignores others

25
Scatter/Gather

Cutting, Pedersen, Tukey Karger 92, 93, Hearst
Pedersen 95
How it works
Cluster sets of documents into general themes,
like a table of contents
Display the contents of the clusters by showing
topical terms and typical titles
User chooses subsets of the clusters and
re-clusters the documents within
Resulting new groups have different themes
Originally used to give collection overview
Evidence suggests more appropriate for displaying
retrieval results in context

26
(No Transcript)
27
Another use of clustering

Use clustering to map the entire huge
multidimensional document space into a huge
number of small clusters.
Project these onto a 2D graphical
representation
Group by doc SPIRE/Kohonen maps
Group by words Galaxy of News/HotSauce/Semio

28
Clustering Multi-Dimensional Document
Space(image from Wise et al 95)
29
Kohonen Feature Maps on Text(from Chen et al.,
JASIS 49(7))
30
Summary Clustering

Advantages
Get an overview of main themes
Domain independent
Disadvantages
Many of the ways documents could group together
are not shown
Not always easy to understand what they mean
Different levels of granularity

31
IRHCI

Question 2 How will a user formulate a query?

32
Query Specification

Interaction Styles (Shneiderman 97)
Command Language
Form Fill
Menu Selection
Direct Manipulation
Natural Language
What about gesture, eye-tracking, or implicit
inputs like reading habits?

33
Command-Based Query Specification

command attribute value connector
find pa shneiderman and tw user
What are the attribute names?
What are the command names?
What are allowable values?

34
Form-Based Query Specification (Altavista)
35
Form-Based Query Specification (Melvyl)
36
Form-based Query Specification (Infoseek)
37
Direct Manipulation Spec.VQUERY (Jones 98)
38
Menu-based Query Specification(Young
Shneiderman 93)
39
IRHCI

Question 3 How will a user scan, evaluate, and
interpret the results?

40
Display of Retrieval Results

Goal minimize time/effort for deciding which
documents to examine in detail
Idea show the roles of the query terms in the
retrieved documents, making use of document
structure

41
Putting Results in Context

Interfaces should
give hints about the roles terms play in the
collection
give hints about what will happen if various
terms are combined
show explicitly why documents are retrieved in
response to the query
summarize compactly the subset of interest

42
Putting Results in Context

Visualizations of Query Term Distribution
KWIC, TileBars, SeeSoft
Visualizing Shared Subsets of Query Terms
InfoCrystal, VIBE, Lattice Views
Table of Contents as Context
Superbook, Cha-Cha, DynaCat
Organizing Results with Tables
Envision, SenseMaker
Using Hyperlinks
WebCutter

43
KWIC (Keyword in Context)

An old standard, ignored by internet search
engines
used in some intranet engines, e.g., Cha-Cha

44
TileBars

Graphical Representation of Term Distribution and
Overlap
Simultaneously Indicate
relative document length
query term frequencies
query term distributions
query term overlap

45
TileBars Example
Query terms What roles do they play in
retrieved documents?
DBMS (Database Systems) Reliability
Mainly about both DBMS reliability
Mainly about DBMS, discusses reliability
Mainly about, say, banking, with a subtopic
discussion on DBMS/Reliability
Mainly about high-tech layoffs
46
(No Transcript)
47
SeeSoft Showing Text Content using a linear
representation and brushing and linking (Eick
Wills 95)
48
David Small Virtual Shakespeare
49
(No Transcript)
50
(No Transcript)
51
Other Approaches

Show how often each query term occurs in
retrieved documents
VIBE (Korfhage 91)
InfoCrystal (Spoerri 94)

52
VIBE (Olson et al. 93, Korfhage 93)
53
InfoCrystal (Spoerri 94)
54
Problems with InfoCrystal

cant see overlap of terms within docs
quantities not represented graphically
more than 4 terms hard to handle
no help in selecting terms to begin with

55
Cha-Cha (Chen Hearst 98)

Shows table-of-contents-like view, like
Superbook
Takes advantage of human-created structure within
hyperlinks to create the TOC

56
IRHCI

Question 4 How can a user reformulate a query?

57
Information need
Collections
text input
Query Modification
58
Query Modification

Problem how to reformulate the query?
Thesaurus expansion
Suggest terms similar to query terms
Relevance feedback
Suggest terms (and documents) similar to
retrieved documents that have been judged to be
relevant

59
Using Relevance Feedback

Known to improve results
in TREC-like conditions (no user involved)
What about with a user in the loop?

60
(No Transcript)
61
Terms available for relevance feedback made
visible(from Koenemann Belkin, 1996)
62
How much of the guts should the user see?

Opaque (black box)
(like web search engines)
Transparent
(see available terms after the r.f. )
Penetrable
(see suggested terms before the r.f.)
Which do you think worked best?

63
Effectiveness Results

Subjects with R.F. did 17-34 better performance
than no R.F.
Subjects with penetration case did 15 better as
a group than those in opaque and transparent
cases.

64
Summary HCI Interface questions using the
standard model of IR

Where does a user start? Faced with a large set
of collections, how can a user choose one to
begin with?
How will a user formulate a query?
How will a user scan, evaluate, and interpret the
results?
How can a user reformulate a query?

65
Standard Model

Assumptions
Maximizing precision and recall simultaneously
The information need remains static
The value is in the resulting document set

66
Problem with Standard Model

Users learn during the search process
Scanning titles of retrieved documents
Reading retrieved documents
Viewing lists of related topics/thesaurus terms
Navigating hyperlinks
Some users dont like long disorganized lists of
documents

67
Berrypicking as an Information Seeking Strategy
(Bates 90)

Standard IR model
assumes the information need remains the same
throughout the search process
Berrypicking model
interesting information is scattered like berries
among bushes
the query is continually shifting
People are learning as they go

68
A sketch of a searcher moving through many
actions towards a general goal of satisfactory
completion of research related to an information
need. (after Bates 89)
Q2
Q4
Q3
Q1
Q5
Q0
69
Implications

Interfaces should make it easy to store
intermediate results
Interfaces should make it easy to follow trails
with unanticipated results

70
Information Access Is the standard IR model
always the model?

No, other models have been proposed and explored
including
Berrypicking (Bates, 1989)
Sensemaking (Russell et al., 1993)
Orienteering (ODay and Jeffries, 1993)
Intermediaries (Maglio and Barrett, 1996)
Social Navigation (Dourish and Chalmers, 1994)
Agents (e.g., Maes, 1992)
And dont forget experiments like (Blair and
Maron, 1985)

71
Next Time

Abbe Don, Guest speaker
Information architecture and novel interfaces for
information access.
See Apple Guides paper listed on IS202
assignments page, along with other readings
Also, here is a request from Abbe
look at the following websites
www.disney.com
www.sony.com
www.nickelodeon.com
go at least "3 levels" deep to get a sense of how
the sites are organized.