Interfaces for Information Retrieval Ray Larson - PowerPoint PPT Presentation

About This Presentation
Title:

Interfaces for Information Retrieval Ray Larson

Description:

lecture authors: Marti Hearst, Ray Larson, Warren ... Interfaces for IR using the standard model of IR ... Well-designed interactive computer systems promote: ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 72
Provided by: valued86
Category:

less

Transcript and Presenter's Notes

Title: Interfaces for Information Retrieval Ray Larson


1
Interfaces for Information RetrievalRay Larson
Warren SackIS202 Information Organization and
RetrievalFall 2001UC Berkeley, SIMSlecture
authors Marti Hearst, Ray Larson, Warren Sack
2
Today
  • What is HCI?
  • Interfaces for IR using the standard model of IR
  • Interfaces for IR using new models of IR and/or
    different models of interaction

3
Human-Computer Interaction (HCI)
  • Human
  • the end-user of a program
  • Computer
  • the machine the program runs on
  • Interaction
  • the user tells the computer what they want
  • the computer communicates results
  • (slide adapted What is HCI?
  • from James Landay)

4
What is HCI?
(slide by James Landay)
5
(No Transcript)
6
Shneiderman on HCI
  • Well-designed interactive computer systems
    promote
  • Positive feelings of success, competence, and
    mastery.
  • Allow users to concentrate on their work, rather
    than on the system.

7
Usability Design Goals
  • Ease of learning
  • faster the second time and so on...
  • Recall
  • remember how from one session to the next
  • Productivity
  • perform tasks quickly and efficiently
  • Minimal error rates
  • if they occur, good feedback so user can recover
  • High user satisfaction
  • confident of success

(slide by James Landay)
8
Who builds UIs?
  • A team of specialists
  • graphic designers
  • interaction / interface designers
  • technical writers
  • marketers
  • test engineers
  • software engineers

(slide by James Landay)
9
How to Design and Build UIs
  • Task analysis
  • Rapid prototyping
  • Evaluation
  • Implementation

Iterate at every stage!
(slide adapted from James Landay)
10
Task Analysis
  • Observe existing work practices
  • Create examples and scenarios of actual use
  • Try out new ideas before building software

11
Task Information Access
  • The standard interaction model for information
    access
  • (1) start with an information need
  • (2) select a system and collections to search on
  • (3) formulate a query
  • (4) send the query to the system
  • (5) receive the results
  • (6) scan, evaluate, and interpret the results
  • (7) stop, or
  • (8) reformulate the query and go to step 4

12
HCI Interface questions using the standard model
of IR
  • Where does a user start? Faced with a large set
    of collections, how can a user choose one to
    begin with?
  • How will a user formulate a query?
  • How will a user scan, evaluate, and interpret the
    results?
  • How can a user reformulate a query?

13
Interface design Is it always HCI or the highway?
  • No, there are other ways to design interfaces,
    including using methods from
  • Art
  • Architecture
  • Sociology
  • Anthropology
  • Narrative theory
  • Geography

14
Information Access Is the standard IR model
always the model?
  • No, other models have been proposed and explored
    including
  • Berrypicking (Bates, 1989)
  • Sensemaking (Russell et al., 1993)
  • Orienteering (ODay and Jeffries, 1993)
  • Intermediaries (Maglio and Barrett, 1996)
  • Social Navigation (Dourish and Chalmers, 1994)
  • Agents (e.g., Maes, 1992)
  • And dont forget experiments like (Blair and
    Maron, 1985)

15
IRHCI
  • Question 1 Where does the user start?

16
Dialog box for choosing sources in old
lexis-nexis interface
17
Where does a user start?
  • Supervised (Manual) Category Overviews
  • Yahoo!
  • HiBrowse
  • MeSHBrowse
  • Unsupervised (Automated) Groupings
  • Clustering
  • Kohonen Feature Maps

18
(No Transcript)
19
Incorporating Categories into the Interface
  • Yahoo is the standard method
  • Problems
  • Hard to search, meant to be navigated.
  • Only one category per document (usually)

20
More Complex Example MeSH and MedLine
  • MeSH Category Hierarchy
  • Medical Subject Headings
  • 18,000 labels
  • manually assigned
  • 8 labels/article on average
  • avg depth 4.5, max depth 9
  • Top Level Categories
  • anatomy diagnosis related disc
  • animals psych technology
  • disease biology humanities
  • drugs physics

21
MeshBrowse (Korn Shneiderman95)Only the
relevant subset of the hierarchy is shown at one
time.
22
HiBrowse (Pollitt 97)Browsing several different
subsets of category metadata simultaneously.
23
Large Category Sets
  • Problems for User Interfaces
  • Too many categories to browse
  • Too many docs per category
  • Docs belong to multiple categories
  • Need to integrate search
  • Need to show the documents

24
Text Clustering
  • Finds overall similarities among groups of
    documents
  • Finds overall similarities among groups of tokens
  • Picks out some themes, ignores others

25
Scatter/Gather
  • Cutting, Pedersen, Tukey Karger 92, 93, Hearst
    Pedersen 95
  • How it works
  • Cluster sets of documents into general themes,
    like a table of contents
  • Display the contents of the clusters by showing
    topical terms and typical titles
  • User chooses subsets of the clusters and
    re-clusters the documents within
  • Resulting new groups have different themes
  • Originally used to give collection overview
  • Evidence suggests more appropriate for displaying
    retrieval results in context

26
(No Transcript)
27
Another use of clustering
  • Use clustering to map the entire huge
    multidimensional document space into a huge
    number of small clusters.
  • Project these onto a 2D graphical
    representation
  • Group by doc SPIRE/Kohonen maps
  • Group by words Galaxy of News/HotSauce/Semio

28
Clustering Multi-Dimensional Document
Space(image from Wise et al 95)
29
Kohonen Feature Maps on Text(from Chen et al.,
JASIS 49(7))
30
Summary Clustering
  • Advantages
  • Get an overview of main themes
  • Domain independent
  • Disadvantages
  • Many of the ways documents could group together
    are not shown
  • Not always easy to understand what they mean
  • Different levels of granularity

31
IRHCI
  • Question 2 How will a user formulate a query?

32
Query Specification
  • Interaction Styles (Shneiderman 97)
  • Command Language
  • Form Fill
  • Menu Selection
  • Direct Manipulation
  • Natural Language
  • What about gesture, eye-tracking, or implicit
    inputs like reading habits?

33
Command-Based Query Specification
  • command attribute value connector
  • find pa shneiderman and tw user
  • What are the attribute names?
  • What are the command names?
  • What are allowable values?

34
Form-Based Query Specification (Altavista)
35
Form-Based Query Specification (Melvyl)
36
Form-based Query Specification (Infoseek)
37
Direct Manipulation Spec.VQUERY (Jones 98)
38
Menu-based Query Specification(Young
Shneiderman 93)
39
IRHCI
  • Question 3 How will a user scan, evaluate, and
    interpret the results?

40
Display of Retrieval Results
  • Goal minimize time/effort for deciding which
    documents to examine in detail
  • Idea show the roles of the query terms in the
    retrieved documents, making use of document
    structure

41
Putting Results in Context
  • Interfaces should
  • give hints about the roles terms play in the
    collection
  • give hints about what will happen if various
    terms are combined
  • show explicitly why documents are retrieved in
    response to the query
  • summarize compactly the subset of interest

42
Putting Results in Context
  • Visualizations of Query Term Distribution
  • KWIC, TileBars, SeeSoft
  • Visualizing Shared Subsets of Query Terms
  • InfoCrystal, VIBE, Lattice Views
  • Table of Contents as Context
  • Superbook, Cha-Cha, DynaCat
  • Organizing Results with Tables
  • Envision, SenseMaker
  • Using Hyperlinks
  • WebCutter

43
KWIC (Keyword in Context)
  • An old standard, ignored by internet search
    engines
  • used in some intranet engines, e.g., Cha-Cha

44
TileBars
  • Graphical Representation of Term Distribution and
    Overlap
  • Simultaneously Indicate
  • relative document length
  • query term frequencies
  • query term distributions
  • query term overlap

45
TileBars Example
Query terms What roles do they play in
retrieved documents?
DBMS (Database Systems) Reliability
Mainly about both DBMS reliability
Mainly about DBMS, discusses reliability
Mainly about, say, banking, with a subtopic
discussion on DBMS/Reliability
Mainly about high-tech layoffs
46
(No Transcript)
47
SeeSoft Showing Text Content using a linear
representation and brushing and linking (Eick
Wills 95)
48
David Small Virtual Shakespeare
49
(No Transcript)
50
(No Transcript)
51
Other Approaches
  • Show how often each query term occurs in
    retrieved documents
  • VIBE (Korfhage 91)
  • InfoCrystal (Spoerri 94)

52
VIBE (Olson et al. 93, Korfhage 93)
53
InfoCrystal (Spoerri 94)
54
Problems with InfoCrystal
  • cant see overlap of terms within docs
  • quantities not represented graphically
  • more than 4 terms hard to handle
  • no help in selecting terms to begin with

55
Cha-Cha (Chen Hearst 98)
  • Shows table-of-contents-like view, like
    Superbook
  • Takes advantage of human-created structure within
    hyperlinks to create the TOC

56
IRHCI
  • Question 4 How can a user reformulate a query?

57
Information need
Collections
text input
Query Modification
58
Query Modification
  • Problem how to reformulate the query?
  • Thesaurus expansion
  • Suggest terms similar to query terms
  • Relevance feedback
  • Suggest terms (and documents) similar to
    retrieved documents that have been judged to be
    relevant

59
Using Relevance Feedback
  • Known to improve results
  • in TREC-like conditions (no user involved)
  • What about with a user in the loop?

60
(No Transcript)
61
Terms available for relevance feedback made
visible(from Koenemann Belkin, 1996)
62
How much of the guts should the user see?
  • Opaque (black box)
  • (like web search engines)
  • Transparent
  • (see available terms after the r.f. )
  • Penetrable
  • (see suggested terms before the r.f.)
  • Which do you think worked best?

63
Effectiveness Results
  • Subjects with R.F. did 17-34 better performance
    than no R.F.
  • Subjects with penetration case did 15 better as
    a group than those in opaque and transparent
    cases.

64
Summary HCI Interface questions using the
standard model of IR
  • Where does a user start? Faced with a large set
    of collections, how can a user choose one to
    begin with?
  • How will a user formulate a query?
  • How will a user scan, evaluate, and interpret the
    results?
  • How can a user reformulate a query?

65
Standard Model
  • Assumptions
  • Maximizing precision and recall simultaneously
  • The information need remains static
  • The value is in the resulting document set

66
Problem with Standard Model
  • Users learn during the search process
  • Scanning titles of retrieved documents
  • Reading retrieved documents
  • Viewing lists of related topics/thesaurus terms
  • Navigating hyperlinks
  • Some users dont like long disorganized lists of
    documents

67
Berrypicking as an Information Seeking Strategy
(Bates 90)
  • Standard IR model
  • assumes the information need remains the same
    throughout the search process
  • Berrypicking model
  • interesting information is scattered like berries
    among bushes
  • the query is continually shifting
  • People are learning as they go

68
A sketch of a searcher moving through many
actions towards a general goal of satisfactory
completion of research related to an information
need. (after Bates 89)
Q2
Q4
Q3
Q1
Q5
Q0
69
Implications
  • Interfaces should make it easy to store
    intermediate results
  • Interfaces should make it easy to follow trails
    with unanticipated results

70
Information Access Is the standard IR model
always the model?
  • No, other models have been proposed and explored
    including
  • Berrypicking (Bates, 1989)
  • Sensemaking (Russell et al., 1993)
  • Orienteering (ODay and Jeffries, 1993)
  • Intermediaries (Maglio and Barrett, 1996)
  • Social Navigation (Dourish and Chalmers, 1994)
  • Agents (e.g., Maes, 1992)
  • And dont forget experiments like (Blair and
    Maron, 1985)

71
Next Time
  • Abbe Don, Guest speaker
  • Information architecture and novel interfaces for
    information access.
  • See Apple Guides paper listed on IS202
    assignments page, along with other readings
  • Also, here is a request from Abbe
  • look at the following websites
  • www.disney.com
  • www.sony.com
  • www.nickelodeon.com
  • go at least "3 levels" deep to get a sense of how
    the sites are organized.
Write a Comment
User Comments (0)
About PowerShow.com