Tutorial: User Interfaces

About This Presentation
Title:

Tutorial: User Interfaces

Description:

Tutorial: User Interfaces & Visualization for Information Access Prof. Marti Hearst University of California, Berkeley http://www.sims.berkeley.edu/~hearst – PowerPoint PPT presentation

Number of Views:15
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Tutorial: User Interfaces


1
TutorialUser Interfaces Visualization for
Information Access
  • Prof. Marti Hearst
  • University of California, Berkeley
  • http//www.sims.berkeley.edu/hearst
  • SIGIR 2000

2
Outline
  • Search Interfaces Today
  • HCI Foundations
  • The Information Seeking Process
  • Visualizing Text Collections
  • Incorporating Context and Tasks
  • Promising Future Directions

3
Introductory Remarks
  • Much of HCI is art, still not science
  • In this tutorial, I discuss user studies whenever
    available
  • I do not have time to do justice to most of the
    topics.

4
What do current search interfaces do well?
5
Web search interfaces solutions
  • Single-word queries
  • Standard IR assumed long queries
  • Web searches average 1.5-2.5 words
  • Problems
  • One word can have many meanings
  • What context is the word used in?
  • Which of many articles to retrieve?

6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
(No Transcript)
10
Web search interfaces solutions
  • Single-word queries
  • Solutions
  • Incorporation of manually-created categories
  • Provides useful starting points
  • Disambiguates the term(s)
  • Development of sideways hierarchy representation
  • Ranking that emphasizes starting points
  • Link analysis finds server home pages, etc
  • Use of behavior of other users
  • Suggests related pages
  • Popularity of pages

11
Web search interfaces are interesting in terms of
what they do NOT do
  • Current Web interfaces
  • Are the results of each site experimenting
  • Only those ideas that work for most people
    survive
  • Only very simple ideas remain
  • Abandoned strategies
  • Scores and graphical bars that show degree of
    match
  • Associated term graph (altavista)
  • Suggested terms for expansion (excite)
  • Why did these die?

12
What is lacking in Web search?
13
What is lacking?
  • Support for complex information needs
  • Info on the construction on highway 80
  • Research chemo vs. surgery
  • How does the 6th circuit tend to rule on
    intellectual property cases?
  • What is the prior art for this invention?

14
What is lacking?
  • Integration of search and analysis
  • Support for a series of searches
  • Backing up and moving forward
  • Suggested next steps
  • Comparisons and contrasts
  • Personal prior history and interests
  • More generally
  • CONTEXT
  • INTERACTIVITY

15
What is lacking?
  • Question Answering
  • Answers, not documents!
  • Active area of research and industry
  • Not always appropriate
  • Should I have chemo or surgery?
  • Who will win the election?

16
(No Transcript)
17
(No Transcript)
18
Question Answering State-of-the-Art
  • Kupiec SIGIR 93, Srihari Li NAACL 00, Cardie et
    al. NAACL 00
  • Goal Find a paragraph, phrase or sentence that
    (hopefully) answers the question.
  • Approach
  • Identify certain types of noun phrases
  • People
  • Dates
  • Hook these up with question types
  • Who
  • When
  • Match keywords in question to keywords in
    candidate answer that contain the right kind of
    NP
  • Use syntactic or simple frame semantics to help
    with matching (optional)

19
Question Answering Example
  • Q What Pulitzer prize-winning author ran
    unsuccessfully for mayor of New York?
  • Relevant sentences from encyclopedia
  • In 1968 Norman Mailer won the Pulitzer Prize
    for .
  • In 1975, Mailer ran for the office of New York
    City mayor.

20
The future of search toolsA Prediction of a
Dichotomy
  • Information Intensive
  • Business analysis
  • Scientific research
  • Planning design
  • Quick lookup
  • Question answering
  • Context-dependent info (location, time)

21
Human-Computer Interaction
22
What is HCI?
  • HCI Human-Computer Interaction
  • A discipline concerned with
  • design
  • evaluation
  • implementation
  • of interactive computing systems for human use
  • The study of major phenomena surrounding the
    interaction of humans with computers.

23
Shneiderman on HCI
  • Well-designed interactive computer systems
    promote
  • Positive feelings of success, competence, and
    mastery.
  • Allow users to concentrate on their work, rather
    than on the system.

24
What is HCI?
Organizational Social Issues
Task
Humans
Technology
25
User-centered Design
  • Focus first on what people need to do, not what
    the system needs to do.
  • Formulate typical scenarios of use.
  • Take into account
  • Cognitive constraints
  • Organizational/Social constraints
  • Keep users involved throughout the project

26
Waterfall Design Model (from Software
Engineering)
Initiation
Application Description
Analysis
Requirements Specification
Design
System Design
Implementation
Product
?
27
UI Design Iteration
Design
Evaluate
Prototype
28
Comparing Design Processes
  • Waterfall model
  • The customer is not the user
  • User-centered design
  • Assess what the user needs
  • Design for this
  • Redesign if user needs are not met

29
Steps in Standard UI Design
  • Needs Assessment / Task Analysis
  • Low-fidelity Prototype Evaluation
  • Redesign
  • Interactive Prototype
  • Heuristic Evaluation
  • Redesign
  • Revised Interactive Prototype
  • Pilot User Study
  • Redesign
  • Revised Interactive Prototype
  • Larger User Study

30
Task Analysis
  • Observe existing work practices
  • Create examples and scenarios of actual use
  • Try out new ideas before building software

31
Rapid Prototyping
  • Build a mock-up of design
  • Low fidelity techniques
  • paper sketches
  • cut, copy, paste
  • video segments
  • Interactive prototyping tools
  • Visual Basic, HyperCard, Director, etc.
  • UI builders
  • NeXT, etc.

32
Usability EvaluationStandard Techniques
  • User studies
  • Have people use the interface to complete some
    tasks
  • Requires an implemented interface
  • "Discount" vs. Scientific Results
  • Heuristic Evaluation
  • Usability expert assesses guidelines

33
Cognitive ConsiderationsNormans Action Cycle
  • Human action has two aspects
  • execution and evaluation
  • Execution doing something
  • Evaluation comparison of what happened to what
    was desired

34
Action Cycle
Goals
Evaluation
Execution
The World
35
Action Cycle
Goals
Evaluation Evaluation of interpretations Interpre
ting the perception Perceiving the state of
the world
Execution Intention to act Sequence of
actions Execution of seq uence of actions
The World
36
Normans Action Cycle
  • Execution has three stages
  • Start with a goal
  • Translate into an intention
  • Translate into a sequence of actions
  • Now execute the actions
  • Evaluation has three stages
  • Perceive world
  • Interpret what was perceived
  • Compare with respect to original intentions

37
Gulf of Evaluation
  • The amount of effort a person must exert to
    interpret
  • the physical state of the system
  • how well the expectations and intentions have
    been met
  • We want a small gulf!

38
Mental Models
  • People have mental models of how things work
  • how does your car start?
  • how does an ATM machine work?
  • how does your computer boot?
  • Allows people to make predictions about how
    things will work

39
Strategy for Design
  • Provide a good conceptual model
  • allows users to predict consequences of actions
  • communicated through the image of the system
  • relations between users intentions, required
    actions, and results should be
  • sensible
  • consistent
  • meaningful (non-arbitrary)

40
Design Guidelines
Shneiderman (8 design rules)
  • Consistency
  • Shortcuts (for experts)
  • Feedback
  • Closure
  • Error prevention
  • Easy reversal of actions
  • User control
  • Low memory burden

There are hundreds of design guidelines listings!
41
Design Guidelines for Search UIs
  • I think the most important are
  • Reduce memory burden / Provide feedback
  • Previews
  • History
  • Context
  • User control
  • Query modification
  • Flexible manipulation of results
  • Easy reversal of actions

42
Designing for Error
  • Norman on designing for error
  • Understand the causes of error and design to
    minimize these causes
  • Make it possible to reverse actions
  • Make it hard to do non-reversible actions
  • Make it easy to discover the errors that do occur
  • Change attitude towards errors
  • A user is attempting to do a task, getting
    there by imperfect approximations actions are
    approximations to what is actually desired.

43
HCI Intro Summary
  • UI design involves users
  • UI design is iterative
  • An art, not a science
  • Evaluation is key
  • Design guidelines
  • are useful
  • but application to information-centric systems
    can be difficult

44
Recommended HCI Books
  • Alan Dix et al., Human-Computer Interaction, 2nd
    edition (Feb 1998) Prentice Hall
  • Ben Shneiderman, Designing the user interface
    strategies for effective human--computer
    interaction, 3rd ed. Addison-Wesley, 1998.
  • Jakob Nielsen, Usability Engineering, Morgan
    Kaufmann, 1994
  • Holtzblatt and Beyer, Making Customer-Centered
    Design Work for Teams, CACM, 36 (10), October
    1993.
  • www.useit.com
  • world.std.com/uieweb
  • usableweb.com

45
Supporting the Information Seeking Process
  • Two parts to the process
  • search and retrieval
  • analysis and synthesis of search results

46
Standard IR Model
  • Assumptions
  • Maximizing precision and recall simultaneously
  • The information need remains static
  • The value is in the resulting document set

47
Problem with Standard Model
  • Users learn during the search process
  • Scanning titles of retrieved documents
  • Reading retrieved documents
  • Viewing lists of related topics/thesaurus terms
  • Navigating hyperlinks
  • Some users dont like long disorganized lists of
    documents

48
A sketch of a searcher moving through many
actions towards a general goal of satisfactory
completion of research related to an information
need. (after Bates 89)
Q2
Q4
Q3
Q1
Q5
Q0
49
Berry-picking model (Bates 90)
  • The query is continually shifting
  • Users may move through a variety of sources
  • New information may yield new ideas and new
    directions
  • The query is not satisfied by a single, final
    retrieved set, but rather by a series of
    selections and bits of information found along
    the way.

50
Implications
  • Interfaces should make it easy to store
    intermediate results
  • Interfaces should make it easy to follow trails
    with unanticipated results
  • Makes evaluation more difficult.

51
Orienteering (ODay Jeffries 93)
  • Interconnected but diverse searches on a single,
    problem-based theme
  • Focus on information delivery rather than search
    performance
  • Classifications resulting from an extended
    observational study
  • 15 clients of professional intermediaries
  • financial analyst, venture capitalist, product
    marketing engineer, statistician, etc.

52
Orienteering (ODay Jeffries 93)
  • Identified three main search types
  • Monitoring
  • Following a plan
  • Exploratory
  • A series of interconnected but diverse searches
    on one problem-based theme
  • Changes in direction caused by triggers
  • Each stage followed by reading, assimilation, and
    analysis of resulting material.

53
Orienteering (ODay Jeffries 93)
  • Defined three main search types
  • monitoring
  • a well-known topic over time
  • e.g., research four competitors every quarter
  • following a plan
  • a typical approach to the task at hand
  • e.g., improve business process X
  • exploratory
  • explore topic in an undirected fashion
  • get to know an unfamiliar industry

54
Orienteering (ODay Jeffries 93)
  • Trends
  • A series of interconnected but diverse searches
    on one problem-based theme
  • This happened in all three search modes
  • Each analyst did at least two search types
  • Each stage followed by reading, assimilation, and
    analysis of resulting material

55
Orienteering (ODay Jeffries 93)
  • Searches tended to trigger new directions
  • Overview, then detail, repeat
  • Information need shifted between search requests
  • Context of problem and previous searches were
    carried to next stage of search
  • The value was contained in the accumulation of
    search results, not the final result set
  • These observations verified Bates predictions.

56
Orienteering (ODay Jeffries 93)
  • Triggers motivation to switch from one strategy
    to another
  • next logical step in a plan
  • encountering something interesting
  • explaining change
  • finding missing pieces

57
Stop Conditions (ODay Jeffries 93)
  • Stopping conditions not as clear as for triggers
  • People stopped searching when
  • no more compelling triggers
  • finished an appropriate amount of searching for
    the task
  • specific inhibiting factor
  • e.g., learning market was too small
  • lack of increasing returns
  • 80/20 rule
  • Missing information/inferences ok
  • business world different than scholarship

58
After the Search Analyzing and Synthesizing
Search Results
  • Orienteering Post-Search Behaviors
  • Read and Annotate
  • Analyze 80 fell into six main types

59
Post-Search Analysis Types (ODay Jeffries 93)
  • Trends
  • Comparisons
  • Aggregation and Scaling
  • Identifying a Critical Subset
  • Assessing
  • Interpreting
  • The rest
  • cross-reference
  • summarize
  • find evocative visualizations
  • miscellaneous

60
SenseMaking (Russell et al. 93)
  • The process of encoding retrieved information to
    answer task-specific questions
  • Combine
  • internal cognitive resources
  • external retrieved resources
  • Create a good representation
  • an iterative process
  • contend with a cost/benefit tradoff

61
UIs for Supporting the Search Process
62
Infogrid (design mockup) (Rao et al. 92)
63
InfoGrid/Protofoil (Rao et al. 92)
  • A general search interface architecture
  • Itemstash store retrieved docs
  • Search Event -- current query
  • History -- history of queries
  • Result Item -- view selected docs metadata

64
Infogrid Design Mockups(Rao et al. 92)
65
DLITE (Cousins 97)
  • Drag and Drop interface
  • Reify queries, sources, retrieval results
  • Animation to keep track of activity

66
DLITE
  • UI to a digital library
  • Direct manipulation interface
  • Workcenter approach
  • experts create workcenters
  • tools specialized for the workcenters task
  • contents persistent
  • Web browser used to display document or
    collection metadata

67
(No Transcript)
68
Interaction
  • Pointing at object brings up tooltip -- metadata
  • Activating object -- component specific action
  • 5 types for result set component
  • Drag-and-drop data onto program
  • Animation used to show what happens with
    drag-and-drop (e.g. waggling)

69
User Reaction to DLITE
  • Two participant pools
  • 7 Stanford CS
  • 11 NASA researchers librarians
  • Requires learning, initially unfamiliar
  • Many requested help pages
  • After the model was understood, few errors
  • Overall positive attitude, even stronger after a
    two week delay
  • Successfully remembered most features after 2
    week lag

70
Keeping Track of History
  • Techniques
  • List of prior queries and results (standard)
  • Slide sorter view, snapshots of earlier
    interactions
  • Graphical hierarchy for web browsing

71
Keeping Track of History
  • PadPrints (Hightower et al. 98)
  • Tree-based history of recently visited web-pages
    history map placed to left of browser window
  • Zoomable, can shrink sub-hierarchies
  • Node title thumbnail

72
PadPrints (Hightower et al. 98)
73
PadPrints (Hightower et al. 98)
74

Initial User Study of Browser History Mechanism
  • 13.4 unable to find recently visited pages
  • only 0.1 use History button, 42 use Back
  • problems with history list (according to authors)
  • incomplete, lose out on every branch
  • textual (not necessarily a problem! )
  • pull down menu cumbersome -- cannot see history
    along with current document

75
User Study of Padprints
  • Changed the task to involve revisiting web pages
  • CHI database, National Park Service website
  • Only correctly answered questions considered
  • 20-30 fewer pages accessed
  • faster response time for tasks that involve
    revisiting pages
  • slightly better user satisfaction ratings

76
Info Seeking Summary
  • The standard model (issue query, get results,
    repeat) is not fully adequate
  • Berry picking view offers an alternative to the
    standard IR model
  • Interfaces can be devised to support the
    interactive process over time
  • More work needs to be done

77
Interactive Query Modification
78
Query Modification
  • Problem how to reformulate the query?
  • Thesaurus expansion
  • Suggest terms similar to query terms
  • Relevance feedback
  • Suggest terms (and documents) similar to
    retrieved documents that have been judged to be
    relevant

79
Relevance Feedback
  • Usually do both
  • expand query with new terms
  • re-weight terms in query
  • There are many variations
  • usually positive weights for terms from relevant
    docs
  • sometimes negative weights for terms from
    non-relevant docs

80
Using Relevance Feedback
  • Known to improve results
  • in TREC-like conditions (no user involved)
  • What about with a user in the loop?
  • How might you measure this?
  • Lets examine a user study of relevance feedback
    by Koenneman Belkin 1996.

81
Questions being InvestigatedKoenemann Belkin 96
  • How well do users work with statistical ranking
    on full text?
  • Does relevance feedback improve results?
  • Is user control over operation of relevance
    feedback helpful?
  • How do different levels of user control effect
    results?

82
How much of the guts should the user see?
  • Opaque (black box)
  • (like web search engines)
  • Transparent
  • (see available terms after the r.f. )
  • Penetrable
  • (see suggested terms before the r.f.)
  • Which do you think worked best?

83
(No Transcript)
84
Terms available for relevance feedback made
visible(from Koenemann Belkin)
85
Details on User StudyKoenemann Belkin 96
  • Subjects have a tutorial session to learn the
    system
  • Their goal is to keep modifying the query until
    theyve developed one that gets high precision
  • This is an example of a routing query (as opposed
    to ad hoc)
  • Reweighting
  • They did not reweight query terms
  • Instead, only term expansion
  • pool all terms in rel docs
  • take top N terms, where
  • n 3 (number-marked-relevant-docs2)
  • (the more marked docs, the more terms added to
    the query)

86
Details on User StudyKoenemann Belkin 96
  • 64 novice searchers
  • 43 female, 21 male, native English
  • TREC test bed
  • Wall Street Journal subset
  • Two search topics
  • Automobile Recalls
  • Tobacco Advertising and the Young
  • Relevance judgements from TREC and experimenter
  • System was INQUERY (vector space with some bells
    and whistles)

87
Sample TREC query
88
Evaluation
  • Precision at 30 documents
  • Baseline (Trial 1)
  • How well does initial search go?
  • One topic has more relevant docs than the other
  • Experimental condition (Trial 2)
  • Subjects get tutorial on relevance feedback
  • Modify query in one of four modes
  • no r.f., opaque, transparent, penetration

89
Precision vs. RF condition (from Koenemann
Belkin 96)
90
Effectiveness Results
  • Subjects with R.F. did 17-34 better performance
    than no R.F.
  • Subjects with penetration case did 15 better as
    a group than those in opaque and transparent
    cases.

91
Number of iterations in formulating queries (from
Koenemann Belkin 96)
92
Behavior Results
  • Search times approximately equal
  • Precision increased in first few iterations
  • Penetration case required fewer iterations to
    make a good query than transparent and opaque
  • R.F. queries much longer
  • but fewer terms in penetrable case -- users were
    more selective about which terms were added in.

93
Relevance Feedback Summary
  • Iterative query modification can improve
    precision and recall for a standing query
  • In at least one study, users were able to make
    good choices by seeing which terms were suggested
    for R.F. and selecting among them
  • So more like this can be useful!
  • But usually requires more than one document,
    unlike how web versions work.

94
(No Transcript)
95
Alternative Notions of Relevance Feedback
96
Social and Implicit Relevance Feedback
  • Find people whose taste is similar to yours.
    Will you like what they like?
  • Follow a users actions in the background. Can
    this be used to predict what the user will want
    to see next?
  • Track what lots of people are doing. Does this
    implicitly indicate what they think is good and
    not good?

97
Collaborative Filtering (social filtering)
  • If Pam liked the paper, Ill like the paper
  • If you liked Star Wars, youll like Independence
    Day
  • Rating based on ratings of similar people
  • Ignores the text, so works on text, sound,
    pictures etc.
  • But Initial users can bias ratings of future
    users

98
Social Filtering
  • Ignores the content, only looks at who judges
    things similarly
  • Works well on data relating to taste
  • something that people are good at predicting
    about each other too
  • Does it work for topic?
  • GroupLens results suggest otherwise (preliminary)
  • Perhaps for quality assessments
  • What about for assessing if a document is about a
    topic?

99
Learning interface agents
  • Use machine learning to improve performance
  • learn user behavior, preferences
  • Useful when
  • 1) past behavior is a useful predictor of the
    future
  • 2) wide variety of behaviors amongst users
  • Examples
  • mail clerk sort incoming messages in right
    mailboxes
  • calendar manager automatically schedule meeting
    times?

100
Example Systems
  • Example Systems
  • WebWatcher
  • Letizia
  • Vary according to
  • User states topic or not
  • User rates pages or not

101
WebWatcher (Freitag et al.)
  • A "tour guide" agent for the WWW.
  • User tells it what kind of information is wanted
  • System tracks web actions
  • Highlights hyperlinks that it computes will be of
    interest.
  • Strategy for giving advice is learned from
    feedback from earlier tours.
  • Uses WINNOW as a learning algorithm

102
(No Transcript)
103
Letizia (Lieberman 95)
user
letizia
heuristics
recommendations
user profile
  • Recommends web pages during browsing based on
    user profile
  • Learns user profile using simple heuristics
  • Passive observation, recommend on request
  • Provides relative ordering of link
    interestingness
  • Assumes recommendations near current page are
    more valuable than others

104
Letizia (Lieberman 95)
  • Infers user preferences from behavior
  • Interesting pages
  • record in hot list
  • save as a file
  • follow several links from pages
  • returning several times to a document
  • Not Interesting
  • spend a short time on document
  • return to previous document without following
    links
  • passing over a link to document (selecting links
    above and below document)

105
Consequences of passive observation
  • No ability to fine-tune profile or express
    interest without visiting appropriate pages
  • Weak heuristics
  • Must click through multiple uninteresting pages
    en route to interestingness
  • Hierarchies tend to get more hits near root
  • But page read time does seem to robustly
    indicate interest (across many pages and many
    users)

106
MARS (Riu et al. 97)
Relevance feedback based on image similarity
107
Time Series R.F. (Keogh Pazzani 98)
108
Social and Implicit Relevance Feedback
  • Several different criteria to consider
  • Implicit vs. Explicit judgements
  • Individual vs. Group judgements
  • Standing vs. Dynamic topics
  • Similarity of the items being judged vs.
    similarity of the judges themselves

109
  • Classifying R. F. Systems Amazon.com
  • Books on related topics
  • Books bought by others who bought this
  • Community, implicit, standing, judges items,
    similar items

110
Classifying R.F. Systems
  • Standard Relevance Feedback
  • Individual, explicit, dynamic, item comparison
  • Standard Filtering (NewsWeeder)
  • Individual, explicit, standing profile, item
    comparison
  • Standard Routing
  • Community (gold standard), explicit, standing
    profile, item comparison

111
Classifying R.F. Systems
  • Letizia and WebWatcher
  • Individual, implicit, dynamic, item comparison
  • Ringo and GroupLens
  • Group, explicit, standing query, judge-based
    comparison

112
Query Modification Summary
  • Relevance feedback is an effective means for
    user-directed query modification.
  • Modification can be done with either direct or
    indirect user input
  • Modification can be done based on an individuals
    or a groups past input.

113
Information Visualization
114
Visualization Success Stories
115
Visualization Success Stories
Illustration of John Snows deduction that a
cholera epidemic was caused by a bad water pump,
circa 1854. Horizontal lines indicate location
of deaths.
From Visual Explanations by Edward Tufte,
Graphics Press, 1997
116
Visualizing Text Collections
  • Some Visualization Principles
  • Why Text is Tough
  • Visualizing Collection Overviews
  • Evaluations involving Users

117
Preattentive Processing
  • A limited set of visual properties are processed
    preattentively
  • (without need for focusing attention).
  • This is important for design of visualizations
  • what can be perceived immediately
  • what properties are good discriminators
  • what can mislead viewers

All Preattentive Processing figures from Healey
97 (on the web)
118
Example Color Selection
Viewer can rapidly and accurately
determine whether the target (red circle) is
present or absent. Difference detected in color.
119
Example Shape Selection
Viewer can rapidly and accurately
determine whether the target (red circle) is
present or absent. Difference detected in form
(curvature)
120
Pre-attentive Processing
  • lt 200 - 250ms qualifies as pre-attentive
  • eye movements take at least 200ms
  • yet certain processing can be done very quickly,
    implying low-level processing in parallel

121
Example Conjunction of Features
Viewer cannot rapidly and accurately
determine whether the target (red circle) is
present or absent when target has two or more
features, each of which are present in the
distractors. Viewer must search sequentially.
122
SUBJECT PUNCHED QUICKLY OXIDIZED TCEJBUS DEHCNUP
YLKCIUQ DEZIDIXO CERTAIN QUICKLY PUNCHED METHODS
NIATREC YLKCIUQ DEHCNUP SDOHTEM SCIENCE ENGLISH
RECORDS COLUMNS ECNEICS HSILGNE SDROCER
SNMULOC GOVERNS PRECISE EXAMPLE MERCURY SNREVOG
ESICERP ELPMAXE YRUCREM CERTAIN QUICKLY PUNCHED
METHODS NIATREC YLKCIUQ DEHCNUP SDOHTEM GOVERNS
PRECISE EXAMPLE MERCURY SNREVOG ESICERP ELPMAXE
YRUCREM SCIENCE ENGLISH RECORDS COLUMNS ECNEICS
HSILGNE SDROCER SNMULOC SUBJECT PUNCHED QUICKLY
OXIDIZED TCEJBUS DEHCNUP YLKCIUQ
DEZIDIXO CERTAIN QUICKLY PUNCHED METHODS NIATREC
YLKCIUQ DEHCNUP SDOHTEM SCIENCE ENGLISH RECORDS
COLUMNS ECNEICS HSILGNE SDROCER SNMULOC
123
Accuracy Ranking of Quantitative Perceptual
Tasks(Mackinlay 88 from Cleveland McGill)
Position
More Accurate
Length
Angle
Slope
Area
Volume
Less Accurate
Color
Density
124
Why Text is Tough to Visualize
  • Text is not pre-attentive
  • Text consists of abstract concepts
  • Text represents similar concepts in many
    different ways
  • space ship, flying saucer, UFO, figment of
    imagination
  • Text has very high dimensionality
  • Tens or hundreds of thousands of features
  • Many subsets can be combined together

125
Why Text is Tough
The Dog.
126
Why Text is Tough
The Dog.
The dog cavorts.
The dog cavorted.
127
Why Text is Tough
The man.
The man walks.
128
Why Text is Tough
The man walks the cavorting dog.
So far, we can sort of show this in pictures.
129
Why Text is Tough
As the man walks the cavorting dog,
thoughts arrive unbidden of the previous spring,
so unlike this one, in which walking was marching
and dogs were baleful sentinals outside unjust
halls.
How do we visualize this?
130
Why Text is Tough
  • Abstract concepts are difficult to visualize
  • Combinations of abstract concepts are even more
    difficult to visualize
  • time
  • shades of meaning
  • social and psychological concepts
  • causal relationships

131
Why Text is Tough
  • Language only hints at meaning
  • Most meaning of text lies within our minds and
    common understanding
  • How much is that doggy in the window?
  • how much social system of barter and trade (not
    the size of the dog)
  • doggy implies childlike, plaintive, probably
    cannot do the purchasing on their own
  • in the window implies behind a store window,
    not really inside a window, requires notion of
    window shopping

132
Why Text is Tough
  • General categories have no standard ordering
    (nominal data)
  • Categorization of documents by single topics
    misses important distinctions
  • Consider an article about
  • NAFTA
  • The effects of NAFTA on truck manufacture
  • The effects of NAFTA on productivity of truck
    manufacture in the neighboring cities of El Paso
    and Juarez

133
Why Text is Tough
  • I saw Pathfinder on Mars with a telescope.
  • Pathfinder photographed Mars.
  • The Pathfinder photograph mars our perception of
    a lifeless planet.
  • The Pathfinder photograph from Ford has arrived.
  • The Pathfinder forded the river without marring
    its paint job.

134
Why Text is Easy
  • Text is highly redundant
  • When you have lots of it
  • Pretty much any simple technique can pull out
    phrases that seem to characterize a document
  • Instant summary
  • Extract the most frequent words from a text
  • Remove the most common English words

135
Guess the Texts
  • 64 president
  • 38 jones
  • 38 information
  • 32 evidence
  • 31 lewinsky
  • 28 oic
  • 28 investigation
  • 26 court
  • 26 clinton
  • 22 office
  • 21 discovery
  • 20 sexual
  • 20 case
  • 17 testimony
  • 16 judge
  • 478 said
  • 233 god
  • 201 father
  • 187 land
  • 181 jacob
  • 160 son
  • 157 joseph
  • 134 abraham
  • 121 earth
  • 119 man
  • 118 behold
  • 113 years
  • 104 wife
  • 101 name
  • 94 pharaoh

136
Text Collection Overviews
  • How can we show an overview of the contents of a
    text collection?
  • Show info external to the docs
  • e.g., date, author, source, number of inlinks
  • does not show what they are about
  • Show the meanings or topics in the docs
  • a list of titles
  • results of clustering words or documents
  • organize according to categories (next time)

137
Visualizing Collection Clusters
  • Scatter/Gather
  • show main themes as groups of text summaries
  • Scatter Plots
  • show docs as points closeness indicates nearness
    in cluster space
  • show main themes of docs as visual clumps or
    mountains
  • Kohonen Feature maps
  • show main themes as adjacent polygons
  • BEAD
  • show main themes as links within a force-directed
    placement network

138
Text Clustering
  • Finds overall similarities among groups of
    documents
  • Finds overall similarities among groups of tokens
  • Picks out some themes, ignores others

139
Clustering for Collection Overviews
  • Two main steps
  • cluster the documents according to the words they
    have in common
  • map the cluster representation onto a
    (interactive) 2D or 3D representation

140
Scatter/GatherCutting, Pedersen, Tukey Karger
92, 93, Hearst Pedersen 95
  • First use of text clustering in the interface
  • Showing clusters to users had not been done
  • Focus on interaction
  • Show topical terms and typical titles
  • Allow users to change the views
  • Did not emphasize visualization

141
Scatter/Gather
142
S/G Example query on star
  • Encyclopedia text
  • 14 sports
  • 8 symbols 47 film, tv
  • 68 film, tv (p) 7 music
  • 97 astrophysics
  • 67 astronomy(p) 12 steller phenomena
  • 10 flora/fauna 49 galaxies, stars
  • 29 constellations
  • 7 miscelleneous
  • Clustering and re-clustering is entirely
    automated

143
Northern Light used to cluster exclusively. Now
combines categorization with clustering
144
Northern Light second level clusters are these
really about NLP?Note that next level
corresponds to URLs
145
Scatter Plot of Clusters(Chen et al. 97)
146
BEAD (Chalmers 97)
147
BEAD (Chalmers 96)
An example layout produced by Bead, seen in
overview, of 831 bibliography entries. The
dimensionality (the number of unique words in
the set) is 6925. A search for cscw or
collaborative shows the pattern of occurrences
coloured dark blue, mostly to the right. The
central rectangle is the visualizers motion
control.
148
Example Themescapes(Wise et al. 95)
Themescapes (Wise et al. 95)
149
Clustering for Collection Overviews
  • Since text has tens of thousands of features
  • the mapping to 2D loses a tremendous amount of
    information
  • only very coarse themes are detected

150
Galaxy of News Rennison 95
151
Galaxy of News Rennison 95
152
Kohonen Feature Maps(Lin 92, Chen et al. 97)
(594 docs)
153
How Useful is Collection Cluster Visualization
for Search?
  • Three studies find negative results

154
Study 1
  • Kleiboemer, Lazear, and Pedersen. Tailoring a
    retrieval system for naive users. In Proc. of
    the 5th Annual Symposium on Document Analysis and
    Information Retrieval, 1996
  • This study compared
  • a system with 2D graphical clusters
  • a system with 3D graphical clusters
  • a system that shows textual clusters
  • Novice users
  • Only textual clusters were helpful (and they were
    difficult to use well)

155
Study 2 Kohonen Feature Maps
  • H. Chen, A. Houston, R. Sewell, and B. Schatz,
    JASIS 49(7)
  • Comparison Kohonen Map and Yahoo
  • Task
  • Window shop for interesting home page
  • Repeat with other interface
  • Results
  • Starting with map could repeat in Yahoo (8/11)
  • Starting with Yahoo unable to repeat in map (2/14)

156
Study 2 (cont.)
  • Participants liked
  • Correspondence of region size to documents
  • Overview (but also wanted zoom)
  • Ease of jumping from one topic to another
  • Multiple routes to topics
  • Use of category and subcategory labels

157
Study 2 (cont.)
  • Participants wanted
  • hierarchical organization
  • other ordering of concepts (alphabetical)
  • integration of browsing and search
  • correspondence of color to meaning
  • more meaningful labels
  • labels at same level of abstraction
  • fit more labels in the given space
  • combined keyword and category search
  • multiple category assignment (sportsentertain)

158
Study 3 NIRVE
  • NIRVE Interface by Cugini et al. 96. Each
    rectangle is a cluster. Larger clusters closer
    to the pole. Similar clusters near one
    another. Opening a cluster causes a projection
    that shows the titles.

159
Study 3
  • Visualization of search results a comparative
    evaluation of text, 2D, and 3D interfaces
    Sebrechts, Cugini, Laskowski, Vasilakis and
    Miller, Proceedings of SIGIR 99, Berkeley, CA,
    1999.
  • This study compared
  • 3D graphical clusters
  • 2D graphical clusters
  • textual clusters
  • 15 participants, between-subject design
  • Tasks
  • Locate a particular document
  • Locate and mark a particular document
  • Locate a previously marked document
  • Locate all clusters that discuss some topic
  • List more frequently represented topics

160
Study 3
  • Results (time to locate targets)
  • Text clusters fastest
  • 2D next
  • 3D last
  • With practice (6 sessions) 2D neared text
    results 3D still slower
  • Computer experts were just as fast with 3D
  • Certain tasks equally fast with 2D text
  • Find particular cluster
  • Find an already-marked document
  • But anything involving text (e.g., find title)
    much faster with text.
  • Spatial location rotated, so users lost context
  • Helpful viz features
  • Color coding (helped text too)
  • Relative vertical locations

161
Visualizing Clusters
  • Huge 2D maps may be inappropriate focus for
    information retrieval
  • cannot see what the documents are about
  • space is difficult to browse for IR purposes
  • (tough to visualize abstract concepts)
  • Perhaps more suited for pattern discovery and
    gist-like overviews

162
Co-Citation Analysis
  • Has been around since the 50s. (Small, Garfield,
    White McCain)
  • Used to identify core sets of
  • authors, journals, articles for particular fields
  • Not for general search
  • Main Idea
  • Find pairs of papers that cite third papers
  • Look for commonalitieis
  • A nice demonstration by Eugene Garfield at
  • http//165.123.33.33/eugene_garfield/papers/mapsci
    world.html

163
Co-citation analysis (From Garfield 98)
164
Co-citation analysis (From Garfield 98)
165
Co-citation analysis (From Garfield 98)
166
Context
167
Types of Context
  • Personal situation
  • Where you are
  • What time it is
  • Your general preferences
  • Context of other documents
  • Context of what you have done so far in the
    search process

168
Putting Results in Context
  • Visualizations of Query Term Distribution
  • KWIC, TileBars, SeeSoft
  • Table of Contents as Context
  • Superbook, Cha-Cha, DynaCat
  • Visualizing Shared Subsets of Query Terms
  • InfoCrystal, VIBE, Lattice Views
  • Dynamic Metadata as Query Previews

169
KWIC (Keyword in Context)
  • An old standard, ignored by internet search
    engines
  • used in some intranet engines, e.g., Cha-Cha

170
Table-of-Contents Views
  • Superbook (Remde et al., 87)
  • Functions
  • Word Lookup
  • Show a list query words, stems, and word
    combinations
  • Table of Contents Dynamic fisheye view of the
    hierarchical topics list
  • Search words can be highlighted here too
  • Page of Text show selected page and highlighted
    search terms
  • See UI/IR textbook chapter for information on
    interesting user study

171
Superbook (http//superbook.bellcore.com/SB)
172
Egan et al. Study
  • Goal compare Superbook with paper book
  • Tasks
  • structured search find answer to a specific
    question using an unfamiliar reference text
  • open-book essay synthesize material from
    different places in the document
  • incidental learning how much useful information
    about the document is acquired while doing other
    tasks
  • subjective ratings user reactions to the form
    and content

173
Egan et al. Study
  • Factors for structured search
  • Does the users question correspond to the
    authors organization of the material?
  • Half the study search questions contained cues as
    to which topic heading to use, half did not
  • Does the users query as stated contain some of
    the same words as those used by the author?
  • Half the questions contained words taken from the
    text surrounding the target text, half did not

174
Egan et al. Study
  • Example search questions
  • Find the section discussing the basic concept
    that the value of any expression, however
    complicated, is a data structure.
  • The dataset murder contains murder rates per
    100,000 population. Find the section that says
    which staes are included in this dataset.
  • Find the section that describes pie charts and
    states whether or not they are a good means for
    analyzing data.
  • Find the section that describes the first thing
    you have to do to get S to print pictoral output.
  • blue boldface terms taken from text
  • pink italics terms taken from topic
    heading

175
Egan et al. Study
  • Hypotheses
  • Conventional document would require good cues
    from the topic headings, but Superbook would not.
  • Word lookup function hypothesized to allow
    circumvention of authors organization scheme.
  • Superbooks search facility would result in
    open-book essays that include more information.

176
Egan et al. Study
  • Source text statistics package manual (562 pp.)
  • Compare
  • superbook vs. paper versions
  • Four sets of search questions of mixed type
  • 20 university students with stats background
  • Superbook training tutorial
  • 15 minutes per structured query
  • One open-book essay retained

177
Egan et al. Study
  • Results Superbook had an advantage in
  • overall average accuracy (75 vs. 62)
  • Superbook did better on questions with words from
    text but not in topic headings
  • Print version did better on questions with no
    search hits
  • speed (5.4 vs. 5.6 min/query on average)
  • Superbook faster for text-only cues
  • Paper faster for no questions with no hits
  • essay creation
  • average score of 5.8 vs. 3.6 points out of 7
  • average 8.8 facts vs. 6.0 out of 15

178
Egan et al. Study
  • Results
  • Subjective ratings
  • Superbook users rated it easier than paper (5.8
    vs. 3.1 out of 7)
  • Superbook users gave higher ratings on the stat
    system
  • Incidental learning
  • Superbook users recalled more chapter headings
  • maybe because these were continually displayed
  • No other differences were significant
  • Problems with study
  • Did not compare against non-hypertext
    computerized version
  • Did not show if/how hyperlinks affected results

179
Cha-Cha (Chen Hearst 98)
  • Shows table-of-contents-like view, like
    Superbook
  • Takes advantage of human-created structure within
    hyperlinks to create the TOC

180
(No Transcript)
181
DynaCat (Pratt, Hearst, Fagan 99)
  • Decide on important question types in an advance
  • What are the adverse effects of drug D?
  • What is the prognosis for treatment T?
  • Make use of MeSH categories
  • Retain only those types of categories known to be
    useful for this type of query.

182
DynaCat (Pratt, Hearst, Fagan 99)
183
DynaCat Study
  • Design
  • Three queries
  • 24 cancer patients
  • Compared three interfaces
  • ranked list, clusters, categories
  • Results
  • Participants strongly preferred categories
  • Participants found more answers using categories
  • Participants took same amount of time with all
    three interfaces
  • Similar results have been verified by another
    study by Chen and Dumais (CHI 2000)

184
Cat-a-ConeMultiple Simultaneous Categories
  • Key Ideas
  • Separate documents from category labels
  • Show both simultaneously
  • Link the two for iterative feedback
  • Distinguish between
  • Searching for Documents vs.
  • Searching for Categories

185
Cat-a-Cone Interface
186
search
browse
query terms
Category Hierarchy
Collection
Retrieved Documents
187
Proposed Advantages
  • Integrate category selection with viewing of
    categories
  • Show all categories context
  • Show relationship of retrieved documents to the
    category structure
  • But was not evaluated with user study

188
Our new project FLAMENCO
  • FLexible Access using MEtadata in Novel
    COmbinations
  • Main idea
  • Preview and postview information
  • Determined dynamically and (semi) automatically,
    based on current task

189
Recap
  • Search Interfaces Today
  • HCI Foundations
  • The Information Seeking Process
  • Visualizing Text Collections
  • Incorporating Context and Tasks
  • Promising Future Directions

190
The future of search toolsA Prediction of a
Dichotomy
  • Information Intensive
  • Business analysis
  • Scientific research
  • Planning design
  • Quick lookup
  • Question answering
  • Context-dependent info (location, time)

191
My Predictions of Future Trends in Search
Interfaces
  • Specialization
  • Single topic search (vortals)
  • Task-oriented search
  • Personalization
  • Question-Answering
  • Visualization???

192
References
  • See the bibliography of Chapter 10 of Modern
    Information Retrieval, Ricardo Baeza-Yates
    Berthier Ribeiro-Neto (Eds.) This chapter is
    called User Interfaces and Visualization, by
    Marti Hearst. Available at www.sims.berkeley.edu/
    hearst/irbook/chapters/chap10.html
Write a Comment
User Comments (0)