Title: Marti Hearst
1Visualization in Text Analysis Problems
VAC Consortium MeetingStanford, May 24, 2006
- Marti Hearst
- School of Information, UC Berkeley
2Outline
- Some Visualization Design Principles
- Illustrated with a new example
- Why Text is Tricky to Visualize
- How to do good visualization design with text
while meeting analysts needs? - Focus on Flexibility with Reproducibility
- Examples from 4 different domains
3What Makes for a Good Visualization?
- Visually illuminates important aspects of the
underlying data and domain. - Supports the users tasks (better than without
the visualization). - Adheres to good design principles.
4Example from Software EngineeringMarat
Boshernitsan, UC Berkeley PhD Dissertation 2006
- Problem need to make complex changes throughout
code. - Example convert from one API to another.
5A Typical Solution
- Either requires programmers to understand and
manipulate abstract syntax trees - Or requires learning another programming language
(or both)!
6First Attempt
7Second Attempt
8A Better Solution
- Build on how programmers think about programming.
- Operate on the textual representation of code.
9Users Operate on Familiar Visual Representation
of Code
10Context-and-Domain Sensitive Visual Cues
11Lessons from this Example
- User-centered Design
- This was the third attempt.
- First 2 attempts did not accurately reflect how
users think about the problem. - Careful design of labels and interaction cues
- Very intelligent backend, but user-activated.
- Visually and interactively reflects how
programmers think about programming.
12What Makes for a Good Visualization for
Analysts?
- Visually illuminates important aspects of the
underlying data and domain. - Supports the users tasks (better than without
the visualization). - Adheres to good design principles.
13Goals vs. Tasks
- Analysts Goals
- Understand current and past situations
- Predict and anticipate future situations
- Observations by Pirolli Card 05
- Different analysts starting with people,
organizations, tasks, and time - predict coup likelihood
- understand bio-warfare threats
- understand relations within cartel
14Goals vs. Tasks
- Analysts tasks
- Explore
- Extract
- Filter
- Link
- Arrange
- Compare
- Hypothesize
- (A combination of Foraging and Sensemaking)
- Should do the tasks only to support the goals.
15Design Principles for Analysts
- Experienced analysts notice what is missing or
unexpected (Wright et al. 06) - Thus consistency and reproducibility are
important.
16Design Principles for Analysts
- Analysts must guard against confirmation bias.
(Pirolli Card 05) - Thus it is important for analysts to
- Be able to easily arrange and re-arrange,
- View information flexibly from many angles,
- While at the same time retaining consistency and
reproducibility. - However its hard to do this with text.
17 Working with Text Text is especially difficult
to visualize
- Very high dimensionality
- Tens to hundreds of thousands of features
- Compositional
- Can be combined together in innumerable ways
- Abstract
- And so difficult to visualize
- Not pre-attentive
- Must foveate to read
- Subtle
- Small differences matter
- Unordered
18Text Meaning is NOT pre-attentive
SUBJECT PUNCHED QUICKLY OXIDIZED TCEJBUS DEHCNUP
YLKCIUQ DEZIDIXO CERTAIN QUICKLY PUNCHED METHODS
NIATREC YLKCIUQ DEHCNUP SDOHTEM SCIENCE ENGLISH
RECORDS COLUMNS ECNEICS HSILGNE SDROCER
SNMULOC GOVERNS PRECISE EXAMPLE MERCURY SNREVOG
ESICERP ELPMAXE YRUCREM CERTAIN QUICKLY PUNCHED
METHODS NIATREC YLKCIUQ DEHCNUP SDOHTEM GOVERNS
PRECISE EXAMPLE MERCURY SNREVOG ESICERP ELPMAXE
YRUCREM SCIENCE ENGLISH RECORDS COLUMNS ECNEICS
HSILGNE SDROCER SNMULOC SUBJECT PUNCHED QUICKLY
OXIDIZED TCEJBUS DEHCNUP YLKCIUQ
DEZIDIXO CERTAIN QUICKLY PUNCHED METHODS NIATREC
YLKCIUQ DEHCNUP SDOHTEM SCIENCE ENGLISH RECORDS
COLUMNS ECNEICS HSILGNE SDROCER SNMULOC
19Why Text is Tough
- Abstract concepts are difficult to visualize
- Combinations of abstract concepts are even more
difficult to visualize - time
- shades of meaning
- social and psychological concepts
- causal relationships
20Why Text is Tough
Why Text is Tough
The dog..
21Why Text is Tough
Why Text is Tough
The dog.
The dog cavorts.
The dog cavorted.
22Why Text is Tough
Why Text is Tough
The man.
The man walks.
23Why Text is Tough
Why Text is Tough
The man walks the cavorting dog.
So far, we can sort of show this in pictures.
24Why Text is Tough
Why Text is Tough
As the man walks the cavorting dog,
thoughts arrive unbidden of the previous spring,
so unlike this one, in which walking was marching
and dogs were baleful sentinels outside unjust
halls.
How do we visualize this?
25Why Text is Tough
Why Text is Tough
- Language only hints at meaning
- Most meaning of text lies within our minds and
common understanding - How much is that doggy in the window?
- how much social system of barter and trade (not
the size of the dog) - doggy implies childlike, plaintive, probably
cannot do the purchasing on their own - in the window implies behind a store window,
not really inside a window, requires notion of
window shopping
26Why Text is Tough
Why Text is Tough
- General categories have no standard ordering
(nominal data) - Categorization of documents by single topics
misses important distinctions - Consider an article about
- NAFTA
- The effects of NAFTA on truck manufacture
- The effects of NAFTA on productivity of truck
manufacture in the neighboring cities of El Paso
and Juarez
27Why Text is Tough
- Other issues about language
- Ambiguous (many different meanings for the same
words and phrases) - Same meaning implied by different combinations
- Different combinations imply different meanings
28Why Text is (Deceptively) Easy
- Text is easier when you have a lot of it
- Web search is now usually conjunction
- Text has a lot of redundancy
- A very simple algorithm can
- Pull out important phrases
- Find meaningfully related words
- Create a summary from document
- Group related documents
29Why Text is Easy
- Pretty much any simple technique can pull out
phrases that seem to characterize a document - Most frequent words from an IR lecture
- 109 slide 69 to 37 view
37 version 37 graphic 37
first - 37 back 36 previous 36 next
32 of 31 the - 30 recall 28 relevant 27
precision 25 retrieved 25 documents - 21 and 18 evaluate 15 a
13 what 13 vs 13
how - 12 trec 12 is 12
high 12 for 10 relevance
- 10 queries 10 on 9
information 8 x 8 why
- 8 as 8 answer 7
search 7 maron 7 document - 7 blair 6 top 6
results 6 measure - 6 length 6 in 6
evaluation 6 curves
30Why Text is Easy
- Same text, removing most frequent words in
language and most frequent in this text - 30 recall 28 relevant 27
precision 25 retrieved 25 documents - 18 evaluate 13 vs 12
trec 12 high 10 relevance - 10 queries 9 information 8 x
8 answer 7 search - 7 maron 7 document 7 blair
6 top 6 results - 6 measure 6 length 6
evaluation 6 curves - These words can act as a simple summary of the
document - People are good at inferring (sometimes
inventing) the commonalities - People are bad at realizing what they are not
seeing
31Simple Text Analysis can Mislead
- Most frequent words
- Biases towards concepts with unique identifiers.
From Spink, Wolfram, Jansen, Saracevic, JASIS 01
32Major Trends vs. Minor Discoveries
- With text, its easy to extract and show the
largest, main trends - But often we want the rare but unexpected and
important event - Russian oil company example
- Schwarzenegger and Enron
- Cigarettes and kids
- Person on the periphery who is working stealthily
to influence things - This is really difficult to solve!
33Design Principles for Analysts
- Experienced analysts notice what is missing or
unexpected. - Analysts must guard against confirmation bias.
- Need to be able to easily arrange and re-arrange,
- View information flexibly from many angles,
- While at the same time retaining consistency and
reproducibility. - Interfaces should reflect the domain and data.
- How to achieve this with text collections?
- Must transform text in understandable ways
- Must provide multiple, consistent views that
nevertheless allow for new discovery and insight
34Why Emphasize Flexibility?
- Cant view representations of all the text
content at once. - Instead, needs ways to flexibly navigate, group,
organize, explore - See important pieces over time.
35The Importance of Flexibility
- Russell, Slaney, Qu, Houston 05
- The ease of viewing and manipulation in the
system strongly influenced the kind of analysis
operations done.
36Examples of Flexibility on Text Data
- PaperLens (Conference proceedings)
- TAMKI (Customer service requests)
- Faceted Browsing (e-commerce)
- Flamenco
- Ebay Express
- FaThumb
- TRIST and Sandbox (Analysts)
37Flexible views
- Infoviz 2004 contest
- Visualize 8 years of conference proceedings
- Tasks
- Static Overview of 10 years of Infovis
- Characterize the research areas and their
evolution - The people in InfoVis
- Which papers/authors are most often referenced?
- How many papers conducted a user study?
- PaperLens integrated solution by Lee, Czerwinski,
Robertson, Bederson - Uses graphical elements and brushing and linking
to flexibly elicudate a collections contents. - http//www.cs.umd.edu/hcil/InfovisRepository/conte
st-2004/index.shtml
38(No Transcript)
39(No Transcript)
40Flexibility in Foraging and Analysis
- TAKMI, by Nasukawa and Nagano, 01
- The system integrates
- Analysis tasks (customer service help)
- Content analysis
- Information Visualization
41Flexibility in AnalysisTAKMI, by Nasukawa and
Nagano, 2001
- Documents containing windows 98
42Flexibility in AnalysisTAKMI, by Nasukawa and
Nagano, 2001
- TAKMI, by Nasukawa and Nagano, 2001
- Patent documents containing inkjet, organized
by entity and year
43Flexibility in Category Navigation
- Browsing Information Collections using
(Hierarchical) Faceted Metadata
44What are facets?
- Sets of categories, each of which describe a
different aspect of the objects in the
collection. - Each of these can be hierarchical.
- (Not necessarily mutually exclusive nor
exhaustive, but often that is a goal.)
45Facet example Recipes
46Nobel Prize Winners Collection
47(No Transcript)
48(No Transcript)
49(No Transcript)
50(No Transcript)
51(No Transcript)
52(No Transcript)
53(No Transcript)
54New Site eBay Express
55(No Transcript)
56(No Transcript)
57(No Transcript)
58(No Transcript)
59(No Transcript)
60(No Transcript)
61(No Transcript)
62Is This Visualization?
- Prior experience and other peoples attempts seem
to suggest that fewer graphics and more text is
better. - Details of layout, font and color contrast, label
selection, and interaction make all the
difference.
63Earlier Variation on the Idea
64Mobile Variation
- FaThumb Karlson, Robertson, Robbins, Czerwinski,
Smith 06 - Well-received, but visualization part not looked
at.
65Flexibility in SenseMaking
- DLITE by Cousins et al. 97
- Sandbox by Wright et al. 06
66Flexibility in Sensemaking
TRIST (The Rapid Information Scanning Tool) is
the work space for Information Retrieval and
Information Triage.
TRIST, Jonkers et al 05
User Defined andAutomatic Categorization
Comparative Analysisof Answers and Content
Rapid Scanningwith Context
Launch Queries
Entities
Query History
Dimensions
AnnotatedDocument Browser
Linked Multi-Dimensional Views Speed Scanning
67Flexibility for Sensemaking Support
Quick Emphasis of Items of Importance.
DynamicAnalytical Models.
Direct interactionwith Gestures(no dialog, no
controls).
Assertions with Proving/Disproving Gates.
68Communication-Centric Text
- Email, conversations, blogs
- The first thought is usually nodes and links
- Doesnt have the desired flexibility
- Some alternatives
- The Network
- Multivariate Networks
69Re-envisioning Networks
- Viewing peoples shared workplaces, hometowns,
schools over time. - www.theyrule.net
70Re-envisioning Networks
- First cut
- Hastings, Snow, and King 05
71Re-envisioning Networks
- Better version
- Hastings, Snow, and King 05
72Re-envisioning Networks
- Wattenberg 06
- OLAP on directed labeled graphs
73Network Flexibility
74Martin Wattenberg, Visual Exploration of
Multivariate Graphs
M
F
Location A
Location B
Location C
Location D
Location E
75Re-envisioning Networks
- Idea vary these ideas to apply to email and
other communication text.
76SummaryText Viz Design Guidelines
- An emphasis on flexible views on text data
- Emphasize brushing and linking using appropriate
visual cues. - Interaction flow should guide the user but also
be flexible. - Information structure should be consistent and
reproducible. - Other guidelines
- Make text visible.
- Visual components should reflect the data and
tasks.
77Thank you!
- www.sims.berkeley.edu/hearst