Title: Search Text Mining Web Site Usability
1SearchText MiningWeb Site Usability
2BAILANDO Projects
- Better Access to Information
- using Language Analysis and
- Novel Dynamic Organizations
3Current BAILANDO Projects
- CHA-CHA FLAMENCO
- Better Search Interfaces
- LINDI
- UI support for Search
- Text Data Mining
- TANGO
- Automated Web Site Usability
4Search UIs
- Combine Browsing Search
- Place Search Results in Context
-
-
Large Category Hierarchies
5Cha-Cha Students Mike Chen, Jamie Laflen, Jason
Hong, Jimmy Lin, Shiang Chen
6Medical Category Hierarchy
7DynaCat (Pratt, Hearst, Fagan 99)
8DynaCat Study
- Design
- Three queries
- 24 cancer patients
- Compared three interfaces
- ranked list, clusters, categories
- Results
- Participants strongly preferred categories
- Participants found more answers using categories
- Participants took same amount of time with all
three interfaces - Similar results have been verified by another
study by Chen and Dumais (CHI 2000)
9 Cat-a-Cone Interface(Hearst Karadi 97)
10FLAMENCOImproving Search via Large Category
Hierarchies
-
- How to show intersections across category types?
- How to preview related categories in a
user-tailored, dynamic manner?
11Text Data Mining
- Relationships between information in documents
can create new facts, not previously known.
12Imagine
- You are a medical researcher
- Your patient has
- spinal inflammation
- numbness in fingers
- low TC levels
- negative results for all tests
- How can you help her?
13Idea
- A new way of searching text.
- Link pieces of information together
- to formulate hypotheses
-
14LINDILinking Information for New DIscoveries
- Three main parts
- Search UI for building and reusing hypothesis
seeking strategies. - Statistical language analysis techniques for
interpreting the text. - Backend for interfacing with various databases
and translating different formats.
15Gathering Evidence
Spinal Inflammation
Numbness in fingers
Low TC Levels
16Gathering Evidence
Find diseases associated with each
Spinal Inflammation
Numbness in fingers
Low TC Levels
17Gathering Evidence
Find unanticipated commonalities
Spinal Inflammation
Numbness in fingers
Low TC Levels
18Supporting Cascaded Search Operations
19(No Transcript)
20New Language Analysis
- First use category labels to retrieve candidate
documents - Then use language analysis to detect causal
relationships between concepts - Title
- Magnesum deficiency implicated in increased
stress levels. - Interpretation
- ltnutrientgtltreductiongt related-to
ltincreasegtltsymptomgt - Use these to find relationships and formulate
hypotheses
21Statistical Semantic Parsing
- Modern statistical techniques
- Mainly applied to syntactic structure
- Probabilistic knowledge representation
- Represent hypotheses with different degrees of
certainty.
22Automating Assessment of Web Site Usability
23Why Worry?
- Problem IBM's extranet
- Heavy use of help and search
- Unhappy users
- Solution
- Massive web site redesign
- Focus on info-organization, not the purchasing
process. - Cost "in the millions"
- Results
- Not announced or trumped up
- Use of "help" decreased 84
- Sales increased 400
24Web TANGOTool for Assessing NaviGation
Organization
- Goal automated support for comparing design
alternatives - How Assess usability of the information
architecture - Approximate peoples information-seeking behavior
(Monte Carlo simulation) - Output quantitative usability metrics
25Guidelines
- There are many usability guidelines
- A survey of 21 sets of web guidelines found
little overlap (Ratner et al. 96) - Why?
- Our hypothesis not empirically validated
- So lets figure out what works!
26An Empirical Study
Which features distinguish well-designed web
pages?
27Methodology
- Data collection
- 1108 pages
- 163 sites
- 3 levels per site
- 14 metrics
- About 85 accurate
- Text cluster and text positioning counts less
accurate
28Metrics
29Preliminary Results
- Linear regression to predict Webby judges ratings
- Top 30 vs bottom 30
- Prediction accuracy
- 72 if categories not taken into account
- 83 if categories assessed separately
30Goals
- Create empirical foundations for what is still
guesswork - Next step
- A free online tool
- Long term goal
- An monte carlo simulator for comparing potential
designs
31For More Information
- http//webtango.berkeley.edu
- hearst_at_sims.berkeley.edu