Title: Automating Assessment of Web Site Usability
1Automating Assessment of Web Site Usability
Marti Hearst University of California, Berkeley
2The Usability Gap
3The Usability Gap
196 M new Web sites in the next 5 years
Nielsen99
Most sites have inadequate usability Forrester,
Spool, Hurst (users cant find what they want
39-66 of the time)
4Usability effects the bottom line
- IBM case study 1999
- Spent millions to redesign site
- 84 decrease in help usage
- 400 increase in sales
- Attributed to improvements in information
architecture
5Usability effects the bottom line
- IBM case study 1999
- Spent millions to redesign site
- 84 decrease in help usage
- 400 increase in sales
- Attributed to improvements in information
architecture
Creative Good Study 1999 Studied 10
e-commerce sites 59 attempts failed If 25 of
these had succeeded - estimated additional 3.9B
in sales
6Talk Outline
- Web Site Design
- Automated Usability Evaluation
- Our approach
- WebTANGO
- Some Empirical Results
- Wrap-up
- Joint work with Melody Ivory Rashmi Sinha
7Web Site Design (Newman et al. 00)
- Information design
- structure, categories of information
- Navigation design
- interaction with information structure
- Graphic design
- visual presentation of information and navigation
(color, typography, etc.)
Courtesy of Mark Newman
8Web Site Design(Newman et al. 00)
- Information Architecture
- includes management and more responsibility for
content - User Interface Design
- includes testing and evaluation
Courtesy of Mark Newman
9Web Site Design Process
Start
Discovery
Assemble information relevant to project
Courtesy of Mark Newman
10(No Transcript)
11Usability EvaluationStandard Techniques
- User studies
- Potential users use the interface to complete
some tasks - Requires an implemented interface
- "Discount" Usability Evaluation
- Heuristic Evaluation
- Usability expert assesses guidelines
12Automated UE
- We looked at 124 methods
- AUE is greatly under-explored
- Only 36 of all methods
- Fewer methods for the web (28)
- Most techniques require some testing
- Only 18 are free from user testing
- Only 6 for the web
13Survey of Automated UE
- Predominant methods (Web)
- Structural analysis (4)
- Bobby, Scholtz Laskowski 98, Stein 97
- Guideline Reviews (11)
- Log file analysis (9)
- Chi et al. 00, Drott 98, Fuller de Graaff 96,
Guzdial et al., Sullivan 97, Theng Marsden 98 - Simulation (2)
- Webcriteria (Max), Chi et al. 00
14Existing Metrics
- Web metric analysis tools report on what is easy
to measure - Predicted download time
- Depth/breadth of site
- We want to worry about
- Content
- User goals/tasks
- We also want to compare alternative designs.
15Web TANGOTool for Assessing NaviGation
Organization
- Goal automated support for comparing design
alternatives - How Assess usability of the information
architecture - Approximate information-seeking behavior
- Output quantitative usability metrics
16Benefits/Tradeoffs
- Benefits
- Less expensive than traditional methods
- Use early in design process
- Tradeoffs
- Accuracy?
- Validate methodology with user studies
- Illustrate different problems than traditional
methods - For comparison purposes only
- Does not capture subjective measures
17Information-Centric Sites
18Guidelines
- There are many usability guidelines
- A survey of 21 sets of web guidelines found
little overlap (Ratner et al. 96) - Why?
- Our hypothesis not empirically validated
- So lets figure out what works!
19An Empirical Study
Which features distinguish well-designed web
pages?
20Methodology
- Collect quantitative measures from 2 groups
- Ranked Sites rated favorably via expert review
or user ratings - Unranked Sites that have not been rated
favorably - Statistically compare the groups
- Predict group membership
21Quantitative Measures
- Identified 42 aspects from the literature
- Page Composition (e.g., words, links, images)
- Page Formatting (e.g., fonts, lists, colors)
- Overall Page Characteristics
- (e.g., information layout quality, download
speed)
22Metrics
- Word Count
- Body Text Percentage
- Emphasized Body Text Percentage
- Text Positioning Count
- Text Cluster Count
- Link Count
- Page Size
- Graphic Percentage
- Graphics Count
- Color Count
- Font Count
- Reading Complexity
23Data Collection
- Collected data for 2,015 information-centric
pages from 463 sites - Education, government, newspaper, etc.
- Data constraints
- At least 30 words
- No e-commerce pages
- Exhibit high self-containment (i.e., no style
sheets, scripts, applets, etc.) - 1,054 pages fit constraints (52)
24Data Collection
- Ranked pages
- Favorably assessed by expert review or user
rating on expert-chosen sites - Sources
- Yahoo! 101 (ER)
- Web 100 (UR)
- PC Mag Top 100 (ER)
- WiseCats Top 100 (ER)
- Webby Awards (ER) Peoples Voice (UR)
25Data Collection
- Unranked
- Not favorably assessed by expert review or user
rating on expert-chosen sites - Do not assume unranked unfavorable
- Sources
- WebCriterias Industry Benchmark
- Yahoo Business Economy Category
- Others
26Data Analysis
- 428 pages
- 214 ranked pages
- 840 unranked pages
- 214 chosen randomly
27Findings
- Several features are significantly associated
with ranked sites - Several pairs of features correlate strongly
- Correlations mean different things in ranked vs.
unranked pages - Significant features are partially successful at
predicting if site is ranked
28Significant Differences
29Significant Differences
- Ranked pages
- More text clustering (facilitates scanning)
- More links (facilitate info-seeking)
- More bytes (more content ? facilitate info
seeking) - More images (clustering graphics ? facilitates
scanning) - More colors (facilitates scanning)
- Lower reading complexity (close to best numbers
in Spool study ? facilitates scanning)
30Metric Correlations
31Metric Correlations
- Created hypotheses based on correlations
- Ranked Pages
- Colored display text
- Link clustering
- ? Both patterns on all pages in random sample
- Unranked Pages
- Display text coloring plus body text emphasis or
clustering - Link coloring or clustering
- Image links, simulated image maps, bulleted links
- ? At least 2 patterns in 70 of random sample
- Confirmed by sampling
32Two Examples
33Ranked Page
Colored display text Link clustering
34UnRanked Page
Body text emphasis Image links
35Predicting Web Page Rating
- Linear Regression
- Explains 10 of difference between groups
- 63 Accuracy (better at unranked prediction)
36Predicting Web Page Rating
- Home vs. Non-home pages
- Text cluster count predicts home page ranking
- 66 accuracy
- Consistent with primary goal of home pages
- Non-home page prediction
- Consistent with full sample results
- 4 of 6 metrics (link count, text positioning
count, color count, reading complexity)
37Another Rating System
- Web site ratings from RateItAll.com
- User ratings on 5-point scale
- (1 Terrible! 5 Great!)
- No rating criteria
- Small set of 59 pages (61 ranked)
- 54 of pages classified consistently
- Only 17 unranked with high rating ? unranked
sites properly labeled - 29 ranked with medium rating ? difference
between expert/non-expert review - Ranking predicted by graphics count with 70
accuracy - ? Carefully design studies with non-experts
38Second study (new results)
- Better rating data
- Webby Awards
- Sites organized into categories
- New metrics computation tool
- More quantitative measures
- Process style sheets, inline frames
- Larger sample of pages
39Webby Awards 2000
- 27 categories
- We used finance, education, community, living,
health, services - 100 judges
- 6 criteria
- 3 rounds of judging
- We used first round only
- 2000 sites initially
40Webby Awards 2000
- 6 criteria
- Content
- Structure navigation
- Visual design
- Functionality
- Interactivity
- Overall experience
- Factor analysis first factor accounted for 91
of the variance - Judgements somewhat normally distributed, with
skew
41New Metrics
42Methodology
- Data collection
- 1108 pages
- 163 sites
- 3 levels per site
- 14 metrics
- About 85 accurate
- Text cluster and text positioning counts less
accurate
43Preliminary Results
- Linear regression to predict Webby judges ratings
- Top 30 vs bottom 30
- Prediction accuracy
- 72 if categories not taken into account
- 83 if categories assessed separately
44Significant Metrics by Category
45Category-based Profiles
- K-means clustering of good sites, according to
the metrics - Preliminary results suggest the sites do cluster
- Can use clusters to create profiles of good and
poor sites for each category - These can be used as empircally verified
guidelines
46Ramifications
- It is remarkable that such simple metrics predict
so well - Perhaps good design is good overall
- There may be other factors
- A foundation for a new methodology
- Empircal, bottom up
- Does this reflect cognitive principles?
- But, no one path to good design
47Longer Term Goal A Simulator for Comparing
Site Design
48Monte Carlo Simulation
- Have a model of information structure
- Have a set of user goals
- Want to assess navigation structure
- Compare alternatives/tradeoffs
- Identify bottlenecks
- Identify critically important pages/links
- Check all pairs of start/end points
- Check overall reachability before and after a
change.
49X
One Monte Carlo simulation step for Design 1,
Task 1. Simulation starts from the home page and
the target information is at Renter Support.
50X
Monte Carlo simulation results for Design 1, Task
1. Simulation runs start from all pages in the
site. Average Navigation times are shown for
Tasks 2 3.
51Monte Carlo Simulation
- At each step in the simulation
- Assume a probability distribution over a set of
next choices. - The next choice is a function of
- The current goal
- The understandability of the choice
- Prior interaction history
- The overall complexity of the page
- Varying the distribution corresponds to varying
properties of the links - Spot-check important choices
52Monte Carlo Simulation
- At each step in the simulation
- Assume a probability distribution over a set of
next choices. - The next choice is a function of
- The current goal
- The understandability of the choice
- Prior interaction history
- The overall complexity of the page
- Varying the distribution corresponds to varying
properties of the links - Spot-check important choices
53In Summary
- Automated Usability Assessment should help close
the Web Usability Gap - We can empirically distinguish between highly
rated web pages and other pages - Empirical validation of design guidelines
- Can build profiles of good vs. poor sites
- Are validating expert judgements with usability
assessments via a user study - Web use simulation is an under-explored and
promising new approach
54Current Projects
- Automating Web Usability (Tango)
- Melody Ivory, Rashmi Sinha
- Text Data Mining (Lindi)
- Barbara Rosario, Steve Tu
- Metadata in Search Interfaces (Flamenco)
- Ame Elliott, Andy Chou
- Web Intranet Search (Cha-Cha)
- Mike Chen, Jamie Laflen
55- More information
- http//www.cs.berkeley.edu/ivory/web
- http//www.sims.berkeley.edu/hearst
56(No Transcript)
57Automated Usability Evaluation
- Logging/capture
- Pro Easy
- Con Requires implemented system
- Con Don't know the user task (web)
- Con Don't present alternatives
- Con Don't distinguish error from success
- Analytical Modeling
- Pro doable at design phase
- Con models an expert
- Con academic exercise
- Simulation
58Research Issues Navigation Predictions
- Develop model for predicting link selection
- Requirements
- Information need (task metadata)
- Representation of pages (page metadata)
- Method for selecting links (relevance ranking)
- Maintaining users conceptual model during site
traversal (scent Fur97,LC98,Pir97)