Automating Assessment of Web Site Usability

1 / 55
About This Presentation
Title:

Automating Assessment of Web Site Usability

Description:

Only 18% are free from user testing. Only 6% for the web. IBM Almaden, Oct 2000 ... Collected data for 2,015 information-centric pages from 463 sites ... – PowerPoint PPT presentation

Number of Views:103
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Automating Assessment of Web Site Usability


1
Automating Assessment of Web Site Usability

Marti Hearst University of California, Berkeley
2
The Usability Gap
3
The Usability Gap
196 M new Web sites in the next 5 years
Nielsen99
Most sites have inadequate usability Forrester,
Spool, Hurst (users cant find what they want
39-66 of the time)
4
Usability effects the bottom line
  • IBM case study 1999
  • Spent millions to redesign site
  • 84 decrease in help usage
  • 400 increase in sales
  • Attributed to improvements in information
    architecture

5
Usability effects the bottom line
  • IBM case study 1999
  • Spent millions to redesign site
  • 84 decrease in help usage
  • 400 increase in sales
  • Attributed to improvements in information
    architecture

Creative Good Study 1999 Studied 10
e-commerce sites 59 attempts failed If 25 of
these had succeeded - estimated additional 3.9B
in sales
6
Talk Outline
  • Web Site Design
  • Automated Usability Evaluation
  • Our approach
  • WebTANGO
  • Some Empirical Results
  • Wrap-up
  • Joint work with Melody Ivory Rashmi Sinha

7
Web Site Design (Newman et al. 00)
  • Information design
  • structure, categories of information
  • Navigation design
  • interaction with information structure
  • Graphic design
  • visual presentation of information and navigation
    (color, typography, etc.)

Courtesy of Mark Newman
8
Web Site Design(Newman et al. 00)
  • Information Architecture
  • includes management and more responsibility for
    content
  • User Interface Design
  • includes testing and evaluation

Courtesy of Mark Newman
9
Web Site Design Process
Start
Discovery
Assemble information relevant to project
Courtesy of Mark Newman
10
(No Transcript)
11
Usability EvaluationStandard Techniques
  • User studies
  • Potential users use the interface to complete
    some tasks
  • Requires an implemented interface
  • "Discount" Usability Evaluation
  • Heuristic Evaluation
  • Usability expert assesses guidelines

12
Automated UE
  • We looked at 124 methods
  • AUE is greatly under-explored
  • Only 36 of all methods
  • Fewer methods for the web (28)
  • Most techniques require some testing
  • Only 18 are free from user testing
  • Only 6 for the web

13
Survey of Automated UE
  • Predominant methods (Web)
  • Structural analysis (4)
  • Bobby, Scholtz Laskowski 98, Stein 97
  • Guideline Reviews (11)
  • Log file analysis (9)
  • Chi et al. 00, Drott 98, Fuller de Graaff 96,
    Guzdial et al., Sullivan 97, Theng Marsden 98
  • Simulation (2)
  • Webcriteria (Max), Chi et al. 00

14
Existing Metrics
  • Web metric analysis tools report on what is easy
    to measure
  • Predicted download time
  • Depth/breadth of site
  • We want to worry about
  • Content
  • User goals/tasks
  • We also want to compare alternative designs.

15
Web TANGOTool for Assessing NaviGation
Organization
  • Goal automated support for comparing design
    alternatives
  • How Assess usability of the information
    architecture
  • Approximate information-seeking behavior
  • Output quantitative usability metrics

16
Benefits/Tradeoffs
  • Benefits
  • Less expensive than traditional methods
  • Use early in design process
  • Tradeoffs
  • Accuracy?
  • Validate methodology with user studies
  • Illustrate different problems than traditional
    methods
  • For comparison purposes only
  • Does not capture subjective measures

17
Information-Centric Sites
18
Guidelines
  • There are many usability guidelines
  • A survey of 21 sets of web guidelines found
    little overlap (Ratner et al. 96)
  • Why?
  • Our hypothesis not empirically validated
  • So lets figure out what works!

19
An Empirical Study
Which features distinguish well-designed web
pages?
20
Methodology
  • Collect quantitative measures from 2 groups
  • Ranked Sites rated favorably via expert review
    or user ratings
  • Unranked Sites that have not been rated
    favorably
  • Statistically compare the groups
  • Predict group membership

21
Quantitative Measures
  • Identified 42 aspects from the literature
  • Page Composition (e.g., words, links, images)
  • Page Formatting (e.g., fonts, lists, colors)
  • Overall Page Characteristics
  • (e.g., information layout quality, download
    speed)

22
Metrics
  • Word Count
  • Body Text Percentage
  • Emphasized Body Text Percentage
  • Text Positioning Count
  • Text Cluster Count
  • Link Count
  • Page Size
  • Graphic Percentage
  • Graphics Count
  • Color Count
  • Font Count
  • Reading Complexity

23
Data Collection
  • Collected data for 2,015 information-centric
    pages from 463 sites
  • Education, government, newspaper, etc.
  • Data constraints
  • At least 30 words
  • No e-commerce pages
  • Exhibit high self-containment (i.e., no style
    sheets, scripts, applets, etc.)
  • 1,054 pages fit constraints (52)

24
Data Collection
  • Ranked pages
  • Favorably assessed by expert review or user
    rating on expert-chosen sites
  • Sources
  • Yahoo! 101 (ER)
  • Web 100 (UR)
  • PC Mag Top 100 (ER)
  • WiseCats Top 100 (ER)
  • Webby Awards (ER) Peoples Voice (UR)

25
Data Collection
  • Unranked
  • Not favorably assessed by expert review or user
    rating on expert-chosen sites
  • Do not assume unranked unfavorable
  • Sources
  • WebCriterias Industry Benchmark
  • Yahoo Business Economy Category
  • Others

26
Data Analysis
  • 428 pages
  • 214 ranked pages
  • 840 unranked pages
  • 214 chosen randomly

27
Findings
  • Several features are significantly associated
    with ranked sites
  • Several pairs of features correlate strongly
  • Correlations mean different things in ranked vs.
    unranked pages
  • Significant features are partially successful at
    predicting if site is ranked

28
Significant Differences
29
Significant Differences
  • Ranked pages
  • More text clustering (facilitates scanning)
  • More links (facilitate info-seeking)
  • More bytes (more content ? facilitate info
    seeking)
  • More images (clustering graphics ? facilitates
    scanning)
  • More colors (facilitates scanning)
  • Lower reading complexity (close to best numbers
    in Spool study ? facilitates scanning)

30
Metric Correlations
31
Metric Correlations
  • Created hypotheses based on correlations
  • Ranked Pages
  • Colored display text
  • Link clustering
  • ? Both patterns on all pages in random sample
  • Unranked Pages
  • Display text coloring plus body text emphasis or
    clustering
  • Link coloring or clustering
  • Image links, simulated image maps, bulleted links
  • ? At least 2 patterns in 70 of random sample
  • Confirmed by sampling

32
Two Examples
33
Ranked Page
Colored display text Link clustering
34
UnRanked Page
Body text emphasis Image links
35
Predicting Web Page Rating
  • Linear Regression
  • Explains 10 of difference between groups
  • 63 Accuracy (better at unranked prediction)

36
Predicting Web Page Rating
  • Home vs. Non-home pages
  • Text cluster count predicts home page ranking
  • 66 accuracy
  • Consistent with primary goal of home pages
  • Non-home page prediction
  • Consistent with full sample results
  • 4 of 6 metrics (link count, text positioning
    count, color count, reading complexity)

37
Another Rating System
  • Web site ratings from RateItAll.com
  • User ratings on 5-point scale
  • (1 Terrible! 5 Great!)
  • No rating criteria
  • Small set of 59 pages (61 ranked)
  • 54 of pages classified consistently
  • Only 17 unranked with high rating ? unranked
    sites properly labeled
  • 29 ranked with medium rating ? difference
    between expert/non-expert review
  • Ranking predicted by graphics count with 70
    accuracy
  • ? Carefully design studies with non-experts

38
Second study (new results)
  • Better rating data
  • Webby Awards
  • Sites organized into categories
  • New metrics computation tool
  • More quantitative measures
  • Process style sheets, inline frames
  • Larger sample of pages

39
Webby Awards 2000
  • 27 categories
  • We used finance, education, community, living,
    health, services
  • 100 judges
  • 6 criteria
  • 3 rounds of judging
  • We used first round only
  • 2000 sites initially

40
Webby Awards 2000
  • 6 criteria
  • Content
  • Structure navigation
  • Visual design
  • Functionality
  • Interactivity
  • Overall experience
  • Factor analysis first factor accounted for 91
    of the variance
  • Judgements somewhat normally distributed, with
    skew

41
New Metrics
42
Methodology
  • Data collection
  • 1108 pages
  • 163 sites
  • 3 levels per site
  • 14 metrics
  • About 85 accurate
  • Text cluster and text positioning counts less
    accurate

43
Preliminary Results
  • Linear regression to predict Webby judges ratings
  • Top 30 vs bottom 30
  • Prediction accuracy
  • 72 if categories not taken into account
  • 83 if categories assessed separately

44
Significant Metrics by Category
45
Category-based Profiles
  • K-means clustering of good sites, according to
    the metrics
  • Preliminary results suggest the sites do cluster
  • Can use clusters to create profiles of good and
    poor sites for each category
  • These can be used as empircally verified
    guidelines

46
Ramifications
  • It is remarkable that such simple metrics predict
    so well
  • Perhaps good design is good overall
  • There may be other factors
  • A foundation for a new methodology
  • Empircal, bottom up
  • Does this reflect cognitive principles?
  • But, no one path to good design

47
Longer Term Goal A Simulator for Comparing
Site Design

48
Monte Carlo Simulation
  • Have a model of information structure
  • Have a set of user goals
  • Want to assess navigation structure
  • Compare alternatives/tradeoffs
  • Identify bottlenecks
  • Identify critically important pages/links
  • Check all pairs of start/end points
  • Check overall reachability before and after a
    change.

49
X
One Monte Carlo simulation step for Design 1,
Task 1. Simulation starts from the home page and
the target information is at Renter Support.
50
X
Monte Carlo simulation results for Design 1, Task
1. Simulation runs start from all pages in the
site. Average Navigation times are shown for
Tasks 2 3.
51
Monte Carlo Simulation
  • At each step in the simulation
  • Assume a probability distribution over a set of
    next choices.
  • The next choice is a function of
  • The current goal
  • The understandability of the choice
  • Prior interaction history
  • The overall complexity of the page
  • Varying the distribution corresponds to varying
    properties of the links
  • Spot-check important choices

52
Monte Carlo Simulation
  • At each step in the simulation
  • Assume a probability distribution over a set of
    next choices.
  • The next choice is a function of
  • The current goal
  • The understandability of the choice
  • Prior interaction history
  • The overall complexity of the page
  • Varying the distribution corresponds to varying
    properties of the links
  • Spot-check important choices

53
In Summary
  • Automated Usability Assessment should help close
    the Web Usability Gap
  • We can empirically distinguish between highly
    rated web pages and other pages
  • Empirical validation of design guidelines
  • Can build profiles of good vs. poor sites
  • Are validating expert judgements with usability
    assessments via a user study
  • Web use simulation is an under-explored and
    promising new approach

54
Current Projects
  • Automating Web Usability (Tango)
  • Melody Ivory, Rashmi Sinha
  • Text Data Mining (Lindi)
  • Barbara Rosario, Steve Tu
  • Metadata in Search Interfaces (Flamenco)
  • Ame Elliott, Andy Chou
  • Web Intranet Search (Cha-Cha)
  • Mike Chen, Jamie Laflen

55
  • More information
  • http//www.cs.berkeley.edu/ivory/web
  • http//www.sims.berkeley.edu/hearst

56
(No Transcript)
57
Automated Usability Evaluation
  • Logging/capture
  • Pro Easy
  • Con Requires implemented system
  • Con Don't know the user task (web)
  • Con Don't present alternatives
  • Con Don't distinguish error from success
  • Analytical Modeling
  • Pro doable at design phase
  • Con models an expert
  • Con academic exercise
  • Simulation

58
Research Issues Navigation Predictions
  • Develop model for predicting link selection
  • Requirements
  • Information need (task metadata)
  • Representation of pages (page metadata)
  • Method for selecting links (relevance ranking)
  • Maintaining users conceptual model during site
    traversal (scent Fur97,LC98,Pir97)
Write a Comment
User Comments (0)