Title: Empirical Foundations for Web Site Usability
1Empirical Foundations for Web Site Usability
Marti Hearst Melody Ivory Rashmi
Sinha University of California, Berkeley
2The Usability Gap
3The Usability Gap
196M new Web sites in the next 5 years Nielsen99
Most sites have inadequate usability Forrester,
Spool, Hurst (users cant find what they want
39-66 of the time)
4The Problem
- NON-professionals need to create websites
- Guidelines are helpful, but
- Sometimes imprecise
- Sometimes conflict
- Usually not empirically founded
5Ultimate Goal Tools to Help Non-Professional
Designers
- Examples
- A grammar checker to assess guideline
conformance - Imperfect
- Only suggestions not dogma
- Automatic comparison to highly usable pages/sites
- Automatic template suggestions
6 A View of Web Site Structure (Newman et al. 00)
- Information design
- structure, categories of information
- Navigation design
- interaction with information structure
- Graphic design
- visual presentation of information and navigation
(color, typography, etc.)
Courtesy of Mark Newman
7 A View of Web Site Design(Newman et al. 00)
- Information Architecture
- includes management and more responsibility for
content - User Interface Design
- includes testing and evaluation
Courtesy of Mark Newman
8The Goal
- Eventually want to assess navigation structure
and graphic design at the page and site level. - Farther down the line information design and
scent - Note we are NOT suggesting we can characterize
- Aesthetics
- Subjective preferences
9The Investigation
- Can we place web design guidelines onto an
empirical foundation? - Can we build models of good design by looking at
existing designs?
10Example Empirical Investigation
- Is it all about the content?
11Webby Awards 2000
- 27 topical categories
- We used finance, education, community, living,
health, services - 100 judges
- International Academy of Digital Arts Sciences
- 3 rounds of judging
- 2000 sites initially
12Webby Awards 2000
- 6 criteria
- Content
- Structure navigation
- Visual design
- Functionality
- Interactivity
- Overall experience
- Scale 1-10 (highest)
- Nearly normally distributed across judged sites
- What are Webby judgements about?
13Webby Awards 2000
- The best predictor of the overall score is the
score for content - The worst predictor is visual design
14So Webbys focus on content!
15Comparing Two Categories
news
arts
16Guidelines
- There are MANY usability guidelines
- A survey of 21 sets of web guidelines found
little overlap (Ratner et al. 96) - Why?
- One idea because they are not empirically
validated - So lets figure out what works!
17Another Empirical Study
Which features distinguish well-designed web
pages?
18Quantitative Metrics
- Identified 42 attributes from the literature
- Roughly characterized
- Page Composition (e.g., words, links, images)
- Page Formatting (e.g., fonts, lists, colors)
- Overall Page Characteristics
- (e.g., information layout quality, download
speed)
19Metrics Used in Study
- Word Count
- Body Text Percentage
- Emphasized Body Text Percentage
- Text Positioning Count
- Text Cluster Count
- Link Count
- Page Size
- Graphic Percentage
- Graphics Count
- Color Count
- Font Count
20Data Collection
- Collected data for 1898 pages from 163 sites
- Attempted to collect from 3 levels within each
site - Six Webby categories
- Health, Living, Community, Education, Finance,
Services - Data constraints
- At least 30 words
- No pages with forms
- Exhibit high self-containment (i.e., no scripts,
applets, etc.)
21Method
- The Webby factor
- A principle components analysis of the 6
judgement criteria accounted for 91 of the
variance - Two comparisons
- Model 1 Top 33 of sites vs. the rest
- (using the overall Webby score)
- Model 2 Top 33 of sites vs. bottom 33 (using
the Webby factor)
22Questions
- Can we use the metrics to predict membership in
top vs. other groups? - Do we see a difference in how the metrics behave
in different content categories?
23Findings
- We can accurately classify web pages
- Linear discriminant analysis
- Model 1 For top vs. rest
- 67 correct for overall
- 73 correct when taking categories into account
- Model 2 For top vs. bottom
- 65 correct for overall
- 80 correct using categories
24Findings
- Top 33 vs bottom 33 via Webby factor
- Linear discriminant analysis
- Works better when subdivided by category
25Why does this work?
- Content is most important predictor of overall
score - BUT there is some predictive power in the visual
design / navigation criteria - Also, it may just be that good design is good
design all over - This result is found in other domains
- automatic essay grading for one
26Deeper Analysis
- Which metrics matter?
- Linear regression analysis
- (backward elimination until adjusted R² reduced)
- All metrics played a role
- Compared small, medium, and large pages
- Across the board
- good pages had significantly smaller graphics
percentage - good pages had less emphasized body text
- good pages had more colors (on text)
27Small pages (66 words on average)
- Good small pages have
- (according to beta coefficients)
- slightly more content
- smaller page sizes
- fewer graphics
- more font variations
- This suggests good small pages
- Have faster download times
- corroborated by a download time metric
- Use different fonts for headers vs the rest of
the text
28Medium pages (230 words on average)
- Good medium pages emphasize less of the body text
- Good medium pages appear to organize text into
clusters (e.g., lists and shaded table areas). - Good medium pages use colors to distinguish
headers from body text
29Large pages (827 words on average)
- Good large pages have
- more headers
- more links
- are larger but have fewer graphics
- probably attributable to style sheets
30Future work
- Distinguish according to page role
- Home page vs. content vs. index
- Better metrics
- Separate info design, navigation design, graphic
design - Site level as well as page level
- Compare against results of live user studies
31Future work
- Category-based profiles
- Can use clustering to create profiles of good and
poor sites for each category - These can be used to suggest alternative designs
- More information CHI 2001 paper
32More metrics
33More metrics
34More metrics
35Ramifications
- It is remarkable that such simple metrics predict
so well - Perhaps good design is good overall
- There may be other factors
- A foundation for a new methodology
- Empirical, bottom up
- But, there is no one path to good design!
36Related Work
- Some tools report on easy-to-measure attributes
- Compare number of links graphics to thresholds
- Stein (Rating Game), Theng Marsden, Thimbley
(Gentler) - These are not empirically validated
- Accessibility compliance
- CAST (Bobby), Scholtz Laskowski
- Perceptually based heuristics
- Faraday (Design Advisor)
37Related Work
- Web log analysis
- Traffic-based and time-based analysis
- Drott, Etgan Cantor, Fuller deGraaff,
Hochheiser Shneiderman, Sullivan - Simulators
- Webcriteria (Max Site Profiler) makes predictions
via a pre-defined path - Chi, Pirolli, Pitkow generate navigation paths
from server logs
38In Summary
- Automated Usability Assessment should help close
the Web Usability Gap - We can empirically distinguish between highly
rated web pages and other pages - Empirical validation of design guidelines
- Can build profiles of good vs. poor sites
- Are validating expert judgements with usability
assessments via a user study - Eventually want to build tools to help end-users
assess their designs
39- More information
- http//webtango.berkeley.edu
- http//www.sims.berkeley.edu/hearst