Title: Evaluation with Users
1Evaluation with Users
- SEng 5115University of Minnesota
- John Kruse
- Spring 2008
2Evaluation with usersThe Gold Standard
- Big investment, big potential
- Big investment is debatable
- Potential isnt
- Many issues
- dealing with human subjects
- which users? which tasks?
- when in the process?
- what to measure?
- how to measure?
3An early test
4Why not just design
- The Superbook story
- Landauer, Trouble with Computers, at BellCore
- Superbook was online version of 1,000s pages
manual - First try wasnt better than paper
- even though designed by knowledgeable experts
- 2nd was better than paper, 3rd lots better
5Why not just interview
- People dont know what they know
- People dont remember what they did
- What people say about their work is different
from what they actually do - Occasional mismatches between what people like
and what works - People like what works on desktop
- But no correlation on web
6Why not just inspect
- Heuristic evaluation (HE) is good, cheap
- Tends not to catch the domain-specific stuff
- Unless you have double experts (Domain and
usability experts) - Tends not to have a task focus
- What is good about a Web site
- Users behavior is chaotic
- HE and testing complement each other
7Early in the process
- Early is important
- Low investment
- Less inertia --gt time to change
- Mock-ups and drawings are OK
- issues in how to handle user choice
- Partial prototypes when necessary
8Early, on-the-fly prototyping
- Paper prototypes Testing redesign
- Revise during test session
- Allow entire team to participate
- building prototypes
- watching users
9Late in the process
- Can measure productivity, timing, etc
- May require more elaborate prototype
- or actual code
- Post-release usability sessions are useful
- Observational or designed test
- Flexibility of development organizations
response is limited - Cost of fixing errors goes up
- If late testing is the only testing is it worth
it?
10User testing as team building
- Prototypes provide a medium for people to work
together - User testing can be fun
- Even if users are abusing your finest work
- Make observers stay away (behind glass)
- Managers have seen teams crystallize around paper
prototyping user testing
11What to measure 1 of 3
- Feasibility of a product approach
- Utility acceptance
- Microwave Cakes
- Ease of initial use learning
- Intuitiveness
- Need for manual
- Icon interpretation
- Problems, questions, reactions
- What users are thinking
12What to measure 2 of 3
- Ease of remembering
- Is it retained, or does it conflict with previous
or interleaved experiences learning - Efficiency of Use / Productivity
- Mostly later for usability measurement
- Limited applicability early in design
- Thinking Aloud related interrupts interfere
with timing - But timing can be done early in some cases
- Parts of workflow besides software
13What to measure 3 of 3
- Affective reactions
- Do they like it?
- Which parts do they like?
- Measuring affect choice can be tricky
- Observation during use is the fundamental method
- Forced choice and ratings can help
14Affective reactions
- It may relate to self-evaluation perceived
competency more than aesthetics - In task-oriented systems
- On the Web, marketing considerations
- If there are problems, then probe
- The persons reactions to similar products
- The background and experience of the person
- Their expectations about the technology
15Home page visual impact
- 5-second test show users the home page
- What does the owner do, what is the companys
business - What attracted your attention? What did it mean,
or how did it make you feel? - What can you do on this page?
16Wizard of Oz
- UI can be simple
- All you need to do is envision the use
- Smoke mirrors are quite adequate
3 books detected War Peace HOLD for M. Smith
17Example Concept feasibility productivity
- Library workstation with RFID
- Study the utility of the device
- Study acceptability of approach
- Is multiple book at a time check-in better then
one book at a time? - Wizard of Oz study
- Visual Basic, Monitor, Speakers, Cardboard Box
- Marked books
- Experimenter
18Library Wizard of Oz (cont)
- Participants thought it was very realistic
- Multiple was not better
- Handling exceptions is disruptive
- Discovered early cheaply
- Continues to be a business goal
19Productivity data
- Specific measurements
- Median/mean time for task
- Comparison of alternatives
- Focus on one type of test at a time
- Think-aloud can slow down completion times
20Experimental design
- Most user testing is not rigorous hypothesis
testing experimentation - has too low n
- lacks good control conditions
- Usually is formative evaluation
- Summative evaluation
- Usually experiments
- Work out the statistics involved
- Statistics cookbooks
21Experimental design principles
- Counterbalancing
- Logically remove the possibility of competing
explanations - If you want to study system X vs system Y
- DO not have all your participants do X, then Y
- Fatigue will decrease performance
- Practice will increase it
- You CANT really predict which will win
22Testing usability of icons 1
- 17 icons for UI, 4 sets created
- Ease of learning
- Show icons, ask what do you think
- Present task/description, have user pick from
entire set - Since icons are not seen in isolation
- Present all names, all icons, have users match
23Testing usability of icons 2
- Efficiency
- Users who had learned the icons
- Given name, then timed on Y/N discrimination
- Given random set, asked to click on specified
icon - Subjective satisfaction
- Rate each one easy -- difficult
- Select preferred one from 4 alternatives
24General User Test guidelines
- Plan ahead of time
- what data to record
- what instructions to deliver
- what to do if user falls off prototype
- when to provide help, and what help
- Know your objectives
- but never lose sight of the user
25General guidelines
- Do a pilot study
- Get professional help for big studies
- In general, it is better if developers
designers arent present during testing - too much bias
- subtle clues
- stay behind one-way glass
26Documents for user testing
- Observer briefing
- Welcome for participants
- Introduction
- Informed consent
- Training materials
- Test task(s)
- Data collection sheet
- Data summary sheet
- Data analysis sheet
- Pre-test questionnaire
- Post-test questionnaire
27Tasks
- Keep close to the real tasks
- May need to shorten some for time reasons
- Task selection heuristics
- Common tasks
- Areas of risk
- Safety, New approaches, Uncertainties
- Design the tasks
- Iteratively
- Get them reviewed
28Test tasks 1
- What you want them to accomplish
- Typically not how
- Example Create a form letter using MS Word,
which results in Thank-you letters to 3
recipients, each with individualized - Salutation (Dear Aunt Abigail)
- Name of gift (necktie)
- Attribute of gift (color)
29Home Health Clinical Test tasks 1
- Set up the new patient Jinnys visit calendar for
a visit frequency of 3x Week for 2 Weeks - Locate todays visit schedule all patients to
be seen today by a Ruth, a nurse - Initiate a visit note for Sam, an existing
patient, from todays schedule - ...and define the discipline, program,
activities type of services to be provided
30User-generated tasks
- Users behave differently when they care about the
task - On the web, with eCommerce, lots of potential
tasks - Interview them, let them define tasks that they
can do with a web site.
Spool, User Interface Engineering
31Observers
- Better to be there
- Than hear / read about it afterwards
- Seeing video clips can be very persuasive
- Better to be few unobtrusive
- No reactions to users choices
- No talking
- Unless behind one-way mirror
Marketing mgr at a session, standing right behind
the user, saw a new design (two columns
reversed), and said aloud Amy, youve got it all
wrong Social situation use process to tell
this higher-ranking customer that it is not OK
to talk, and why. Amy had good reasons for
showing in this order (counterbalancing), and for
not biasing.
32Users
- Real users, as much as possible
- If real users are scarce, try surrogates
- If 3M people cant use it, then maybe its too
hard - Availability of users might influence testing
approach - Recruiting is non-trivial effort
- Money always helps
33Welcome, Orientation
- Welcome, description of project
- Description may be truncated
- Brief intro to usability testing concept
- Test is of software under real-world conditions
- You wont help them
- Unless necessary
- Explanations afterwards
- How long it will take
34Participant (Human Subjects)
- Remind them that you are not testing them
- You are testing your own product
- But tell them you would rather not help them
- Informed, voluntary consent
- Understand that they can quit at any time
- Explain test in lay terms
- Privacy anonymity, use of image/voice
35Informed Consent
- This is very important
- Participant is volunteer
- Can leave at any time
- Is video being collected?
- What will it be used for?
- Who will see it?
- If you want to record, get permission
36Pre-test questionnaire
- About users background
- Experience with OS, software, etc.
- Experience with job
- How many hours per day do you use a computer?
- What is your educational level (degrees,
certificates, fields of study)?
37Training Materials
- What do they need to know?
- What is real-world?
- What will they get at work for training?
- Dont just look at best, or worst-case
- Instruct about software conventions
- Task orientation
- Explain things very thoroughly
- Practice, demonstrate (?)
38Thinking-Aloud Method 1
- User asked to think-aloud
- Running commentary, like at sports event
- What they are thinking or looking for or trying
to do - What they like, dont like, good, bad, anything
is OK and helpful - They should guess, ask questions
- You wont answer them
39Thinking-Aloud Method 2
- They should not let this interfere with their
normal process - Generally, dont explain decisions, or make
design suggestions, until after then its
welcome - Have them practice it
- Or have the first task be reasonably easy
40How to help users
- Not too soon
- Encourage them to try things out
- Be encouraging in general
- Tell them they cant make a mistake
- You will learn from everything they do or say
- When to help
- If they get stuck stay stuck
- When they look upset
41How to help users
- Dont give answers if at all possible
- Ask the user questions
- General at first
- Ones that will get them thinking about their
conceptual model - Then more specific (Leading) questions
- Give them hints
- General at first, then more specific
- Is their conceptual model OK?
42Making users comfortable
- Break after every task
- Recap, offer drink break
- Answer users questions if possible
- Dont let users start designing
- Until after users have completed their tasks
43Pairs of participantsA thinking aloud variant
- Thinking Aloud is difficult for people to do
- Users can work in pairs
- They talk to each other
- This is more comfortable for them
44Test facilitators role
- Flight attendant
- Responsibility for comfort safety of subjects
- Prevent distress, embarrassment
- Scientist
- During
- Maintain objectivity
- Gather data
- Before After
- Plan
- Reports
45Data Collection Sheet
- For quick observation, without recording where in
program - Success, comments they made
- Failure, kind of failure
46Web usability testing - Data
- Things happen fast
- Abbreviate, or
- Prepare checkoff sheets with likely actions
- Prepare page miniatures, to make notes on
47Special Materials - Greeking test
- Or Mumble text
- For layout
- On web site
- Greek all the text, but keep graphics
- Evaluate alternative web page designs
- Does layout communicate function?
- Or, does it matter where items go?
- Ksdiudhk dkji
- Mm Mmmmm mmmm
48Videotaping
- Usually too much work to go back over it
- Good for driving home your points
- Developers who werent there
- Disagreement on interpretation
- Management
- Split screen useful
- Camera for screen, camera for face
49Modern Data Capture
- Morae, from Techsmith
- Screen Cursor Clicks
- Face Voice
- Keystrokes
- Pages with timestamps
- Optional concurrent observers time-stamped
events - Start Ron video with Morae
50Post-test questionnaire
- Ratings scales are good
- Then ask them to talk elaborate
- How hard was it (Extremely easyModerately
difficultExtremely difficult) - Would you use it (Neversometimesalways)
- Did you understand the part where...
- Did you like it, find it attractive, etc?
- Anything missing from it?
51Reporting the findings
- Say something positive
- Make recommendations to improve things
- For Summative evaluation
- Common Industry Format for Usability Test Reports
Version 1.1, October 28, 1999 - Produced by the Industry USability Reporting
project www.nist.gov/iusr - For Formative evaluation
- Write for your audience
52Observing what didnt happen 1
- Establish expectations for user behavior
- e.g., this link will be followed for this reason
by this kind of user - Note when it does not happen, explain
- Look for what didnt take place in debriefing
afterwards - e.g. users in a study didnt look in online
books TOCs
53Observing what didnt happen 2
- Users at antique fair did not use Community
link - They used search facility, found peoples web
sites - Said I didnt know she had a web site -- I know
her - Look for behavior that doesnt make sense
54Empirical studies of usability testing
- Usability test of a web site, by 9 teams
- All teams given same objectives for the same
interface - Each team then conducted a study using their
organization's standard procedures and
techniques. - Molichs
55Results of study of usability testing
- More than 300 problems found in total
- Most were "reasonable and in accordance with
generally accepted advice on usable design." - There wasn't a single problem that every team
reported.
56Tasks as the basis of usability testing
- 9 teams created 51 different tasks for the same
UI - Each task was well designed valid
- but little agreement on which tasks were
critical. - If each team used the same best practices, then
they should have all derived the same tasks from
the test scenario
57What to do to improve things
- Task design is important
- Agree on them
- Maybe Goals are more important
- Better result reporting is needed
- The teams reports differed widely
- Ranged from 5 pages to 52 pages
- Iterations are useful
- With intervening design changes
- Culture attitude of continuous testing
58Empirical ResultsHow much is enough?
- For applications, rule of thumb 8 is plenty
- 80 problems w 4-5 users
- For some web sites 8 is not enough
- Task purchase CD online (general)
- Important, new problems with each of 18 users
- 247 total obstacles-to-purchase, 5 new per user
Spool etc, User Interface Engineering, CHI 2001
-- We conducted usability tests on an e-commerce
web site using a very straightforward task
buying a CD from an online music store. We chose
users who had a history of purchasing music
online. We asked these users to make a shopping
list of CDs they wanted to buy and gave them
money to spend on these items.
59Is 8 really enough?
- It depends
- For iterative UCD, 5-6 users OK
- To find all usability problems requires a large n
- Sample of tasks matters need good coverage
- Can you do repeated trials with same user?
- Or do they learn all the workarounds?
60How many users
- One at a time, at 3M and Microsoft
- Then made changes
- Then ran another
- Achieved good results
- Some opportunities just make sense
- Weve had 3 observers in the room
- With no obvious ill effects
- Except I needed to moderate
61Other techniques Surveys
- QUIST Questionaire for User Interaction
Satisfaction, Univ of Maryland - How long have you worked on this system?
- How many operating systems have you worked with?
- Overall reaction to the system
- Terrible .. Wonderful
- Frustrating Satisfying
- Characters, screen layouts, terminology
- questions about all aspects of a system
62Logging to study use
- User actions performance
- Page visits
- High-frequency search terms
- Search results, success, etc.
- High-frequency error messages
- Special Events
- Back button
- History / Bookmarks / Favorites
63Project assignment
- Take a few good questions
- Describe how you want the data to look
- Or at least what the comparisons will be
- To help you make business decisions
- Dont talk about how you would process it
- How would you summarize large numbers of episodes
of use, with people taking different paths, and
dropping out at different points?
64Framework for Logging Usability Data (FLUD)
- Design of a File Format for Logging Website
Interaction - National Institute of Standards and Technology
Special Publication 500-248 - Web Metrics Testbed
- tools and techniques that support rapid, remote,
and automated testing and evaluation of website
usability http//zing.ncsl.nist.gov/WebTools/
65Opentracker.net
- Website statistics provide insight
- Make daily decisions based on customer behavior
- Traffic statistics are a form of direct feedback
- Generate marketing numbers - not guesswork
- Learn what customers do adjust content to meet
their needs - Adjust strategies according to what works
- Identify non-effective strategies and drop them
- Profit from informed advertising and content
management decisions
66Other data-gathering techniques
- Online/telephone consultants
- Online suggestion box
- Interviews, focus panels
- Eye movements
- Example from useit.com
- Using Eye Tracking to Compare Web Page Designs A
Case Study - Agnieszka Bojko. Journal of Usability Studies,
Issue 3, Volume 1, May 2006, pp. 112-120
67Field Experiments Observation
- Productivity experiments on book sorting at a
library - Act as though you had some device
- Do the devices work ahead of time
- Ask user to do the newly-defined (partial) task
- Observational studies
- Where does the time go
- Whats it worth to automate a step
68Summary
- Get real (representative) users
- Orient them
- Testing product not them
- They can quit. OK to record?
- Talking aloug
- Tell them
- what you want them to accomplish
- not how to do it
- Let them do it
- Train only as much as is realistic
- Help only as necessary, asking hinting at first
- Note what they so