Title: Usability Evaluation of Computer Assisted Survey Instruments
1Usability Evaluation ofComputer Assisted Survey
Instruments
Mick P. Couper
Survey Research Center, University of Michigan
and Joint Program in Survey Methodology
2Goals of This Presentation
- Argue that design is important in both
interviewer-administered and self-administered
surveys - Show examples of design problems
- Introduce notion of human-computer interaction
(HCI) research and usability - Discuss application of usability or user-centered
design to surveys
3First, Some Terminology
- Computer assisted interviewing (CAI)
- CATI
- CAPI
- CASI, audio-CASI
- Computerized self-administered questionnaires
(CSAQs) - Disk-by-mail
- E-mail
- Web surveys
- Touchtone data entry (TDE) , interactive voice
response (IVR), etc.
4Design Implications of These Differences
- CATI and CAPI have different design implications
- e.g., laptops for CAPI designed for portability
smaller screen size, reduced keyboard, no mouse - CASI, audio-CASI occupy a middle position
- self-administered, often by people with no prior
computer experience - but interviewer present to train and assist
- standardized interface
- CSAQs present biggest design challenge
5Interaction in Computer Assisted Interviewing
(CAI)
- Interviewer as intermediary or agent
- Handling two interactions simultaneously
- Some training in use of computer and interacting
with respondent - Interface is the same across all interviews
6Interaction in Computerized Self-Administered
Questionnaires (CSAQs)
- No trained person present to assist
- Wide variation in skills, experience and
motivation - Respondent must infer meaning and task from
instrument or questionnaire - Interface often not standardized depends on
hardware/software used by respondent
7(Why) Is Instrument Design Important?
- Traditional focus on question wording
- Computer assisted methods have added concerns
about program correctness - Instrument design often placed in the hands of
programmers - End-user (interviewer or respondent) may be
largely ignored in the development of
computerized instruments
8Effect of Design GVU Web User Surveys
Eighth Survey (1997)
Ninth Survey (1998)
Source http//www.gvu.gatech.edu/user_surveys/
9Effect of Design Web Surveys
- Georgia Tech Universitys GVU eighth (1997) and
ninth (1998) WWW user surveys - Percent of responses in two categories
- age 5-10
- rather not say
10Effect of Design Michigan Daily Survey
Short Box Version
11Effect of Design Michigan Daily Survey
Long Box Version
12Michigan Daily Web Survey Invalid Entries by
Version
- Cases randomly assigned to treatment n348 radio
button, 242 long box, 71 short box - Invalid entries defined as anything other than
0-10, DK and NA - Percent of entries for Q64 which were invalid by
method
13Effect of Design MSI Web Survey
Single Column Version
14Effect of Design MSI Web Survey
Double Column Version
15MSI Web Survey of Michigan Students
- Sample randomly assigned to versions n?1,300 in
each group - Significantly more items endorsed in top half of
single column and in left of two columns - Example percent endorsing To increase enjoyment
of music or food - appears on left of two columns, and lower half of
single column
16Effects of Design in Interviewer Administered
Surveys
- Likely to be more subtle than in
self-administered survey - trained and experienced interviewers often
compensate for poor design - Instrument can affect interviewer behavior (and
thus data quality) in two main ways - delivery of question
- recording of response
17Effect of Design CAPI Survey
Source National Health Interview Survey
18Effect of Design NHIS
- Additional names on followup screen, reached by
pressing Sh-F6 to activate bottom screen then
PgDn to see balance of list - Interviewers required to read all names
- In 18 interviews observed in usability
laboratory, no interviewer successfully used this
function - Trace file analysis of over 16,000 field
interviews revealed that in households with 4
person, the Sh-F6 function was used less
than 6 of the time
19New Version of NHIS Screen
20Effect of Inconsistent Design in CAI
- Frazis and Stewart (1998) examined the following
question in the Current Population Survey (CPS) - The rest of the instrument uses 1yes and 2no
- Only those who earlier report less than a high
school education are asked this question
E2. Did you ever get a High School diploma by
completing High School OR through a GED or other
equivalent? 1) Yes, completed high
school 2) Yes, GED or other equivalent 3) No
21Effect of Inconsistent Design in CAI
- Of those asked this question, 12.2 selected
option 2 - Using external data, Frazis and Stewart estimate
that almost all of these responses are spurious - Estimate of 4.8 million additional GEDs in U.S.
from above question - True population estimate is closer to 400,000
22Usability
- Also known as human-computer interaction (HCI)
research or user-centered design (UCD) - Is more than aesthetics or user-friendliness
- Is grounded in theory, with contributions from
cognitive and social psychology, communication,
computer science, ethnography, sociology - Based on empirical research
- Well-developed and growing literature to guide
practice on general design
23Usability for Computer Assisted Surveys
- No need to reinvent the wheel
- Many of the principles can be readily applied
- Procedures translate easily to CSAQ and CAI
applications - Research and testing methods are extensions of
familiar techniques - cognitive laboratory methods
- field observation
- experimentation
- Much of the investment already made
24Good Design
- May not only reduce errors (improve data quality)
directly, but may also - reduce (initial and refresher) training costs
- reduce interviewer frustration (? lower turnover)
- reduce respondent burden
- reduce time of interview
25Approaches to Evaluating Usability
- Expert methods or usability inspection methods
- cognitive walkthroughs
- heuristic evaluation
- End-user methods
- usability testing
- field observations
- Analysis of production data or paradata
- keystroke files
- section- and item-level timers
- call-record data
26Heuristic Evaluation
- Steps in heuristic evaluation
- develop set of usability heuristics
- usability experts review instrument
- rate problems according to predefined heuristics
- produce unduplicated list of problems identified
- rank severity of problems
- Example of heuristics used by Sweet et al. (1996)
for evaluation of web-based CSAQ - speak users language consistency minimize
users memory load flexibility and efficiency of
use, etc.
27End-User Methods of Usability Evaluation
- Laboratory-based methods
- formal experiments of alternative designs
- observations of use of instrument
- scenario-based approaches
- Field-based methods
- observation in field
- User debriefings
- feedback from interviewers and/or respondents
28Usability Laboratory at SRC
- Extension of cognitive laboratory facilities and
procedures - Designed to facilitate study of CAI
- Focus on end-user evaluation in
interviewer-administered surveys - Examine both interviewer-computer interaction and
interviewer-respondent interaction - Supports both observational and experimental work
29Layout of SRC Usability Laboratory
30Usability Laboratory Tools
- Scan-converted video image of computer screen
- Video of interviewers hands on keyboard
- Video of interviewer-respondent interaction
- Video editing and coding software
- tag and code notable incidents
- timer markers to find incidents of a certain type
- ability to edit and create a greatest hits video
31Issues in Usability Research and Testing
- Expert versus novice users
- Number of subjects
- Scripts versus scenarios versus natural
interaction - Field versus laboratory
- Focus on instrument, interface or interaction
- When to test usability
- Reducing, analyzing qualitative data
- Use of performance measures, objective outcomes
32Usability Evaluation of NHIS Cancer Supplement
- Real time test of usability evaluation for an
upcoming supplement to the NHIS - Total of 4 weeks from beginning recruitment and
setup to delivery of report - 11 interviews observed in lab and analyzed on
videotape - These interviews revealed almost all of the major
design problems subsequently identified in a
300-case field pretest
33Problematic Item in Cancer Supplement
34Problematic Item in Cancer Supplement
35Use of Paradata to Identify Usability Issues
- Paradata
- auxiliary data about the process of data
collection - produced as an automatic byproduct of the CAI
and CSAQ process - Keystroke files, trace files, transaction logs
- Time stamps
- Review of interviewer notes, problem reports,
help-desk logs, etc. - Item missing data, breakoffs, etc.
36Example From NHIS Trace File Analysis
37Trace Files on FSSI Item
- This question follows a series of 4 questions
with following format on various sources of
income - (1) Yes
- (2) No
- Despite message to interviewers, still over 1,100
backups to this item - 87 of backups to FSSI are from a followup
question asked only of those who answered (2). - of these, 97 returned to FSSI and changed the
answer from (2) to (3).
38Experimental Work on CAI Usability
- Several small-scale experiments in usability lab
to test alternative designs - Variety of approaches used, for example
- novice users recruited as subjects
- interviewers using scenarios
- mock interviews with respondent following script
- Examples of studies
- item-by-item versus grid-based design of
household roster - alternative approaches to multiple-response items
- use of mouse versus keyboard for audio-CASI
39Test of Item Versus Grid for Roster
- 12 interviewers used both versions, order
randomized - Grid version took 7 less time on average than
item version - Larger gain for persons other than the first
- Duration in seconds per eligible household member
- Results replicated in field using CATI by Fuchs
(1999)
40Test of Mouse Versus Keyboard for Audio-CASI
- 40 low-literacy women recruited from soup
kitchens and homeless shelters to complete
audio-CASI instrument - Each subject completed the interview using each
input mode, with order randomized - Performance data favored keyboard
- 5.7 minutes for keyboard
- 9.4 minutes for mouse
41Respondent Preference for Input Mode
42Possible Reason for Poor Performance of Mouse
43Triangulation of Methods
- Recommend a variety of methods
- For example, trace file analysis confirmed what
we observed in the laboratory - Expert review identified various screen design
issues for closer examination in laboratory - Observations led to a set of experiments on
alternative grid designs - Prototyping and rapid evaluation of alternative
designs to inform standards or guidelines
44Implementing User-Centered Design
- 1. Acknowledge need for user-centered design
- 2. Make use of your experts
- 3. Involve the users
- 4. Allocate time and resources
- 5. Start early
- 6. Foster communication
- 7. Become familiar with HCI findings
- 8. Take a long-term view
451. Acknowledge the Need for UCD
- Need to overcome organizational resistance
- May be seen as one more (unnecessary) thing to
worry about - Viewed as adding time ands money to development
process - Possible threat to role of programming staff
- Things have always worked fine why change now?
- Why should we care?
462. Make Use of Experts
- Experts in interface design (human factors, HCI
specialists) - Experts in pretesting and evaluation (cognitive
laboratory researchers, survey methodologists) - Experts in use of the systems (interviewers,
field staff)
473. Involve the Users
- Interviewers are one of the most important
resources in an organization make use of them - Involve interviewers in design decisions
- Bring interviewers and respondents in to test and
evaluate designs, and observe them in a natural
setting - Listen to interviewer and respondent comments
about design issues - Use paradata to evaluate how systems are being
used
484. Allocate Time and Resources
- Not everybody can build a usability lab, hire
usability experts and conduct extensive end-user
testing - But everyone can at least evaluate usability of
systems during pretesting and development using
less expensive methods - Systematize the process
495. Start Early
- After interviewer training or during production
is the wrong place to start - Usability considerations should be part of all
stages from conceptualization and design to
implementation - Different activities can occur at different
stages of the process - guidelines to assist initial development
- testing to evaluate near-final instruments
506. Foster Communication
- Between programmers/authors/designers and
substantive experts - Between authors/designers and users
- Establish feedback mechanisms and system for
reporting design problems - Listen to users, and act on their suggestions
- Programmers must be exposed to how their programs
are used
517. Become Familiar with HCI Findings
- Train programming staff in user-centered design
- Read the literature
- Communicate with others, share experiences,
report successes and failures - Attend conferences on computer assisted survey
methods
528. Take a Long-Term View
- Will not get it right the first time learn from
mistakes - Software and hardware constantly evolving
anticipate new developments - Evaluate, debrief, document for the future
- Develop guidelines, specifications
- Develop expertise and appreciation for UCD among
all levels of staff
53Summary
- Design is important!
- both in self-administered and interviewer-administ
ered surveys - increasingly so, given rapid proliferation of CAI
and CSAQ methods and increasing complexity of
survey instruments - We need to develop ways to ...
- evaluate design
- improve design
- implement good design
54Thank you!