Title: A Laboratory Evaluation of Six Electronic Voting Machines
1A Laboratory Evaluation of Six Electronic Voting
Machines
Fred Conrad University of Michigan
2Multi-institution, Multi-disciplinary Project
University of Michigan Frederick Conrad Emilia Peytcheva Michael Traugott University of Maryland Paul Herrnson Ben Bedersen
Georgetown University Michael Hanmer University of Rochester Richard Niemi
3Agenda
- The problem
- Usability can affect election outcomes!
- Method
- Anything unique about what we did?
- Some results
- Satisfaction
- Performance
- Implications
4Acknowledgements
- Wil Dijkstra, Ralph Franklin, Brian Lewis, Esther
Park, Roma Sharma, Dale Vieriegge - National Science Foundation
- Grant IIS-0306698
- Survey Research Center
- Institute for Social Research, University of
Michigan - Partners
- Federal Election Commission (FEC), Maryland State
Board of Elections, National Institute of
Standards and Technology (NIST) - Vendors
- Diebold, Hart InterCivic, ESS, NEDAP, Avante
- Note Sequoia declined invitation to participate
5Scope and limits of current work
- Todays talk presents a small scale study that
was designed to demonstrate potential challenges
and inform future work - It does not address system accuracy,
affordability, accessibility, durability or
ballot design - The voting systems tested were those available
when the study was conducted some machines may
have been deployed with different options some
machines may since have been updated
6Voter intent and e-voting
- Hanging chads in Florida 2000 came to symbolize
ambiguity about voter intent - E-voting (e.g. touch screen user interfaces) can
eliminate this kind of ambiguity - With e-voting, no uncertainty about whether vote
is recorded - Though whether or not voter pressed a button on a
touch screen can be ambiguous - E-voting may introduce usability problems that
threaten credibility of voting tallies
7Usability ? Security
- Much of the e-voting controversy surrounds
security - Are the systems vulnerable to systematic,
widespread fraud? - We propose that at least as serious a threat to
integrity of elections is usability - Are voters ever unable to enact their intentions
because of how the user interface is designed? - Are they ever discouraged by the experience?
- Procuring e-voting systems may depend on
usability, security and cost, among other criteria
8Usability is only one characteristic of overall
performance
- Our focus on usability is not intended to suggest
that other dimensions of system performance are
not important - We are simply focusing on usability
- Accuracy, Accessibility, Affordability,
Durability, Security, Transportability - we did not test with disabled users
9Some Hypotheses
- Voters will make more errors
- If they have limited computer experience
- unfamiliar with interface and input conventions
- scroll bars, check boxes, focus of attention,
keyboard - For some voting tasks than others
- e.g. writing-in votes, changing votes
- Voters will be less satisfied
- the more effort required to vote
- e.g. more actions like touching the touch screen
10Current Project
- Examines usability of 6 e-voting systems
- 5 commercial products (used in 2004)
- 1 research prototype
- Field (n ?1500 ) and laboratory (n 42)
- Breadth vs. depth
- Focus today on laboratory study
11The machines
- Selected to represent specific features
- Vendors (with exception of NEDAP) implemented
ballots for best presentation - Photos that follow taken by our research group
not provided by vendors
12Avante Vote Trakker
Image removed to reduce size of file contact
author for complete presentation
13Diebold AccuVote TS
Image removed to reduce size of file contact
author for complete presentation
14ESS Optical Scan
Image removed to reduce size of file contact
author for complete presentation
15Hart InterCivic eSlate
Image removed to reduce size of file contact
author for complete presentation
16NEDAP LibertyVote
Image removed to reduce size of file contact
author for complete presentation
17UMD Zoomable Systemwww.cs.umd.edu/bederson/votin
g
Image removed to reduce size of file contact
author for complete presentation
18General approach (lab and field)
- Before voting, users indicate intentions by
circling choices in each contest - In some contests, instructed how to vote
- All users asked to vote on all 6 machines
- with one of two ballot designs
- Office Block
- Straight Party option
- in 1 of 6 random orders (Latin Square)
19General approach (contd)
- Tasks
- change a vote
- write-in a vote
- abstain (undervote) in one contest
- two contests required voting for 2 candidates
- Users complete satisfaction questionnaire after
each machine
20Lab Study Design
Ballot Design Computer Experience Low High Computer Experience Low High
Office Block 21 9
Straight Party 10 2
n number voters
gt twice a week
21Lab Study Design and Procedure
- 42 people recruited via newspaper ads
- 31 with limited computer experience
- 29 over 50 years old
22Why did we oversample older users with little
computer experience?
- Because e-voting systems must be usable by anyone
who wants to vote - If anyone is unable to enact their intentions
because of the user interface, the technology is
failing - We wanted to focus, in our small sample, on those
people most likely have problems
23More about users
- Visited lab in Ann Arbor, MI in July and August,
2004 - paid 50 for 2 hours
- Previously voted in an election
- 95 reported voting previously
- 7 reporting using touch screens when they voted
- Prior voting experience
- Paper 43
- Punch card 69
- Lever machine 48
- Dials and Knobs 19
- Touch screen 7
24Design and Procedure (contd)
- All machines in a single large room
- 2 video cameras on rolling tripod
- 1 per 3 machines
- Proprietary designs ruled out use of direct
screen capture e.g. scan converter or Morae
25Satisfaction Results
- Preview
- Left-most bar (Diebold)
- Right-most bar (Hart)
- Consistent with data from field study (n ? 1500)
- Provides face validity for lab results with small
sample
26The voting system was easy to use
27I felt comfortable using the system
28Correcting my mistakes was easy
29Casting a write-in vote was easy to do
30Changing a vote was easy to do
31Why the differences in satisfaction?
- We believe the answer lies in the details of the
interaction - Thus, we focus on subset of voters using these
two machines - Office block ballot
- Limited computer experience
- n 21
- Represents 20 of (what we project will be) ?
13,000 codable behaviors
32Focus on subgroup of users
Ballot Design Computer Experience Low High Computer Experience Low High
Office Block 21 9
Straight Party 10 2
n number voters
33Coding the Video
Image removed to reduce size of file contact
author for complete presentation
34Coding the Video (2)
Image removed to reduce size of file contact
author for complete presentation
35Sequential analysis
- Goal is to identify and count event patterns
- Order is critical because each event provides
context for events that follow and precede it - E.g. trouble changing votes when original vote
must be deselected - How many times did voters press new candidate
without first deselecting? - How often did they do this before consulting
Help? - How often did they do this after consulting Help?
- Tree analysis example
36Number of Actions
- For every touch screen action there are two
actions with rotary wheel - Touch screen press screen with finger
- Rotary wheel move wheel and press Enter
- Empirically, people take proportionally more
actions - Diebold 1.89 actions per task
- Hart 3.92 actions per task
37Number of Actions
Write-in
Change vote
Getting started
38Duration
- Voting duration (mins) varied substantially by
machine - Diebold 4.68 (sd 1.27)
- Hart 10.56 (sd 4.53)
- Presumably due to larger number of actions in
Hart than Diebold - And possibly more thorough ballot review
39Accuracy
- Varies by Machine and Voting Task
- 2 Candidates (State Representative)
- Inaccurate enough for concern
- Errors of Omission just voted for one candidate
- Write-In (Member Library Board)
- Quite inaccurate for Hart
- Errors of commission name spelled wrong
- Errors of omission no write-in vote (in the end)
- Changing Vote (Probate Court Judge)
- Overall accurate but slightly less accurate for
Diebold - Error of commission unintended candidate remains
selected
40Voting Accuracy
Change vote
Write-in
2 Candidates
41Number of Actions Getting Started
Image removed to reduce size of file contact
author for complete presentation
8 actions minimally required to access system 4
selections and 4 Enter presses
42Number of Actions Getting Started
Image removed to reduce size of file contact
author for complete presentation
2 actions required to access system Insert
access card and press Next
43Access examples
- Hart
- Voter is not able to select digits with rotary
wheel, attempts to press (non-touch) screen,
requests help - Help does not help
- Voter figures it out
- Diebold
- Voter slides access card into reader
- Presses Next
44Number of Actions Vote Change
- Diebold requires de-selecting current vote in
order to change it - Clicking on already checked check box
- Likely to be opaque to non-computer users
- Despite manufacturer-provided instructions
- On only 11/21 occasions, voters correctly
deselect on first try - On 10/21 touched second candidate without first
deselecting original selection
45Number of Actions Vote Change
- Changing votes is essential for correcting errors
and expressing change of heart - Example of problem changing vote
- Voter 27
46Number of Actions Write-in
- Write-in votes generally involve as many actions
as letters in the name - Double this if navigation and selection required
- Example of problems correcting write-in mistakes
- Voter 38
47Review
- Both machines offer similar ballot review
- Displays voters choices and highlights
unselected contests - In both cases, ballot review spans two pages
48Review Hart
Image removed to reduce size of file contact
author for complete presentation
49Review Diebold
Image removed to reduce size of file contact
author for complete presentation
50How often do voters review their votes?
- On how many occasions did voters cast ballot
without reviewing all choices (displaying the
second review page)? - Hart 8/34
- Diebold 17/29
- Diebold review much briefer than Hart suggesting
cursory review - Hart 55.5 seconds
- Diebold 9.8 seconds
51Review Example 1
- Diebold
- Voter (seems to accidentally) not vote in one
contest, resulting in an undervote - Completes ballot and system displays review
screen - She immediately presses Cast Ballot and says
That one I felt confident in didnt even need
to go over it
52Review Example 2
- Hart
- Voter (seems to accidentally) not vote in two
contests, resulting in two undervotes - Completes ballot and system displays first of two
review screens - He selects first undervote (in red text) and
system displays relevant contest in ballot - He selects intended candidates, i.e. votes for
circled candidates in voter info booklet, and
system displays first review screen - He repeats for second undervote
53Review screens
- Some designs promote more review and correction
of errors than others - Hart review screens visually distinct from ballot
screens and, if voter presses Cast Vote after
first review screen, system displays second
screen - Diebold review screens hard to distinguish from
ballot screens and if voter presses Cast Ballot
without scrolling to see lower part of screen,
system casts ballot - More review and correction surely improves voting
accuracy - but involves more work which may lead to lower
satisfaction
54Summary
- User satisfaction and performance related to
particular features - Touch screen involves fewer actions and seemed
more intuitive to these users than
wheel-plus-enter sequence - Deselecting a choice in order to change it seemed
counterintuitive to many voters and responsible
for at least one incident of casting an
unintended vote - Review screens designed to promote review
(distinct from ballot, hard to cast vote in
middle) led to more review and correction
55Summary (contd)
- These users were more successful on some tasks
with Hart and on others with Diebold - Fit between features and tasks more appropriate
level of analysis than overall machine
56Conclusions
- In a situation designed to maximize usability
problems, the machines mostly fared well - But they did exhibit some usability problems and
accuracy was not perfect - Both unintended votes and no votes
- Substantial proportion voters did not review
their ballots - Seems likely that non-computer users will not
recognize interface conventions - E.g. De-selection and scrolling
- Even very low error rates -- for just computer
novices -- can matter in very close elections
57Conclusions (cont)
- We cannot compare voters performance with new
technology to older techniques - But we will be able to use performance with the
ESS (paper ballot, optical scan) as a rough
baseline - Certainly, voting systems are now being
securitized in a way they were not before
58Implications
- Most of these design problems can be improved by
applying usability engineering techniques - But industry and election officials need to make
this a priority - EAC/NIST developing usability guidelines
- Unparalleled design challenge
- Systems should be usable by all citizens all the
time, even if used once every few years
59 60Additional Slides if time permits
- User Interface Can Affect Outcome
- Variance
- Bias
- Some usability measures
- Measures (contd)
61User Interface Can Affect Outcome
- Ballot Design
- Butterfly ballot
- Interaction
- Casting ballot too soon
- Changing votes
- Writing-in votes
- Navigating between contests
- Reviewing votes
- Frustration, Increased Cynicism
- Abandonment
- Lower Turnout in Future
- Voters might question results
62Variance
- Interface-related error is not systematic
- all candidates should suffer equally from this
(all else being equal) - E.g. if difficult to change votes, doesnt matter
which selections require change - But unlikely that error for different candidates
is exactly complementary
63Bias
- Interface systematically prevents votes from
being cast for a particular candidate - Results either in no vote being cast or voter
choosing unintended candidate - e.g. Butterfly Ballot may have led Jewish voters
who intended to vote for Al Gore to vote for Pat
Buchanan
64Some usability measures
- Satisfaction
- Accuracy
- Do voters vote for whom they intend?
- In lab, compare circled choices to observable
screen actions - In field, compare circled choices to ballot
images and audit trails
65Measures (cont)
- Number of Actions
- Presses and clicks
- Substantive actions, e.g. requests for system
help, revisions of earlier selections - Duration
- Per task
- Overall