Title: Designing Speech Interfaces for Kiosks
1Designing Speech Interfacesfor Kiosks
Max Van KleekBuddhika Kottahachchi Tyler
Horton Paul Cavallaro
2AGENDA
- Background
- Motivation
- Design
- Current Implementation
- Demo (Video)
- Evaluation
- Conclusions Future Work
3Background OK-Net Oxygen Kiosk Network
4BackgroundSmart Kiosk Information Navigation
and Noteposting Interface (SKINNI)
Provide timely, relevant information to visitors
and members of the CSAIL community through a
touchscreen GUI
5BackgroundSmart Kiosk Information Navigation
and Noteposting Interface (SKINNI)
Provide timely, relevant information to visitors
and members of the CSAIL community through a
touchscreen GUI
6BackgroundSmart Kiosk Information Navigation
and Noteposting Interface (SKINNI)
Provide timely, relevant information to visitors
and members of the CSAIL community through a
touchscreen GUI
7MOTIVATION
- Searching for specific information via
touchscreen GUIs feels tedious, error prone
- more time consuming than desirable - poor
pointing accuracy - widgets behave differently
on touchscreens - no tactile feedback - Optimizing the GUI for touchscreens, and adding
shortcuts to allow searching/rapid information
access yielded limited success - screen
clutter - new vs experienced users -
forced user to use attached keyboard
8MOTIVATION
Searching for Howard Shrobe Touch Directory
Pane (Scan list of names, realize they are
alphabetical by last name) Touch scrollbar Down
arrow Touch scrollbar Down arrow Attempt to drag
scroll box downward (fails) Touch S shortcut at
top of screen (Scan list of names) Touch
scrollbar Down arrow Touch scrollbar Down arrow
Touch scrollbar Down arrow Touch scrollbar Down
arrow Touch scrollbar Down arrow Touch row
corresponding to Howard Shrobe
9MOTIVATION
Searching for Howard Shrobe Touch Directory
Pane (Scan list of names, realize they are
alphabetical by last name) Touch text field
corresponding to Last name (Move hand / glance
from screen to keyboard) Type S, h, r, o,
b, e Touch row corresponding to Howard
Shrobe
Much shorter, but much less frequently used
awkward since eyes/hands are swapping between
screen and keyboard
10Kiosk Kiosk on the wall... What's the best
interface of them all?
11DESIGN Speech Challenges
- Robustness - Speaker independence - Speech
dysfluencies and accents - Signal capture in
noisy environments ...achieving good
recognition accuracy. - Usability - Low threshold of use - Initial
learning curve - Visibility of system state -
Handling misrecognition errors gracefully -
Managing user expectations
Related work ESPIRIT MASK project Gavin et.
al. (1996) Smart Kiosk project Christian et
al. (2000)
12DESIGN - Galaxy
- Galaxy gives us... - Speaker independence -
Handling of Speech disfluencies/accents - Speechbuilder gives us... - Ease of speech
domain definition/manipulation - Distributed architecture lends well to
Kiosks - Thin clients dependent on more powerful
servers
13IMPLEMENTATION -Architecture
14IMPLEMENTATION Speech Domain
- Constrained domain - Only directory field and
map queries - Iterative Design - Initial domain extended
through informal user survey
ltopt speechbuilder"4.0"gtltclass type"Action"
name"show_room"gt ltentrygt where is
room thirty two two two six A lt/entrygt
ltentrygt can you please (show me
tell me) a map of where room thirty
two two two six A lt/entrygt ltentrygt
can you please (show me tell me) a map
of where is Ben Bitdiddle office is
lt/entrygt ltentrygt Do you know where
is Ben Bitdiddle office is
lt/entrygt lt/classgtltclass type"Key"
name"Person"gt ltentrygtHal Abelsonlt/entrygt
ltentrygtBryan Adamslt/entrygt ltentrygtEdward
Adelsonlt/entrygt . .
15IMPLEMENTATION Innovation
- Speech state feedback GUI - Provides immediate
visual feedback of the system state - What was
recognized? - Is the system ready for
interaction? - Is the system busy?
16IMPLEMENTATION Innovation
- Advantages - User is made aware of what the
system is trying to do - Reasons for recognition
failures can be determined - Initial
familiarization process is much smoother - User
retention increases - Disadvantages - Isn't helpful for visually
impaired users - Takes up display space
17DEMO
18EVALUATION - Methodology
- Informal user study
- 10 subjects (lab members not representative)
- Task - Look up the phone number for 18 randomly
selected lab members - First 6 using the Speech
Interface - Second 6 using the Touchscreen
Interface - Final 6 using the preferred - Metric - Time taken - From when name to be
looked up provided to the subject - To when
subject retrieves the number from the kiosk
19EVALUATION - Results
- Subjects were not aware of supported query
forms - recognition rate in the first 2 queries
50 - thereafter 72 - 8/10 subjects preferred the speech interface
- When recognition was successful, performance
was consistently better!
20CONCLUSIONS
- Users are receptive to using speech interfaces
- Failed recognition imposes severe penalties on
performance - Ramp-up time can be reduced and user
retention increased by providing appropriate
feedback
21FUTURE WORK
- Improve recognition rates - Improve speech
domain - Update voice models (current ones from
phone data) - Further evaluation
- Extend speech interface to support all
functionality exposed via touchscreen interface - Conversation support - dialog and discourse
management - Multi-language support - Stata visitors come
from all over the world