Title: Designing Speech Interfaces for Kiosks
1Designing Speech Interfacesfor Kiosks
Max Van KleekBuddhika Kottahachchi Tyler
Horton Paul Cavallaro
2AGENDA
- Background
- Motivation
- Design
- Current Implementation
- Demo (Video)
- Evaluation
- Conclusions Future Work
3(No Transcript)
4BackgroundSmart Kiosk Information Navigation
and Noteposting Interface (SKINNI)
Provide timely, relevant information to visitors
and members of the CSAIL community through a
touchscreen GUI
5BackgroundSmart Kiosk Information Navigation
and Noteposting Interface (SKINNI)
Provide timely, relevant information to visitors
and members of the CSAIL community through a
touchscreen GUI
6BackgroundSmart Kiosk Information Navigation
and Noteposting Interface (SKINNI)
Provide timely, relevant information to visitors
and members of the CSAIL community through a
touchscreen GUI
7MOTIVATION
- Searching for specific information via
touchscreen GUIs feels tedious, error prone
- more time consuming than desirable - poor
pointing accuracy - widgets behave differently
on touchscreens - no tactile feedback - Optimizing the GUI for touchscreens, and adding
shortcuts to allow searching/rapid information
access yielded limited success - screen
clutter - new vs experienced users -
forced user to use attached keyboard
8(No Transcript)
9(No Transcript)
10(No Transcript)
11DESIGN Speech Challenges
- Robustness - Speaker independence - Speech
dysfluencies and accents - Signal capture in
noisy environments ...achieving good
recognition accuracy. - Usability - Low threshold of use - Initial
learning curve - Visibility of system state -
Handling misrecognition errors gracefully -
Managing user expectations
Related work ESPIRIT MASK project Gavin et.
al. (1996) Smart Kiosk project Christian et
al. (2000)
12DESIGN - Galaxy
- Galaxy gives us... - Speaker independence -
Handling of Speech disfluencies/accents - Speechbuilder gives us... - Ease of speech
domain definition/manipulation - Distributed architecture lends well to
Kiosks - Thin clients dependent on more powerful
servers
13IMPLEMENTATION -Architecture
14IMPLEMENTATION Speech Domain
- Constrained domain - Only directory field and
map queries - Iterative Design - Initial domain extended
through informal user survey
ltopt speechbuilder"4.0"gtltclass type"Action"
name"show_room"gt ltentrygt where is
room thirty two two two six A lt/entrygt
ltentrygt can you please (show me
tell me) a map of where room thirty
two two two six A lt/entrygt ltentrygt
can you please (show me tell me) a map
of where is Ben Bitdiddle office is
lt/entrygt ltentrygt Do you know where
is Ben Bitdiddle office is
lt/entrygt lt/classgtltclass type"Key"
name"Person"gt ltentrygtHal Abelsonlt/entrygt
ltentrygtBryan Adamslt/entrygt ltentrygtEdward
Adelsonlt/entrygt . .
15IMPLEMENTATION Innovation
- Speech state feedback GUI - Provides immediate
visual feedback of the system state - What was
recognized? - Is the system ready for
interaction? - Is the system busy?
16IMPLEMENTATION Innovation
- Advantages - User is made aware of what the
system is trying to do - Reasons for recognition
failures can be determined - Initial
familiarization process is much smoother - User
retention increases - Disadvantages - Isn't helpful for visually
impaired users - Takes up display space
17DEMO
18EVALUATION - Methodology
- Informal user study
- 10 subjects (lab members not representative)
- Task - Look up the phone number for 18 randomly
selected lab members - First 6 using the Speech
Interface - Second 6 using the Touchscreen
Interface - Final 6 using the preferred - Metric - Time taken - From when name to be
looked up provided to the subject - To when
subject retrieves the number from the kiosk
19EVALUATION - Results
- Subjects were not aware of supported query
forms - recognition rate in the first 2 queries
50 - thereafter 72 - 8/10 subjects preferred the speech interface
- When recognition was successful, performance
was consistently better!
20CONCLUSIONS
- Users are receptive to using speech interfaces
- Failed recognition imposes severe penalties on
performance - Ramp-up time can be reduced and user
retention increased by providing appropriate
feedback
21FUTURE WORK
- Improve recognition rates - Improve speech
domain - Update voice models (current ones from
phone data) - Further evaluation
- Extend speech interface to support all
functionality exposed via touchscreen interface - Conversation support - dialog and discourse
management - Multi-language support - Stata visitors come
from all over the world