Title: Voice as a User Interface Tuning Speech
1Voice as a User InterfaceTuning Speech
- Baiju D. Mandalia, PhD
- Senior Technical Staff Member, IBM Corporation
2Voice as a User Interface
- Human factors standards
- Error handling
- Prompt Design
- Dialog Design
- Grammar Design
- Tuning voice applications
3ETSI ES 202 077 for spokencommands
- Context-independent common commands (like Main
Menu, Operator, Goodbye etc.) - Context dependent common commands (Help, Repeat)
- Core commands (Yes, No, Stop)
- Digits (including zero, Oh, double etc)
- Name and Digit dialing (like Home, Work, Mobile)
- Basic Call Handling (Answer, Divert all calls
to, Transfer etc.) - Media control (Play, Pause, Continue etc.)
- Browseable List Navigation (Next, Continue,
Details) - Editing Commands (Delete, Save, Record etc.)
- Device settings (Volume up/ Louder, Volume
down/Quieter) - Word-spotting mode (Wake-up- to activate other
modal functions in above list)
4Dialog Components for handling errors
5Prompt Design
- Pre-recorded vs Text to Speech
- Guiding the caller
- SSML
- Dictionaries
6Dialog Design
- Consistency
- Make sure caller experience is uniform during
call - Barge-in
- Define dialog based on use of better interaction
and barge-in by experienced users - Using Nbest
- Exploit nbest to provide disambiguation with
complex tasks
7Grammar Design
- Unknown pronunciations
- Acoustic Confusability
- Grammar Coverage
- Complexity
- Dynamic Application Development
- Weighting of more common words
8Tuning Voice Applications
- Timeouts
- Lexicons
- Weighting
- Confidence levels
- Speed vs Accuracy
- Sensitivity
- Acoustic model adaptation
9WebSphere Voice Server Tuning Tools
10Tuning tools features
- Eclipse based
- User friendly , graphical interface
- Tightly integrated for repetitive testing
- Assistive tools like pronunciation builder for
tuning
11Validating grammars on the MRCP Server
12Enumerating a grammar (random)
13Testing grammars with text
14Pronunciation Builder
15Testing grammars with speech
16Voice Trace Analyzer for Tuning
- Set the Voice Server trace specification.
- Run your voice application (generate trace data).
- Run the WVS Collector tool.
- (optional for Integrated Runtime Environment)
- Import the data into the Voice Trace Analyzer.
17Voice Trace Analyzer Views
18Transcriptions within tool
- In Grammar
- Accuracy
- Correct Accept (CA)
- Correct Reject (CR)
- False Accept (FA)
- FA-In
- FA-Out
- False Rejects (FR)
19References
- WebSphere Voice Server
- http//www.ibm.com/software/pervasive/voice_serve
r - WebSphere Voice Server Information Center
- http//publib.boulder.ibm.com/infocenter/pvcvoice
/51x/index.jsp - WebSphere Voice Zone
- http//www.ibm.com/developerworks/websphere/zones
/voice - IBM WVS for Multiplatforms V5.1.1/V5.1.2 Handbook
- http//www.redbooks.ibm.com/abstracts/sg246447.htm
l?Open - Speech User Interface Guide
- http//www.redbooks.ibm.com/redpieces/abstracts/r
edp4106.html?Open
20(No Transcript)