Title: Understanding and Optimizing Speech Recognition
1Understanding and Optimizing Speech Recognition
- Vocera Customer Webinar Series
- July 29, 2009
2Objectives
- Understand how the Nuance engine works
- Measure the performance of the Nuance engine
- Analyze reports from the Vocera Report Server
- Maximize speech recognition performance with best
practices
3Speakers
- Kathy Brown
- Vocera, Inc.
- Director of Professional Services
- Steve Blair
- Vocera, Inc.
- Director of Engineering
4Measuring Speech Recognition
5Some Speech Rec Questions
- What is good speech recognition?
- How do you know you have it?
- What is bad speech recognition?
- What can you do about it?
6Measuring Speech Recognition
- Two methodologies
- Supervised
- Unsupervised
- Unsupervised Automated reporting
- Reports generated automatically from call log
data - VRS Reports
- Supervised Manual reporting
- Manually transcribe what the user actually said
- Compare transcribed utterance to call log data
and grammars
Transcription written text corresponding to the
actual spoken utterance
7Typical VRS Speech Rec Summary
27,243 69.10 8,388 21.30 3,036
7.70 729 1.80 19 0.00 Total
39,415 100
Overall speech recognition statistics
8Measuring Accuracy
- Unsupervised reporting has built-in limitations
- You must compare the Nuance interpretation to the
user utterance to determine accuracy - This is a manual process
- Unsupervised reporting is an automated process
You cannot measure accuracy without transcriptions
9Types of Utterances
- First step in measuring accuracy is to separate
utterances into two groups
Utterances
Out-of-Grammar
In-Grammar
10In and Out of Grammar
- In-grammar
- User says something that is in the defined
grammar - Example Call Brian Sturges
- Out-of-grammar
- User says something that is not in the defined
grammar - Example Give me Brian Sturges
11Why Separate Utterances?
- Distinction between in- and out-of-grammar is
critically important - In-grammar utterances
- When not correctly recognized, need recognizer
and dictionary tuning - Out-of-grammar utterances
- Indicate training, usability, and database issues
12In-Grammar Measurements
- In-Grammar Correct Accept (IG-CA)
- The recognizer returned the correct answer
- In-Grammar False Accept (IG-FA)
- The recognizer returned an incorrect answer
- In-Grammar False Reject (IG-FR)
- The recognizer could not find a good match and
rejected rather than return an answer
13In-Grammar Measurements
- In-Grammar Correct Accept
14Out-of-Grammar Measurements
- Out-of-Grammar Correct Reject (OG-CR)
- The recognizer correctly rejected the input
- Out-of-Grammar False Accept (OG-FA)
- The recognizer returned a wrong answer because
the input was not in the grammar
15Out-of-Grammar Measurements
- Out-of-Grammar Correct Reject
- Out-of-Grammar False Accept
16True Recognition Accuracy
- Typical recognition accuracy is usually much
greater than 70
More representative speech recognition accuracy
Actual customer data from medium-sized hospital
17Limits of VRS Reports
- VRS reports do not measure accuracy!
- Why?
TIME Sun Apr 20 080559 2008 BEGIN_TIME
14.25 SESSION_ID
3d53190a_000009d8_480b5bd7_84a2_0000 START_OF_SPEE
CH_DELAY 4.36 SPEECH_DURATION
3.2 STATUS
RECOGNITION LATENCY
0.219 PERCENT_CPU_RT
0.274689 SERVER_HOSTNAME
absmc-vcdb2 NUM_RESULTS
2 RESULT0 call patricia
shiferaw please CONFIDENCE0
47 PROBABILITY0 -11201 NUM_NL_INTER
PRETATIONS0 1 NL_INTERPRETATION00
ltsnAction Callgt ltsnName1 u-pshiferawgt CONFIDENCE
00snAction 65 CONFIDENCE00snName1
48
User utterance
Nuance call log data
18Speech Recognition Baseline
- Performed intensive survey
- Collected 41,845 in-field utterances and Nuance
logs from 5 large Vocera hospitals (8 sites) - Transcribed 7932 utterances
- Analyzed results for accuracy (intent result?)
- Result highlights
- 18 utterances are out-of-grammar
- 89 of in-grammar utterances were successfully
recognized - 71 of utterances were CALL subject
19What You Can DoOptimizing Speech Recognition
20Use ASNs Sparingly
- Alternate Spoken Names (ASNs) increase the size
of the grammar - Larger grammars increase the likelihood of
misrecognition - Incorrect ASNs create problems that are hard to
debug
21When to Use an ASN
- Use ASNs for nicknames
- Victoria Noel, who is known as Tori
- Charles Gregory Drew, who goes by his middle
name, Greg - Ernest Alexander, who is called Skip by
everyone - Use ASNs for titles
- Benjamin Spock, known as Doctor Spock
- Dwight Eisenhower, called General Eisenhower
22Other Uses of ASNs
- Name most commonly used
- Adult I C U Lead Seventh Floor Lead
- Maiden name
- Rebecca Barry Rebecca Nunn
- Many users cannot pronounce the name
- Mazowiecki Popieluszko Mazzy Pop
23Quasi-Phonetic Spellings
- Carefully use ASNs to spell non-English names
phonetically - Spell as English words or names
- Spell as phonetically simple syllables
- Pseudo-phonetic spellings are a slippery slope!
- Examples of useful phonetic ASNs
- Pacita
- Nuance thinks it is Pacheetah
- Could be spelled as Paseeta
- Cindy Landola
- Could be spelled as Lantola
24How to Add ASNs
- Identify a commonly mispronounced name
- Record the users pronunciation
- Make sure the name is in the database
- Call the name to see if it works for you
- Respell the name phonetically
- Call the name to check your work
Test, test, and test again!
25When to Use Learned Names
- Learned names
- Effective when only a few users mispronounce the
name - Alternate spoken names
- Effective when many users mispronounce the name
26When to Use Department Names
- Department name allows use of first name only
- Michael Gebrekristos Michael in Administration
- Aster Cabansag Aster in E K G
- Department name disambiguates similar names
- Peri Gunay in Imaging
- Eric Auneit in Transport Services
- Assigning departments to each user is akin to
adding an ASN for each user
27Specify Length of Phone Extensions
- Specify the number of digits in fixed length
extensions and pager numbers - Recognition improves significantly by limiting
the length of the digit string - Accuracy degrades 1 per digit
- Set values in properties.txt
- TelExtensionLength
- Number of digits in a phone extension
- TelPagerNumberLength
- Number of digits in an inside pager number
28 29Speech Recognition Support
- Vocera Professional Services Documents
- Vocera Speech Recognition Professional Services
Support Data Sheet - Vocera Professional Services Brochure
Download both from Live Meeting by clicking the
Handouts button in the top right area of your
screen These documents can also be found on the
Vocera customer portal at http//vocera.com/porta
l/View/Default.aspx?id112parentServiceandSupp
ort
30Contact Information and Questions
Vocera Speech Recognition Questions Vocera
Professional Services or your Vocera Sales
Associate toll free 1 888 9 Vocera 1 888
986 2372 Webinar Information To register for
upcoming webinars or view recordings of previous
webinars visit http//vocera.com/news/webinars.as
px About Vocera Webinars If you have questions
about the Vocera Customer Webinar Series, please
send an email to webinars_at_vocera.com
31Thank You