Title: Task oriented application of automatic speech recognition
1Task oriented application of automatic speech
recognition
2Task specific voice control and dialog system
- To integrate a speech recognition system into a
task specific application to perform a useful
task - System consists of
- A speech recognizer
- A language analyzer
- An expert system
- A physical system being controlled by the voice
commands - Text to speech synthesizer
3Converts I/p into grammatically correct text
Extract meaning from text
Selects desired action
Converts text reply in m/c generated speech
Text
Meaning
Text Reply
Speech
Text to Speech synthesizer
Speech
Speech recognizer
Language analyzer
Expert system
Voice I/p
Voice O/p
Receives data from system
Issues command to system
Vocabulary grammar model
System under voice control executes commands Repo
rts status
Semantic rules
Pronunciation rules
Output action
FigBlock diagram of a task-specific voice
control and dialog system
4- Speech recognizer
- The function of this block is to convert speech
I/p into a grammatically correct text. - It is constrained by the recognizer vocabulary
and grammar model. - The text string is sent to a language analyzer
5- Language analyzer
- Extracts the meaning from the text with the help
of semantic rules - The decoded meaning is sent to the expert system
- Expert system
- First selects the desired action then issues
appropriate commands to a physical system under
voice control to carry out the action then
receives data on the command status
6- Ex. command carried out successfully or
unsuccessfully and then construct a textual
reply - Text to speech synthesizer
- A text reply is converted into a speech message
with appropriate word pronunciation rules and
played back to the user - The system in the figure performs the specific
task of interest
7Characteristics of speech recognition applications
- Beneficial to the user
- User friendly
- Accurate
- Real time
8- Proposed system must provide a real benefit to
the used in the form of - Increased productivity
- Ease of use
- Better m/c interface or a more natural way of
communication - If the application is not useful to the user it
do not succeed over time
9- The system must be user friendly. User should
feel comfortable, it must provide friendly and
helpful voice prompts and it must provide an
effective means of communications. - The system must be accurate.
- The recognition system must respond in real time.
The response should be very fast
10Methods of handling recognition errors
- Four ways to deal with the errors
- Fail soft methods
- Self-detection/correction of errors
- Verification or multilevel decision before
proceeding - Rejection/pass to operator
11- Fail soft methods
- The cost (in terms of time) of recognition error
is low - Hence the error is acceptable
- The error will be detected and corrected at the
later stage - The user can enter into a correction mode to
backtrack to the point where the error was made
12- Self-detection/correction of errors
- The recognition system utilizes known task
constraints (given database) to automatically
detect and correct recognition errors - Ex. Spelling of the name from finite list of names
13- Verification or multilevel decision before
proceeding - The recognition system ask the user for help
whenever likelihood score is high and it is
difficult resolving small differences in the
strings - The recognizer ask the user to verify the first
choice decision if it is not verified, the
recognizer ask the user to verify the second
choice
14- Rejection/pass on to operator
- By recording all spoken I/ps in digital format,
the system can reduce the error rate by rejecting
a small but finite percentage of the spoken
strings, and passing on such strings to a human
operator who makes the final decision based on
listening to the spoken input - By using all four techniques the accuracy of
speech recognizer approaches 100
15Broad classes of speech recognition applications
- Five broad classes
- Office or business system
- Manufacturing
- Telephone or telecommunications
- Medical
- other
16- Office or business system
- Data entry
- Database management and control
- Keyboard enhancement
- Manufacturing
- Eyes-free , hand free monitoring of manufacturing
foe quality control
17- Telephone or telecommunications
- Many applications are feasible over dialed up
telephones - Automation of operator assisted services
- Telemarketing
- Call distribution by voice
18- Medical The primary application is voice
creation and editing of specialized medical
reports - Other
- Voice controlled and operated games and toys
- Voice recognition aids for the handicapped
- Voice control in a moving vehicle
- Climate control
19Command and control applications
- User can control the machines using simple
commands - Voice repertory dialer a dialer allows a caller
to place a calls be speaking the name of someone
in the repertory (accumulation) rather than
dialing the digit code. - Used in mobile phone, within a car (eyes and
hands free)
20- A repertory dialer needs a speaker trained set of
vocabulary pattern corresponding to repertory
names (and their phone no.) - Needs a speaker independent set of vocabulary
patterns corresponding to the digits and set of
command words for controlling normal telephone
features (off-hook, dial, repeat, hang up)
21Automated call type recognition
- The automation of operator-assisted to parallel
calls - Ex. Call made from a pay phone that normally
require operator assistance, including collect
calls, person to person calls, third party
billing calls, operator assisted calls and credit
card calls
22- Five options for this service a vocabulary
consisting only five words is adequate - Collect to make collect calls
- Person to make person to person calls
- Third number to make third party billing calls
- Operator to make operator assisted calls
- Calling card to make calling card calls
23- The system is speaker independent and can work
over the standard dialed-up telephone network - If the customer obeys the voice prompt and spoke
one of the command words then the accuracy of the
system is more than 99 - Customer have to use the specific command word
- Or otherwise keyword spotting technique have to
be used to find out the command words embedded
within the sentence
24Call distribution by voice commands
- A call is placed that will normally answered by
an operator who then distributed the call to the
appropriate location (person) based on the users
responses to the questions asked by the attendant - In this application the attendant function is
automated via voice processing
25- The voice response system poses a series of menu
based questions, and based on the user responses,
route the call appropriately - Ex. Railway system
26Directory listing retrieval
27- Provides the access to directory information from
spoken spelled name - To access the directory information for a name in
the directory, the user spells the name using the
word stop between the last name and the
initials as in Rabiner-stop-LR-stop - The speech recognizer demands the name in the
given directory which best matches the spoken
input and then speaks the directory information
for that name to the user.
28- Due to similar sounding letters there may be
error, but the telephone directory provides task
syntax that automatically detects and corrects
improperly recognized letters - System can handle common misspelling of names
with a single insertion or deletion of letter
29Credit card sales validation
- Merchant needs cc validation and does not have
automatic card reader or modern dialer then - He must call a specific number and provides an
attendant with 10 digit merchant identification
number, a 15 digit cc no. and the amount in
rupees of the transaction
30- In this case a speech recognition system uses a
connected digit recognizer to recognize the
merchant identification number and CC no. and a
connected word recognizer for the transaction
amount - For amount the vocabulary size for recognition is
larger than that of need for cc no - Because same string can be spoken in various ways
( Rs. 137 Rs one three seven/ rs. One thirty
seven etc )
31End of Chapter 9