Task oriented application of automatic speech recognition - PowerPoint PPT Presentation

1 / 31

About This Presentation

Title:

Task oriented application of automatic speech recognition

Description:

Task oriented application of automatic speech recognition Chapter9 – PowerPoint PPT presentation

Number of Views:80

Avg rating:3.0/5.0

Slides: 32

Provided by: Sant134

Category:

more less

Transcript and Presenter's Notes

Title: Task oriented application of automatic speech recognition

1
Task oriented application of automatic speech
recognition

Chapter9

2
Task specific voice control and dialog system

To integrate a speech recognition system into a
task specific application to perform a useful
task
System consists of
A speech recognizer
A language analyzer
An expert system
A physical system being controlled by the voice
commands
Text to speech synthesizer

3
Converts I/p into grammatically correct text
Extract meaning from text
Selects desired action
Converts text reply in m/c generated speech
Text
Meaning
Text Reply
Speech
Text to Speech synthesizer
Speech
Speech recognizer
Language analyzer
Expert system
Voice I/p
Voice O/p
Receives data from system
Issues command to system
Vocabulary grammar model
System under voice control executes commands Repo
rts status
Semantic rules
Pronunciation rules
Output action
FigBlock diagram of a task-specific voice
control and dialog system
4

Speech recognizer
The function of this block is to convert speech
I/p into a grammatically correct text.
It is constrained by the recognizer vocabulary
and grammar model.
The text string is sent to a language analyzer

Language analyzer
Extracts the meaning from the text with the help
of semantic rules
The decoded meaning is sent to the expert system
Expert system
First selects the desired action then issues
appropriate commands to a physical system under
voice control to carry out the action then
receives data on the command status

Ex. command carried out successfully or
unsuccessfully and then construct a textual
reply
Text to speech synthesizer
A text reply is converted into a speech message
with appropriate word pronunciation rules and
played back to the user
The system in the figure performs the specific
task of interest

7
Characteristics of speech recognition applications

Beneficial to the user
User friendly
Accurate
Real time

Proposed system must provide a real benefit to
the used in the form of
Increased productivity
Ease of use
Better m/c interface or a more natural way of
communication
If the application is not useful to the user it
do not succeed over time

The system must be user friendly. User should
feel comfortable, it must provide friendly and
helpful voice prompts and it must provide an
effective means of communications.
The system must be accurate.
The recognition system must respond in real time.
The response should be very fast

10
Methods of handling recognition errors

Four ways to deal with the errors
Fail soft methods
Self-detection/correction of errors
Verification or multilevel decision before
proceeding
Rejection/pass to operator

Fail soft methods
The cost (in terms of time) of recognition error
is low
Hence the error is acceptable
The error will be detected and corrected at the
later stage
The user can enter into a correction mode to
backtrack to the point where the error was made

Self-detection/correction of errors
The recognition system utilizes known task
constraints (given database) to automatically
detect and correct recognition errors
Ex. Spelling of the name from finite list of names

Verification or multilevel decision before
proceeding
The recognition system ask the user for help
whenever likelihood score is high and it is
difficult resolving small differences in the
strings
The recognizer ask the user to verify the first
choice decision if it is not verified, the
recognizer ask the user to verify the second
choice

Rejection/pass on to operator
By recording all spoken I/ps in digital format,
the system can reduce the error rate by rejecting
a small but finite percentage of the spoken
strings, and passing on such strings to a human
operator who makes the final decision based on
listening to the spoken input
By using all four techniques the accuracy of
speech recognizer approaches 100

15
Broad classes of speech recognition applications

Five broad classes
Office or business system
Manufacturing
Telephone or telecommunications
Medical
other

Office or business system
Data entry
Database management and control
Keyboard enhancement
Manufacturing
Eyes-free , hand free monitoring of manufacturing
foe quality control

Telephone or telecommunications
Many applications are feasible over dialed up
telephones
Automation of operator assisted services
Telemarketing
Call distribution by voice

Medical The primary application is voice
creation and editing of specialized medical
reports
Other
Voice controlled and operated games and toys
Voice recognition aids for the handicapped
Voice control in a moving vehicle
Climate control

19
Command and control applications

User can control the machines using simple
commands
Voice repertory dialer a dialer allows a caller
to place a calls be speaking the name of someone
in the repertory (accumulation) rather than
dialing the digit code.
Used in mobile phone, within a car (eyes and
hands free)

A repertory dialer needs a speaker trained set of
vocabulary pattern corresponding to repertory
names (and their phone no.)
Needs a speaker independent set of vocabulary
patterns corresponding to the digits and set of
command words for controlling normal telephone
features (off-hook, dial, repeat, hang up)

21
Automated call type recognition

The automation of operator-assisted to parallel
calls
Ex. Call made from a pay phone that normally
require operator assistance, including collect
calls, person to person calls, third party
billing calls, operator assisted calls and credit
card calls

Five options for this service a vocabulary
consisting only five words is adequate
Collect to make collect calls
Person to make person to person calls
Third number to make third party billing calls
Operator to make operator assisted calls
Calling card to make calling card calls

The system is speaker independent and can work
over the standard dialed-up telephone network
If the customer obeys the voice prompt and spoke
one of the command words then the accuracy of the
system is more than 99
Customer have to use the specific command word
Or otherwise keyword spotting technique have to
be used to find out the command words embedded
within the sentence

24
Call distribution by voice commands

A call is placed that will normally answered by
an operator who then distributed the call to the
appropriate location (person) based on the users
responses to the questions asked by the attendant
In this application the attendant function is
automated via voice processing

The voice response system poses a series of menu
based questions, and based on the user responses,
route the call appropriately
Ex. Railway system

26
Directory listing retrieval
27

Provides the access to directory information from
spoken spelled name
To access the directory information for a name in
the directory, the user spells the name using the
word stop between the last name and the
initials as in Rabiner-stop-LR-stop
The speech recognizer demands the name in the
given directory which best matches the spoken
input and then speaks the directory information
for that name to the user.

Due to similar sounding letters there may be
error, but the telephone directory provides task
syntax that automatically detects and corrects
improperly recognized letters
System can handle common misspelling of names
with a single insertion or deletion of letter

29
Credit card sales validation

Merchant needs cc validation and does not have
automatic card reader or modern dialer then
He must call a specific number and provides an
attendant with 10 digit merchant identification
number, a 15 digit cc no. and the amount in
rupees of the transaction

In this case a speech recognition system uses a
connected digit recognizer to recognize the
merchant identification number and CC no. and a
connected word recognizer for the transaction
amount
For amount the vocabulary size for recognition is
larger than that of need for cc no
Because same string can be spoken in various ways
( Rs. 137 Rs one three seven/ rs. One thirty
seven etc )

31
End of Chapter 9

Write a Comment

User Comments (0)