Task oriented application of automatic speech recognition - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Task oriented application of automatic speech recognition

Description:

Task oriented application of automatic speech recognition Chapter9 – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 32
Provided by: Sant134
Category:

less

Transcript and Presenter's Notes

Title: Task oriented application of automatic speech recognition


1
Task oriented application of automatic speech
recognition
  • Chapter9

2
Task specific voice control and dialog system
  • To integrate a speech recognition system into a
    task specific application to perform a useful
    task
  • System consists of
  • A speech recognizer
  • A language analyzer
  • An expert system
  • A physical system being controlled by the voice
    commands
  • Text to speech synthesizer

3
Converts I/p into grammatically correct text
Extract meaning from text
Selects desired action
Converts text reply in m/c generated speech
Text
Meaning
Text Reply
Speech
Text to Speech synthesizer
Speech
Speech recognizer
Language analyzer
Expert system
Voice I/p
Voice O/p
Receives data from system
Issues command to system
Vocabulary grammar model
System under voice control executes commands Repo
rts status
Semantic rules
Pronunciation rules
Output action
FigBlock diagram of a task-specific voice
control and dialog system
4
  • Speech recognizer
  • The function of this block is to convert speech
    I/p into a grammatically correct text.
  • It is constrained by the recognizer vocabulary
    and grammar model.
  • The text string is sent to a language analyzer

5
  • Language analyzer
  • Extracts the meaning from the text with the help
    of semantic rules
  • The decoded meaning is sent to the expert system
  • Expert system
  • First selects the desired action then issues
    appropriate commands to a physical system under
    voice control to carry out the action then
    receives data on the command status

6
  • Ex. command carried out successfully or
    unsuccessfully and then construct a textual
    reply
  • Text to speech synthesizer
  • A text reply is converted into a speech message
    with appropriate word pronunciation rules and
    played back to the user
  • The system in the figure performs the specific
    task of interest

7
Characteristics of speech recognition applications
  • Beneficial to the user
  • User friendly
  • Accurate
  • Real time

8
  • Proposed system must provide a real benefit to
    the used in the form of
  • Increased productivity
  • Ease of use
  • Better m/c interface or a more natural way of
    communication
  • If the application is not useful to the user it
    do not succeed over time

9
  1. The system must be user friendly. User should
    feel comfortable, it must provide friendly and
    helpful voice prompts and it must provide an
    effective means of communications.
  2. The system must be accurate.
  3. The recognition system must respond in real time.
    The response should be very fast

10
Methods of handling recognition errors
  • Four ways to deal with the errors
  • Fail soft methods
  • Self-detection/correction of errors
  • Verification or multilevel decision before
    proceeding
  • Rejection/pass to operator

11
  • Fail soft methods
  • The cost (in terms of time) of recognition error
    is low
  • Hence the error is acceptable
  • The error will be detected and corrected at the
    later stage
  • The user can enter into a correction mode to
    backtrack to the point where the error was made

12
  • Self-detection/correction of errors
  • The recognition system utilizes known task
    constraints (given database) to automatically
    detect and correct recognition errors
  • Ex. Spelling of the name from finite list of names

13
  • Verification or multilevel decision before
    proceeding
  • The recognition system ask the user for help
    whenever likelihood score is high and it is
    difficult resolving small differences in the
    strings
  • The recognizer ask the user to verify the first
    choice decision if it is not verified, the
    recognizer ask the user to verify the second
    choice

14
  • Rejection/pass on to operator
  • By recording all spoken I/ps in digital format,
    the system can reduce the error rate by rejecting
    a small but finite percentage of the spoken
    strings, and passing on such strings to a human
    operator who makes the final decision based on
    listening to the spoken input
  • By using all four techniques the accuracy of
    speech recognizer approaches 100

15
Broad classes of speech recognition applications
  • Five broad classes
  • Office or business system
  • Manufacturing
  • Telephone or telecommunications
  • Medical
  • other

16
  • Office or business system
  • Data entry
  • Database management and control
  • Keyboard enhancement
  • Manufacturing
  • Eyes-free , hand free monitoring of manufacturing
    foe quality control

17
  • Telephone or telecommunications
  • Many applications are feasible over dialed up
    telephones
  • Automation of operator assisted services
  • Telemarketing
  • Call distribution by voice

18
  • Medical The primary application is voice
    creation and editing of specialized medical
    reports
  • Other
  • Voice controlled and operated games and toys
  • Voice recognition aids for the handicapped
  • Voice control in a moving vehicle
  • Climate control

19
Command and control applications
  • User can control the machines using simple
    commands
  • Voice repertory dialer a dialer allows a caller
    to place a calls be speaking the name of someone
    in the repertory (accumulation) rather than
    dialing the digit code.
  • Used in mobile phone, within a car (eyes and
    hands free)

20
  • A repertory dialer needs a speaker trained set of
    vocabulary pattern corresponding to repertory
    names (and their phone no.)
  • Needs a speaker independent set of vocabulary
    patterns corresponding to the digits and set of
    command words for controlling normal telephone
    features (off-hook, dial, repeat, hang up)

21
Automated call type recognition
  • The automation of operator-assisted to parallel
    calls
  • Ex. Call made from a pay phone that normally
    require operator assistance, including collect
    calls, person to person calls, third party
    billing calls, operator assisted calls and credit
    card calls

22
  • Five options for this service a vocabulary
    consisting only five words is adequate
  • Collect to make collect calls
  • Person to make person to person calls
  • Third number to make third party billing calls
  • Operator to make operator assisted calls
  • Calling card to make calling card calls

23
  • The system is speaker independent and can work
    over the standard dialed-up telephone network
  • If the customer obeys the voice prompt and spoke
    one of the command words then the accuracy of the
    system is more than 99
  • Customer have to use the specific command word
  • Or otherwise keyword spotting technique have to
    be used to find out the command words embedded
    within the sentence

24
Call distribution by voice commands
  • A call is placed that will normally answered by
    an operator who then distributed the call to the
    appropriate location (person) based on the users
    responses to the questions asked by the attendant
  • In this application the attendant function is
    automated via voice processing

25
  • The voice response system poses a series of menu
    based questions, and based on the user responses,
    route the call appropriately
  • Ex. Railway system

26
Directory listing retrieval
27
  • Provides the access to directory information from
    spoken spelled name
  • To access the directory information for a name in
    the directory, the user spells the name using the
    word stop between the last name and the
    initials as in Rabiner-stop-LR-stop
  • The speech recognizer demands the name in the
    given directory which best matches the spoken
    input and then speaks the directory information
    for that name to the user.

28
  • Due to similar sounding letters there may be
    error, but the telephone directory provides task
    syntax that automatically detects and corrects
    improperly recognized letters
  • System can handle common misspelling of names
    with a single insertion or deletion of letter

29
Credit card sales validation
  • Merchant needs cc validation and does not have
    automatic card reader or modern dialer then
  • He must call a specific number and provides an
    attendant with 10 digit merchant identification
    number, a 15 digit cc no. and the amount in
    rupees of the transaction

30
  • In this case a speech recognition system uses a
    connected digit recognizer to recognize the
    merchant identification number and CC no. and a
    connected word recognizer for the transaction
    amount
  • For amount the vocabulary size for recognition is
    larger than that of need for cc no
  • Because same string can be spoken in various ways
    ( Rs. 137 Rs one three seven/ rs. One thirty
    seven etc )

31
End of Chapter 9
Write a Comment
User Comments (0)
About PowerShow.com