Speech Recognition - PowerPoint PPT Presentation

1 / 12

About This Presentation

Title:

Speech Recognition

Description:

Number of Views:40

Avg rating:3.0/5.0

Slides: 13

Provided by: csMon

Category:

Tags: recognition | speech | wreck

Transcript and Presenter's Notes

Title: Speech Recognition

1
Speech Recognition

2
Block diagram of speech recognizer
3
IWR , CSR

IWR Isolated Word Recognition
Spoken in isolation and belonging to a fixed
dictionary
Lexicon typical pronunciations of each word in
dictionary
Search algorithm output the word that maximizes
a given objective function ( likelihood of a word
given the observation sequence)
CSR Continuous Speech Recognition
Lexicon same as IWR
Language model give a stochastic description of
the language and the possibly probabilistic
description of which specific words can follow
another word or group of words
Search algorithm find a grammatically correct
sentence that maximizes a given objective
function ( likelihood of a sentence given the
observation sequence)
Coarticulation effects how to recognize
speech vs how to wreck a nice beach,
incomplete information

4
Major methods for speech recognition

Template-based approach
Small dictionaries, mainly for IWR
Reference templates (a sequence of feature
vectors representing a unit of speech to be
recognized)
Distance measure
eg log spectral distance, likelihood
distortions
Stochastic approach (maximum likelihood)
Dominant method
Equations represent
X observation sequence
W unknown sentence
Output the sentence w that
Pr(w) max w Pr(WX)
Pr(WX) Pr(X) Pr(XW) Pr(W)
W argmax w Pr(XW) Pr(W)
for fixed X
( argmaxw f(w) w ? f(w)
max w f(w)
Defn Cs -logPr eg Cs(W) - logPr(W)
W argmin w Cs(W) Cs(XW)
Solution of the equation
Language Modeling and Acoustic
Modeling

5
Modeling Tools

HMM (Hidden Markov Model)
Quintuple ? (N,M,A,B,p)
N the number of state
M the number of symbols that each state can
output or recognize
A NN state transition matrix, a(i,j) the
probability of moving from state i to state j
B observation probability distribution, bi(d)
the probability of recognizing or generating the
symbol dwhen in state i
? the initial state probability distribution
such that ?i the probability of being in state
i at time 1.
MS (Markov Source)
E transitions between states
V set of states
? alphabet including null symbol
One to one mapping M from E to V?V
M(t) (i, a, j)
i is the predecessor state of t
t output symbol a
j is the successor state of t

6
Viterbi

Viterbi Algorithm
Compute the optimal state sequence Q
(q1,..,qT) through? that matches X. (that is max
Pr(QX,?) )
ßt (i) probability along the highest
probability that accounts for the first t
observations and ends in state i
?t (i) the state at time t-1 that led to
state I at time t along that path
Initialization
Induction
Termination
Backtracking

7
Acoustic Word Models via Acoustic Phone Models

8
MS ,HMM

Circles represent states, arcs represent
transitions.
Arcs are labeled f/p, denoting that the
associated transition outputs phone f and occurs
with probability p
For each phone f in the alphabet build a HMM
Directed graph having a minimum of four and a
maximum of seven states with exactly one source,
one sink, self loops and no back arcs
Gives an acoustic model describing the different
ways in which one can pronounce the given phone
Technically, this HMM is a device for computing
how likely it is that a given observation
sequence acoustically matches the given phone

9
MS HMM
10
Conclusion

11
Application

ATT Watson Advanced Speech Application Platform
http//www.att.com/aspg/blasr.html
BBN Speech Products http//www.bbn.com/speech_prod
s/
DragonDictate from Dragon Systems,Inc.
http//www.dragonsys.com/

12
(No Transcript)

Write a Comment

User Comments (0)