Isolated-Word Speech Recognition Using Hidden Markov Models - PowerPoint PPT Presentation

About This Presentation
Title:

Isolated-Word Speech Recognition Using Hidden Markov Models

Description:

Trellis Structure of HMMs ... a trellis makes it easy to see the state ... as the best path in the HMM trellis allows us to use the Viterbi algorithm to ... – PowerPoint PPT presentation

Number of Views:664
Avg rating:3.0/5.0
Slides: 34
Provided by: Iri776
Learn more at: http://web.mit.edu
Category:

less

Transcript and Presenter's Notes

Title: Isolated-Word Speech Recognition Using Hidden Markov Models


1
Isolated-Word Speech Recognition Using Hidden
Markov Models
  • 6.962 Week 10 Presentation

Irina Medvedev Massachusetts Institute of
Technology April 19, 2001
2
Outline
  • Markov Processes, Chains, Models
  • Isolated-Word Speech Recognition
  • Feature Analysis
  • Unit Matching
  • Training
  • Recognition
  • Conclusions

3
Markov Process
  • The process, x(t), is first-order Markov if, for
    any set of ordered times, ,
  • The current value of a Markov process depends
    all of the memory necessary to predict the future
  • The past does not add any additional information
    about the future

4
Markov Process
  • The transition probability density provides an
    important statistical description of a Markov
    process and is defined as
  • A complete specification of a Markov process
    consists of
  • first-order density
  • transition density

5
Markov Chains
  • A Markov chain can be used to describe a system
    which, at any time, belongs to one of N distinct
    states,
  • At regularly spaced times, the system may stay
    in the same state or transition to a different
    state
  • State at time t is denoted by qt

Fully Connected Markov Model
6
Markov Chains
  • The state transition probabilities are made
    according to a set of probabilities associated
    with each state
  • These probabilities are stored in the state
    transition matrix
  • where N is the number of states in the Markov
    chain.
  • The state transition probabilities are
  • and have the properties and

7
Hidden Markov Models
  • Hidden Markov Models (HMMs) are used when the
    states are not observable events.
  • Instead, the observation is a probabilistic
    function of the state rather than the state
    itself
  • The states are described by a probability model
  • The HMM is a doubly embedded stochastic process

8
HMM Example Coin Toss
  • How do we build an HMM to explain the observed
    sequence of head and tails?
  • Choose a 2-state model
  • Several possibilities exist

1-coin Model Observable
2-coin Model States are Hidden
9
Hidden Markov Models
  • Hidden Markov Models are characterized by
  • N, the number of states in the model
  • A, the state transition matrix
  • Observation probability distribution state
  • Initial state distribution,
  • Model Parameter Set

10
Left-Right HMM
  • Can only transition to a higher state or stay in
    the same state
  • No-skip constraint allows states to transition
    only to the next state or remain the same state
  • Zeros in the state transition matrix represent
    illegal state transitions

4-state left-right HMM with no skip transitions
11
Isolated-Word Speech Recognition
  • Recognize one word at a time
  • Assume incoming signal is of the form
  • silence speech silence
  • Feature Analysis Training
  • Unit Matching Recognition

12
Feature Analysis
  • We perform feature analysis to extract
    observation vectors upon which all processing
    will be performed
  • The discrete-time speech signal is
    with discrete Fourier transform
  • To reduce the dimensionality of the V-dim speech
    vector, we use cepstral coefficients, which serve
    as the feature observation vector for all future
    processing

13
Cepstral Coefficients
  • Feature vectors are cepstral coefficients
    obtained from the sampled speech vector
  • where is the periodogram estimate of
    the power spectral density of the speech
  • We eliminate the zeroth component and keep
    cepstral coefficients 1 through L-1
  • Dimensionality reduction

14
Properties of Cepstral Coefficients
  • Serve to undo the convolution between the pitch
    and the vocal tract
  • High-order cepstral components carry speaker
    dependent pitch information, which is not
    relevant for speech recognition
  • Cepstral coefficients are well approximated by a
    Gaussian probability density function (pdf)
  • Correlation values of cepstral coefficients are
    very low

15
Modeling of Cepstral Coefficients
  • HMM assumes that the Markovian states generate
    the cepstral vectors
  • Each state represents a Gaussian source with
    mean vector and covariance matrix
  • Each feature vector of cepstral coefficients can
    be modeled as a sample vector of an L-dim
    Gaussian random vector with mean vector and
    diagonal covariance matrix

16
Formulation of the Feature Vectors
17
Unit Matching
  • Initial Goal obtain an HMM for each speech
    recognition unit
  • Large vocabulary (300 words)
    recognition units are phonemes
  • Small-vocabulary (10 words) recognition
    units are words
  • We will consider an isolated-word speech
    recognition system for a small vocabulary of M
    words

18
Notation
  • Observation vector is , where each
    is a cepstral feature vector and is the
    number of feature vectors in an observation
  • State Sequence is , where each
  • State index
  • Word index
  • Time index
  • The term model will be used for both the HMM and
    the parameter set describing the HMM,

19
Training
  • We need to obtain an HMM for each of the M words
  • The process of building the HMMs is called
    training
  • Each HMM is characterized by the number of
    states, N, and the model parameter set,
  • Each cepstral feature vector, , in state,
    , can be modeled by an L-dim Gaussian pdf
  • where is the mean vector and is the
    covariance matrix in state

20
Training
  • A Gaussian pdf is completely characterized by
    the mean vector and covariance matrix
  • The model parameter set can be modified to
  • The training procedure is the same for each
    word. For convenience, we will drop the
    subscript from

21
Building the HMM
  • To build the HMM, we need to determine the
    parameter set that maximizes the likelihood of
    the observation for that word.
  • Objective
  • The double maximization can be performed by
    optimizing over the state sequence and the model
    individually

22
Uniform Segmentation
Determining the initial state sequence
50 segments ? 8 states
23
Maximization over the Model
  • Given the initial state sequence, we maximize
    over the model
  • The maximization entails estimating the model
    parameters from the observation given the state
    sequence
  • Estimation is performed using the Baum-Welch
    re-estimation formulas

24
Re-estimation Formulas
is the number of feature vectors in state
25
Model Estimation
26
Maximization over the state sequence
  • Given the model, we maximize over the state
    sequence
  • The probability expression can be rewritten as

27
Maximization over the state sequence
  • Applying the logarithm transforms the
    maximization of a product into a maximization of
    a sum
  • We are still looking for the state sequence that
    maximizes the expression
  • The optimal state sequence can be determined
    using the Viterbi algorithm

28
Trellis Structure of HMMs
  • Redrawing the HMM as a trellis makes it easy to
    see the state sequence as a path through the
    trellis
  • The optimal state sequence is determined by the
    Viterbi algorithm as the single best path that
    maximizes

29
Training Procedure
Uniform Segmentation
Cepstral Calculation
Estimation of (Baum-Welch)
State Sequence Segmentation (Viterbi)
No
Converged?
Yes
30
Recognition
  • We have a set of HMMs, one for each word
  • Objective Choose the word model that maximizes
    the probability of the observation given the
    model (Maximum Likelihood detection rule)
  • Classifier for observation is
  • The likelihood can be written as a summation
    over all state sequences

31
Recognition
  • Replace the full likelihood by an approximation
    that takes into account only the most probable
    state sequence capable of producing the
    observation
  • Treating the most probable state sequence as the
    best path in the HMM trellis allows us to use the
    Viterbi algorithm to maximize the above
    probability
  • The best-path classifier for observation is

32
Recognition
Index of recognized word
Cepstral Calculation
Select Maximum
33
Conclusion
  • Introduced hidden Markov models
  • Described process of isolated-word speech
    recognition
  • Feature vectors Unit matching
  • Unit matching Training Recognition
  • Other considerations
  • Artificial Neural Networks (ANNs) for speech
    recognition
  • Hybrid HMM/ANN models
  • Minimum classification error HMM design
Write a Comment
User Comments (0)
About PowerShow.com