Linear Dynamic Model LDM for Automatic Speech Recognition - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Linear Dynamic Model LDM for Automatic Speech Recognition

Description:

'Filter' characteristic of LDM has potential to improve noise ... (Acc) clean dataset (Acc) model. Institute for Signal and Information Processing (ISIP) ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 19
Provided by: jennifer194
Category:

less

Transcript and Presenter's Notes

Title: Linear Dynamic Model LDM for Automatic Speech Recognition


1
Linear Dynamic Model (LDM) for Automatic Speech
Recognition
PhD Candidate Tao Ma Advised by Dr. Joseph
Picone Institute for Signal and Information
Processing (ISIP) Mississippi State University
2
  • An Example of Kalman Filter (another name of LDM)
  • In control system engineering, Kalman Filter
    succeeds to model a system with noisy observations

Filtering Position at present time (remove
noise effect) Predicting Position at a future
time Smoothing Position at a time in the past
Observation
A Kalman Filter models the position evolution
3
  • Outline
  • Why Linear Dynamic Model (LDM)?
  • Linear Dynamic Model
  • Pilot experiment LDM phone classification on
    Aurora 4
  • Hybrid HMM/LDM decoder architecture for LVCSR
  • Future work

4
  • HMM Speech Recognition System

Hidden Markov Models
5
  • Is HMM a perfect model for ASR?
  • Progress on improving the accuracy of HMM-based
    system has slowed in the past decade
  • Theory drawbacks of HMM
  • False assumption that frames are independent and
    stationary
  • Spatial correlation is ignored (diagonal
    covariance matrix)
  • Limited discrete state space

Accuracy
Clean
Noisy
Time
6
  • Motivation of Linear Dynamic Model (LDM) Research
  • Motivation
  • A model which reflects the characteristics of
    speech signals will ultimately lead to great ASR
    performance improvement
  • LDM incorporates frame correlation information of
    speech signals, which is potential to increase
    recognition accuracy
  • Filter characteristic of LDM has potential to
    improve noise robustness of speech recognition
  • Fast growing computation capacity (thanks to
    Intel) make it realistic to build a two-way
    HMM/LDM hybrid speech engine

7
  • State Space Model
  • Linear Dynamic Model (LDM) is derived from State
    Space Model
  • Equations of State Space Model

y observation feature vector x corresponding
internal state vector h() relationship function
between y and x at current time f() relationship
function between current state and all previous
states epsilon noise component eta noise
component
8
  • Linear Dynamic Model
  • Equations of Linear Dynamic Model (LDM)
  • Current state is only determined by previous
    state
  • H, F are linear transform matrices
  • Epsilon and Eta are driving components

y observation feature vector x corresponding
internal state vector H linear transform matrix
between y and x F linear transform matrix
between current state and previous state epsilon
driving component eta driving component
9
  • Kalman filtering for state inference (E-Step of
    EM training)

For a speech sound,
Human Being Sound System
e
Kalman Filtering Estimation
10
  • RTS smoother for better inference
  • Rauch-Tung-Striebel (RTS) smoother
  • Additional backward pass to minimize inference
    error
  • During EM training, computes the expectations of
    state statistics

Standard Kalman Filter
Kalman Filter with RTS smoother
11
  • Maximum Likelihood Parameter Estimation (M-Step
    of EM training)

LDM Parameters
aa ae ah ao aw ay b ch d dh eh er
Nothing but matrix multiplication!

12
  • LDM for Speech Classification


x
aa
y
x

x
ch
MFCC Feature
Hypothesis

x
eh

HMM-Based Recognition
x

LDM-Based Recognition
aa
y
x

MFCC Feature
x
ch
Hypothesis

x
eh

13
  • Challenges of Applying LDM to ASR
  • Segment-based model
  • frame-to-phoneme information is needed before
    classification
  • EM training is sensitive to state initialization
  • Each phoneme is modeled by a LDM, EM training is
    to find a set of parameters for a specific LDM
  • No good mechanism for state initialization yet
  • More parameters than HMM (23x)
  • Currently mono-phone model, to build a tri-phone
    model for LVCSR would need more training data

14
  • Pilot experiment phone classification on Aurora 4
  • Aurora 4 Wall Street Journal six kinds of
    noises
  • Airport, Babble, Car, Restaurant, Street, and
    Train
  • Frame-to-phone alignment is generated by ISIP
    decoder (force align mode)
  • Adding language model will get 93 accuracy for
    clean data
  • 40 phones, one vs. all classifier

15
  • Hybrid HMM/LDM decoder architecture for LVCSR

Confidence Measurement
Best Hypothesis
16
  • Status and future work
  • The development of HMM/LDM hybrid decoder is
    still in progress
  • HMM/LDM hybrid decoder is Expected to be done in
    2009
  • ISIP HMM/SVM hybrid decoder acts as the reference
    for implementation
  • Future work
  • Research has proved the nonlinear effects in
    speech signals
  • Investigate the probability of replacing Kalman
    filtering with nonlinear filtering (such as
    Unscented Kalman Filter, Extended Kalman Filter)

17
  • Thank you!

Questions?
18
  • References
  • Digalakis, V., Segment-based Stochastic Models
    of Spectral Dynamics for Continuous Speech
    Recognition, Ph.D. Dissertation, Boston
    University, Boston, Massachusetts, USA, 1992.
  • Digalakis, V., Rohlicek, J. and Ostendorf, M.,
    ML Estimation of a Stochastic Linear System with
    the EM Algorithm and Its Application to Speech
    Recognition, IEEE Transactions on Speech and
    Audio Processing, vol. 1, no. 4, pp. 431442,
    October 1993.
  • Frankel, J., Linear Dynamic Models for Automatic
    Speech Recognition, Ph.D. Dissertation, The
    Centre for Speech Technology Research, University
    of Edinburgh, Edinburgh, UK, 2003.
Write a Comment
User Comments (0)
About PowerShow.com