A Survey of Large Margin Hidden Markov Model - PowerPoint PPT Presentation

1 / 26

About This Presentation

Title:

A Survey of Large Margin Hidden Markov Model

Description:

A Survey of Large Margin Hidden Markov Model. Xinwei Li, Hui Jiang. York ... of all models at the same time, only one selected model will be adjusted in each ... – PowerPoint PPT presentation

Number of Views:39

Avg rating:3.0/5.0

Slides: 27

Provided by: Ryan85

Category:

more less

Transcript and Presenter's Notes

Title: A Survey of Large Margin Hidden Markov Model

1
A Survey of Large Margin Hidden Markov Model

Xinwei Li, Hui Jiang
York University

2
Reference Papers

Xinwei Li M.S. thesis Sep. 2005, Large
Margin HMMs for SR
Xinwei Li ICASSP 05, Large Margin HMMs for
SR
Chaojun Liu ICASSP 05, Discriminative
training of CDHMMs for Maximum Relative
Separation Margin
Xinwei Li ASRU 05, A constrained joint
optimization method for LME
Hui Jiang SAP 2006, Large Margin HMMs for
SR
Jinyu Li ICSLP 06, Soft Margin Estimation of
HMM parameters

3
Outline

Large Margin HMMs
Analysis of Margin in CDHMM
Optimization methods for Large Margin HMMs
estimation
Soft Margin Estimation for HMM

4
Large Margin HMMs for ASR

In ASR, given any speech utterance ?, a speech
recognizer will choose the word W as output based
on the plug-in MAP decision rule as follows
For a speech utterance Xi, assuming its true word
identity as Wi, the multiclass separation margin
for Xi is defined as

Discriminant function
O denotes the set of all possible words
5
Large Margin HMMs for ASR

According to the statistical learning theory
Vapnik, the generalization error rate of a
classifier in new test sets is theoretically
bounded by a quantity related to its margin
Motivated by the large margin principle, even for
those utterances in the training set which all
have positive margin, we may still want to
maximize the minimum margin to build an HMM-based
large margin classifier for ASR

6
Large Margin HMMs for ASR

Given a set of training data D X1, X2,,XT,
we usually know the true word identities for all
utterances in D, denoted as L W1, W2,,WT
First, from all utterances in D, we need to
identify a subset of utterances S as
We call S as support vector set and each
utterance in S is called a support token which
has relatively small positive margin among all
utterances in the training set D

where egt 0 is a preset positive number
7
Large Margin HMMs for ASR

This idea leads to estimating the HMM models ?
based on the criterion of maximizing the minimum
margin of all support tokens, which is named as
large margin estimation (LME) of HMM

8
Analysis of Margin in CDHMM

Adopt the Viterbi method to approximate the
summation with the single optimal Viterbi path,
the discriminant function can be expressed as

9
Analysis of Margin in CDHMM

Here, we only consider to estimate mean vectors

In this case, the discriminant functions can be
represented as a summation of some quadratic
terms related to mean values of CDHMMs
10
Analysis of Margin in CDHMM

As a result, the decision margin can be represent
as a standard diagonal quadratic form
Thus, for each feature vector xit, we can divide
all of its dimensions into two parts

we can see that each feature dimension
contributes to the decision margin separately
11
Analysis of Margin in CDHMM

After some math manipulation, we have

linear function
quadratic function
12
Analysis of Margin in CDHMM
13
Analysis of Margin in CDHMM
14
Analysis of Margin in CDHMM
15
Optimization methods for LM HMM estimation

An iterative localized optimization method
An constrained joint optimization method
Semidefinite programming method

16
Iterative localized optimization

In order to increase the margin unlimitedly while
keeping the margins positive for all samples,
both of the models must be moved together
if we keep one of the models fixed, the other
model cannot be moved too far under the
constraint that all samples must have positive
margin
Otherwise the margin for some tokens will become
negative
Instead of optimizing parameters of all models at
the same time, only one selected model will be
adjusted in each step of optimization
Then the process iterates to update another model
until the optimal margin is achieved

17
Iterative localized optimization

How to select the target model in each step?
The model should be relevant to the support token
with the minimum margin
The minimax optimization can be re-formulated as

18
Iterative localized optimization

Approximated by summation of exponential functions

19
Iterative localized optimization
20
Constrained Joint optimization

Introduce some constraints to make the
optimization problem bounded
In this way, the optimization can be performed
jointly with respect to all model parameters

21
Constrained Joint optimization

In order to bound the margin contribution from
the linear part
In order to bound the margin contribution from
the quadratic part

22
Constrained Joint optimization

Reformulate the large margin estimation as the
following constrained minimax optimization
problem

23
Constrained Joint optimization

The constrained minimization problem can be
transformed into an unconstrained minimization
problem

24
Constrained Joint optimization
25
Soft Margin estimation

Model separation measure and frame selection
SME objective function and sample selection

26
Soft Margin estimation

Difference between SME and LME
LME neglects the misclassified samples.
Consequently, LME often needs a very good
preliminary estimate from the training set
SME works on all the training data, both the
correctly classified and misclassified samples
While SME must first choose a margin ?
heuristically

Write a Comment

User Comments (0)