Title: CSE 552652
1- CSE 552/652
- Hidden Markov Models for Speech Recognition
- Spring, 2005
- Oregon Health Science University
- OGI School of Science Engineering
- John-Paul Hosom
- Lecture Notes for April 20
- Alternative Duration Modeling,
- Initializing an HMM
2Review Viterbi Search
(1) Initialization
(2) Recursion
3Review Viterbi Search
(3) Termination
(4) Backtracking
Note Usually this algorithm is done in log
domain, to avoid underflow errors.
4Duration Modeling Rabiner 6.9
Exponential distribution implicit in transition
probabilities
However, a phoneme tends to have, on average, a
Gamma distribution
prob. of being in phn
5Duration Modeling the Semi-Markov Model
One method of correction is a semi-Markov
model(also called Continuously Variable
Duration Hidden Markov Models or Explicit
State-Duration Density HMMs)
a11
a22
a12
standard HMM
a21
ot
ot
pS1(d)
pS2(d)
a12
semi-Markov model
S2
S1
a21
otot1otd1-1
otot1otd2-1
Note self-loop not allowed in SMM
In SMM, one state generates multiple (d)
observation vectors the probability of
generating exactly d vectors is determined from
the function pj(d). This function may be
continuous (e.g. Gamma) or discrete.
6Duration Modeling the Semi-Markov Model
Assuming that r states have been visited during t
observations, with states Qq1, q2, qr having
durations d1, d2, dr such that d1 d2 dr
t, then the probability of being in state i at
time t and observing Q is
where pq(d) describes probability of being in
state q exactly d times
7Duration Modeling the Semi-Markov Model
which makes the Viterbi search look like
where D is the maximum duration for any pj(d)
Advantages of SMM better modeling of phonetic
durations Disadvantages D2/2 increase in
computation time claimed by Rabiner fewer data
with which to estimate aij more parameters
(pj(d)) to compute
8Duration Modeling the Semi-Markov Model
Example
state M state H state LP(sun) 0.4
0.75 0.25P(rain) 0.6 0.25 0.75
0.3
?M 0.50 ?H 0.20 ?L 0.30
0.2
0.2
0.1
0.1
0.1
what is the probability of the observation
sequence s s r s r (ssun,rrain) and the
state sequence Md3 Hd1 Ld1 ??
0.5 0.3 (0.4 0.4 0.6) 0.5 0.1
0.75 0.3 0.1 0.75
9Duration Modeling Duration Penalties
Duration Penalties assume uniform transition
probabilities
but then apply penalties if, during the search,
the hypothesized duration is shorter or longer
than specified limits pj(d) penaltylong (dj
maxdurj) if dj gt maxdurj, else penaltyshort
(mindurj dj) if dj lt mindurj, else 1.0 where
penaltylong, penaltyshort are values less than
1.0, dj is the hypothesized duration of state j,
mindurj and maxdurj are duration limits specific
to state j. (likelihoods instead of
probabilities) No longer guaranteed to find the
best state sequence, but usually do.
10Duration Modeling
Does duration modeling matter? No no matter
which type of duration model you use, you
get similar ASR performance. Yes duration can
be critical to phonemic distinction all
HMM (and SMM, etc.) systems lack the ability to
model this
11How To Start Training an HMM??
- Q1 How to compute initial ?i, aij values?
- Assign random, equally-likely, or other values.
(works fine for ?i or aij but not bj(ot))
y
pau
E
pau
s
12How To Start Training an HMM??
Q2 How to create initial bj(ot) values?
- Initializing bj(ot) requires segmentation of
training data - (2a) Dont worry about content of training data,
divide it into equal-length segments,
compute bj(ot) for each segment. - flat start.
13How To Start Training an HMM??
- Initializing bj(ot) requires segmentation of
training data - (2b) Better solution
- Use manually-aligned data, if available.
Split each phoneme - into X equal parts to create X states per
phoneme.
y
pau
E2
pau
s
E1
14How To Start Training an HMM??
- Initializing bj(ot) requires segmentation of
training data - (2c) Intermediate solution
- Use force-aligned data. We know
phoneme sequence, so - use Viterbi on existing HMM to determine
best alignment.
15How To Start Training an HMM??
- Given a segmentation corresponding to one state,
split that segment (state) into mixture
components using VQ
clusters may be independent of time!
for 2-dimensional feature
cluster into 3 groups
7
12
16How To Start Training an HMM??
- For each mixture component in each segment,
compute means and diagonals of covariance
matrices
okmd(t) dth dimension of observation
o(t) corresponding to mth mixture componentin
kth state
y
Cov(X,Y) E(X?x)(Y?y) E(XY)?x?y Cov(X,X)
E(X2)-?2x (?X2)/N - (?X/N)2 ?2(X)
7
12
17How To Start Training an HMM??
- Q3 How to improve initial aij, bj(ot) estimates?
- Viterbi Segmentation (k-means segmentation)
- 1. Assume training data, initial model.
- 2. Use Viterbi to determine best state sequence
through data. - 3. For each state (segment)
- ? for each observation,
- assign o(t) to most likely mixture component
using bj(ot) - ? update cjm, ?jm, ?jm, aij
- 4. If new model very different from current
model, - set current model to new model
- go to (2).
18How To Start Training an HMM??
- How does assignment and updating work?
1. VQ to create clusters cluster weight
ratio of points in cluster to total
points in state 2. Estimate bj() by computing
means, covariances 3. Perform Viterbi search
to get best state alignment
these points are within one state
these white points go to neighboring state
19How To Start Training an HMM??
- How does assignment and updating work?
4. Assign each observation to the mixture
component that yields the greatest
probability of that observation. 5.
Update means, covariances, mixture weights,
transition probabilities 6. Repeat from (3)
until converge
20How To Start Training an HMM??
- How is updating done?
- Discrete HMM (VQ)
- Continuous HMM (GMM)
21How To Start Training an HMM??
2-state HMM, each state has 2 mixture components
y
E
each observation has 2 dimensions use flat start
to select initial states
use VQ to cluster into initial 4 groups
22How To Start Training an HMM??
compute aij, bj()
Use Viterbi to segment utterance
Re-cluster points according to highest probability
23How To Start Training an HMM??
re-compute aij, bj(), re-segment
re-compute aij, bj(), re-segment
Eventually...
24How To Start Training an HMM??
Viterbi segmentation can be used to boot-strap
another method, EM, for locally maximizing the
likelihood of P(O?). Well talk later about
implementing EM using the forward- backward (also
known as Baum-Welch) procedure. Then embedded
training will relax one of the constraints for
further improvement. All methods provide
locally-optimal solution there is no known
globally-optimal (closed) solution for HMM
parameter estimation. The better the initial
estimates of ? (in particular bj(ot)), the better
the final result.
25Multiple Training Files
So far, weve implicitly assumed a single set of
observations for training. Most systems are
trained on multiple sets of observations (files).
This makes it necessary to use
accumulators. Initialize an HMM for each
file compute initial state boundaries (e.g. flat
start) add information to accumulator (sum, sum
squared, count) compute mean, standard deviation
for each GMM
File 1
.pau y eh s
.pau
File 2
.pau y eh s .pau
File 3
.pau y eh
s .pau
26Multiple Training Files
Iteratively Improve an HMM for each
iteration reset accumulators for each
file get state parameter info. from previous
iteration add new state information to
accumulators compute mean, standard deviation
for each GMM update estimates of state
parameters
27Viterbi Search Project
- Second project Given an existing HMM, implement
a Viterbi search to find likelihood of
utterance and best state sequence. - Template code is available to read in
features, read in HMM values, provide some
context and a starting point. - The features will be given to you are real, in
that they are 7 PLP coefficients plus 7 delta
values from utterances of yes and no
sampled every 10 msec. - Also given to you is the logAdd() function, but
you must implement the multi-dimensional GMM
code (see formula from Lecture 5, slide 18).
Assume diagonal covariance matrix. - All necessary files (template, HMM, vector
files) located on the class web site.
28Viterbi Search Project
- Search files with HMMs for yes and no, and
print out final likelihood scores and most
likely state sequences input1.txt hmm_yes.10
input1.txt hmm_no.10 input2.txt hmm_yes.10
input2.txt hmm_no.10 input3.txt hmm_yes.10
input3.txt hmm_no.10 - Then, use results to perform ASR (1) is
input1.txt more likely to be yes or no?
(2) is input2.txt more likely to be yes or
no? (3) is input3.txt more likely to be
yes or no? - Due on May 9 send your source code and results
(including final scores and most likely state
sequences) to hosom at cslu. ogi. edu late
responses generally not accepted.