Title: An Initial Study on Minimum Phone Error Discriminative Learning of Acoustic Model for Mandarin Large
1An Initial Study on Minimum Phone Error
Discriminative Learning of Acoustic Model for
Mandarin Large Vocabulary Continuous Speech
Recognition
- Jen-Wei Kuo
- National Taiwan Normal University
2Outline
- Minimum Phone Error (MPE) Training
- Objective Function From Minimum Overall Risk
- Function Maximization
- Update Formula
- MAP updates (I-Smoothing)
- MPE Linear Transform based Adaptation
- Model Space Adaptation (MPE-LR)
- Feature Space Adaptation (MPE-LT)
- Experiments
3Notations
4Minimum Overall Risk
Loss function
Overall risk
Classifier Design Conventional MAP
decoding, Hypothesis tesing, WER minimization
(Sausage), MBR Recognition
5Overall Risk Criterion Estimation
- ORCE was firstly proposed by Na Eurospeech 95,
using the zero-one loss function. - Kaiser introduced the Levenshtein distance to be
as the loss function in ORCE ICSLP00,
SpeechComm02.
average loss of all hypotheses
6Maximize the average accuracy
- ORCE can be regarded as maximizing the average
accuracy of all possible hypotheses. - It would try to increase the weight of the path
with higher accuracy, and reduce the weight of
the lower one. - The higher/lower accuracy the path has, the more
positive/negitive contribution it gives.
average accuracy of all hypotheses
7Minimum Phone Error (MPE)
- The improvements from ORCE to MPE
- The use of lattices
- MAP estimation of parameters (I-smoothing)
- The set of the smoothing constants in the EB
update equations - The emphasis on phone error rather than word
error
8Expectation Maximum
Berlin Chen
How to do?
9Lower Bound
Construct a tractable lower bound
10Three Steps for EM
- Step 1.Draw a lower bound
- Use the Jensens inequality
- Step 2.Find the best lower bound
- Let the lower bound touch the objective function
at current guess - Step 3.Maximize the best lower bound
- Obtain the new guess
- Go to Step 1 until converge
11Step 1.Draw a lower bound
Apply Jensens Inequality
12Step 2.Find the best lower bound
- Let the lower bound touch the objective function
at current guess - Find the best at
13Step 2.Find the best lower bound
Set it to zero
14Step 2.Find the best lower bound
Q function
15Strong-sense auxiliary function
16Weak-sense auxiliary function and smooth function
- Weak-sense auxiliary function
- Smooth function
17Weak-sense auxiliary function for MPE
18Weak-sense auxiliary function for MPE
19Weak-sense auxiliary function for MPE
20Weak-sense auxiliary function for MPE
21EBW equivalent smooth function
22EBW equivalent smooth function
23MAP updates (I-smoothing)
24Experiments -Baseline
- Character error rate for ML training
25Experiments
- Character error rate on different settings