Title: Learning Structured Models for Phone Recognition
 1Learning Structured Models for Phone Recognition
- Slav Petrov, Adam Pauls, Dan Klein
2Acoustic Modeling 
 3Motivation
- Standard acoustic models impose many structural 
 constraints
- We propose an automatic approach 
- Use TIMIT Dataset 
- MFCC features 
- Full covariance Gaussians
(Young and Woodland, 1994) 
 4Phone Classification 
 5Phone Classification
æ 
 6HMMs for Phone Classification 
 7HMMs for Phone Classification
Temporal Structure 
 8Standard subphone/mixture HMM
Temporal Structure
Gaussian Mixtures
Model Error rate
HMM Baseline 25.1 
 9Our Model
Standard Model
Fully Connected
Single Gaussians 
 10Hierarchical Baum-Welch Training
32.1
28.7
HMM Baseline 25.1
5 Split rounds 21.4 
 11Phone Classification Results
Method Error Rate
GMM Baseline (Sha and Saul, 2006) 26.0 
HMM Baseline (Gunawardana et al., 2005) 25.1 
SVM (Clarkson and Moreno, 1999) 22.4 
Hidden CRF (Gunawardana et al., 2005) 21.7 
Our Work 21.4 
Large Margin GMM (Sha and Saul, 2006) 21.1  
 12Phone Recognition 
 13Standard State-Tied Acoustic Models 
 14No more State-Tying 
 15No more Gaussian Mixtures 
 16Fully connected internal structure 
 17Fully connected external structure 
 18Refinement of the /ih/-phone 
 19Refinement of the /ih/-phone 
 20Refinement of the /ih/-phone 
 21Refinement of the /ih/-phone 
 22Refinement of the /l/-phone 
 23Hierarchical Refinement Results
HMM Baseline 41.7
5 Split Rounds 28.4 
 24Merging
- Not all phones are equally complex 
- Compute log likelihood loss from merging
Split model
Merged at one node 
 25Merging Criterion 
 26Split and Merge Results
Split Only 28.4
Split  Merge 27.3 
 27HMM states per phone 
 28HMM states per phone 
 29HMM states per phone 
 30Alignment
Results
Hand Aligned 27.3
Auto Aligned 26.3 
 31Alignment State Distribution 
 32Inference
- State sequence 
- d1-d6-d6-d4-ae5-ae2-ae3-ae0-d2-d2-d3-d7-d5 
- Phone sequence 
- d - d - d -d -ae - ae - ae - ae - d - d -d - d - 
 d
- Transcription 
-  d - ae - 
 d
Viterbi
Variational
??? 
 33Variational Inference
Variational Approximation
Viterbi 26.3
Variational 25.1 
 34Phone Recognition Results
Method Error Rate
State-Tied Triphone HMM (HTK) (Young and Woodland, 1994) 27.7 
Gender Dependent Triphone HMM (Lamel and Gauvain, 1993) 27.1 
Our Work 26.1 
Bayesian Triphone HMM (Ming and Smith, 1998) 25.6 
Heterogeneous classifiers (Halberstadt and Glass, 1998) 24.4  
 35Conclusions
- Minimalist, Automatic Approach 
- Unconstrained 
- Accurate 
- Phone Classification 
- Competitive with state-of-the-art discriminative 
 methods despite being generative
- Phone Recognition 
- Better than standard state-tied triphone models 
36Thank you!
- http//nlp.cs.berkeley.edu