Title: Optimal Feature
1Optimal Feature
- Shang-Ming Lee
- sammy_at_speech.ee.ntu.edu.tw
2Outline
- Preview some exp result
- Review of some concepts
- Review of some techniques
- New technique in ICSLP
- Summary and Future work
3Outline
- Preview some exp result
- Review of some concepts
- Review of some techniques
- New technique in ICSLP
- Summary and Future work
4Num100 Exp result
5Num100 Exp result
- Thoughts
- Use more diag mixes to model covariance
- Saturate or Sparse data?
- Use any transform can achieve fullC by diag?
- FullC is the upper bound without any modification
(robustness, adaptation) ?
6Goal
- What is optimal?
- Uncorrelated
- Discriminative
- Compact
7Outline
- Preview some exp result
- Review of some concepts
- Review of some techniques
- New technique in ICSLP
- Summary and Future work
8Motivation
- There may be redundancies in the feature space
- Spatial
- DCT does not guarantee uncorrelated result
- Time
- May be sampling rate , may be window size , end
point detection , etc. - Curse of dimension (uncorrelated noise)
9Motivation(cont.)
- Side effect of correlation
- Why we use gaussian mixtures?
- To fit the non gaussian pdf. ( series expansion)
- What will happen when correlated gaussian fit by
uncorrelated gaussian? - Other works to cope with correlation
- Full covariance
- State specific rotation
- Semi-tied covariance
- Etc.
10Correlated Gaussian and single mix
11Correlated Gaussian fitted by 3 mix
12Motivation(cont.)
13Reference paper
- 1George Saon,Minimum Bayes Error Feature
Selection - 2George Saon, Maximum Likelihood discriminat
feature spaces,ICASSP2000 - 3Mark Gales,Semi-Tied Covariance Matrices for
Hidden Markov Models,IEEE transaction on SAP - 4 Gopinath,Maximum Likelihood Modeling with
Gaussian Distributions for classification,ICASSP9
8 - 5Duda,Pattern classification and scene
analysis - 6M. Thomae,A new approach to discrimintive
feature extraction using model transformation,ICA
SSP2000 - 7Schukat-Talamazzini,Optimal linear feature
transformations for semi-continuous hidden markov
models,ICASSP95
14Outline
- Preview some exp result
- Review of some concepts
- Review of some techniques
- New technique in ICSLP
- Summary and Future work
15Traditional LDA
- Fishers Linear Discriminant
- Find a transform ? that can well separate
different class - Within class scatterW
- Between class scatterB
16Traditional LDA
- Projection based
- Data driven (no optimized criterion)
- Strict constraint on distribution (global ?)
- Separate the data but doesnt consider ML
17Heteroscedastic extension
- Model-based generalization of LDA derived in the
maximum-likelihood framework - Handle unequal variance-classifier models
- Can be treated as a constrained ML.
18HDA (cont.)
- Define a objective fuction
- No close form (use gradient descent )
19Maximum likelihood linear transform
- Consider diagonal case
- Maximize the log likelihood-gtmin diff
- Simpler case (DHDA)
20A New Approach to discriminative feature
extraction using model transformation
- Another way to LDA
- Model-based
- MCE sense
21ELDA-MT
- Objective
- Adjust the transform matrix W such that
- Correct class and y move to each other
- Best rival class and y move away from each other
weight
22ELDA-MT(cont.)
- Discriminant measure
- So d is desired to be as negative as possible
krival class ccorrect class
23ELDA-MT(cont.)
24ELDA-MT(cont.)
- Iterative finding W
- Reduced feature space (for W calculation)
25ELDA-MT(cont.)
- Original prototype generation
26Outline
- Preview some exp result
- Review of some concepts
- Review of some techniques
- New technique in ICSLP
- Summary and Future work
27Minimum Bayes Error Feature Selection
- Counter part for maximum likelihood feature space
- Minimum Bayes error
- Direct approach
- Indirect approach
- Use bound
28Minimum Bayes Error Feature Selection
- 1. Max divergence Bound
- Interclass divergence (relative entropy)
- For gaussian case
Unable to have a close form solution
29Minimum Bayes Error feature selection
- Numerical optimization (Newton-Raphson)
- Where we have calculate the derivative
- And use LDA result to initialize
30Minimum Bayes Error Feature Selection
- 2. Min Bhattacharyya bound
- where
31Min Bayes Error Feature Selection
- Experiment
- 2.3K context dep. States
- 134K diag gaussian mix
- 70 hours of training data
- Supervectors
- Every 9 consecutive
- Clustered to train a full covariance gaussian
state(class)
32Min Bayes Error Feature Selection
33Outline
- Preview some exp result
- Review of some concepts
- Review of some techniques
- New technique in ICSLP
- Summary and Future work
34Summary and Future Work
- Traditional ML based tech.
- Few have close form solution
- HDA does not
- MCE based tech.
- Seems to have to use numeric method
- Definition is the question
35Summary and Future Work
- Optimal feature
- Transformation based gt Linear
- Different optimum leads to different criteria
- ML
- MCE
- They should incorporate with acoustic models.
36Summary and Future Work
- Future work
- Feature cluster based LDA (11/31)
- Cluster features to be a class and apply LDA
- Model based LDA (12/15)
- Viterbi force alignment and then use mixture
based LDA - Min-Bayes Error (-)
- As papers