Title: Performance Improvement of GMM Computation in Sphinx 3.6
1Performance Improvement of GMM Computation in
Sphinx 3.6
- Arthur Chan
- Carnegie Mellon University
- Mar 10, 2005
2This seminar
- Not very refined. Some info is missing.
- 30 slides.
- Outline
- Overview of GMM Computation in Sphinx 3.X (xlt5)
(lt- This part is not new.) - 3 Improvement with Experimental Results (lt- This
part is new.) - Discussion
3Mechanism of GMM Computation in S3.X(Xlt5)
4Computation at every frame in Sphinx
5Computation of GMMs in a Continuous HMM ASR system
- Order of Computation
- Frames x GMMs x Gaussian x Feature length
- Typical Numbers
- Frames 1000
- GMMs 5000
- Gaussians 8 to 64
- Feature length 39
- Not practical to fully compute them.
6Overview of GMM Computation in Sphinx 3.X (xlt5)
- Philosophy
- No single technique will give the best
accuracy/speed trade-off. - Techniques in the literature can be categorized
and combined in a systematic manner. - Four Level Categorization of GMM Computation
Techniques - Frame-level (Down-sampling)
- GMM-level (CI-based GMM Selection)
- Gaussian-level (VQ-based and SVQ-based Gaussian
Selection) - Component-level (Sub-vector quantization)
- 3.475-80 speed gain with 5-10 rel.
degradation.
7Fast GMM Computation Level 1 Frame Selection
-Compute GMM in one and other frame
only -Improvement Compute GMM only if current
frame is similar to previous frame
8Fast GMM Computation Level 2 Senone/GMM
Selection
GMM
-Compute GMM only when its base-phones are highly
likely -Others backed-off by the base phone
scores. -Similar to -Julius (Akinobu 1999)
-Microsofts Rich Get Richer (RGR) heuristics
9Fast GMM ComputationLevel 3 Gaussian Selection
Gaussian
GMM
10Fast GMM Computation Level 4 LDA
Gaussian
Feature Component
11Frame-level and GMM-level Techniques in S3.X
(Xlt5)
- Frame-level
- Skipping Frames
- Only compute GMMs for 1 out of N frames
- Copied the most recently computed frames.
- GMM-level
- Use CI GMM as an approximate score
- If a CD GMM has good CI GMM scores (within a
beam) - Compute the full CD score
- If not
- Back off to the CI score.
- Good CI GMM scores is defined as
- Within the beam of the best CI GMM score.
12Weaknesses of the Frame-level and GMM-level
Techniques
- Frame-level
- Deteriorate performance significantly (gt10)
- Hard to tune.
- GMM-level
- The number of GMMs computed varied from frame to
frame. - -gtWorst case performance is poor
- CI score is used to back off
- -gtSearch performance degrades because a lot of
scores are the same.
13Baseline Experiments
14Baseline experiments
- Tested on 3 tasks
- Tested in a tough condition
- Manually tuned
- Tune on test set (Sorry, couldnt get the dev.
set.) - Optimized one dimension at a time.
- Very close to optimal
- Goal
- faster.
- graceful degradation (lt5)
15Tasks evaluated (General Description)
Task\Info Vocab F(kHz) Description
Comm. 2-3 k 8 -Telephone channel. -Not very noisy.
WSJ 5k 5k 16 -WSJ dictation. -Clean
ICSI Meet 12k 11.025 -Meeting task -Noisy -Very challenging task for AM or LM
16Tasks evaluated (Baseline Speed/Accuracy on
2.2GHz P4)
Task\Rec. S3.X (slow) S3.X (untuned) S3.X (tuned)
Comm. 2-3k 10.42 (7-8x?) 11.91 (3.63x) 12.851 (0.89x)
WSJ 5k 6.24 (4.28x) 6.18 (1.07x) 6.73 (0.64x)
ICSI Meet 12k 28.42 (8.25x) 30.63 (3.77x) 32.90 (1.48x)
17Proposed Methods
18Proposed Methods (A glance)
- The goals of the 3 methods
- Method 1 Try to reduce the variance of GMM
Computation time. - Method 2 Try to make CI-GMMS more well-behaved
- Method 2 and a half Try to make Down-sampling to
more well-behaved. - Didnt work. We will try to analyse why.
- Method 3 An idea inspired by the analysis.
19Method 1 Use a fixed upper bound for GMMs
computed in each frame
- Only compute the CD scores if
- Corresponding CI is within CI beam AND
- The number of CD GMMs computed would not exceed a
certain number. - Advantages
- Per utt. GMM computation can be more predictable.
- Get a better bargain in trading off computation.
20Method 1 Results
Task\Rec. S3.X (tuned BL) Method 1
Comm. 2-3k 12.851 (0.89x) 12.834 (0.73x)
WSJ 5k 6.73 (0.64x) Doesnt help
ICSI Meet 12k 32.90 (1.48x) 33.76 (1.15x)
21Method 2 Use the best Gaussian index from the
previous frame.
- Best Gaussian Index What does it mean?
- Index for the best Gaussian score in a GMM.
- Why is it useful?
- Two major reasons from literature
- 1, In reality, the best Gaussian score dominates
the GMM scores. (up to 95-99) - 2, Usually, the collision rate of the best
Gaussian indices in the current and previous
frames is quite high. (Literature say 70) - (Q Are these assumptions really correct?)
22Method 2 (Algorithm)
- In CIGMMS,
- for those non-computed senone (was backed off to
CI) - If the best index of previous frame is available,
assume it is the current best index - Compute GMM
- This improves the smoothing performance of CIGMMS
- Better accuracy
- We can use a tighter beam.
23Results
Task\Rec. S3.X Method 1 S3.X Method 12 S3.X Method 12 small beam
Comm. 2-3k 12.834 (0.73x) 12.650 (0.73x) 12.834 (0.64x)
WSJ 5k 6.73 (0.64x) 6.707 (0.64x) 6.73 (0.62x)
ICSI Meet 12k 35.35 (SVQ) (0.90x) 34.79 (0.93x) 35.35 (0.88x)
24Method 2 and a half (Algorithm)
- In Frame-Dropping
- When last index is available, assume it is the
current best index. - Compute GMM.
25Results
- Not shown
- Because there is no improvement
- Why better approximation doesnt give any gain?
26Comparison of Different types of GMM Scores
Approximation
- GMM scores
- Use current best index
- not plausible because the whole GMM need to
compute first. - Use previous score
- but the current frame information is not used.
- Use previous best index
- If the two assumptions is true, this is a good
method. - Use corresponding CI score
- Replace the CD score by CI score. Hurt the best
performed senones
27Analysis 1 Log Likelihood distortion if current
index use. (Is assumption 1 correct?)
mix 2 4 8 16 32
Comm 1e-8 1e-8 1e-7 1e-7 1e-7
Comm (50bst) 1e-8 1e-8 1e-7 1e-7 1e-7
ICSI 1e-8 1e-8 1e-8 1e-8 1e-8
ICSI (50bst) 1e-8 1e-8 1e-8 1e-8 1e-8
28Analysis 2 Is the collision rate always 70?
- On average, YES
- For the top senones in noisy task, NO
- In the ICSI task, the hit rate for the top 50
senones sometimes will drop to 50
29Analysis 3 Relative magnitude of
distortioncaused by different approximations
- If Distortion by using current index is 1
- In Frame dropping, (significant Degradation)
- Distortion by using previous index is
- Comm. 20 (in 2 mix) , 40 (in 32 mix)
- ICSI. 10 (in 2 mix), 20 (in 32 mix)
- Distortion by using previous score
- Not tested coz I dont have time.
- Ad-hoc observation lt using previous index
- but gtgtbetter than CI score.
- In CI-GMM Selection, not much degradation
- But
- Distortion by using the CI score is 100 times
than using previous index - 200-1000
30Some thoughts
- Why Frame dropping doesnt work if distortion is
not low? - Why CI GMM Selection work if distortion is so
high? - My Answer
- It doesnt matter which approximation was used
- What it matters is whether the best scores are
computed. - CI GMMS still keep the best GMM scores.
- Frame dropping always throwing away the N best
GMM scores.
31Method 3
- Motivations
- At every frame best senone scores still need to
be computed even in frames need to be ignored. - Concerns
- But how to preserve the effectiveness of
down-sampling?
32Method 3
- Another very simple idea.
- Trick Use CIGMMS for every frame.
- But for alternative frame, or frames we want to
ignore, - Multiply a factor F (0lt F lt1) to the CI-GMMS
beam.
33Idea 3 (Results)
Task\Rec. S3.X Method 12 S3.X Method 123
Comm. 2-3k 12.834 (0.64x) 13.11 (0.56x)
WSJ 5k 6.73 (0.63x) 6.90 (0.59x)
ICSI 12k 35.35 (0.89x) 36.43 (0.73x)
34Idea 3 (Discussion)
- Advantage of the scheme
- Best senone scores are still computed when Fgt 0
- More tunable
- Tightening factor is a real number
- Preserve the properties of CI-GMMs and
Down-sampling. - When F0, Equivalent to down-sampling
- When F1, Equivalent to CI-based GMM Selection
- A smoothing between Frame-level and
Gaussian-level. - Idea is dynamic beam
35Summary
Rec\Task Comm 2-3k WSJ 5k ICSI 12k
BL(untuned) 11.91 (3.63x) 6.18 (1.07x) 30.63 (3.77x)
BL (tuned) 12.851 (0.89x) 6.73 (0.64x) 32.90 (1.48x)
Meth 1 12.84 (0.73x) Doesnt Help 33.76 (1.15x)
SVQ - - 35.35 (0.90x)
Meth 2 12.84 (0.64x) 6.73 (0.63x) 35.35 (0.88x)
Meth 3 13.11 (0.56x) 6.90 (0.59x) 36.43 (0.73x)
36Conclusion
- Only 20-25 gain obtained in 3 computation
improvements. (90 last time) - Pruned and non-pruned conditions are different
scenarios - The performance gain of jointly optimizing two
levels would give around 5-10 solid gain. - Its time to leave GMM computation and work some
other things.
37Side note Snapshots of Recent Development of
Sphinx 3.6
- The use of per frame CI GMM score is still not
optimal - Jim, Why dont you use lexical retrieval? Its
very easy to implement. - Still no improvement in search
- Alex, Seriously When can you implement a
search using lexical tree copies? - ICSI/CALO Meeting task give us a lot of fun/pain.
- Sphinx 3 20-30 improvement doesnt always show
up. - Arthur, do you want to say something?
- Some S3 and STs functions look really
funny/awful. - Yitao, Sigh.
- Dave, Evandro, (Shake their heads)
38Acknowledgement
- Thanks
- Ravi
- Alex
- Evandro
- Dave
39Q A