Performance Improvement of GMM Computation in Sphinx 3.6 - PowerPoint PPT Presentation

About This Presentation

Title:

Performance Improvement of GMM Computation in Sphinx 3.6

Description:

We will try to analyse why. Method 3: An idea inspired by the analysis. ... Analysis 1 : Log Likelihood distortion if current index use. (Is assumption 1 correct? ... – PowerPoint PPT presentation

Number of Views:58

Avg rating:3.0/5.0

Slides: 40

Provided by: Arthu61

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Performance Improvement of GMM Computation in Sphinx 3.6

1
Performance Improvement of GMM Computation in
Sphinx 3.6

Arthur Chan
Carnegie Mellon University
Mar 10, 2005

2
This seminar

Not very refined. Some info is missing.
30 slides.
Outline
Overview of GMM Computation in Sphinx 3.X (xlt5)
(lt- This part is not new.)
3 Improvement with Experimental Results (lt- This
part is new.)
Discussion

3
Mechanism of GMM Computation in S3.X(Xlt5)
4
Computation at every frame in Sphinx
5
Computation of GMMs in a Continuous HMM ASR system

Order of Computation
Frames x GMMs x Gaussian x Feature length
Typical Numbers
Frames 1000
GMMs 5000
Gaussians 8 to 64
Feature length 39
Not practical to fully compute them.

6
Overview of GMM Computation in Sphinx 3.X (xlt5)

Philosophy
No single technique will give the best
accuracy/speed trade-off.
Techniques in the literature can be categorized
and combined in a systematic manner.
Four Level Categorization of GMM Computation
Techniques
Frame-level (Down-sampling)
GMM-level (CI-based GMM Selection)
Gaussian-level (VQ-based and SVQ-based Gaussian
Selection)
Component-level (Sub-vector quantization)
3.475-80 speed gain with 5-10 rel.
degradation.

7
Fast GMM Computation Level 1 Frame Selection
-Compute GMM in one and other frame
only -Improvement Compute GMM only if current
frame is similar to previous frame
8
Fast GMM Computation Level 2 Senone/GMM
Selection
GMM
-Compute GMM only when its base-phones are highly
likely -Others backed-off by the base phone
scores. -Similar to -Julius (Akinobu 1999)
-Microsofts Rich Get Richer (RGR) heuristics
9
Fast GMM ComputationLevel 3 Gaussian Selection
Gaussian
GMM
10
Fast GMM Computation Level 4 LDA
Gaussian
Feature Component
11
Frame-level and GMM-level Techniques in S3.X
(Xlt5)

Frame-level
Skipping Frames
Only compute GMMs for 1 out of N frames
Copied the most recently computed frames.
GMM-level
Use CI GMM as an approximate score
If a CD GMM has good CI GMM scores (within a
beam)
Compute the full CD score
If not
Back off to the CI score.
Good CI GMM scores is defined as
Within the beam of the best CI GMM score.

12
Weaknesses of the Frame-level and GMM-level
Techniques

Frame-level
Deteriorate performance significantly (gt10)
Hard to tune.
GMM-level
The number of GMMs computed varied from frame to
frame.
-gtWorst case performance is poor
CI score is used to back off
-gtSearch performance degrades because a lot of
scores are the same.

13
Baseline Experiments
14
Baseline experiments

Tested on 3 tasks
Tested in a tough condition
Manually tuned
Tune on test set (Sorry, couldnt get the dev.
set.)
Optimized one dimension at a time.
Very close to optimal
Goal
faster.
graceful degradation (lt5)

15
Tasks evaluated (General Description)
Task\Info Vocab F(kHz) Description
Comm. 2-3 k 8 -Telephone channel. -Not very noisy.
WSJ 5k 5k 16 -WSJ dictation. -Clean
ICSI Meet 12k 11.025 -Meeting task -Noisy -Very challenging task for AM or LM
16
Tasks evaluated (Baseline Speed/Accuracy on
2.2GHz P4)
Task\Rec. S3.X (slow) S3.X (untuned) S3.X (tuned)
Comm. 2-3k 10.42 (7-8x?) 11.91 (3.63x) 12.851 (0.89x)
WSJ 5k 6.24 (4.28x) 6.18 (1.07x) 6.73 (0.64x)
ICSI Meet 12k 28.42 (8.25x) 30.63 (3.77x) 32.90 (1.48x)
17
Proposed Methods
18
Proposed Methods (A glance)

The goals of the 3 methods
Method 1 Try to reduce the variance of GMM
Computation time.
Method 2 Try to make CI-GMMS more well-behaved
Method 2 and a half Try to make Down-sampling to
more well-behaved.
Didnt work. We will try to analyse why.
Method 3 An idea inspired by the analysis.

19
Method 1 Use a fixed upper bound for GMMs
computed in each frame

Only compute the CD scores if
Corresponding CI is within CI beam AND
The number of CD GMMs computed would not exceed a
certain number.
Advantages
Per utt. GMM computation can be more predictable.
Get a better bargain in trading off computation.

20
Method 1 Results
Task\Rec. S3.X (tuned BL) Method 1
Comm. 2-3k 12.851 (0.89x) 12.834 (0.73x)
WSJ 5k 6.73 (0.64x) Doesnt help
ICSI Meet 12k 32.90 (1.48x) 33.76 (1.15x)
21
Method 2 Use the best Gaussian index from the
previous frame.

Best Gaussian Index What does it mean?
Index for the best Gaussian score in a GMM.
Why is it useful?
Two major reasons from literature
1, In reality, the best Gaussian score dominates
the GMM scores. (up to 95-99)
2, Usually, the collision rate of the best
Gaussian indices in the current and previous
frames is quite high. (Literature say 70)
(Q Are these assumptions really correct?)

22
Method 2 (Algorithm)

In CIGMMS,
for those non-computed senone (was backed off to
CI)
If the best index of previous frame is available,
assume it is the current best index
Compute GMM
This improves the smoothing performance of CIGMMS
Better accuracy
We can use a tighter beam.

23
Results
Task\Rec. S3.X Method 1 S3.X Method 12 S3.X Method 12 small beam
Comm. 2-3k 12.834 (0.73x) 12.650 (0.73x) 12.834 (0.64x)
WSJ 5k 6.73 (0.64x) 6.707 (0.64x) 6.73 (0.62x)
ICSI Meet 12k 35.35 (SVQ) (0.90x) 34.79 (0.93x) 35.35 (0.88x)
24
Method 2 and a half (Algorithm)

In Frame-Dropping
When last index is available, assume it is the
current best index.
Compute GMM.

25
Results

Not shown
Because there is no improvement
Why better approximation doesnt give any gain?

26
Comparison of Different types of GMM Scores
Approximation

GMM scores
Use current best index
not plausible because the whole GMM need to
compute first.
Use previous score
but the current frame information is not used.
Use previous best index
If the two assumptions is true, this is a good
method.
Use corresponding CI score
Replace the CD score by CI score. Hurt the best
performed senones

27
Analysis 1 Log Likelihood distortion if current
index use. (Is assumption 1 correct?)
mix 2 4 8 16 32
Comm 1e-8 1e-8 1e-7 1e-7 1e-7
Comm (50bst) 1e-8 1e-8 1e-7 1e-7 1e-7
ICSI 1e-8 1e-8 1e-8 1e-8 1e-8
ICSI (50bst) 1e-8 1e-8 1e-8 1e-8 1e-8
28
Analysis 2 Is the collision rate always 70?

On average, YES
For the top senones in noisy task, NO
In the ICSI task, the hit rate for the top 50
senones sometimes will drop to 50

29
Analysis 3 Relative magnitude of
distortioncaused by different approximations

If Distortion by using current index is 1
In Frame dropping, (significant Degradation)
Distortion by using previous index is
Comm. 20 (in 2 mix) , 40 (in 32 mix)
ICSI. 10 (in 2 mix), 20 (in 32 mix)
Distortion by using previous score
Not tested coz I dont have time.
Ad-hoc observation lt using previous index
but gtgtbetter than CI score.
In CI-GMM Selection, not much degradation
But
Distortion by using the CI score is 100 times
than using previous index
200-1000

30
Some thoughts

Why Frame dropping doesnt work if distortion is
not low?
Why CI GMM Selection work if distortion is so
high?
My Answer
It doesnt matter which approximation was used
What it matters is whether the best scores are
computed.
CI GMMS still keep the best GMM scores.
Frame dropping always throwing away the N best
GMM scores.

31
Method 3

Motivations
At every frame best senone scores still need to
be computed even in frames need to be ignored.
Concerns
But how to preserve the effectiveness of
down-sampling?

32
Method 3

Another very simple idea.
Trick Use CIGMMS for every frame.
But for alternative frame, or frames we want to
ignore,
Multiply a factor F (0lt F lt1) to the CI-GMMS
beam.

33
Idea 3 (Results)
Task\Rec. S3.X Method 12 S3.X Method 123
Comm. 2-3k 12.834 (0.64x) 13.11 (0.56x)
WSJ 5k 6.73 (0.63x) 6.90 (0.59x)
ICSI 12k 35.35 (0.89x) 36.43 (0.73x)
34
Idea 3 (Discussion)

Advantage of the scheme
Best senone scores are still computed when Fgt 0
More tunable
Tightening factor is a real number
Preserve the properties of CI-GMMs and
Down-sampling.
When F0, Equivalent to down-sampling
When F1, Equivalent to CI-based GMM Selection
A smoothing between Frame-level and
Gaussian-level.
Idea is dynamic beam

35
Summary
Rec\Task Comm 2-3k WSJ 5k ICSI 12k
BL(untuned) 11.91 (3.63x) 6.18 (1.07x) 30.63 (3.77x)
BL (tuned) 12.851 (0.89x) 6.73 (0.64x) 32.90 (1.48x)
Meth 1 12.84 (0.73x) Doesnt Help 33.76 (1.15x)
SVQ - - 35.35 (0.90x)
Meth 2 12.84 (0.64x) 6.73 (0.63x) 35.35 (0.88x)
Meth 3 13.11 (0.56x) 6.90 (0.59x) 36.43 (0.73x)
36
Conclusion

Only 20-25 gain obtained in 3 computation
improvements. (90 last time)
Pruned and non-pruned conditions are different
scenarios
The performance gain of jointly optimizing two
levels would give around 5-10 solid gain.
Its time to leave GMM computation and work some
other things.

37
Side note Snapshots of Recent Development of
Sphinx 3.6