Performance Improvement of GMM Computation in Sphinx 3.6 - PowerPoint PPT Presentation

About This Presentation
Title:

Performance Improvement of GMM Computation in Sphinx 3.6

Description:

We will try to analyse why. Method 3: An idea inspired by the analysis. ... Analysis 1 : Log Likelihood distortion if current index use. (Is assumption 1 correct? ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 40
Provided by: Arthu61
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Performance Improvement of GMM Computation in Sphinx 3.6


1
Performance Improvement of GMM Computation in
Sphinx 3.6
  • Arthur Chan
  • Carnegie Mellon University
  • Mar 10, 2005

2
This seminar
  • Not very refined. Some info is missing.
  • 30 slides.
  • Outline
  • Overview of GMM Computation in Sphinx 3.X (xlt5)
    (lt- This part is not new.)
  • 3 Improvement with Experimental Results (lt- This
    part is new.)
  • Discussion

3
Mechanism of GMM Computation in S3.X(Xlt5)
4
Computation at every frame in Sphinx
5
Computation of GMMs in a Continuous HMM ASR system
  • Order of Computation
  • Frames x GMMs x Gaussian x Feature length
  • Typical Numbers
  • Frames 1000
  • GMMs 5000
  • Gaussians 8 to 64
  • Feature length 39
  • Not practical to fully compute them.

6
Overview of GMM Computation in Sphinx 3.X (xlt5)
  • Philosophy
  • No single technique will give the best
    accuracy/speed trade-off.
  • Techniques in the literature can be categorized
    and combined in a systematic manner.
  • Four Level Categorization of GMM Computation
    Techniques
  • Frame-level (Down-sampling)
  • GMM-level (CI-based GMM Selection)
  • Gaussian-level (VQ-based and SVQ-based Gaussian
    Selection)
  • Component-level (Sub-vector quantization)
  • 3.475-80 speed gain with 5-10 rel.
    degradation.

7
Fast GMM Computation Level 1 Frame Selection
-Compute GMM in one and other frame
only -Improvement Compute GMM only if current
frame is similar to previous frame
8
Fast GMM Computation Level 2 Senone/GMM
Selection
GMM
-Compute GMM only when its base-phones are highly
likely -Others backed-off by the base phone
scores. -Similar to -Julius (Akinobu 1999)
-Microsofts Rich Get Richer (RGR) heuristics
9
Fast GMM ComputationLevel 3 Gaussian Selection
Gaussian
GMM
10
Fast GMM Computation Level 4 LDA
Gaussian
Feature Component
11
Frame-level and GMM-level Techniques in S3.X
(Xlt5)
  • Frame-level
  • Skipping Frames
  • Only compute GMMs for 1 out of N frames
  • Copied the most recently computed frames.
  • GMM-level
  • Use CI GMM as an approximate score
  • If a CD GMM has good CI GMM scores (within a
    beam)
  • Compute the full CD score
  • If not
  • Back off to the CI score.
  • Good CI GMM scores is defined as
  • Within the beam of the best CI GMM score.

12
Weaknesses of the Frame-level and GMM-level
Techniques
  • Frame-level
  • Deteriorate performance significantly (gt10)
  • Hard to tune.
  • GMM-level
  • The number of GMMs computed varied from frame to
    frame.
  • -gtWorst case performance is poor
  • CI score is used to back off
  • -gtSearch performance degrades because a lot of
    scores are the same.

13
Baseline Experiments
14
Baseline experiments
  • Tested on 3 tasks
  • Tested in a tough condition
  • Manually tuned
  • Tune on test set (Sorry, couldnt get the dev.
    set.)
  • Optimized one dimension at a time.
  • Very close to optimal
  • Goal
  • faster.
  • graceful degradation (lt5)

15
Tasks evaluated (General Description)
Task\Info Vocab F(kHz) Description
Comm. 2-3 k 8 -Telephone channel. -Not very noisy.
WSJ 5k 5k 16 -WSJ dictation. -Clean
ICSI Meet 12k 11.025 -Meeting task -Noisy -Very challenging task for AM or LM
16
Tasks evaluated (Baseline Speed/Accuracy on
2.2GHz P4)
Task\Rec. S3.X (slow) S3.X (untuned) S3.X (tuned)
Comm. 2-3k 10.42 (7-8x?) 11.91 (3.63x) 12.851 (0.89x)
WSJ 5k 6.24 (4.28x) 6.18 (1.07x) 6.73 (0.64x)
ICSI Meet 12k 28.42 (8.25x) 30.63 (3.77x) 32.90 (1.48x)
17
Proposed Methods
18
Proposed Methods (A glance)
  • The goals of the 3 methods
  • Method 1 Try to reduce the variance of GMM
    Computation time.
  • Method 2 Try to make CI-GMMS more well-behaved
  • Method 2 and a half Try to make Down-sampling to
    more well-behaved.
  • Didnt work. We will try to analyse why.
  • Method 3 An idea inspired by the analysis.

19
Method 1 Use a fixed upper bound for GMMs
computed in each frame
  • Only compute the CD scores if
  • Corresponding CI is within CI beam AND
  • The number of CD GMMs computed would not exceed a
    certain number.
  • Advantages
  • Per utt. GMM computation can be more predictable.
  • Get a better bargain in trading off computation.

20
Method 1 Results
Task\Rec. S3.X (tuned BL) Method 1
Comm. 2-3k 12.851 (0.89x) 12.834 (0.73x)
WSJ 5k 6.73 (0.64x) Doesnt help
ICSI Meet 12k 32.90 (1.48x) 33.76 (1.15x)
21
Method 2 Use the best Gaussian index from the
previous frame.
  • Best Gaussian Index What does it mean?
  • Index for the best Gaussian score in a GMM.
  • Why is it useful?
  • Two major reasons from literature
  • 1, In reality, the best Gaussian score dominates
    the GMM scores. (up to 95-99)
  • 2, Usually, the collision rate of the best
    Gaussian indices in the current and previous
    frames is quite high. (Literature say 70)
  • (Q Are these assumptions really correct?)

22
Method 2 (Algorithm)
  • In CIGMMS,
  • for those non-computed senone (was backed off to
    CI)
  • If the best index of previous frame is available,
    assume it is the current best index
  • Compute GMM
  • This improves the smoothing performance of CIGMMS
  • Better accuracy
  • We can use a tighter beam.

23
Results
Task\Rec. S3.X Method 1 S3.X Method 12 S3.X Method 12 small beam
Comm. 2-3k 12.834 (0.73x) 12.650 (0.73x) 12.834 (0.64x)
WSJ 5k 6.73 (0.64x) 6.707 (0.64x) 6.73 (0.62x)
ICSI Meet 12k 35.35 (SVQ) (0.90x) 34.79 (0.93x) 35.35 (0.88x)
24
Method 2 and a half (Algorithm)
  • In Frame-Dropping
  • When last index is available, assume it is the
    current best index.
  • Compute GMM.

25
Results
  • Not shown
  • Because there is no improvement
  • Why better approximation doesnt give any gain?

26
Comparison of Different types of GMM Scores
Approximation
  • GMM scores
  • Use current best index
  • not plausible because the whole GMM need to
    compute first.
  • Use previous score
  • but the current frame information is not used.
  • Use previous best index
  • If the two assumptions is true, this is a good
    method.
  • Use corresponding CI score
  • Replace the CD score by CI score. Hurt the best
    performed senones

27
Analysis 1 Log Likelihood distortion if current
index use. (Is assumption 1 correct?)
mix 2 4 8 16 32
Comm 1e-8 1e-8 1e-7 1e-7 1e-7
Comm (50bst) 1e-8 1e-8 1e-7 1e-7 1e-7
ICSI 1e-8 1e-8 1e-8 1e-8 1e-8
ICSI (50bst) 1e-8 1e-8 1e-8 1e-8 1e-8
28
Analysis 2 Is the collision rate always 70?
  • On average, YES
  • For the top senones in noisy task, NO
  • In the ICSI task, the hit rate for the top 50
    senones sometimes will drop to 50

29
Analysis 3 Relative magnitude of
distortioncaused by different approximations
  • If Distortion by using current index is 1
  • In Frame dropping, (significant Degradation)
  • Distortion by using previous index is
  • Comm. 20 (in 2 mix) , 40 (in 32 mix)
  • ICSI. 10 (in 2 mix), 20 (in 32 mix)
  • Distortion by using previous score
  • Not tested coz I dont have time.
  • Ad-hoc observation lt using previous index
  • but gtgtbetter than CI score.
  • In CI-GMM Selection, not much degradation
  • But
  • Distortion by using the CI score is 100 times
    than using previous index
  • 200-1000

30
Some thoughts
  • Why Frame dropping doesnt work if distortion is
    not low?
  • Why CI GMM Selection work if distortion is so
    high?
  • My Answer
  • It doesnt matter which approximation was used
  • What it matters is whether the best scores are
    computed.
  • CI GMMS still keep the best GMM scores.
  • Frame dropping always throwing away the N best
    GMM scores.

31
Method 3
  • Motivations
  • At every frame best senone scores still need to
    be computed even in frames need to be ignored.
  • Concerns
  • But how to preserve the effectiveness of
    down-sampling?

32
Method 3
  • Another very simple idea.
  • Trick Use CIGMMS for every frame.
  • But for alternative frame, or frames we want to
    ignore,
  • Multiply a factor F (0lt F lt1) to the CI-GMMS
    beam.

33
Idea 3 (Results)
Task\Rec. S3.X Method 12 S3.X Method 123
Comm. 2-3k 12.834 (0.64x) 13.11 (0.56x)
WSJ 5k 6.73 (0.63x) 6.90 (0.59x)
ICSI 12k 35.35 (0.89x) 36.43 (0.73x)
34
Idea 3 (Discussion)
  • Advantage of the scheme
  • Best senone scores are still computed when Fgt 0
  • More tunable
  • Tightening factor is a real number
  • Preserve the properties of CI-GMMs and
    Down-sampling.
  • When F0, Equivalent to down-sampling
  • When F1, Equivalent to CI-based GMM Selection
  • A smoothing between Frame-level and
    Gaussian-level.
  • Idea is dynamic beam

35
Summary
Rec\Task Comm 2-3k WSJ 5k ICSI 12k
BL(untuned) 11.91 (3.63x) 6.18 (1.07x) 30.63 (3.77x)
BL (tuned) 12.851 (0.89x) 6.73 (0.64x) 32.90 (1.48x)
Meth 1 12.84 (0.73x) Doesnt Help 33.76 (1.15x)
SVQ - - 35.35 (0.90x)
Meth 2 12.84 (0.64x) 6.73 (0.63x) 35.35 (0.88x)
Meth 3 13.11 (0.56x) 6.90 (0.59x) 36.43 (0.73x)
36
Conclusion
  • Only 20-25 gain obtained in 3 computation
    improvements. (90 last time)
  • Pruned and non-pruned conditions are different
    scenarios
  • The performance gain of jointly optimizing two
    levels would give around 5-10 solid gain.
  • Its time to leave GMM computation and work some
    other things.

37
Side note Snapshots of Recent Development of
Sphinx 3.6
  • The use of per frame CI GMM score is still not
    optimal
  • Jim, Why dont you use lexical retrieval? Its
    very easy to implement.
  • Still no improvement in search
  • Alex, Seriously When can you implement a
    search using lexical tree copies?
  • ICSI/CALO Meeting task give us a lot of fun/pain.
  • Sphinx 3 20-30 improvement doesnt always show
    up.
  • Arthur, do you want to say something?
  • Some S3 and STs functions look really
    funny/awful.
  • Yitao, Sigh.
  • Dave, Evandro, (Shake their heads)

38
Acknowledgement
  • Thanks
  • Ravi
  • Alex
  • Evandro
  • Dave

39
Q A
Write a Comment
User Comments (0)
About PowerShow.com