LOG-ENERGY DYNAMIC RANGE NORMALIZATON FOR ROBUST SPEECH RECOGNITION - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

LOG-ENERGY DYNAMIC RANGE NORMALIZATON FOR ROBUST SPEECH RECOGNITION

Description:

... (ASR) has been in commercial application for decades but still has severe ... Data in Test A are added to by noises of Subway, Babble, Car and Exhibition. ... – PowerPoint PPT presentation

Number of Views:77
Avg rating:3.0/5.0
Slides: 19
Provided by: Pili2
Category:

less

Transcript and Presenter's Notes

Title: LOG-ENERGY DYNAMIC RANGE NORMALIZATON FOR ROBUST SPEECH RECOGNITION


1
LOG-ENERGY DYNAMIC RANGE NORMALIZATONFOR ROBUST
SPEECH RECOGNITION
  • Weizhong Zhu and Douglas OShaughnessy
  • INRS-EMT, University of Quebec
  • Montreal, Quebec, H5A 1K6, Canada
  • Presenter Chen, Hung-Bin

ICASSP 2005
2
Outline
  • Introduction
  • Observation
  • Energy dynamic range normalization method1
  • Energy dynamic range normalization method2
  • Experiment
  • Conclusion

3
Introduction
  • Automatic Speech Recognition (ASR) has been in
    commercial application for decades but still has
    severe limitations.
  • Accuracy of speech recognition degrades rapidly
    when speech is distorted by noise.
  • Methods to overcome the effects of noise must be
    applied in order to achieve good recognition
    accuracy in real speech recognition applications
    where various types of noises may exist.
  • Robust speech recognition is one of the most
    challenging areas of speech recognition.

4
Introduction (cont.)
  • Methods of robust speech recognition can be
    classified into two approaches.
  • front-end processing method is to suppress the
    noise and get more robust parameters
  • back-end processing is to compensate for noise
    and adapt the parameters inside the HMM system
  • In this paper, we focus on the first approach.
  • We try to find a more effective way to remove the
    effects of additive noise for the log-energy
    feature.
  • We propose a log-energy dynamic range
    normalization (ERN) method to minimize mismatch
    between training and testing data.
  • The dynamic range of log-energy feature sequences
    of an utterance is normalized to a target dynamic
    range.

5
Observation
  • Comparing with the log-energy feature sequence
  • noisy speech with a 10 dB SNR ratio and that of
    clean speech
  • 1. Elevated minimum value,
  • 2. Valleys are buried by additive noise energy,
    while perks are not affected as much.

6
Energy dynamic range normalization
  • The larger difference on valleys leads to a
    mismatch between the clean and noisy speech.
  • To minimize the mismatch,
  • we suggest an algorithm to scale the log-energy
    feature sequence of clean speech, in which we
    lift valleys while we keep peaks unchanged.

7
Energy dynamic range normalization
  • define a log-energy dynamic range of the sequence
    as follows

8
Energy dynamic range normalization
  • Following are the steps of the proposed
    log-energy feature dynamic range normalization
    algorithm

9
Energy dynamic range normalization
  • Linear scaling equation may not be the best
    solution.
  • We modify the linear scaling equation into
    non-linear scaling equation.

10
Energy dynamic range normalization
  • Figure 2 shows a schematic representation of the
    scaling effect of the proposed algorithm.
  • The scaling effect is decreased as its own value
    goes up and the maximum of the sequence is
    unchanged.

11
Experiment
  • The proposed method was evaluated on the Aurora 2
    database.
  • All recognition tests were conducted using the
    HTK recognition toolkit with the setting defined
    for evaluation.
  • Speech models are eleven whole word HMMs fixed to
    16 states with 3 diagonal Gaussian mixtures per
    state.
  • Two silence models are defined.
  • Data in Test A are added to by noises of Subway,
    Babble, Car and Exhibition.
  • Data in Test B are added to noises of Restaurant,
    Street, Airport and Station.
  • In Test C, besides the additive noise, channel
    distortion is also included.

12
Recognition results
  • The results in this section are defined in terms
    of relative improvement (R.I.)
  • where NewScore, Baseline are recognition
    accuracies for each test using proposed and
    reference algorithms,

13
Experiment
  • Results of table 1 show relative improvements
    with the different target log-energy dynamic
    range.

14
Experiment
  • The results of relative improvement in different
    target dynamic ranges using this non-linear
    normalization method are shown in Table 2.
  • It achieves a 30.83 highest overall relative
    improvement when the target range is set to 14
    dB.

15
Experiment
  • Performance comparisons between linear and
    nonlinear normalization methods for average
    relative improvement at different SNR levels are
    shown in table 3.
  • The mean recognition accuracy for each test set
    is obtained by taking the average of the
    recognition accuracies measured in 20, 15, 10, 5
    and 0 dB SNR.

16
Experiment
  • Experiment 2 in Table 4
  • Here in experiment 2, we answer the questions
  • (1) what are the results of techniques like
    cepstral mean and variance normalizations?
  • (2) Can the proposed algorithms combine with
    these techniques get an even better result?
  • CMN refers to cepstral mean normalization
  • process with all 13 parameters
  • CVN for cepstral variance normalization
  • process with all 13 parameters
  • ERN(L) for proposed methods is linear
    respectively
  • ERN(N) for proposed methods is non-linear
    respectively

17
Experiment
18
Conclusion
  • A log-energy dynamic range normalization
    technique is introduced to improve ASR
    performance in noisy conditions.
  • Reducing mismatch in log-energy leads to a large
    recognition improvement.
  • It is also confirmed that the proposed algorithm
    can be combined with the cepstral mean or
    variance normalization techniques to achieve an
    even better result.
Write a Comment
User Comments (0)
About PowerShow.com