Title: LOW-RESOURCE NOISE-ROBUST FEATURE POST-PROCESSING ON AURORA 2.0
1LOW-RESOURCE NOISE-ROBUST FEATURE POST-PROCESSING
ON AURORA 2.0
- Chia-Ping Chen, Jeff Bilmes and Katrin Kirchhoff
- SSLI Lab
- Department of Electrical Engineering
- University of Washington
- Presenter Shih-Hsiang(??)
ICSLP 2002
2Introduction
- The performance of ASR systems often decreases
dramatically when the noise level increases - The degradation is minor when the signal-to-noise
ratio (SNR) is high, but quite significant at low
SNR level - In the past, a variety of techniques have been
proposed - Principle component analysis and a discriminative
neural network (Ellis et al. 2001) - Missing Data theory (Cooke et al. 2001)
- Voice activity detector (VAD) and variable frame
rate are used to drop noisy feature vector to
reduce insertion error (John et al. 2001) - Nonlinear spectral subtraction, noise masking,
feature filters, and model adaptation (Lieb et
al. 2001) - data-driven temporal filters, on-line mean and
variance normalization, voice activity detection,
and server side discriminate features are
integrated together to improve noise robustness
(Morgan et al. 2001) - etc
3Literature Review
- Ellis et al. 2001
- John et al. 2001
- Variable frame rate processing
- An observation vector is discarded if it does not
differ much from the previous observation vector.
In our implementation of VFR, frame-to-frame
variation is estimated as the Euclidean norm of
the sub-vector corresponding to the
delta-cepstrum. - Voice activity detection
4Literature Review
5Proposed method
- The first step is standard mean subtraction (MS)
- The second step is variance normalization (VN)
- The third step is auto-regression moving average
(ARMA)
feature vector (cepstral coefficient)
the order of the ARMA filter
6Choosing a proper order M of the filter
The transfer function is
The frequency response of the ARMA filter of
order M is
There are zeros in the frequency
response of the ARMA filter is approximately
proportional to its order It support that a
large M will perform poorly since it could filter
out important speech information
7Gain and phase shifts of the ARMA filter
8The time sequences of the cepstral coefficient c1
for the digit string 5376869 corrupted with
different levels of noises
9Evaluation
- Evaluate on Aurora 2.0 noisy digits database
- Two training sets and three test sets
- Training sets clean training set only /
multi-condition speech - Test sets stationary-noise sets /
non-stationary-noise sets / convolutional noise - 7 different levels of noises
- Clean, 20dB, 15 dB, 10dB, 5dB, 0dB, -5dB
- Recognizer
- Simple HMM-based system using whole-word models
- Zero Nine and Oh 16 states per word, 3
mixture Gaussian per state - silence 3-states
10Recognition results
Word accuracies (as percentages)
Top multi-condition training Bottom clean
training
11A comparison of different orders of the ARMA
filtering
- A small M will retain the short-term cepstral
information but is more vulnerable to noise - A large M will make the processed features less
corrupted by noise, but the short-term cepstral
information will be lost.
Top multi-condition training Bottom clean
training
12Test the effectiveness of proposed technique
- The results show that while variance
normalization and mean subtraction improves
performance over the baseline, the addition of
the ARMA filter provides significant further
improvements
13Comparison of different filter
- causal ARMA filter
- non-causal MA filter
- causal MA filter
14Comparison of different filter (cont.)
Top multi-condition training Bottom clean
training