LOW-RESOURCE NOISE-ROBUST FEATURE POST-PROCESSING ON AURORA 2.0

About This Presentation

Title:

LOW-RESOURCE NOISE-ROBUST FEATURE POST-PROCESSING ON AURORA 2.0

Description:

Chia-Ping Chen, Jeff Bilmes and Katrin Kirchhoff. SSLI Lab. Department of ... The performance of ASR systems often decreases dramatically ... theory (Cooke et ... – PowerPoint PPT presentation

Number of Views:24

Avg rating:3.0/5.0

Slides: 15

Provided by: ShihH

Category:

more less

Transcript and Presenter's Notes

Title: LOW-RESOURCE NOISE-ROBUST FEATURE POST-PROCESSING ON AURORA 2.0

1
LOW-RESOURCE NOISE-ROBUST FEATURE POST-PROCESSING
ON AURORA 2.0

Chia-Ping Chen, Jeff Bilmes and Katrin Kirchhoff
SSLI Lab
Department of Electrical Engineering
University of Washington
Presenter Shih-Hsiang(??)

ICSLP 2002
2
Introduction

The performance of ASR systems often decreases
dramatically when the noise level increases
The degradation is minor when the signal-to-noise
ratio (SNR) is high, but quite significant at low
SNR level
In the past, a variety of techniques have been
proposed
Principle component analysis and a discriminative
neural network (Ellis et al. 2001)
Missing Data theory (Cooke et al. 2001)
Voice activity detector (VAD) and variable frame
rate are used to drop noisy feature vector to
reduce insertion error (John et al. 2001)
Nonlinear spectral subtraction, noise masking,
feature filters, and model adaptation (Lieb et
al. 2001)
data-driven temporal filters, on-line mean and
variance normalization, voice activity detection,
and server side discriminate features are
integrated together to improve noise robustness
(Morgan et al. 2001)
etc

3
Literature Review

Ellis et al. 2001
John et al. 2001
Variable frame rate processing
An observation vector is discarded if it does not
differ much from the previous observation vector.
In our implementation of VFR, frame-to-frame
variation is estimated as the Euclidean norm of
the sub-vector corresponding to the
delta-cepstrum.
Voice activity detection

4
Literature Review

Morgan et al. (2001)

5
Proposed method

The first step is standard mean subtraction (MS)
The second step is variance normalization (VN)
The third step is auto-regression moving average
(ARMA)

feature vector (cepstral coefficient)
the order of the ARMA filter
6
Choosing a proper order M of the filter
The transfer function is
The frequency response of the ARMA filter of
order M is
There are zeros in the frequency
response of the ARMA filter is approximately
proportional to its order It support that a
large M will perform poorly since it could filter
out important speech information
7
Gain and phase shifts of the ARMA filter
8
The time sequences of the cepstral coefficient c1
for the digit string 5376869 corrupted with
different levels of noises
9
Evaluation

Evaluate on Aurora 2.0 noisy digits database
Two training sets and three test sets
Training sets clean training set only /
multi-condition speech
Test sets stationary-noise sets /
non-stationary-noise sets / convolutional noise
7 different levels of noises
Clean, 20dB, 15 dB, 10dB, 5dB, 0dB, -5dB
Recognizer
Simple HMM-based system using whole-word models
Zero Nine and Oh 16 states per word, 3
mixture Gaussian per state
silence 3-states

10
Recognition results
Word accuracies (as percentages)
Top multi-condition training Bottom clean
training
11
A comparison of different orders of the ARMA
filtering

A small M will retain the short-term cepstral
information but is more vulnerable to noise
A large M will make the processed features less
corrupted by noise, but the short-term cepstral
information will be lost.

Top multi-condition training Bottom clean
training
12
Test the effectiveness of proposed technique

The results show that while variance
normalization and mean subtraction improves
performance over the baseline, the addition of
the ARMA filter provides significant further
improvements

13
Comparison of different filter