Title: THE MAPSPACE DENOISING ALGORITHM FOR NOISE ROBUST SPEECH RECOGNITION
1THE MAP-SPACE DENOISING ALGORITHM FOR NOISE
ROBUST SPEECH RECOGNITION
Stereo-based Piecewise Affinity Compensation for
Environment
ShihHsiang 2006
2Reference
- K. Daoudi and C. Cerisara, The map-space
denoising algorithm for noise robust speech
recognition, in Proc. ASRU, Cancuun, Mexico,
2005. - C. Cerisara and K. Daoudi, Evaluation of the
SPACE denoising Algorithm on Aurora2, in Proc.
ICASSP, Toulouse, France, 2005.
3Outline
- Introduction
- The SPACE algorithm
- The MAP-SPACE algorithm
- Experiments
- Conclusions
4Introduction
- Robustness techniques can be roughly classified
into two categories - Signal Processing
- Achieve noise robustness or denoise the signal
- Adaptation Techniques
- Initial canonical models are transformed to
represent the new environment - Require relatively a large amount of adaptation
data - The need of speech data transcription in an
unsupervised mode - In this paper, they proposed an algorithm, called
MAP-SPACE, which can be seen as an hybrid
approach between a denoising and an adaptation
techniques
5The SPACE algorithm
- The first step of the algorithm is to model the
noisy speech y by a mixture of I Gaussians - The second step is to model clean speech x by a
mixture of I Gaussians - Assume that the acoustic region modeled by is
related to the one modeled by by a certain
transformation
prior
prior
6The SPACE algorithm (cont.)
Clean GMM
Noisy GMM
Find a transformation through stereo data
Find a transformation through stereo data
Find a transformation through stereo data
7The SPACE algorithm (cont.)
- Assume the relationship is deterministic and
affine - in each acoustic region
- the mapping transformation is
- This mapping is the basis of their denoising
algorithm, that is, they assume - then clean feature estimate is given by
-
8The SPACE algorithm (cont.)
- In order to estimate the parameters ,
MMSE is used. The objective function to minimize
is - given that the covariances are diagonal
- If and
then for each I, the objective function to
minimize is
where
where
9The SPACE algorithm (cont.)
- The problem is then equivalent to minimize Fi,n
w.r.t ai,n and bi,n, for each i and n. Let - Then the problem becomes
- The solution to this problem is given by
- Thus obtain Ai and bi for each i.
10The MAP-SPACE Algorithm
- The originality of the SPACE algorithm resides in
its ability to be easily modified to handle new
environment - Assume that SPACE has been performed for some
initial training noisy condition and given test
observations - Use these observations in a MAP criterion to
adapt the initial noisy speech GMM to the new
environment - Such adaptation keeps correspondence between the
initial and new model parameters - That is, if the adapted noisy speech GMM is
- for each i, the new estimate is then given by
11The MAP-SPACE Algorithm (cont.)
- For faster implementation, this estimate can be
approximated by - MAP-SPACE has two major advantages w.r.t.
traditional adaptation techniques - No transcription of adaptation data is required
and no assumption on noise alteration is made - The amount of adaptation data require to achieve
good estimate is much lower than in traditional
adaptation - As compared to SPLICE, MAP-SPACE has the major
advantage of making no assumption on the type of
noise which corrupts testing data
12Experiment
- The results are obtained using pseudo-clean
trained HMM - The HMMs modeling the acoustic units are trained
using the denoised features instead of the clean
ones - This strategy provides an approximate modeling of
residual noise - Experiments have been conducted on the clean part
of Aurora 2 test set A, which artificially
corrupted by adding various types of noise at
different SNRs (from NOISEX) - A mixture of 8 Gaussians is used to model the
noisy and clean speech GMMs. - Adaptation of the noisy GMM us realized using the
whole noisy test set
13Experiment (cont.)
- Experiments on white noise with different SNR
- Match scores are the best (but impracticable)
- Clean models give very poor performance 24.7
and 9.8 - Experiments on different noise types (5dB)
- Training is done on a white noise and testing on
some different noise types
14Conclusions
- MAP-SPACE is a combination between an extension
of SPLICE and traditional adaptation techniques - MAP-SPACE have shown is robustness to SNR and
noisy type mismatch - In this paper, they only want to prove their
algorithm is efficient to compensation the
distortion corrupted by various kind of noise
15Practical Implementation (on AURORA2)
- The pseudo-clean features are obtained by
following steps - For each noisy condition of the training corpus,
a GMM is trained using the Maximum Likelihood
criterion - The corresponding clean GMM is trained on the
corresponding clean sentences using the MMSE
criterion - For each testing condition
- First estimate the SNR (Detect the closest
training environment) - For each test sentence, the energy is computed on
a sliding window of 64ms length - SNR the highest energy / the lowest energy (in
the window) - The closest corresponding SNR of the training
corpus is found (4 noisy GMMs)
16Practical Implementation (on AURORA2) (cont.)
- In the second step, the four noisy GMMs for these
four training conditions are compares, and the
one that maximizes the likelihood is chosen - The test corpus is then denoised using the
parameters of this GMM and its corresponding
clean GMM - Evaluation on AURORA2 test set A
- The multi-style training always outperforms than
SPACE - There is no apparent stability in the SPACE
behavior when the number of Gaussians varies
17Joint Modeling of Clean and Noisy Speech
Distribution
- There may exist different reasons that explain
the bad results of SPACE and MAP-SPACE - The most important one is the fact that the
Gaussian correspondence hypothesis is not
verified - This meas, that the MMSE criterion is not the
best way to build such correspondences - Joint modeling of clean and noisy speech
distribution - Modeling P(x,y)
18Comparison of SPACE and SPACE-JM
Average accuracy over the 4 noises and 5 SNRs (5
dB, 10 dB, 15 dB, 20 dB and clean) of aurora2
test set A
Average accuracy over the 4 noises of aurora2
test set A at SNR0 dB
- SPACE-JM are much more stable than SPACE results
- The results not only suggest that a better
Gaussian correspondence is achieved by SPACE-JM,
but also it is robust to SNR change