Title: Speech Recognition in Adverse Environments
1Speech Recognition in Adverse Environments
- Juan Arturo Nolazco-Flores
- Dpto. de Ciencias Computacinales
- ITESM, campus Monterrey
2Talk Overview
- Introduction
- Parallel Model Combination(PMC)
- SS-PMC
- Coments and Conclusions
3END!
4Introduction
- Problem
- Automatic Speech Recognition performance is
highly degraded when speech is corrupted for
noise (additive noise, convolutional noise,
etc.). - Fact
- In order to have real speech recognisers ASR
should tackle this problem. - Knowledge.
- ASR can be improved either
- Enhancing speech before recognition
- Training models in the same environment the ASR
is going to be used.
5Recognition using CD-HMM
Recogniser
6(No Transcript)
7Enhancing Speech
- Features
- Models are trained with clean speech.
- Corrupted speech is enhanced.
- There are a number of well studied techniques
- Subtract an estimated noise found during
nonspeech activity. - Adaptive noise cancelling (ANC).
- Successful for low to medium SNR (gt0).
8- Problems
- Enhancers are not perfects, therefore
- the speech is distorted and
- there are residual noise.
9Training models in the same environment
- ASR systems which uses this technique can deal
with low to high SNR (gt0 dB). - In example, for an isolated digit recognition
task where digits are corrupted for
helicopter(Lynx) noise, you can get the following
performance - For TIMIT
- Problem
- There are many possible environments (no
practical).
10- However, using continuous HMM is possible to
combine the clean speech model and noise model
and obtain a noisy speech model. - Techniques
- Model Decomposition
- Parallel Model Combination
11Parallel Model Combination (PMC)
- Introduction
- Scheme
- Diagram
12Introduction
- It is an artificial way to simulate that the
system has been trained in the adverse
environment the system is going to work. - The clean speech CHMM and the noise CHMM
(estimated with the noise before the word is
uttered) are combined to obtain models adapted to
the adverse environment. - The combination is based in the assumption that
that pdf of the state distribution models are
completely defined by the mean and variance.
13Scheme
- For simplicity, it is convenient to combine these
models in a linear domain. - Problem
- High performance speech recognition is obtained
in a non-linear domain (i.e. mel-cepstral
domain). - Solution
- Transform coefficients to a linear domain.
14Diagram
Clean speech HMM
Linear domain
C-gtlog
exp()
PMC HMM
C()
log()
Noise HMM
C-gtlog
exp()
Simulates training in noise.
15SS-PMC
- Introduction
- Hypothesis prove
- SS Combination Development
- Diagram
- Results
16Introduction
- How can we improve recognition performance in
highly adverse environments (SNRlt0dB)? - Thus, PMC does not represent a solution for
highly adverse environments.
(Upper boundary conditions)
17- On the other hand, we know that the enhancer
returns a cleaner speech, but distorted. - Therefore the question is
- Is it possible to improve recognition performance
if the models where trained with this cleaner
speech?
18Hypothesis
- Training HMMs with enhanced speech makes the HMM
learn both the speech distortion and the residual
noise. - If we show that this hypothesis is true, we can
be confident that indeed we can improve
recognition performance.
19- In order to prove this hypothesis
- An enhancer scheme was selected.
- Models were trained with the enhanced speech.
- Recognition performance was developed in the same
conditions. - The recognition performance obtained for this
experiment will be compared with the recognition
performance obtained when models were trained in
the same environment.
20Hypothesis Prove
- Introduction
- Spectral Subtraction definition
- Experiments and results
- Conclusions
21Introduction
- Since it is a simple (and successful) scheme,
Spectral Subtraction (SS) was selected.
22Spectral Subtraction Definition
- Before filterbank
- After filterbank.
23Experiments and Results.
- CHMMs were trained speech enhanced by SS.
- Recognition performance was developed over speech
enhance by SS in the same conditions.
24Example 1
- Task isolated digit Recognition
- Training Using enhanced speech
- Noise Helicopter
- Database Noisex92
- Real noise is artificially added to clean speech,
such that no Lombard effect can bias recognition
performance.
25Results
Training Models in Noise (PMC)
This values represent the upper boundary of the
ASR system.
26Training Models in Noise (PMC)
27Example 2
- Vocabulario 30 palabras (números I.e. dos mil
quinientos dólares).
28Example 3
29Conclusions
- Hypothesis was prove to be true.
- A new research area is open
- Tried these experiments using other databases.
- How can we combine CHMM, such that we do not need
to train for all enhancement conditions. - Are all the enhancement technique suited for CHMM
combination?
30- Now, we know that ASR can be improved either
- Enhancing speech before recognition
- Training CHMM in the same environment the ASR is
going to be used. - Training CHMM with the same enhancement technique
that is used to get cleaner speech at
recognition. - Advantage
- Moreover, training with a better enhancement
technique means a potential better recognition
performance.
31SS Model Combination
- Introduction
- Spectral Subtraction Scheme
32Introduction
- It was proven, when training and testing CHMMs
using the same enhancement condition the
recognition performance is improved. - How can we combine CHMMs without having to train
for each enhancement and noise condition? - Observation For CHMMs the states pdfs are
completely defined for their means and variances.
33Spectral Subtraction Scheme
Assuming Y and YD can be modelled as parametric
distributions with means EY and EYD and
variances VY and VYD.
It can be shown that these parameters
are distorted as follows
pdf of Y
34Prove
where
Re-arranging
35Hence
36A(a,P(Y))
Assuming that Y is lognormal
Making
( )
37Diagram
Adaptation calculations
Clean speech HMM
SS-PMC HMM
C-gtlog
exp()
C()
log()
PMC
Noise HMM
C-gtlog
exp()
Speech is pre-processed using SS.
38Results
No compensation scheme
Spectral Subtraction
PMC
Spectral Subtraction and parallel
model combination
39Coments and Conclusions
- Since training and recognition with the same
speech enhancement scheme have not been tried
before, hence a new area of research is now open.
- How can we combine CHMM, such that we do not need
to train for all enhancement conditions. - Are all the enhancement technique suited for CHMM
combination? - We show how to combine clean speech and noise
CHMM for SS scheme. - It was shown that equations for CHMM combination,
when SS scheme is used, were straightforward.
40- We expect that training with a better enhancement
technique we can also obtain better recognition
performance. - Future work
- Develop equations and experiments for other
enhancement techniques. - Obtain the optimal alpha for SS scheme.