Enhanced Speech Models for Robust Speech Recognition - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Enhanced Speech Models for Robust Speech Recognition

Description:

Enhanced Speech Models for Robust Speech Recognition Juan Arturo Nolazco-Flores Dpto. de Ciencias Computacinales ITESM, campus Monterrey Talk Overview Introduction ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 48
Provided by: JuanArtur
Category:

less

Transcript and Presenter's Notes

Title: Enhanced Speech Models for Robust Speech Recognition


1
Enhanced Speech Modelsfor Robust Speech
Recognition
  • Juan Arturo Nolazco-Flores
  • Dpto. de Ciencias Computacinales
  • ITESM, campus Monterrey

2
Talk Overview
  • Introduction
  • Enhanced-Speech Models
  • Coments and Conclusions

3
Questions?
4
Introduction
  • Problem
  • Automatic Speech Recognition performance is
    highly degraded when speech is corrupted for
    noise (additive noise, convolutional noise,
    etc.).
  • Fact
  • In order to have real speech recognisers, ASR
    should tackle this problem.
  • Knowledge.
  • ASR can be improved either
  • Enhancing speech before recognition
  • Training models in the same environment the ASR
    is going to be used.
  • Challenge
  • Find a simple and efficient technique to solve
    this problem.

5
Recognition using CD-HMM
Recogniser
6
Recognition under Adverse Environments
TIMIT 6632
Digitos 10
7
(No Transcript)
8
Enhancing Speech
  • Features
  • Models are trained with clean speech.
  • Corrupted speech is enhanced.
  • There are a number of well studied techniques
  • Subtract an estimated noise found during
    nonspeech activity.
  • Adaptive noise cancelling (ANC).
  • Successful for low to medium SNR (gt5dB).

9
  • Problems
  • Enhancers are not perfects, therefore
  • the speech is distorted and
  • there are residual noise.

10
Training models in the same environment
  • ASR systems which uses this technique can deal
    with low to high SNR (gt0 dB).
  • In example, for an isolated digit recognition
    task where digits are corrupted for
    helicopter(Lynx) noise, you can get the following
    performance
  • For TIMIT
  • Problem
  • There are many possible environments (no
    practical).

11
  • However, using continuous HMM is possible to
    combine the clean speech model and noise model
    and obtain a noisy speech model.
  • Techniques
  • Model Decomposition
  • Parallel Model Combination-PMC (Mark Gales,
    1996).
  • Cepstrum-Domain Model Combination-CDMC (Kim
    Rose, 2002).

12
Changing to linear domain using PMC
  • Introduction
  • Scheme
  • Diagram

13
Introduction
  • It is an artificial way to simulate that the
    system has been trained in the adverse
    environment the system is going to work.
  • The clean speech CHMM and the noise CHMM
    (estimated with the noise before the word is
    uttered) are combined in the linear domain to
    obtain models adapted to the adverse environment.
  • The combination is based in the assumption that
    that pdf of the state distribution models are
    completely defined by the mean and variance.

14
Scheme
  • For simplicity, it is convenient to combine these
    models in a linear domain.
  • Problem
  • High performance speech recognition is obtained
    in a non-linear domain (i.e. mel-cepstral domain,
    auditory-based coefficients).
  • Solution
  • Transform coefficients to a linear domain.

15
Diagram
Clean speech HMM
Linear domain
C-1()
exp()
PMC HMM
C()

log()
Noise HMM
C-1()
exp()
Simulates training in noise.
16
Enhanced Speech Models
  • Introduction
  • Hypothesis prove
  • Enhanced-Speech Models Combination
  • Changing to linear domain using PMC
  • Diagram
  • Results

17
Introduction
  • When we train in the same environment, we
    obtained the following upper boundry values
  • Since PMC or CDMC (Cepstrum-Domain Model
    Combination) tries to simulated recognition in
    the same environment, hence this are the best
    expected results for these kind of techniques.

18
Introduction
  • How can we improve recognition performance in
    adverse environments?

19
  • Fact
  • The enhancer returns a cleaner speech, but
    distorted.
  • Therefore the question is
  • Is it possible to improve recognition performance
    if the models where trained with this enhaned
    speech?

20
Hypothesis
  • Enhanced-Speech models improve ASR performance in
    noisy environments.

21
In order to prove this hypothesis
  • A signal enhancement scheme has to be selected.
  • Models has to be trained with the enhanced
    speech.
  • Observation vectors input to the recogniser has
    to be processed for the selected enhancement
    scheme.

22
Hypothesis Prove
  • Introduction
  • Spectral Subtraction definition
  • Experiments and results
  • Conclusions

23
Introduction
  • Since it is a simple (and successful) scheme,
    Spectral Subtraction (SS) was selected.

24
Spectral Subtraction Definition
  • Before filterbank
  • After filterbank.

25
Experiments and Results.
  • CHMMs were trained with speech enhanced by SS.
  • Recognition performance was developed over speech
    enhance by SS in the same conditions.

26
Example 1
  • Task isolated digit Recognition
  • Vocabulary Size 10
  • Training Using enhanced speech
  • Noise Helicopter (Lynx)
  • Database Noisex92
  • Real noise is artificially added to clean speech,
    such that no Lombard effect can bias recognition
    performance.

27
Std. HMM
  • bPSS

Training Models in Noise (PMC)
Enhanced-Speech Models
28
Example 2
  • Task continuous digit Recognition
  • Vocabulary size 30 words
  • Training Using enhanced speech
  • Noise White
  • White noise is artificially added to clean
    speech, such that no Lombard effect can bias
    recognition performance.

29
Results
Std. HMM
Noisy Speech Models (PMC)
Enhanced-Speech Models
30
Example 3
  • Task continuous speech Recognition
  • Vocabulary size 6233 words
  • Training Using enhanced speech
  • Noise white
  • Database TIMIT
  • Real noise is artificially added to clean speech,
    such that no Lombard effect can bias recognition
    performance.

31
Results
Std. HMM
Noisy Speech Models (PMC)
Enhanced-Speech Models
32
Conclusions
  • Hypothesis was prove to be true.
  • Challenge
  • Tried these experiments using other databases.
  • How can we combine
  • Enhanced Scheme,
  • the Noise Model
  • and the Clean models
  • such that we do not need to train for all
    enhancement conditions.

33
Conclusions
  • Are all the enhancement schemes suited for
    combination?

34
Conclusions
  • Now, we know that ASR can be improved either
  • Enhancing speech before recognition
  • Training CHMM in the same environment the ASR is
    going to be used.
  • Training CHMM with the same enhancement technique
    that is used to get cleaner speech at
    recognition.
  • Advantage
  • Moreover, training with a better enhancement
    technique means a potential better recognition
    performance.

35
ES-SS Model Combination
  • Introduction
  • ES-Spectral Subtraction Scheme

36
Introduction
  • How can we combine CHMMs without having to train
    for each enhancement and noise condition?
  • Observation For CHMMs the states pdfs are
    completely defined for their means and variances.

37
ES-Spectral Subtraction Scheme
Assuming Y and YD can be modelled as parametric
distributions with means EY and EYD and
variances VY and VYD.
It can be shown that these parameters
are distorted as follows
pdf of Y
38
Prove
where
Re-arranging
39
Hence
40
A(a,P(Y))
Assuming that Y is lognormal
Making
( )
41
ES-PMC Diagram
Adaptation calculations
Clean speech HMM
ES-PMC HMM
C-gtlog
exp()
C()
log()


PMC
Noise HMM
C-gtlog
exp()
Speech is pre-processed using SS.
42
Results
No compensation scheme
Spectral Subtraction
PMC
Spectral Subtraction and parallel
model combination
43
Results
No compensation scheme
Spectral Subtraction
PMC
Spectral Subtraction and parallel
model combination
44
Results
No compensation scheme
Spectral Subtraction
PMC
Spectral Subtraction and parallel
model combination
45
Results
No compensation scheme
Spectral Subtraction
PMC
Spectral Subtraction and parallel
model combination
46
Coments and Conclusions
  • Since training and recognition with the same
    speech enhancement scheme have not been tried
    before, hence a new area of research has been
    open.
  • How can we combine CHMM, such that we do not need
    to train for all enhancement conditions.
  • Are all the enhancement technique suited for CHMM
    combination?
  • We show how to combine enhanced-speech, noise and
    clean CHMM for SS scheme.
  • It was shown that equations for ES-PMC-SS were
    straightforward.

47
  • We expect that training with a better enhancement
    technique we can also obtain better recognition
    performance.
  • Future work
  • Develop equations and experiments for other
    enhancement techniques.
  • Obtain the optimal alpha for SS scheme.
  • Compensate in the Cepstrum Domain.
Write a Comment
User Comments (0)
About PowerShow.com