Kein Folientitel - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Kein Folientitel

Description:

Interfacing Time-Frequency Masking and Speech Recognition ... cardioid microphones 2 reference signals. Speech. TI digits, 10 different speakers, 2min each ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 30
Provided by: dorothea6
Category:

less

Transcript and Presenter's Notes

Title: Kein Folientitel


1
Separation and Robust Recognition of Noisy,
Convolutive Speech Mixtures using Time-Frequency-M
asking and Missing Data Techniques Dorothea
Kolossa, Aleksander Klimas, Reinhold
OrglmeisterBerlin University of Technology
2
Overview
  • Introduction
  • Time-Frequency-Masking
  • Missing Data Techniques
  • Interfacing Time-Frequency Masking and Speech
    Recognition
  • Transformation of uncertain features to
    recognition domain
  • Modified missing data technique
  • Experiments and Results
  • Conclusions

3
Time-Frequency Masking
Speech Signals
Time-Frequency Masking can provide effective
source separation
Masking Function
Mixture
4
Time-Frequency Masking
  • Criterion suitable for Convolutive Mixtures
  • ICA Subband Energies
  • Motivation for criterion
  • ICA is inherently
  • noise robust
  • Frequency Variant ICA
  • model is capable of
  • capturing reverberation
  • effects
  • Audio-Example
  • Car-Data 100kmh

in
no mask
masked
5
Time-Frequency Masking
  • Time-Frequency Masking
  • Average SNR-Gain 3.4dB over ICA alone with
    ICA-band-
  • energy-based mask
  • Robust with respect to noise and reverberation
  • But performance improvement does not translate
    to
  • improved Speech Recognition Performance
  • This is most likely due to feature distortions
  • but human performance on recognizing
    suggests that sufficient information is
    present despite distortion
  • Solution suggested here is based on missing
    data
  • techniques (e.g. Barker, Green Cooke, 2001)

6
Missing Data Techniques
  • Dealing with missing frequency bins
  • Missing Data Techniques
  • Integration over Uncertainty Ranges
  • Marginalization
  • Data Imputation

point estimate
Source Separation
HMM Speech Recognition
x1(t)
x2(t)
S(w)
uncertainty range
Source Separation
HMM Speech Recognition
x1(t)
x2(t)
S(w), sS(w)
7
Missing Data Techniques
Using Variance in Recognition
8
Missing Data Marginalization
Assign an uncertainty range for each unreliable
feature
xu
9
Missing Data Imputation
Data Imputation When no information is available
(the feature is completely uncertain), the
recognizer model mean is used.
10
Missing Data Techniques
Application of Missing Data Techniques to Speech
Recognition
Source Separation works in Frequency
Domain. Speech Recognition works best on MFCCs or
derived features. -gt Mismatch between domains.
Usual Compromise
uncertainty range
Source Separation
Missing Data HMM Speech Recognition
x1(t)
x2(t)
S(w), sS(w)
STFT-Domain
Problem Recognition performs significantly
better in other domains, such that missing
feature approach performs overall worse than
feature reconstruction (Raj, Seltzer, Stern 2004)
11
Uncertain Feature Transformation
Possible Solution
uncertainty range
Source Separation
Missing Data HMM Speech Recognition
x1(t)
x2(t)
S(w), sS(w)
STFT-Domain
Uncertain Feature Transfor- mation
Source Separation
Missing Data HMM Speech Recognition
x1(t)
x2(t)
STFT-Domain
e.g. MFCC-Domain
S(w), sS(w)
Scep, sScep
12
Uncertain Feature Transformation
Transforming Uncertain Features Flow Diagram
All grey boxes contain purely
linear trans- forms.
Microphone Signals Xi
Preprocessing (ICA)
sS(w ,t)
S(w ,t)
o
sabs (w ,t)
S (w ,t)
Feature Transformation for Recognition
Mel Filter Bank
Smel(w ,t)
smel(w ,t)
log
slog(w ,t)
Slog(w ,t)
DCT
scep(t ,t)
D
Acceleration
sd(w ,t)
sdd(w ,t)
Scep(t ,t)
ddcep(t ,t)
dcep(t ,t)
Feature Vector
13
Uncertain Feature Transformation
Transformation of Variance in Nonlinearities Anal
ytical Integration must be done beforehand
manually often no analytic solution
exists or may be hard to find
14
Uncertain Feature Transformation
Transformation of Variance in Nonlinearities Anal
ytical Integration must be done beforehand
manually often no analytic solution
exists or may be hard to find Integrals to
solve
15
Uncertain Feature Transformation
Transformation of Variance in Nonlinearities Anal
ytical Integration must be done beforehand
manually often no analytic solution
exists or may be hard to find Absolute
Value
16
Uncertain Feature Transformation
Transformation of Variance in Nonlinearities Anal
ytical Integration must be done beforehand
manually often no analytic solution
exists or may be hard to find Logarithm (G
ales1996) Analytical solution for
MFCC-Transform used for comparison
17
Uncertain Feature Transformation
Transformation of Variance in Nonlinearities Anal
ytical Integration must be done beforehand
manually often no analytic solution
exists or may be hard to find Alternatives to
analytical Integration Monte-Carlo-Simulation too
expensive computationally Pseudo-Monte-Carlo int
eresting Unscented Transform
18
Uncertain Feature Transformation
  • Transforming Variance via Unscented Transform
  • Generate set of Sigma-Points which capture
    statistics (in contrast to
  • Monte-Carlo Methods, far fewer points are
    needed)
  • Propagate Sigma-Points through nonlinearity
  • Accurate up to those moments, which are
    correctly represented by
  • Sigma-Points

19
Uncertain Feature Transformation
Transforming Uncertain Features Flow Diagram
All grey boxes contain purely
linear trans- forms.
Microphone Signals Xi
Preprocessing (ICA)
sS(w ,t)
S(w ,t)
o
sabs (w ,t)
S (w ,t)
Feature Transformation for Recognition
Mel Filter Bank
Smel(w ,t)
smel(w ,t)
log
slog(w ,t)
Slog(w ,t)
DCT
scep(t ,t)
D
Acceleration
sd(w ,t)
sdd(w ,t)
Scep(t ,t)
ddcep(t ,t)
dcep(t ,t)
Feature Vector
20
Modified Imputation
  • Using Variance in Recognition
  • Define uncertainty interval as function of
    variance and
  • perform Integration
  • or
  • Maximize
  • to obtain modified imputation equations.

21
Modified Imputation
Using Variance in Imputation Maximization
of leads to the observation
estimate which can be found by for
single Gaussians
22
Modified Imputation
Using Variance in Imputation and for
MOG-models is approximated by Finally,
the state-output-probability p(oq) is replaced
by
23
Modified Imputation
Imputation for the case of uncertain information
24
Experiments
Reverberant Room Recordings
25
Experiments
In-Car Recordings
  • Outline
  • 8 channel microphone array
  • Speech reproduced with artificial
  • mouth from CD with TI digits
  • Simultaneous recording with 4
  • cardioid microphones
  • 2 reference signals
  • Speech
  • TI digits,
  • 10 different speakers, 2min each
  • Setup
  • Car Mercedes S 320
  • Pre-Amplifier MidiMan
  • Recorder 16channel, 12kHz, 16bit
  • artificial heads HEAD acoustics

26
Results
Correct on connected digits task config
a config b car 0kmh car 100kmh _at_
-9.6dBSNR noisy 56.1 49.8 50.6
21.5 TF-Masked 59.3 50.8 48.5 20.6 dB
gain 5.5dB 7.1dB 15.4dB 12.3dB Analytic
Integration 88.5 89.0 80.9 66.2 Unscented 87.2
86.1 80.1 68.4 config a and b trev
300ms in-car trev 70ms
27
Conclusions
  • Integration of ICA and Speech Recognition
  • ICA Results can be improved by Time-Frequency
    masking
  • Speech Recognition results can suffer despite
    improvements in SNR
  • To improve recognition performance, variance
    information on
  • spectral features can be derived in frequency
    domain
  • For transforming variances to cepstral domain,
    analytical integration
  • was compared with unscented transform. Results
    are similar.
  • Unscented Transform may provide generally
    applicable solution for
  • coupling TF-signal processing to speech
    recognition in its
  • optimal domain of operation

28
Literature
J. Barker, P. Green and M.P. Cooke Linking
Auditory Scene Analysis and Robust ASR by Missing
Data Techniques, Proceedings WISP 2001,
Stratford, UK. Available at http//hoarsenet.org/
spandh/projects/respite/publications/publications.
html M. Gales Model-Based Techniques for
Noise Robust Speech Recognition PhD Thesis,
Cambridge University, 1996. D. Kolossa and R.
Orglmeister Nonlinear Postprocessing for Blind
Speech Separation, Proceedings ICA2004, Lecture
Notes in Computer Science. B. Raj, M. Seltzer
and R. Stern Reconstruction of Missing Features
for Robust Speech Recognition, Speech
Communication 43, pp. 275-296, 2004. R. Stern
Signal Separation Motivated by Human Auditory
Perception Applications to Automatic Speech
Recognition, in Speech Separation by Humans and
Machines, P. Divenyi (Ed.), Kluwer 2005.
29
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com