Dealing with Acoustic Noise Part 1: Spectral Estimation - PowerPoint PPT Presentation

1 / 26

About This Presentation

Title:

Dealing with Acoustic Noise Part 1: Spectral Estimation

Description:

Noise from the Perspective of the Brainstem. Something happened!! (VAD) ... Is there something new in the signal (hypothesis H1), or not (hypothesis H0) ... – PowerPoint PPT presentation

Number of Views:25

Avg rating:3.0/5.0

Slides: 27

Provided by: isleIl

Category:

more less

Transcript and Presenter's Notes

Title: Dealing with Acoustic Noise Part 1: Spectral Estimation

1
Dealing with Acoustic Noise Part 1 Spectral
Estimation

Mark Hasegawa-Johnson
University of Illinois
Lectures at CLSP WS06
July 20, 2006

2
Noise from the Perspective of the Brainstem

Something happened!! (VAD)
What happened?? (Recognition)

3
Noise from the Perspective of the Brainstem

Something happened!! (VAD)
Which pixels belong to the new event (auditory
scene analysis)?
What are their amplitudes (spectral estimation)?
What happened?? (Recognition)

4
A Speech Recognition Model (Mixture Gaussian)

Problem find the most probable class (Cn) given
measurements of a function (f(S)) of the speech
signal (S). For example, f(S) might be the PLP
coefficients.
Solution Choose n such that p(Cn,S)gtp(Cm,S) for
n?m.
What is p(Cn,S)? The most effective current
computational model is the mixture Gaussian the
weighted sum of exp(-m-f(S)W2), where
xW2xTWx. 2W is called the precision matrix.

5
How Should the Model React to Additive Noise?

Suppose that we only have a noisy measurement, X
... where V is independent noise. Then Cn should
maximize

6
Answer By Computing fMMSE
7
Definition of fMMSE
8
Classical Estimators Maximum Likelihood
9
Classical Estimators Maximum A Posteriori
10
Classical Estimators Minimum Mean Squared Error
11
Functions of Random Variables

Since fMMSE(S)?f(SMMSE), it is necessary to find
the probability density function of f(S)
directly. Fortunately, the PDF of f(S) can
always be computed from the PDF of S, as follows

12
PDF of Speech Time Domain

Speech samples sn are often modeled as
Gaussian, because Gaussian PDFs are easy to
manipulate.
In fact, noise tends to be Gaussian, but
Speech PDF is actually a mixture of Gaussian
small-amplitude samples (the noise bits?) and
Laplacian high-amplitude samples (the actual
speech bits?)

13
PDF of Speech Filter Outputs, e.g., STFT

STFT is just a filter with complex-valued filter
coefficients
Central Limit Theorem says Sk should be a 2D
Gaussian, if the window is infinitely long

14
Err. Is it REALLY Gaussian?
15
If we ignore the previous slide, and pretend that
Sk is a complex Gaussian, then it is possible to
analytically derive the PDFs of Sk2, Sk, and
phase of Sk. They are
16
Classical Spectral Estimation Assumptions

One signal is ongoing (call it the noise), so
its lN is known (averaged over time prior to
voice activity detection).
The MMSE estimate of the other signal, S,
combines two types of information
a priori knowledge, lS ESk2
a priori SNR xk lS / lN
Maximum likelihood estimator, SML Xk2-lN
Maximum likelihood SNR g k Xk2 / lN

17
Classical Spectral Estimation ResultsWiener
Filter(Norbert Wiener, 1949)
18
Classical Spectral Estimation ResultsMMSE
Spectral Amplitude Estimate(Ephraim and Malah,
1984)
19
Classical Spectral Estimation ResultsMMSE Log
Amplitude Estimate (Ephraim and Malah, 1985)
20
How does it sound?
21
How does it sound?

MVDR Beamformer eliminates high-frequency noise,
MMSE-logSA eliminates low-frequency noise
MMSE-logSA adds reverberation at low frequencies
reverberation seems to not effect speech
recognition accuracy

22
What about, oh, say PLP?
23
Loudness Spectrum

Perceptual LPC (PLP) begins by computing an
estimate of the perceptual loudness spectrum.
Step 1 filter the signal using complex-valued
Bark-scale critical-band filters hkm
Step 2 compress the amplitudes with a
nonlinearity

24
MMSE Estimate of the Perceptual Loudness Spectrum

gPLP requires numerical integration (of u1/3e-u)
Numerical integration is a lot cheaper than it
used to be (e.g., via lookup table).

25
A Conservative Computational Auditory Model
x(t)
Auditory Nerve Carries The Loudness Spectrum
Xk2/3 (Fletcher Hermansky)
PLP Perceptual Formant Extraction (Hermansky)
Tandem Features Perceptual Magnet Effect (Niyog
i)
Average Background Loudness lN
Change Detection
Variable Threshold Synapses Amplitude
Compression (Ghitza, 1986)
MMSE Estimator of New Event E Sk2/3 X
Basilar Membrane Mechanical Filterbank (von
Bekesy)
Speech Feature Extraction
Auditory Scene Analysis
26
Change Detection (VAD)