Title: Dealing with Acoustic Noise Part 1: Spectral Estimation
1Dealing with Acoustic Noise Part 1 Spectral
Estimation
- Mark Hasegawa-Johnson
- University of Illinois
- Lectures at CLSP WS06
- July 20, 2006
2Noise from the Perspective of the Brainstem
- Something happened!! (VAD)
- What happened?? (Recognition)
3Noise from the Perspective of the Brainstem
- Something happened!! (VAD)
- Which pixels belong to the new event (auditory
scene analysis)? - What are their amplitudes (spectral estimation)?
- What happened?? (Recognition)
4A Speech Recognition Model (Mixture Gaussian)
- Problem find the most probable class (Cn) given
measurements of a function (f(S)) of the speech
signal (S). For example, f(S) might be the PLP
coefficients. - Solution Choose n such that p(Cn,S)gtp(Cm,S) for
n?m. - What is p(Cn,S)? The most effective current
computational model is the mixture Gaussian the
weighted sum of exp(-m-f(S)W2), where
xW2xTWx. 2W is called the precision matrix.
5How Should the Model React to Additive Noise?
- Suppose that we only have a noisy measurement, X
- ... where V is independent noise. Then Cn should
maximize
6Answer By Computing fMMSE
7Definition of fMMSE
8Classical Estimators Maximum Likelihood
9Classical Estimators Maximum A Posteriori
10Classical Estimators Minimum Mean Squared Error
11Functions of Random Variables
- Since fMMSE(S)?f(SMMSE), it is necessary to find
the probability density function of f(S)
directly. Fortunately, the PDF of f(S) can
always be computed from the PDF of S, as follows
12PDF of Speech Time Domain
- Speech samples sn are often modeled as
Gaussian, because Gaussian PDFs are easy to
manipulate. - In fact, noise tends to be Gaussian, but
- Speech PDF is actually a mixture of Gaussian
small-amplitude samples (the noise bits?) and
Laplacian high-amplitude samples (the actual
speech bits?)
13PDF of Speech Filter Outputs, e.g., STFT
- STFT is just a filter with complex-valued filter
coefficients - Central Limit Theorem says Sk should be a 2D
Gaussian, if the window is infinitely long
14Err. Is it REALLY Gaussian?
15If we ignore the previous slide, and pretend that
Sk is a complex Gaussian, then it is possible to
analytically derive the PDFs of Sk2, Sk, and
phase of Sk. They are
16Classical Spectral Estimation Assumptions
- One signal is ongoing (call it the noise), so
its lN is known (averaged over time prior to
voice activity detection). - The MMSE estimate of the other signal, S,
combines two types of information - a priori knowledge, lS ESk2
- a priori SNR xk lS / lN
- Maximum likelihood estimator, SML Xk2-lN
- Maximum likelihood SNR g k Xk2 / lN
17Classical Spectral Estimation ResultsWiener
Filter(Norbert Wiener, 1949)
18Classical Spectral Estimation ResultsMMSE
Spectral Amplitude Estimate(Ephraim and Malah,
1984)
19Classical Spectral Estimation ResultsMMSE Log
Amplitude Estimate (Ephraim and Malah, 1985)
20How does it sound?
21How does it sound?
- MVDR Beamformer eliminates high-frequency noise,
MMSE-logSA eliminates low-frequency noise - MMSE-logSA adds reverberation at low frequencies
reverberation seems to not effect speech
recognition accuracy
22What about, oh, say PLP?
23Loudness Spectrum
- Perceptual LPC (PLP) begins by computing an
estimate of the perceptual loudness spectrum. - Step 1 filter the signal using complex-valued
Bark-scale critical-band filters hkm - Step 2 compress the amplitudes with a
nonlinearity
24MMSE Estimate of the Perceptual Loudness Spectrum
- gPLP requires numerical integration (of u1/3e-u)
- Numerical integration is a lot cheaper than it
used to be (e.g., via lookup table).
25A Conservative Computational Auditory Model
x(t)
Auditory Nerve Carries The Loudness Spectrum
Xk2/3 (Fletcher Hermansky)
PLP Perceptual Formant Extraction (Hermansky)
Tandem Features Perceptual Magnet Effect (Niyog
i)
Average Background Loudness lN
Change Detection
Variable Threshold Synapses Amplitude
Compression (Ghitza, 1986)
MMSE Estimator of New Event E Sk2/3 X
Basilar Membrane Mechanical Filterbank (von
Bekesy)
Speech Feature Extraction
Auditory Scene Analysis
26Change Detection (VAD)
- Is there something new in the signal (hypothesis
H1), or not (hypothesis H0)? - p1p(H1) a priori, p0p(H0) a priori
- Solution compute the log likelihood ratio
- which has a simple form, in terms of gk