Dealing with Acoustic Noise Part 1: Spectral Estimation - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Dealing with Acoustic Noise Part 1: Spectral Estimation

Description:

Noise from the Perspective of the Brainstem. Something happened!! (VAD) ... Is there something new in the signal (hypothesis H1), or not (hypothesis H0) ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 27
Provided by: isleIl
Category:

less

Transcript and Presenter's Notes

Title: Dealing with Acoustic Noise Part 1: Spectral Estimation


1
Dealing with Acoustic Noise Part 1 Spectral
Estimation
  • Mark Hasegawa-Johnson
  • University of Illinois
  • Lectures at CLSP WS06
  • July 20, 2006

2
Noise from the Perspective of the Brainstem
  • Something happened!! (VAD)
  • What happened?? (Recognition)

3
Noise from the Perspective of the Brainstem
  • Something happened!! (VAD)
  • Which pixels belong to the new event (auditory
    scene analysis)?
  • What are their amplitudes (spectral estimation)?
  • What happened?? (Recognition)

4
A Speech Recognition Model (Mixture Gaussian)
  • Problem find the most probable class (Cn) given
    measurements of a function (f(S)) of the speech
    signal (S). For example, f(S) might be the PLP
    coefficients.
  • Solution Choose n such that p(Cn,S)gtp(Cm,S) for
    n?m.
  • What is p(Cn,S)? The most effective current
    computational model is the mixture Gaussian the
    weighted sum of exp(-m-f(S)W2), where
    xW2xTWx. 2W is called the precision matrix.

5
How Should the Model React to Additive Noise?
  • Suppose that we only have a noisy measurement, X
  • ... where V is independent noise. Then Cn should
    maximize

6
Answer By Computing fMMSE
7
Definition of fMMSE
8
Classical Estimators Maximum Likelihood
9
Classical Estimators Maximum A Posteriori
10
Classical Estimators Minimum Mean Squared Error
11
Functions of Random Variables
  • Since fMMSE(S)?f(SMMSE), it is necessary to find
    the probability density function of f(S)
    directly. Fortunately, the PDF of f(S) can
    always be computed from the PDF of S, as follows

12
PDF of Speech Time Domain
  • Speech samples sn are often modeled as
    Gaussian, because Gaussian PDFs are easy to
    manipulate.
  • In fact, noise tends to be Gaussian, but
  • Speech PDF is actually a mixture of Gaussian
    small-amplitude samples (the noise bits?) and
    Laplacian high-amplitude samples (the actual
    speech bits?)

13
PDF of Speech Filter Outputs, e.g., STFT
  • STFT is just a filter with complex-valued filter
    coefficients
  • Central Limit Theorem says Sk should be a 2D
    Gaussian, if the window is infinitely long

14
Err. Is it REALLY Gaussian?
15
If we ignore the previous slide, and pretend that
Sk is a complex Gaussian, then it is possible to
analytically derive the PDFs of Sk2, Sk, and
phase of Sk. They are
16
Classical Spectral Estimation Assumptions
  • One signal is ongoing (call it the noise), so
    its lN is known (averaged over time prior to
    voice activity detection).
  • The MMSE estimate of the other signal, S,
    combines two types of information
  • a priori knowledge, lS ESk2
  • a priori SNR xk lS / lN
  • Maximum likelihood estimator, SML Xk2-lN
  • Maximum likelihood SNR g k Xk2 / lN

17
Classical Spectral Estimation ResultsWiener
Filter(Norbert Wiener, 1949)
18
Classical Spectral Estimation ResultsMMSE
Spectral Amplitude Estimate(Ephraim and Malah,
1984)
19
Classical Spectral Estimation ResultsMMSE Log
Amplitude Estimate (Ephraim and Malah, 1985)
20
How does it sound?
21
How does it sound?
  • MVDR Beamformer eliminates high-frequency noise,
    MMSE-logSA eliminates low-frequency noise
  • MMSE-logSA adds reverberation at low frequencies
    reverberation seems to not effect speech
    recognition accuracy

22
What about, oh, say PLP?
23
Loudness Spectrum
  • Perceptual LPC (PLP) begins by computing an
    estimate of the perceptual loudness spectrum.
  • Step 1 filter the signal using complex-valued
    Bark-scale critical-band filters hkm
  • Step 2 compress the amplitudes with a
    nonlinearity

24
MMSE Estimate of the Perceptual Loudness Spectrum
  • gPLP requires numerical integration (of u1/3e-u)
  • Numerical integration is a lot cheaper than it
    used to be (e.g., via lookup table).

25
A Conservative Computational Auditory Model
x(t)
Auditory Nerve Carries The Loudness Spectrum
Xk2/3 (Fletcher Hermansky)
PLP Perceptual Formant Extraction (Hermansky)
Tandem Features Perceptual Magnet Effect (Niyog
i)
Average Background Loudness lN
Change Detection
Variable Threshold Synapses Amplitude
Compression (Ghitza, 1986)
MMSE Estimator of New Event E Sk2/3 X
Basilar Membrane Mechanical Filterbank (von
Bekesy)
Speech Feature Extraction
Auditory Scene Analysis
26
Change Detection (VAD)
  • Is there something new in the signal (hypothesis
    H1), or not (hypothesis H0)?
  • p1p(H1) a priori, p0p(H0) a priori
  • Solution compute the log likelihood ratio
  • which has a simple form, in terms of gk
Write a Comment
User Comments (0)
About PowerShow.com