A noiseestimation algorithm for highly nonstationary environments - PowerPoint PPT Presentation

1 / 29

About This Presentation

Title:

A noiseestimation algorithm for highly nonstationary environments

Description:

Martin, R., 2001. ... Minimum statistics (MS) (Martin, 2001) ... Panel A Clean Speech Panel C Martins (2001) Panel E - Proposed method ... – PowerPoint PPT presentation

Number of Views:96

Avg rating:3.0/5.0

Slides: 30

Provided by: ShihH

Category:

more less

Transcript and Presenter's Notes

Title: A noiseestimation algorithm for highly nonstationary environments

1
A noise-estimation algorithm for highly
non-stationary environments

Sundarrajan Rangachari, Philipos C. Loizou
Department of Electrical Engineering, University
of Texas at Dallas, P.O. Box 830688, EC 33
Richardson, TX 75083-0688, USA
Presenter Shih-Hsiang(??)

SPEECH COMMUNICATION Vol. 48(2), 2006
2
Reference

Doblinger, G., 1995. Computationally efficient
speech enhancement by spectral minima tracking in
subbands. Proc. Eurospeech 2, 15131516.
Hirsch, H., Ehrlicher, C., 1995. Noise estimation
techniques for robust speech recognition. Proc.
IEEE Internat. Conf. on Acoust. Speech Signal
Process., 153156.
Martin, R., 2001. Noise power spectral density
estimation based on optimal smoothing and minimum
statistics. IEEE Trans. Speech Audio Process. 9
(5), 504512.
Cohen, I., 2002. Noise estimation by minima
controlled recursive averaging for robust speech
enhancement. IEEE Signal Process. Lett. 9 (1),
1215.
Hu, Y., Loizou, P., 2004. Speech enhancement
based on wavelet thresholding the multitaper
spectrum. IEEE Trans. Speech Audio Process. 12
(1), 5967.

3
Introduction

In most speech-enhancement algorithms, it is make
assumed that an estimate of the noise spectrum is
available
It is critical for the performance of
speech-enhancement algorithms
The noise estimate can have a major impact on the
quality of the enhanced signal
If the noise estimate is too low, annoying
residual noise will be audible
If the noise estimate is too high, speech will be
distorted
The simplest approach is to estimate and update
the noise spectrum during the silent segments of
the signal
Using a voice activity detection (VAD) algorithm
It only work satisfactorily in stationary noise,
not work well in more realistic environments
(non-stationary noise)
Hence there is a need to update the noise
spectrum continuously over time

4
Proposed noise-estimation algorithmsCompute
smooth speech power spectrum
Let the noisy speech signal in the time domain be
denoted as
Noisy speech
Clean speech
Additive noise
The smoothed power spectrum of noisy speech is
computed using the following first-order
recursive equation
Smoothing constant
Frame index
Frequency index
Smooth power spectrum
5
Proposed noise-estimation algorithmsTracking the
minimum of noisy speech
Local minimum of the noisy speech power spectrum
ß and ? are constants which are determined
experimentally The look ahead factor ß controls
the adaptation time of the local minimum
6
Proposed noise-estimation algorithmsTracking the
minimum of noisy speech
Plot of noisy speech power spectrum and local
minimum using (3) for a speech degraded by
babble noise at 5 dB SNR at frequency bin k5
7
Proposed noise-estimation algorithmsSpeech-presen
ce probability
Let the ratio of noisy speech power spectrum and
its local minimum be defined as
The power spectrum of noisy speech will be nearly
equal to its local minimum when speech is absent
The speech-presence probability, P(?,k), is
updated using the following first-order recursion
Smoothing constant
The above recursion implicitly exploits the
correlation for speech presence in adjacent
frames
8
Proposed noise-estimation algorithmsSpeech-presen
ce probability
Top panel Plot of estimated speech-presence
probability based on the ratio Sr(?,k) Bottom
panel spectrogram of the clean signal.
9
Proposed noise-estimation algorithmsComputing
frequency-dependent smoothing constants
Using the speech-presence probability estimate,
we compute the time-frequency dependent smoothing
factor as follows
constant
Note that as(? ,k) take values in the range of
ad as(? ,k) 1
Finally, the noise spectrum estimate is updated as
10
Proposed noise-estimation algorithms
Plot of true noise spectrum and the estimated
noise spectrum using our proposed method for a
speech degraded by babble noise at 5 dB SNR and
single frequency f 250 Hz.
11
Comparison with existing algorithmsMinimum
statistics (MS) (Martin, 2001)

Minimum statistics (MS) (Martin, 2001)

the power spectral densities of the noise signal
Equivalent degrees of freedom
12
Comparison with existing algorithmsMinimum
statistics (MS) (Martin, 2001)
Comparison between the noise spectrum estimated
using the proposed algorithm (thick line) and
Martins (Martin, 2001) (dashed line) algorithm
for a sentence corrupted by car noise (t lt 1.8
s) followed by a sentence corrupted by
multi-talker babble (t gt 1.8 s).
13
Comparison with existing algorithmsContinuous
minima tracking (Doblinger, 1995)

Continuous minima tracking (Doblinger, 1995)

Drawback the noise estimate increases whenever
the noisy speech power increases
14
Comparison with existing algorithmsContinuous
minima tracking (Doblinger, 1995)
Top panel Plot of true noise spectrum and
estimated noise spectrum using the proposed
method Bottom panel Plot of true noise spectrum
and estimated noise spectrum using (Doblinger,
1995) Arrows indicate regions where noise is
overestimated.
15
Comparison with existing algorithmsWeighted
average technique (Hirsch et al., 1995)

Weighted average technique (Hirsch and Ehrliche ,
1995)

It fails when there is a sudden increase in
noise level. This will result in a situation
where the noisy speech spectrum will never be
smaller than the threshold, since the threshold
is based on the past noise estimates already very
low. Thus, the noise estimate will not be updated
if the noise power remains at that high level
spectral magnitude
l-th frame
i-th subband
estimate noise magnitude
16
Comparison with existing algorithmsWeighted
average technique (Hirsch et al., 1995)
Comparison of estimated noise spectrum (f 500
Hz) of proposed method (dashed line) with that
of Hirsch and Ehrlicher (1995) (solid line) for a
noisy speech of SNR 20 dB (t lt 1.8 s) followed
by a noisy speech of SNR 5 dB (t gt 1.8 s).
17
Comparison with existing algorithmsMinima
controlled Recursive Averaging (Cohen,2002)

Minima controlled Recursive Averaging (MRCA)
(Cohen,2002)

Given two hypotheses
l-th frame
speech absence
speech presence
k-th subband
Let ?d(k,l)ED(k,l)2 denote the variance of
the noise in the k-th band
speech absence
speech presence
Smoothing constant
Let p(k,l)p(H1(k,l)Y(k,l)) denote the
conditional signal presence probability
where
18
Comparison with existing algorithmsMinima
controlled Recursive Averaging (Cohen,2002)
Let the local energy of the noisy speech be
obtained by smoothing the magnitude squared of
its STFT in time and frequency. In frequency,
use a window function b whose length is 2w1
In time, the smoothing is performed by a first
order recursive averaging, given by
Track the minimum of the local energy
Speech presence is determined by the ratio
between the local energy of the noisy speech and
its minimum within a specified time window
The conditional signal presence probability
calculated as follow
19
Comparison with existing algorithmsMinima
controlled Recursive Averaging (Cohen,2002)

The local minimum in (Cohen, 2002) was found by
tracking the minimum of noisy speech over a
search window spanning L frames, this has some
drawbacks
The minimum is sensitive to outliers
The minima tracking may lag by as many as 2L
frames
In this paper
The estimate of the noise spectrum in the
proposed method is not influenced by the
minimum-search window
the threshold used in our method for identifying
speech presence /absence regions is frequency
dependent while that of Cohen (2002) is fixed for
all frequencies

20
Experimental

Combined with a Wiener-type speech-enhancement
algorithm (Hu and Loizou, 2004)
Estimate the spectral gain function

where C(?,k) is the estimated clean speech
spectrum compute as follow
where v0.001 is a small positive number
µmax is the maximum allowable value of µ ,which
was set to 10 µ0(14 µmax)/5 s25/(µmax-1)
21
Experimental (cont.)

Obtain the enhanced spectrum
Other parameters
ad0.85, ap0.2, ß0.8, ?0.998, ?0.7

where X(?,k) is the enhanced spectrum
where LF and MF are the bins corresponding to 1
and 3 kHz, and Fs is the sampling frequency
22
Experimental ResultSubjective evaluation

Using formal listening tests
Single noise
Sentences were degraded by either multi-talker
babble noise or factory noise
Triplet noise
Three different noise types (multi-talker
babble, factory noise, and white noise) appear in
proper order without any pauses in the middle
The listeners were asked to select from the pair
of stimuli presented the sentence which was more
natural, easier to listen and free of artifacts
A preference score of 100 would indicate that
listeners preferred the proposed method over the
other methods all the time

23
Experimental ResultSubjective evaluation
due to the fact that proposed noise-estimation
algorithm adapts quickly to the highly
non-stationary environments
24
Experimental ResultObjective evaluation

Mean squared error between the true noise
spectrum and the estimated noise spectrum
Log-likelihood ratio (LLR) measure

estimated noise power spectrum
total frame number
true noise power spectrum
linear prediction coefficient vector of the
enhanced speech frame
The LLR is a spectral distance measure which
mainly models the mismatch between the formants
of the original and enhanced signals
autocorrelation matrix of the original (clean)
speech frame
linear prediction coefficient vector of the
original (clean) speech frame
25
Experimental ResultObjective evaluation

Segmental SNR

the set of frames that contain speech
26
Experimental ResultObjective evaluation (MSE)
The MSE results are not consistent with the
preference outcomes, in that lower MSE values did
not suggest better preference. This indicates
that the MSE measure might not be a reliable
measure for assessing performance of
noise-estimation algorithms. 1. this
measure is sensitive to outlier values 2.
it treats noise overestimation and noise
underestimation errors the same
27
Experimental ResultObjective evaluation (LLR and
SNR)
The segmental SNR values and the LLR values shown
in Table 3 were found to be more consistent with
the subjective evaluation results
28
Experimental Result
Panel A Clean Speech Panel C Martins
(2001) Panel E - Proposed method Panel B
Noisy Speech Panel D Cohen (2003)
29
Conclusions

The noise estimate was updated continuously in
every frame using timefrequency smoothing
factors calculated based on speech-presence
probability in each frequency bin of the noisy
speech spectrum
The speech-presence probability was estimated
using the ratio of noisy speech power spectrum to
its local minimum
The update of noise estimate was faster for very
rapidly varying non-stationary noise environments