Title: A noiseestimation algorithm for highly nonstationary environments
1A noise-estimation algorithm for highly
non-stationary environments
- Sundarrajan Rangachari, Philipos C. Loizou
- Department of Electrical Engineering, University
of Texas at Dallas, P.O. Box 830688, EC 33
Richardson, TX 75083-0688, USA - Presenter Shih-Hsiang(??)
SPEECH COMMUNICATION Vol. 48(2), 2006
2Reference
- Doblinger, G., 1995. Computationally efficient
speech enhancement by spectral minima tracking in
subbands. Proc. Eurospeech 2, 15131516. - Hirsch, H., Ehrlicher, C., 1995. Noise estimation
techniques for robust speech recognition. Proc.
IEEE Internat. Conf. on Acoust. Speech Signal
Process., 153156. - Martin, R., 2001. Noise power spectral density
estimation based on optimal smoothing and minimum
statistics. IEEE Trans. Speech Audio Process. 9
(5), 504512. - Cohen, I., 2002. Noise estimation by minima
controlled recursive averaging for robust speech
enhancement. IEEE Signal Process. Lett. 9 (1),
1215. - Hu, Y., Loizou, P., 2004. Speech enhancement
based on wavelet thresholding the multitaper
spectrum. IEEE Trans. Speech Audio Process. 12
(1), 5967.
3Introduction
- In most speech-enhancement algorithms, it is make
assumed that an estimate of the noise spectrum is
available - It is critical for the performance of
speech-enhancement algorithms - The noise estimate can have a major impact on the
quality of the enhanced signal - If the noise estimate is too low, annoying
residual noise will be audible - If the noise estimate is too high, speech will be
distorted - The simplest approach is to estimate and update
the noise spectrum during the silent segments of
the signal - Using a voice activity detection (VAD) algorithm
- It only work satisfactorily in stationary noise,
not work well in more realistic environments
(non-stationary noise) - Hence there is a need to update the noise
spectrum continuously over time
4Proposed noise-estimation algorithmsCompute
smooth speech power spectrum
Let the noisy speech signal in the time domain be
denoted as
Noisy speech
Clean speech
Additive noise
The smoothed power spectrum of noisy speech is
computed using the following first-order
recursive equation
Smoothing constant
Frame index
Frequency index
Smooth power spectrum
5Proposed noise-estimation algorithmsTracking the
minimum of noisy speech
Local minimum of the noisy speech power spectrum
ß and ? are constants which are determined
experimentally The look ahead factor ß controls
the adaptation time of the local minimum
6Proposed noise-estimation algorithmsTracking the
minimum of noisy speech
Plot of noisy speech power spectrum and local
minimum using (3) for a speech degraded by
babble noise at 5 dB SNR at frequency bin k5
7Proposed noise-estimation algorithmsSpeech-presen
ce probability
Let the ratio of noisy speech power spectrum and
its local minimum be defined as
The power spectrum of noisy speech will be nearly
equal to its local minimum when speech is absent
The speech-presence probability, P(?,k), is
updated using the following first-order recursion
Smoothing constant
The above recursion implicitly exploits the
correlation for speech presence in adjacent
frames
8Proposed noise-estimation algorithmsSpeech-presen
ce probability
Top panel Plot of estimated speech-presence
probability based on the ratio Sr(?,k) Bottom
panel spectrogram of the clean signal.
9Proposed noise-estimation algorithmsComputing
frequency-dependent smoothing constants
Using the speech-presence probability estimate,
we compute the time-frequency dependent smoothing
factor as follows
constant
Note that as(? ,k) take values in the range of
ad as(? ,k) 1
Finally, the noise spectrum estimate is updated as
10Proposed noise-estimation algorithms
Plot of true noise spectrum and the estimated
noise spectrum using our proposed method for a
speech degraded by babble noise at 5 dB SNR and
single frequency f 250 Hz.
11Comparison with existing algorithmsMinimum
statistics (MS) (Martin, 2001)
- Minimum statistics (MS) (Martin, 2001)
the power spectral densities of the noise signal
Equivalent degrees of freedom
12Comparison with existing algorithmsMinimum
statistics (MS) (Martin, 2001)
Comparison between the noise spectrum estimated
using the proposed algorithm (thick line) and
Martins (Martin, 2001) (dashed line) algorithm
for a sentence corrupted by car noise (t lt 1.8
s) followed by a sentence corrupted by
multi-talker babble (t gt 1.8 s).
13Comparison with existing algorithmsContinuous
minima tracking (Doblinger, 1995)
- Continuous minima tracking (Doblinger, 1995)
Drawback the noise estimate increases whenever
the noisy speech power increases
14Comparison with existing algorithmsContinuous
minima tracking (Doblinger, 1995)
Top panel Plot of true noise spectrum and
estimated noise spectrum using the proposed
method Bottom panel Plot of true noise spectrum
and estimated noise spectrum using (Doblinger,
1995) Arrows indicate regions where noise is
overestimated.
15Comparison with existing algorithmsWeighted
average technique (Hirsch et al., 1995)
- Weighted average technique (Hirsch and Ehrliche ,
1995)
It fails when there is a sudden increase in
noise level. This will result in a situation
where the noisy speech spectrum will never be
smaller than the threshold, since the threshold
is based on the past noise estimates already very
low. Thus, the noise estimate will not be updated
if the noise power remains at that high level
spectral magnitude
l-th frame
i-th subband
estimate noise magnitude
16Comparison with existing algorithmsWeighted
average technique (Hirsch et al., 1995)
Comparison of estimated noise spectrum (f 500
Hz) of proposed method (dashed line) with that
of Hirsch and Ehrlicher (1995) (solid line) for a
noisy speech of SNR 20 dB (t lt 1.8 s) followed
by a noisy speech of SNR 5 dB (t gt 1.8 s).
17Comparison with existing algorithmsMinima
controlled Recursive Averaging (Cohen,2002)
- Minima controlled Recursive Averaging (MRCA)
(Cohen,2002)
Given two hypotheses
l-th frame
speech absence
speech presence
k-th subband
Let ?d(k,l)ED(k,l)2 denote the variance of
the noise in the k-th band
speech absence
speech presence
Smoothing constant
Let p(k,l)p(H1(k,l)Y(k,l)) denote the
conditional signal presence probability
where
18Comparison with existing algorithmsMinima
controlled Recursive Averaging (Cohen,2002)
Let the local energy of the noisy speech be
obtained by smoothing the magnitude squared of
its STFT in time and frequency. In frequency,
use a window function b whose length is 2w1
In time, the smoothing is performed by a first
order recursive averaging, given by
Track the minimum of the local energy
Speech presence is determined by the ratio
between the local energy of the noisy speech and
its minimum within a specified time window
The conditional signal presence probability
calculated as follow
19Comparison with existing algorithmsMinima
controlled Recursive Averaging (Cohen,2002)
- The local minimum in (Cohen, 2002) was found by
tracking the minimum of noisy speech over a
search window spanning L frames, this has some
drawbacks - The minimum is sensitive to outliers
- The minima tracking may lag by as many as 2L
frames - In this paper
- The estimate of the noise spectrum in the
proposed method is not influenced by the
minimum-search window - the threshold used in our method for identifying
speech presence /absence regions is frequency
dependent while that of Cohen (2002) is fixed for
all frequencies
20Experimental
- Combined with a Wiener-type speech-enhancement
algorithm (Hu and Loizou, 2004) - Estimate the spectral gain function
-
where C(?,k) is the estimated clean speech
spectrum compute as follow
where v0.001 is a small positive number
µmax is the maximum allowable value of µ ,which
was set to 10 µ0(14 µmax)/5 s25/(µmax-1)
21Experimental (cont.)
- Obtain the enhanced spectrum
- Other parameters
- ad0.85, ap0.2, ß0.8, ?0.998, ?0.7
where X(?,k) is the enhanced spectrum
where LF and MF are the bins corresponding to 1
and 3 kHz, and Fs is the sampling frequency
22Experimental ResultSubjective evaluation
- Using formal listening tests
- Single noise
- Sentences were degraded by either multi-talker
babble noise or factory noise - Triplet noise
- Three different noise types (multi-talker
babble, factory noise, and white noise) appear in
proper order without any pauses in the middle - The listeners were asked to select from the pair
of stimuli presented the sentence which was more
natural, easier to listen and free of artifacts - A preference score of 100 would indicate that
listeners preferred the proposed method over the
other methods all the time
23Experimental ResultSubjective evaluation
due to the fact that proposed noise-estimation
algorithm adapts quickly to the highly
non-stationary environments
24Experimental ResultObjective evaluation
- Mean squared error between the true noise
spectrum and the estimated noise spectrum - Log-likelihood ratio (LLR) measure
estimated noise power spectrum
total frame number
true noise power spectrum
linear prediction coefficient vector of the
enhanced speech frame
The LLR is a spectral distance measure which
mainly models the mismatch between the formants
of the original and enhanced signals
autocorrelation matrix of the original (clean)
speech frame
linear prediction coefficient vector of the
original (clean) speech frame
25Experimental ResultObjective evaluation
the set of frames that contain speech
26Experimental ResultObjective evaluation (MSE)
The MSE results are not consistent with the
preference outcomes, in that lower MSE values did
not suggest better preference. This indicates
that the MSE measure might not be a reliable
measure for assessing performance of
noise-estimation algorithms. 1. this
measure is sensitive to outlier values 2.
it treats noise overestimation and noise
underestimation errors the same
27Experimental ResultObjective evaluation (LLR and
SNR)
The segmental SNR values and the LLR values shown
in Table 3 were found to be more consistent with
the subjective evaluation results
28Experimental Result
Panel A Clean Speech Panel C Martins
(2001) Panel E - Proposed method Panel B
Noisy Speech Panel D Cohen (2003)
29Conclusions
- The noise estimate was updated continuously in
every frame using timefrequency smoothing
factors calculated based on speech-presence
probability in each frequency bin of the noisy
speech spectrum - The speech-presence probability was estimated
using the ratio of noisy speech power spectrum to
its local minimum - The update of noise estimate was faster for very
rapidly varying non-stationary noise environments