Title: Modeling of Mel Frequency Features for Non Stationary Noise
1Modeling of Mel Frequency Features for Non
Stationary Noise
I.Andrianakis P.R.White
Signal Processing and Control Group Institute of
Sound and Vibration Research University of
Southampton
2Outline
- Introduction.
- Mel Frequency Log Spectrum and Cepstrum.
- Distribution of the MFLS and MFC coefficients.
- Physical Interpretation of the distributions.
- Modeling of data with Gaussian Mixture Models and
the EM algorithm. - Results.
- Summary Further work.
3Introduction
- When working with speech or noise, often one
wishes to extract some salient features of the
signals so that instead of working with the
whole data set to concentrate on a smaller set
that conveys most significant information. - Such features are the Mel Frequency Log Spectral
and Cepstral Coefficients. - Their favourable property is that they focus
mostly on low frequency components, where most of
the car or train noise energy exists, while
compacting the usually lower energy - higher
frequencies. -
- We shall present some results from our research
on the application of MFLSCs and MFCCs to noise
signals and their modelling with Gaussian Mixture
Models.
4Mel Frequency Log Spectrum and Cepstrum
Mel Frequency Filter Banks
Noise
STFT
.2
Log( . )
DCT( . )
Mel Frequency Cepstrum
Mel Frequency Log Spectrum
5Rationale Behind the Use of Mel Frequency
Features
Mel frequency warping focuses in low
frequencies (lt1Khz) where the filter bank
spacing is linear. Energy above 1KHz is
compacted as the filters have logarithmically
increasing pass bands. Suitable for
representing ambient noise (i.e. in cars and
trains) because the energy is concentrated in the
lower frequencies.
6 Rationale Behind the Use of Mel Frequency
Features (II)
Filter banks are closely spaced where the
signals energy is higher.
7Comparison With LPC
13 LPC Spectrum
20 Mel Spectrum
PSD
Train
Car
Frequency Hz
8Distribution of the Mel Frequency Coefficients
- We are concerned with the form of the probability
distribution of the Mel Frequency
features, that is, the Mel Log Spectrum and the
Mel Cepstrum. - In the following, we shall present the
distribution of MF Log Spectrum Coefficients and
MF Cepstral Coefficients for various types of
signals. - We shall also try to give a physical explanation
for the form of the distribution for each case.
9Stationary Noise
- This is a segment of car noise and its respective
spectrogram. - The signal looks fairly stationary in its mean
and variance, while the spectrogram shows that
its frequency components do not vary with time
either.
We shall proceed now to examine the distribution
of its Mel Frequency Features.
10Mel Log Spectrum
Below we can see the evolution with time of the
previous signals Mel Log Spectrum, the kurtosis
of its coefficients and some characteristic
distributions.
The coefficients follow almost a Gaussian
distribution.
1
5
16
20
Coefficients
11Mel Cepstrum
This is the evolution with time of the Mel
Cepstrum, the kurtosis of its coefficients and
some selected distributions.
The coefficients are again almost Gaussian. The
high kurtosis for 1 and 2 is due to a few
outliers.
1
2
12
15
Coefficients
12Non-Stationary Noise
We shall proceed now to examine how the
distributions vary in the case of Non-Stationary
noise. This is a segment of train noise, where
a number of amplitude fluctuations occurs due to
events as changing of rails and other trains
passing by.
13Mel Log Spectrum
The Mel Log Spectrum is now varying with time
reflecting the different sound events. The
kurtosis is also increasing for higher
coefficients.
The few first coefficients close to Gaussian but
the higher ones develop longer tails.
1
7
11
19
Coefficients
14Mel Cepstrum
The sound events are now reflected in the first
few Cepstrum coefficients.
Unlike the Log Spectrum the first coefficients
now have longer tails, while the higher tend to
Gaussian.
1
2
4
11
Coefficients
15Log Spectrum Distribution - Physical
Interpretation
The lower ML Spectrum coefficients represent the
lower frequencies of the spectrum where there is
always noise energy present. Thus, they assume
constant high values with not many fluctuations
that turn them close to Gaussian. Higher
coefficients assume high values only temporarily,
due to non stationary events. This results in
their distributions having longer tails. When
energy is present at high frequencies for
prolonged periods they can even be bimodal.
1
19
Coefficients
16Cepstrum Distribution - Physical Interpretation
The lower Cepstrum Coefficients reflect the
amplitude and envelope spectral fluctuations. As
both of these vary in non stationary signals so
do the lower MFCCs resulting in distributions
with long tails. Higher coefficients however,
convey mostly information about harmonic
components, not as dominant in the more broadband
like noise of trains and cars and definitely not
fast fluctuating.
1
11
Coefficients
17Modelling the Data
The previous analysis showed that the
distribution of Mel Log Spectrum and Mel Cepstrum
coefficients deviates from the normal especially
in the case of non-stationary noise, which is of
most interest. In our attempt to model
successfully the coefficients we used Gaussian
Mixture Models, which are capable of
approximating irregularly shaped
distributions. An algorithm that allows us to
fit mixtures of Gaussians into our data is the
Estimation Maximization algorithm.
18The Estimation Maximization Algorithm for
Gaussian Mixture Models
We assume the probabilistic model where W
e assume a latent random variable that
determines the distribution comes from. We
then find the expected value of the log
likelihood with respect to , given
and an initial guess of the parameters That
is
19The Estimation Maximization Algorithm for
Gaussian Mixture Models (II)
This was the Expectation step. In the
Maximization step we maximize the expected value
with respect to i.e. The two steps are
repeated until convergence. For an excellent
tutorial of EM see J. Bilmes, A Gentle Tutorial
of the EM Algorithm and its Application fir
Gaussian Mixture and Hidden Markov Models
20Fitting GMM to the Data
Single Gaussian
Two Gaussians
Here we present some results of fitting GMMs to
various distributions.
Three Gaussians
21Summary
- Today we have discussed about
- The distribution of the Mel Frequency Log
Spectral and Cepstral Coefficients. - The form this assumes in the presence of
non-stationary noise providing also a physical
explanation. - How it can be modeled with Gaussian Mixture
models via the EM algorithm. - And finally showed some results of fitting GMMs
into our data.
22Further Work
Examine the distribution of Mel Frequency
features for noisy speech and see how these are
altered by the presence of different noise types.
Construct Optimal Estimators for clean speech Mel
features, given the noisy ones and the noise
models.
Use HMMs with Gaussian Mixture Models for
accommodating the different noise states.