Modeling of Mel Frequency Features for Non Stationary Noise PowerPoint PPT Presentation

presentation player overlay
1 / 22
About This Presentation
Transcript and Presenter's Notes

Title: Modeling of Mel Frequency Features for Non Stationary Noise


1
Modeling of Mel Frequency Features for Non
Stationary Noise
I.Andrianakis P.R.White
Signal Processing and Control Group Institute of
Sound and Vibration Research University of
Southampton
2
Outline
  • Introduction.
  • Mel Frequency Log Spectrum and Cepstrum.
  • Distribution of the MFLS and MFC coefficients.
  • Physical Interpretation of the distributions.
  • Modeling of data with Gaussian Mixture Models and
    the EM algorithm.
  • Results.
  • Summary Further work.

3
Introduction
  • When working with speech or noise, often one
    wishes to extract some salient features of the
    signals so that instead of working with the
    whole data set to concentrate on a smaller set
    that conveys most significant information.
  • Such features are the Mel Frequency Log Spectral
    and Cepstral Coefficients.
  • Their favourable property is that they focus
    mostly on low frequency components, where most of
    the car or train noise energy exists, while
    compacting the usually lower energy - higher
    frequencies.
  • We shall present some results from our research
    on the application of MFLSCs and MFCCs to noise
    signals and their modelling with Gaussian Mixture
    Models.

4
Mel Frequency Log Spectrum and Cepstrum

Mel Frequency Filter Banks
Noise
STFT
.2
Log( . )
DCT( . )
Mel Frequency Cepstrum
Mel Frequency Log Spectrum
5
Rationale Behind the Use of Mel Frequency
Features

Mel frequency warping focuses in low
frequencies (lt1Khz) where the filter bank
spacing is linear. Energy above 1KHz is
compacted as the filters have logarithmically
increasing pass bands. Suitable for
representing ambient noise (i.e. in cars and
trains) because the energy is concentrated in the
lower frequencies.
6
Rationale Behind the Use of Mel Frequency
Features (II)

Filter banks are closely spaced where the
signals energy is higher.
7
Comparison With LPC
13 LPC Spectrum
20 Mel Spectrum

PSD
Train
Car
Frequency Hz
8
Distribution of the Mel Frequency Coefficients
  • We are concerned with the form of the probability
    distribution of the Mel Frequency
    features, that is, the Mel Log Spectrum and the
    Mel Cepstrum.
  • In the following, we shall present the
    distribution of MF Log Spectrum Coefficients and
    MF Cepstral Coefficients for various types of
    signals.
  • We shall also try to give a physical explanation
    for the form of the distribution for each case.

9
Stationary Noise
  • This is a segment of car noise and its respective
    spectrogram.
  • The signal looks fairly stationary in its mean
    and variance, while the spectrogram shows that
    its frequency components do not vary with time
    either.


We shall proceed now to examine the distribution
of its Mel Frequency Features.
10
Mel Log Spectrum
Below we can see the evolution with time of the
previous signals Mel Log Spectrum, the kurtosis
of its coefficients and some characteristic
distributions.

The coefficients follow almost a Gaussian
distribution.
1
5
16
20
Coefficients
11
Mel Cepstrum
This is the evolution with time of the Mel
Cepstrum, the kurtosis of its coefficients and
some selected distributions.

The coefficients are again almost Gaussian. The
high kurtosis for 1 and 2 is due to a few
outliers.
1
2
12
15
Coefficients
12
Non-Stationary Noise
We shall proceed now to examine how the
distributions vary in the case of Non-Stationary
noise. This is a segment of train noise, where
a number of amplitude fluctuations occurs due to
events as changing of rails and other trains
passing by.

13
Mel Log Spectrum
The Mel Log Spectrum is now varying with time
reflecting the different sound events. The
kurtosis is also increasing for higher
coefficients.

The few first coefficients close to Gaussian but
the higher ones develop longer tails.
1
7
11
19
Coefficients
14
Mel Cepstrum
The sound events are now reflected in the first
few Cepstrum coefficients.

Unlike the Log Spectrum the first coefficients
now have longer tails, while the higher tend to
Gaussian.
1
2
4
11
Coefficients
15
Log Spectrum Distribution - Physical
Interpretation

The lower ML Spectrum coefficients represent the
lower frequencies of the spectrum where there is
always noise energy present. Thus, they assume
constant high values with not many fluctuations
that turn them close to Gaussian. Higher
coefficients assume high values only temporarily,
due to non stationary events. This results in
their distributions having longer tails. When
energy is present at high frequencies for
prolonged periods they can even be bimodal.
1
19
Coefficients
16
Cepstrum Distribution - Physical Interpretation

The lower Cepstrum Coefficients reflect the
amplitude and envelope spectral fluctuations. As
both of these vary in non stationary signals so
do the lower MFCCs resulting in distributions
with long tails. Higher coefficients however,
convey mostly information about harmonic
components, not as dominant in the more broadband
like noise of trains and cars and definitely not
fast fluctuating.
1
11
Coefficients
17
Modelling the Data

The previous analysis showed that the
distribution of Mel Log Spectrum and Mel Cepstrum
coefficients deviates from the normal especially
in the case of non-stationary noise, which is of
most interest. In our attempt to model
successfully the coefficients we used Gaussian
Mixture Models, which are capable of
approximating irregularly shaped
distributions. An algorithm that allows us to
fit mixtures of Gaussians into our data is the
Estimation Maximization algorithm.
18
The Estimation Maximization Algorithm for
Gaussian Mixture Models
We assume the probabilistic model where W
e assume a latent random variable that
determines the distribution comes from. We
then find the expected value of the log
likelihood with respect to , given
and an initial guess of the parameters That
is
19
The Estimation Maximization Algorithm for
Gaussian Mixture Models (II)
This was the Expectation step. In the
Maximization step we maximize the expected value
with respect to i.e. The two steps are
repeated until convergence. For an excellent
tutorial of EM see J. Bilmes, A Gentle Tutorial
of the EM Algorithm and its Application fir
Gaussian Mixture and Hidden Markov Models
20
Fitting GMM to the Data
Single Gaussian
Two Gaussians
Here we present some results of fitting GMMs to
various distributions.
Three Gaussians
21
Summary
  • Today we have discussed about
  • The distribution of the Mel Frequency Log
    Spectral and Cepstral Coefficients.
  • The form this assumes in the presence of
    non-stationary noise providing also a physical
    explanation.
  • How it can be modeled with Gaussian Mixture
    models via the EM algorithm.
  • And finally showed some results of fitting GMMs
    into our data.

22
Further Work
Examine the distribution of Mel Frequency
features for noisy speech and see how these are
altered by the presence of different noise types.
Construct Optimal Estimators for clean speech Mel
features, given the noisy ones and the noise
models.
Use HMMs with Gaussian Mixture Models for
accommodating the different noise states.
Write a Comment
User Comments (0)
About PowerShow.com