Title: Understand what is Noise
1Understand what is Noise
- Presenter Shih-Hsiang (??)
Spoken Language processing, Chapter Advanced
Digital Signal Processing and Noise Reduction,
Chapter 2Robustness Techniques for Speech
Recognition, Berlin Chen, 2004
2Introduction - Noise
- What is Noise?
- Unwanted signal that interferes with the
communication or measurement or processing of an
information-bearing signal - Noise is present in various degrees in almost all
environments - Noise can cause transmission errors and may even
disrupt a communication process - What kind of noises in the real world?
3Different kind of noises
Depending on its source
- Acoustic noise
- Emanated from moving, vibrating, colliding
sources - moving cars, air-condition, computer fans,
traffic, people talking in the background, wind,
rain - Electromagnetic noise
- Present at all frequencies ( from electric
devices) - radio, television transmitters and receivers
- Electrostatic noise
- Generated by the presence of a voltage with or
without current flow - fluorescent lighting
- Channel distortions, echo, and fading
- Non-ideal characteristics of communication
channel - radio channel
- Processing noise
- Results from the digital / analog processing of
signals
4Different kind of noises (cont.)
Depending on its frequency or time
characteristics
- Narrow band noise
- a noise process with a narrow bandwidth
- Band-limited white noise
- a noise with a flat spectrum and a limited
bandwidth that usually covers the limited
spectrum of the device - White Noise (theoretical concept)
- has the same power at all frequencies
- Coloured noise
- non-white noise or any wideband noise whose
spectrum has a non-flat shape - Pink noise, brown noise
- Impulsive noise
- consists of short-duration pulses of random
amplitude and random duration - Transient noise
- consists of relatively long duration noise pulses
5Color of Noise
- White Noise
- a signal with a flat frequency spectrum in linear
space - Pink Noise
- frequency spectrum of pink noise is flat in
logarithmic space - power density decreases 3dB per octave with
increasing frequency - Brown Noise
- power density decrease of 6 dB per octave with
increasing frequency - Blue Noise (Azure Noise)
- power density increases 3 dB per octave with
increasing frequency - Purple Noise (Violet Noise)
- power density increases 6 dB per octave with
increasing frequency - Gray Noise
- noise subjected to a psychoacoustic equal
loudness curve over a given range of frequencies
6Spectrogram
- The spectrogram shows the energy in a signal at
each frequency and at each time
Dark areas of spectrogram show high intensity
7White Noise
8Pink Noise
9Brown Noise
10Blue Noise
11Purple Noise
12Gray Noise
13Noises in Aurora 2
Babble
Airport
Exhibition
Car
14Noises in Aurora 2 (cont.)
Street
Restaurant
Train
Subway
15Noise in Speech Recognition
Time-Domain
Frequency-Domain
16Additive Noises / Convolutional Noises
- Additive noises can be stationary or
non-stationary - Stationary noises
- the power spectral density does not change over
time - the noises are also narrow-band noises
- such as computer fan, air conditioning, car noise
- Non-stationary noises
- the statistical properties change over time
- wide band noise
- machine gun, door slams, keyboard clicks,
radio/TV, and other speakers voices (babble
noise) - Convolutional noises (channel noises) are mainly
resulted from channel distortion and are
stationary for most cases - Reverberation, the frequency response of
microphone, transmission lines, etc
17Reconstruction of incomplete spectrograms for
robust speech recognition
Bhiksha Raj Ramakrishnan Ph.D. dissertation, ECE
Dept, CMU, Apr. 2000 Advisor Richard Stern
- Presenter Shih-Hsiang (??)
18Introduction
- The performance of ASR systems degrades greatly
when the speech has been corrupted by noise - Training the same level of noise?
- Two approaches to reduce the mismatch
- Data-compensation methods
- Classifier-compensation methods (Model Adaptation)
Training Data Distribution
Testing Data Distribution
no longer similar
19Introduction (cont.)
- The drawback of above approaches
- Most of them assume the noise is stationary
- The effect of the noise can be representable by a
linear transform of the parameters - Effective in the context of their intended
purposes - Human auditory system preferentially processes
the high-energy components of the speech signal
while suppressing the weaker components - Human are able to comprehend speech that has
undergone considerable spectral excision
20Introduction (cont.)
- Two new approaches be developed
- Multi-band based approaches Hermansky
- Different frequency bands of speech signals may
be corrupted at different SNRs. - Using divide-and-conquer
- deweighting noise bands
- Missing-feature approaches Cooke
- Low SNR regions are selectively erased or label
as unreliable - Performed on the basis of incomplete-data
21Introduction (cont.)
- The advantages of Missing-Features approaches
- Make no assumptions about the corrupting noise
- Do not need to have a knowledge about noise
- Remarkable robust to high levels of noise
corruption - Missing-feature methods
- Classifier modification methods
- Model the effect of the incompleteness of the
data - Spectrogram reconstruction methods
- Estimate the missing components of incomplete
spectrograms and reconstruct them
22Introduction (cont.)
- Classifier modification methods
- Spectrogram reconstruction methods ? Todays
topic
23Background InformationMultivariate Gaussian
Distribution
- When X(X1,, XL) is a L-dimensional random
vector, the multivariate Gaussian pdf has the
form - Conditional distributions
- If X1 conditional on X2 a is multivariate
normal
mean shift
regression coefficients
24Background InformationMultivariate Gaussian
Distribution (cont.)
observed data
missing data
25Background InformationMaximum A-Posteriori (MAP)
Estimation
In MAP estimation the missing data are estimated
to maximize their Likelihood, conditioned on the
value of the observed data
when
is Gaussian We get
26Background InformationMaximum A-Posteriori
Estimation (cont.)
Figure. The same Gaussian sliced at X 2. The
flat surface in the figure represents the
distribution of all vectors whose X component is
2. This distribution peaks at Y Y1. Thus Y1 is
the MAP estimate of Y when X is 2
Figure. The solid horizontal line shows the
observed value of X. The circle on the
intersection of the solid diagonal line, and the
dotted line, shows where the distribution of
vectors with X2 peaks. This is the MAP estimate
of Y when X2. The solid diagonal line shows how
the position of this peak varies at each value
of X.
Figure. Gaussian distribution of a 2 dimensional
random vector. The mean of the Gaussian is at
1,1. The X and Y components have covariance
1.0, and the covariance between X and Y is 0.5
27Background InformationSpectrogram
- It is a short pictorial representation of the
short-time periodogram
short-time Fourier transform
where
Px(l,?) represents the power in frequency ? at
time instant l in the signal
S(l,k) represents the kth component of the lth
log-spectral vector
28Background InformationSpectrogram (cont.)
- Wide-band spectrogramsshorter windows(lt10ms)
- have good time resolution
- Narrow-band spectrogramsLonger windows(gt20ms)
- the harmonics can be clearly seen
29Background InformationMEL Spectrogram
- Mel spectrogram consists of a sequence of log
mel-spectral vectors
Px(l,k) is the kth component of the mel spectrum
in the lth analysis window mk(j) is the jth DFT
coefficient of the impulse response of the kth
mel filter
The mel spectrogram consists of a sequence of
log-mel-spectral vectors and K is the total
number of mel filters
30Background InformationMEL Spectrogram (cont.)
31Background InformationEffect of noise on the
spectrogram
- When the speech signal is corrupted by additive
noise - If assume that the noise is uncorrelated to the
speech signal
time domain
frequency domain
spectrogram
mel-spectrogram
32Background InformationEffect of noise on the
spectrogram (cont.)
Region have been Deleted when a Local SNR less
then 0 dB
Speech be corrupted to 15db by additive
white noise
Speech be corrupted to 10db By additive
white noise
Region have been Deleted when a Local SNR less
then 0 dB
33Recognizing speech with incomplete spectrograms
Modify the manner in which the classifier, or
recognizer
- A speech recognition system is a statistical
pattern classifier - There are two possible approaches to handing
- Data imputation approach
- Marginalization approach
language model
acoustic model
decompose S into its observed and missing
component as SSo,Sm
Sm is not known and thereforce its likelihood
cannot be computed
34Spectrogram reconstruction methods for missing
data
Modify the manner in Data-compensate
- Estimating missing regions of incomplete
spectrograms to reconstruct complete spectrogram - Geometrical reconstruction methods
- Linear interpolation
- Nonlinear interpolation with polynomial function
- Cluster-based reconstruction methods
- Single cluster based reconstruction
- Multiple cluster based reconstruction
- Covariance-based reconstruction methods
35Spectrogram reconstruction methods for missing
data (cont.)Geometrical reconstruction methods
- Interpolating between adjacent observed elements
in the spectrogram to reconstruct a missing
element - adjacent along frequency axis
- adjacent along time axis
- The interpolation used could be
- simple linear interpolation
- use other higher-order functional forms such as
polynomials, rational functions, or spline
36Spectrogram reconstruction methods for missing
data (cont.)Geometrical reconstruction methods
- Linear interpolation
- Given any sequence of numbers s1,s2,,sM,
where the samples in the intervall1,l2 are
unknown or missing
l2
l1
37Spectrogram reconstruction methods for missing
data (cont.)Geometrical reconstruction methods
- Linear interpolation (cont)
- Linear along frequency
- Liner along time
s(l,k) lth spectral vector kth component in the
spectrogram
38Spectrogram reconstruction methods for missing
data (cont.)Geometrical reconstruction methods
- Nonlinear interpolation(1) with polynomial
functions - Lagranges formula give a set of L points on a
plane, (x1,y1), (x2,y2),, (xmym)
39Spectrogram reconstruction methods for missing
data (cont.)Geometrical reconstruction methods
- Nonlinear interpolation(1) with polynomial
functions (cont.) - Nonliner along frequency
- Nonliner along time
s(l,k) lth spectral vector kth component in the
spectrogram
40Spectrogram reconstruction methods for missing
data (cont.)Geometrical reconstruction methods
- Experimental results
- Using mean squared error (MSE) to measure the
accuracy of the reconstructed spectrogram - The greater the MSE, the greater the divergence
between the reconstructed and uncorrupted
spectrograms
True uncorrupted spectrogram
Reconstructed spectrogram
The number of missing elements in the spectrogram
41Spectrogram reconstruction methods for missing
data (cont.)Geometrical reconstruction methods
- Experimental results (cont.)
liner interpolation along time
randomly 50 deleted
liner interpolation along frequency
original
nonliner interpolation(1) along frequency
nonliner interpolation(1) along time
nonliner interpolation(2) along frequency
nonliner interpolation(2) along time
42Spectrogram reconstruction methods for missing
data (cont.)Geometrical reconstruction methods
- Experimental results (cont.)
MSE along time
MSE along frequency
Accuracy along time
Accuracy along frequency
43Spectrogram reconstruction methods for missing
data (cont.)Geometrical reconstruction methods
- Summary
- Linear interpolation estimation can be quite
effective - More detailed models are more likely to be
erroneous - Interpolation along time is generally more
effective than interpolation along frequency - Not enough frequency components
- Several drawbacks
- When the fraction of missing elements is very
high - there might not be sufficient information
remaining in the picture to reconstruct the
missing elements properly - If the observed elements in the spectrogram were
to be distorted, - all missing elements reconstructed on the basis
would also be distorted similarly
44Spectrogram reconstruction methods for missing
data (cont.)Cluster-based reconstruction methods
- Use the vector statistics of the spectral vector
for reconstruction of the complete spectrogram - Spectral vectors are assumed to be segregated
into a set of cluster
MAP estimaate for the missing component
complete component
observed component
45Spectrogram reconstruction methods for missing
data (cont.)Cluster-based reconstruction methods
- Single Cluster-based reconstruction methods
- The tth spectral vector in spectrogram denote
S(t) - Missing component of the tth spectral vector
denote Sm(t) - Observed component of the tth spectral vector
denote So(t) - S(t)AtSo(t),Sm(t), where At is the
permutation matrix
46Spectrogram reconstruction methods for missing
data (cont.)Cluster-based reconstruction methods
- Single Cluster-based reconstruction methods
(cont.) - Experimental results
?complete spectrogram
randomly 50 deleted
original
reconstructed
47Spectrogram reconstruction methods for missing
data (cont.)Cluster-based reconstruction methods
- Experimental results (cont.)
Accuracy
MSE
48Spectrogram reconstruction methods for missing
data (cont.)Cluster-based reconstruction methods
- Multiple Cluster-based reconstruction methods
- Two steps to estimate the missing portions of an
incomplete vector - Cluster membership of the vector
- Decide which cluster the vector belongs to
- Once the cluster membership of the vector is
established the distribution of that cluster is
used to obtain MAP estimates for the missing
components
49Spectrogram reconstruction methods for missing
data (cont.)Cluster-based reconstruction methods
- Multiple Cluster-based reconstruction methods
(cont.) - Step 1 Decide which cluster the vector belongs
to
The cluster membership
Kth cluster
priori probability,P(k)
negative of the log-likelihood
50Spectrogram reconstruction methods for missing
data (cont.)Cluster-based reconstruction methods
- Multiple Cluster-based reconstruction methods
(cont.) - Step 2 MAP estimates
- Experimental results (Oracle experimental upper
bound)
randomly 70 deleted
codebook512
original
codebook1
codebook8
codebook64
codebook size is the number of clusters used in
the representation
51Spectrogram reconstruction methods for missing
data (cont.)Cluster-based reconstruction methods
- Experimental results (cont.)
Accuracy
MSE
52Spectrogram reconstruction methods for missing
data (cont.)Cluster-based reconstruction methods
- Cluster Marginal Reconstruction Identifying
cluster membership based on observed components
alone - Because we have no knowledge about entire S(t)
when some components in S(t) are missing - Step 1 Decide which cluster the vector belongs
to
identify the cluster membership of the vector
based on The observed component of the vector
along
marginalization
53Spectrogram reconstruction methods for missing
data (cont.)Cluster-based reconstruction methods
- Cluster Marginal Reconstruction Identifying
cluster membership based on observed components
alone (cont.) - Experimental results
randomly 70 deleted
codebook512
original
codebook1
codebook8
codebook64
54Spectrogram reconstruction methods for missing
data (cont.)Cluster-based reconstruction methods
- Experimental results (cont.)
Wrongly identified cluster
MSE
Accuracy
55Spectrogram reconstruction methods for missing
data (cont.)Cluster-based reconstruction methods
- Summary
- Cluster based reconstruction methods can be very
effective in reconstructing missing regions of
spectrogram - When cluster memberships are identified based
only on the observed components, the result is
similar to single-cluster based reconstruction - Single Gaussian model for the distribution is a
good method
56Spectrogram reconstruction methods for missing
data (cont.)Covariance-based reconstruction
- Consider the sequence of spectral vectors that
constitute a spectrogram to be the output of a
Gaussian wide-sense stationary (WSS) random
process - The mean of the spectral vectors and covariances
between elements in the spectrogram are
independent of their position in the spectrogram - WSS gives us the following properties
- Mean is not depend on where it occurs
- Covariance between the component of two vector
depends only on the distance
57Spectrogram reconstruction methods for missing
data (cont.)Covariance-based reconstruction
apply WSS properties
58Spectrogram reconstruction methods for missing
data (cont.)Covariance-based reconstruction
apply ES(t,k)µ(k)
MAP estimation for Sm can be estimated
4 sec utterance has 400 frames Each spectral
vectors have 20 frequency components There 8000
components in all in the spectrogram
?computational cost very high
59Spectrogram reconstruction methods for missing
data (cont.)Covariance-based reconstruction
??
- Reconstructing missing elements individually
Let S(t,k) is an element of the vector of
missing component Com(t,k) is
cross-covariance between So and Sm
expected value of S(t,k)
not all components of So contribute equally to
estimate of S(t,k)
60Spectrogram reconstruction methods for missing
data (cont.)Covariance-based reconstruction
- Jointly reconstructing all missing elements in a
vector
The missing element vector for the second
spectral vector is constructed as
The neighborhood of observed vector for Sm(2)
The mean vector for So(2) and Sm(2)
The cross covariance between Sm(2) and So(2)
The autocovariance matrix of So(2)
The second spectral vector would be obtained as
61Spectrogram reconstruction methods for missing
data (cont.)Covariance-based reconstruction
original
randomly 90 deleted
individually
vector jointly
MSE
Accuracy
62Estimating the location of corrupt regions in
spectrograms
- Its a difficult task to estimate the reliable
and unreliable region - Use spectrographic mask to distinguish the region
- Binary information about every element in the
spectrogram - The ability of missing feature methods depends
critically on the accuracy of the spectrographic
masks used - False alarm reliable element declared as
unreliable - Miss unreliable element tagged as reliable
63Estimating the location of corrupt regions in
spectrograms (cont.)
- The recognition performance degrades very quickly
with increasing fraction of false alarms - The sensitivity of missing-feature methods to
misses is not so much
64Estimating the location of corrupt regions in
spectrograms (cont.)
- Using spectral subtraction
Typical values of ? and ß are 0.95 and 2
The initial portion of any utterance is assumed
to contain only noise
65Estimating the location of corrupt regions in
spectrograms (cont.)
spectrographic mask estimated using
spectral- subtraction for speech corrupted to 10
dB
oracle spectrographic
spectrographic masks estimated by
spectral- subtraction
oracle spectrographic
66Estimating the location of corrupt regions in
spectrograms (cont.)
- Using a bayesian classifier
classification vector
67Estimating the location of corrupt regions in
spectrograms (cont.)
spectrographic mask estimated using a classifier
for speech corrupted to 10 dB
oracle spectrographic
spectrographic masks estimated by a classifier
oracle spectrographic