Signal Processing And Analysis Methods For Speech Recognition - PowerPoint PPT Presentation

1 / 61

About This Presentation

Title:

Signal Processing And Analysis Methods For Speech Recognition

Description:

Signal Processing And Analysis Methods For Speech Recognition – PowerPoint PPT presentation

Number of Views:426

Avg rating:3.0/5.0

Slides: 62

Provided by: thak5

Category:

more less

Transcript and Presenter's Notes

Title: Signal Processing And Analysis Methods For Speech Recognition

1
Signal Processing And Analysis Methods For Speech
Recognition
2
Introduction

Spectral analysis is the process of defining the
speech in different parameters for further
processing
Eg short term energy, zero crossing rates, level
crossing rates and so on
Methods for spectral analysis are therefore
considered as core of the signal processing front
end in a speech recognition system

3
Spectral Analysis methods

Two methods
The Filter Bank spectrum
The Linear Predictive coding (LPC)

4
Spectral Analysis models

Pattern recognition model
Acoustic phonetic model

5
Spectral Analysis Model
Parameter measurement is common in both the
systems
6
Pattern recognition Model

The three basic steps in pattern recognition
model are
1. parameter measurement
2. pattern comparison
3. decision making

7
1. Parameter measurement

To represent the relevant acoustic events in
speech signal in terms of compact efficient set
of speech parameters
The choice of which parameters to use is dictated
by other consideration
eg
computational efficiency,
type of Implementation ,
available memory
The way in which representation is computed is
based on signal processing considerations

8
Acoustic phonetic Model
9
Spectral Analysis

Two methods
The Filter Bank spectrum
The Linear Predictive coding (LPC)

10
The Filter Bank spectrum
Spectral representation
Digital i/p
The band pass filters coverage spans the
frequency range of interest in the signal
11
1.The Bank of Filters Front end Processor

One of the most common approaches for processing
the speech signal is the bank-of-filters model
This method takes a speech signal as input and
passes it through a set of filters in order to
obtain the spectral representation of each
frequency band of interest.

Eg
100-3000 Hz for telephone quality signal
100-8000 Hz for broadband signal
The individual filters generally do overlap in
frequency
The output of the ith bandpass filter
where Wi is the normalized frequency

Each bandpass filter processes the speech signal
independently to produce the spectral
representation Xn

14
The Bank of Filters Front end Processor
15
The Bank of Filters Front end Processor
The sampled speech signal, s(n), is passed
through a bank of Q Band pass filters, giving the
signals
16
The Bank of Filters Front end Processor

The bank-of-filters approach obtains the energy
value of the speech signal considering the
following steps
Signal enhancement and noise elimination.- To
make the speech signal more evident to the bank
of filters.
Set of bandpass filters.- Separate the signal in
frequency bands. (uniform/non uniform filters )

Nonlinearity.- The filtered signal at every band
is passed through a non linear function (for
example a wave rectifier full wave or half wave)
for shifting the bandpass spectrum to the
low-frequency band.

18
The Bank of Filters Front end Processor

Low pass filter.- This filter eliminates the
high-frequency generated by the non linear
function.
Sampling rate reduction and amplitude
compression.- The resulting signals are now
represented in a more economic way by re-sampling
with a reduced rate and compressing the signal
dynamic range.

The role of the final lowpass filter is to
eliminate the undesired spectral peaks
19
The Bank of Filters Front end Processor
Assume that the output of the ith bandpass filter
is a pure sinusoid at frequency ?I If full
wave rectifier is used as the nonlinearity
20
The Bank of Filters Front end Processor

21
Types of Filter Bank Used For Speech Recognition

uniform filter bank
Non uniform filter bank

22
uniform filter bank

The most common filter bank is the uniform filter
bank
The center frequency, fi, of the ith bandpass
filter is defined as
Q is number of filters used in bank of filters

23
uniform filter bank

The actual number of filters used in the filter
bank
bi is the bandwidth of the ith filter
There should not be any frequency overlap between
adjacent filter channels

24
uniform filter bank

If bi lt Fs/N, then the certain portions of the
speech spectrum would be missing from the
analysis and the resulting speech spectrum would
not be considered very meaningful

25
nonuniform filter bank

Alternative to uniform filter bank is nonuniform
filter bank
The criterion is to space the filters uniformly
along a logarithmic frequency scale.
For a set of Q bandpass filters with center
frequncies fi and bandwidths bi, 1iQ, we set

26
nonuniform filter bank
27

The most commonly used values of a2
This gives an octave band spacing adjacent
filters
And a4/3 gives 1/3 octave filter spacing

28
Implementations of Filter Banks

Depending on the method of designing the filter
bank can be implemented in various ways.
Design methods for digital filters fall into two
classes
Infinite impulse response (IIR) (recursive
filters)
Finite impulse response

The FIR filter (finite impulse response) or
non recursive filter
The present output is depend on the present input
sample and previous input samples
The impulse response is restricted to finite
number of samples

Advantages
Stable, noise less sever
Excellent design methods are available for
various kinds of FIR filters
Phase response is linear
Disadvantage
Costly to implement
Memory requirement and execution time are high
Require powerful computational facilities

The IIR filter (Infinite impulse response) or
recursive filter
The present output sample is depends on the
present input, past input samples and output
samples
The impulse response extends over an infinite
duration

Advantage
Simple to design
Efficient
Disadvantage
Phase response is non linear
Noise affects more
Not stable

33
FIR Filters
34
FIR Filters

Less expensive implementation can be derived by
representing each bandpass filter by a fixed low
pass window ?(n) modulated by the complex
exponential

35
Frequency Domain Interpretation For Short Term
Fourier Transform
A
At nn0
Where FT. denotes Fourier Transform Sn0(ej?i)
is the conventional Fourier transform of the
windowed signal, s(m)w(n0-m), evaluated at the
frequency ? ?i
36
Frequency Domain Interpretation For Short Term
Fourier Transform
Shows which part of s(m) are used in the
computation of the short time Fourier transform
37
Frequency Domain Interpretation For Short Term
Fourier Transform

Since w(m) is an FIR filter with size L then from
the definition of Sn(ej?i) we can state that
If L is large, relative to the signal periodicity
then Sn(ej?i) gives good frequency resolution
If L is small, relative to the signal periodicity
then Sn(ej?i) gives poor frequency resolution

38
Frequency Domain Interpretation For Short Term
Fourier Transform
For L500 points Hamming window is applied to a
section of voiced speech. The periodicity of
the signal is seen in the windowed time waveform
as well as in the short time spectrum in
which the fundamental frequency and its harmonics
show up as narrow peaks at equally spaced
frequencies.
39
Frequency Domain Interpretation For Short Term
Fourier Transform
For short windows, the time sequence s(m)w(n-m)
doesnt show the signal periodicity, nor does
the signal spectrum. It shows the broad spectral
envelop very well.
40
Frequency Domain Interpretation For Short Term
Fourier Transform
Shows irregular series of local peaks and
valleys due to the random nature of the unvoiced
speech
41
Frequency Domain Interpretation For Short Term
Fourier Transform
Using the shorter window smoothes out the random
fluctuations in the short time spectral
magnitude and shows the broad spectral envelope
very well
42
Linear Filtering Interpretation of the short-time
Fourier Transform

The linear filtering interpretation of the short
time Fourier Transform
i.e Sn(ejwi) is a convolution of the low pass
window, w(n), with the speech signal, s(n),
modulated to the center frequency wi

From A

43
FFT Implementation of Uniform Filter Bank Based
on the Short-Time FT
44
FFT Implementation of Uniform Filter Bank Based
on the Short-Time FT
45
FFT Implementation of Uniform Filter Bank Based
on The Short Time FT

The FFT implementation is more efficient than
the direct form structure

46
Nonuniform FIR Filter Bank Implementations
The most general form of a nonuniform FIR filter
bank
47
Nonuniform FIR Filter Bank Implementations

The kth bandpass filter impulse response, hk(n),
represents a filter with a center frequency ?k,
and bandwidth ??k.
The set of Q bandpass filters covers the
frequency range of interest for the intended
speech recognition application

48
Nonuniform FIR Filter Bank Implementations

Each band pass filter is implemented via a direct
convolution
Each band pass filter is designed via the
windowing design method
The composite frequency response of the Q-channel
filter bank is independent of the number and
distribution of the individual filters

49
Nonuniform FIR Filter Bank Implementations
A filter bank with the three filters has the
exact same composite frequency response as the
filter bank with the seven filters shown in
figure above
50
Nonuniform FIR Filter Bank Implementations

The impulse response of the kth bandpass filter
The frequency response of the kth bandpass filter

Impulse response of ideal band pass filer
FIR window

51
Nonuniform FIR Filter Bank Implementations

Thus the frequency response of the composite
filter bank

1

52
Nonuniform FIR Filter Bank Implementations

Where wmin is the lowest frequency in the filter
bank and wmax is the highest frequency
Equation 1 can be written as
Which is independent of the number of ideal
filters, Q, and their distribution in the
frequency

53
FFT-Based Nonuniform Filter Banks

By combining two or more uniform channels the
nonuniformity can be created
Consider taking an N-point DFT of the sequence
x(n)

54
FFT-Based Nonuniform Filter Banks

The equivalent kth channel value, Xk can be
obtained by weighing the sequence, x(n) by the
complex sequence 2 exp(-j (?n/N))cos(?n/N).
If more than two channels are combined, then a
different equivalent weighing sequence results

55
Tree Structure Realizations of Nonuniform Filter
Banks

In this method the speech signal is filtered in
the stages, and the sampling rate is successively
reduced at each stage

56
Tree Structure Realizations of Nonuniform Filter
Banks
57
Tree Structure Realizations of Nonuniform Filter
Banks

The original speech signal, s(n), is filtered
initially into two bands, a low band and a high
band
The high band is down sampled by 2 and represents
the highest octave band (?/2? ?) of the filter
bank.
The low band is similarly down sampled by 2 and
fed into second filtering stage in which the
signal is again split into two equal bands.
Again the high band of the stage 2 is down
sampled by 2 and is used as a next highest filter
bank output.

58
Tree Structure Realizations of Nonuniform Filter
Banks

The low band is also down sampled by 2 and fed
into a third stage of filters
These third stage output after down sampling by
factor 2, are used as the two lowest filter bands

59
Summary of considerations for speech recognition
filter banks

1st. Type of digital filter used (IIR (recursive)
or FIR (nonrecursive))
IIR Advantage simple to implement and
efficient.
Disadvantage phase response is nonlinear
FIR Advantage phase response is linear
Disadvantage expensive in implementation

60
Summary of considerations for speech recognition
filter banks

2nd. The number of filters to be used in the
filter bank.
For uniform filter banks the number of filters,
Q, can not be too small or else the ability of
the filter bank to resolve the speech spectrum is
greatly damaged. The value of Q less than 8 are
generally avoided
The value of Q can not be too large, because the
filter bandwidths would eventually be too narrow
for some talker (eg. High-pitch females) i.e no
prominent harmonics would fall within the band.
(in practical systems the value of Q32).

61
Summary of considerations for speech recognition
filter banks