Title: Speech Recognition Chapter 3
1Speech RecognitionChapter 3
2Speech Front-Ends
- Linear Prediction Analysis
- Linear-Prediction Based Processing
- Cepstral Analysis
- Auditory signal Processing
3Linear Prediction Analysis
- Introduction
- Linear Prediction Model
- Linear Prediction Coefficients Computation
- Linear Prediction for Automatic Speech
Recognition - Linear Prediction in Speech Processing
- How good is the LP Model.
4Signal Processing Front End
Convert the speech waveform in some type of
parametric representation.
sk
Filterbank
Signal Processing Front End
Linear Prediction Front End
Linear Prediction Coefficients
Oo(1)o(2)..o(T)
5Introduction
- In short intervals, it provides a good model of
the speech. - Mathematical precise and simple.
- Easy to implement in software or hardware.
- Works fine for recognition applications.
- It also has applications in formant and pitch
estimation, speech coding and synthesis.
6Linear Prediction Model
- Basic idea
- are called LP(Linear
Prediction) coefficients. - By including the excitation signal, we obtain
- where is the normalised excitation and
is the gain of the excitation.
7- In the z-domain (secc. 1.1.4, pp. 15, Deller)
- leading to the transfer function (Fig. 3.27)
8- LP model retains the spectral magnitude, but it
has a minimum phase (Sec. 1.1.7, Deller) feature. - However, in practice, phase is not very important
for speech perception.
Observation H(z) models the glottal
filter(G(z)) and the lips radiation(R(z).
9Linear Prediction Coefficients Computation
- Introduction
- Methogologies
10Linear Prediction Coefficients Computation
- LP coefficients can be obtained by solving the
next equation system (Secc. 3.3.2, Prove
)
11Methodologies
- Autocorrelation Method
- Covariance Method
- Not commonly used in Speech Recognition
12Autocorrelation Method
- Assumptions Each frame is independent (Fig. 3.29
). - Solution (Juang, secc. 3.3.3 pp105-106)
- where
-
(2)
These equations are know as Yule-Walker equations.
13 14Features
Symetric.
Diagonal elements are the same.
Toeplitz Matriz
15- This matrix is known as Toeplitz. A linear system
with this matrix can be solved very efficient. - Examples (Fig. 3.32 and 3.33 )
- Example (Fig. 3.34 )
- Example (Fig. 3.35 )
- Example (Fig. 3.36 )
16Linear Prediction for Automatic Speech Recogition
To minimise signal discontinuity
Flats the spectrum
equation (2) usually M8
Incorporate signal dynamics
to minimise noise sensitivity
To Cepstral Coefficients
Durbin Algorithm
17Preemphasis
- The transfer function of the glottis can be
modelled as follows - The radiation effect can be modelled as follows
18Hence, to obtain the transfer function of the
vocal tract the other pole must be cancelled as
follows.
19Preemphasis sould be done only for sonorant
sounds.
This process can be automated as follows.
where is the autocorrelation function.
20N samples size frame, M samples frame shift
N samples size frame, M samples frame shift
21- Minimize signal discontinuities at the edges of
the frames. - A typical window is the Hamming window.
22(No Transcript)
23LPC Analysis
- Converts the autocorrelations coefficients into
LPC parameter set. - LPC Parameter set
- LPC coefficients
- Reflection (PARCOR) coefficients
- log area ratio coefficients
- The formal method to obtain the LPC parameter set
is know as Durbins method.
24Durbins method
25(No Transcript)
26LPC (Typical values)
27LPC Parameter Conversion
- Conversion to Cepstral Coeficients.
- Robust feature set for speech recognition.
- Algorithm
28Parameter weighting
- low-order cepstral coefficents are highly
sensibles to noise
29Temporal Cepstral Derivative
- First or second order derivatives is enough.
- It can be aproximated as follows
30(No Transcript)
31(No Transcript)
32Given
33Hamming Windowed
Large prediction errors since speech is
predicted form previous samples arbitray set to
zero.
34Large prediction errors since speech is
predicted form previous samples arbitray set to
zero.
35Unvoiced signals are not position sensitive. It
does not show special effect at the edges.
36Observe the whitening phenomena at the error
spectrum.
37Observe the whitening phenomena at the
error specturm
38Observe the error wave periodicity behaviour
taken as bases for the Pitch Estimators.
39- Observe that a sharp decrease
- in the prediction error is obtain
- for small M value (M1...4).
- Observe that unvoiced signal
- has higher RMS error.
40Observe the all-pole model ability to match the
spectrum.
41Linear Prediction in Speech Processing
- LPC for Vocal Tract Shape Estimation
- LPC for Pitch Detection
- LPC for Formant prediction
42LPC for Vocal Tract Shape Estimation
To minimise signal discontinuity
Free of glottis and radiation effects
Vocal Tract Shape Estimation
Parameter Calculation
to minimise noise sensitivity
To Cepstral Coefficients
43Parameter Calculation
- Durbins Method (As in Speech Recognition)
- In case, this method is used, first the
autocorrelation analysis should be performed. - Lattice Filter
44Lattice Filter
- The reflection coefficients are obtain directly
form the signal, avoiding the autocorrelation
analysis. - Methods
- Itakura-Saito (Parcor)
- Burg
- New forms
- Advantage
- Easier to implement in Hardware
- Disadvantage
- needs around 5 times more calculation.
45Itakura-Saito (PARCOR)
where
Accumulates over time (n).
It can be shown that the PARCOR coefficients,
obtain for the Itakura-Saito method are exactly
the same as the reflection coefficients obtained
by the Levison Durbin algorithm.
Example
46Burg
where
Example
47Example
Itakura-Saito
Burg
48New Forms
- Stroback, New forms of Levinson and Schur
algorithms, IEEE Signal Processing Magazine, pp.
12-36, 1991.
49Vocal Tract Shape Estimation
From
We obtain
Therefore, by setting the the lips area to an
arbitrary value we can obtain the vocal tract
configuration relative to the initial condition.
This technique as been succesfully used to train
deaf persons.
50LPC for Pitch Detection
Speech Sampled at 10KHz
Inverse Filering A(z)
LPF 800Hz
DownSampler 51
Peak finding
Autocorrelation
LPC Analysis
V/U decision or Pitch
51LPC for Formant Detection
Sampled Speech
Formants
Peak finding
LPC Spectrum
Emphasis Peaks (second derivative)
LPC Analysis
52LPC Spectrum
- LP assumes that the vocal tract system can be
modelled with an all-pole system - The spectrum can be obtain by
- In order to emphasis formant peaks we can set
53Therefore
Spectrum (DTFT)
Spectrum (DFT)
In order to increase the spectral resolution we
pad with zeros
In order to use an FFT algorithm
54Caclulate the Spectral magnitude(DFT)
Invert the Spectral magnitude(DFT)
This spectrum is called the LPC Spectrum.
55How good is the LP Model
- As shown by the physiological analysis of the
vocal tract the speech model is as follows - However, it can be shown ( ), that LP Model
is good for estimating the magnitude of pole-zero
system.
56Prove
- According to lema 1 ( ) and lema 2 ( ) ,
can be written as follows - The estimates are calculated such that it
correspond to the of this model.
All pass component
57- Since hence
- therefore, if the estimators, are exacts, then
at least we obtain a model with a correct
magnitude.
58Lema 1
- Lema 1(System Decomposition)
- Any causal ration system
- can be descomponed as (prove )
Minimal phase component
59Prove
For two poles and two zeros
Lets define
Re-arranging this equation
60With the knowledge that
Hence
61Therefore
End of prove.
62Lema 2
- Lema 2 Minimum phase component can be expresed
as an all-pole system - in theory goes to infinity, in practice is
limited.
63Linear Prediction Based Procesing
- Critics to the Linear Prediction Model
- Perceptual Linear Prediction (PLP)
- LP Cepstra
64Critics to the Linear Prediction Model
- The LP spectrum approximate the speech spectrum
equally well at all frequencies of the analysis
band. - This property is inconsistent with the human
hearing.
65Precepual Linear Prediction (PLP)
Critical Band Spectral Analysis
Equal Loudness Pre-emphasis
Intensity Loudness
IDFT
Yule-Walker Equations Solutions
66Critical Band Analysis
Speech Signal Frame
Critical Band Spectral Resolution
Short-Term Spectra
Windowing
DFT (20 ms) (200 samples 56 zeros for padding for
Ts10KHz)c
DFT (20 ms Hamming Window
67Critical-Band Spectral Resolution
Frequency Warping (Hertz -gt Barks)
Convolution and Downsampling
filter-bank masking curve approximation
68Equal Loudness Pre-emphasis
Approximate the non-equal sensitivity of the
human hearing at different frequencies.
69Intensitive Loudnes Power Law
Approximate the non-linear relation between the
intensity of sound and its perceived loudness.
70Cepstral Analysis
- Introduction
- Homomorphic Processing
- Cepstral Spectrum
- Cepstrum
- Mel-Cepstrum
- Cepstrum in Speech Processing
71Introduction
When speech is pre-emphasised
The excitation is not necessary for estimate the
vocal tract function.
Therefore, it is desirable to separate the
excitation information form the vocal tract
information.
72We can think the speech spectrum as a signal, we
can observer that is composed for the
multiplication of a slow signal, and a
fast signal, .
Therefore, we can try to obtain the best of this
knowledge. The formal technique which exploit
this feature is called Homomorphic Processing.
73Homomorphic Processing
- It is a technique to filter no-lineal systems.
- In Homomorphic Processing the non-linear related
signals are transform the signal to a linear
domain.
F(z)
H
H-1
74In order to obtain a linear system a complex log
transformation is applied to the speech spectrum.
S(z)
log
exp
75Cepstral Spectrum
Definition.
where
is the STFT
76Cepstrum
Definition.
77Cepstrum In Speech Processing
- Pitch Estimation
- Format Estimation
- Pitch and Formant Estimation
78Pitch Estimation
Sampled Speech
Peak finding
High-Pass Liftering
Emphasis Peaks (second derivative)
Cepstrum
Pitch
79Formant Estimation
Sampled Speech
Peak finding
Low-Pass Liftering
Emphasis Peaks (second derivative)
Cepstrum
Formants
80Pitch and Formant Estimation
Sampled Speech
Peak finding
High-Pass Liftering
Emphasis Peaks (second derivative)
Cepstrum
Pitch
Peak finding
Low-Pass Liftering
Emphasis Peaks (second derivative)
Formants