Speech Signal Representations I - PowerPoint PPT Presentation

About This Presentation

Title:

Speech Signal Representations I

Description:

Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage Speech Signal Representations I Decomposition of the speech signal (x[n]) as a source (e ... – PowerPoint PPT presentation

Number of Views:70

Avg rating:3.0/5.0

Slides: 36

Provided by: FRVer3

Category:

more less

Transcript and Presenter's Notes

Title: Speech Signal Representations I

1
Speech Signal Representations I

Seminar Speech Recognition 2002
F.R. Verhage

2
Speech Signal Representations I

Decomposition of the speech signal (xn) as a
source (en) passed through a linear
time-varying filter (hn).

3
Speech Signal Representations I

Estimation of the filter, inspired by
Speech production models
Linear Predictive Coding (LPC)
Cepstral analysis
Speech perception models (part II)
Mel-frequency cepstrum
Perceptual Linaer Prediction (PLP)
Speech recognizers estimate filter
characteristics and ignore the source

4
Speech Signal Representations IShort-Time
Fourier Analysis

Spectrogram
Representation of a signal highlighting several
of its properties based on short-time Fourier
analysis
Two dimensional time horizontal and frequency
vertical
Third dimension gray or color level indicating
energy

5
Speech Signal Representations IShort-Time
Fourier Analysis

Spectrogram
Narrow band
Long windows (gt 20 ms) ?
Narrow bandwidth
Lower time resolution, better frequency
resolution
Wide band
Short windows ( lt10 ms) ?
Wide bandwidth
Good time resolution, lower frequency resolution
Pitch synchronous
Requires knowledge of local pitch period

6
Speech Signal Representations IShort-Time
Fourier Analysis

Spectrogram

7
Speech Signal Representations IShort-Time
Fourier Analysis

Window analysis
Series of short segments, analysis frames
Short enough so that the signal is stationary
Usually constant, 20-30 ms
Overlaps possible
Different types of window functions (wmn)
Rectangular (equal to no window function)
Hamming
Hanning

8
Speech Signal Representations IShort-Time
Fourier Analysis

Window analysis
Window size must be long enough
Rectangular N M
Hamming, Hanning N 2M
Pitch period not known in advance ?
Prepare for lowest pitch period ?
At least 20ms for rectangular or 40ms for
Hamming/Hanning (50Hz)
But longer windows give a more average spectrum
instead of distinct spectra ?
Rectangular window has better time resolution

9
Speech Signal Representations IShort-Time
Fourier Analysis
10
Speech Signal Representations IShort-Time
Fourier Analysis
11
Speech Signal Representations IShort-Time
Fourier Analysis
12
Speech Signal Representations IShort-Time
Fourier Analysis
13
Speech Signal Representations IShort-Time
Fourier Analysis
14
Speech Signal Representations IShort-Time
Fourier Analysis
15
Speech Signal Representations IShort-Time
Fourier Analysis
16
Speech Signal Representations IShort-Time
Fourier Analysis

Window analysis
Frequency response not completely zero outside
main lobe ? Spectral leakage
Second lobe of a Hamming window is approx. 43dB
below main lobe ? less spectral leakage
Hamming, Hanning, triangular windows offer less
spectral leakage ?
Rectangular windows are rarely used despite their
better time resolution

17
Speech Signal Representations IShort-Time
Fourier Analysis
18
Speech Signal Representations IShort-Time
Fourier Analysis
19
Speech Signal Representations IShort-Time
Fourier Analysis
20
Speech Signal Representations IShort-Time
Fourier Analysis
21
Speech Signal Representations IShort-Time
Fourier Analysis

Short-time spectrum of male voice speech
Time signal /ah/local pitch 110Hz
30ms rectangularwindow
15ms rectangular window
30ms Hammingwindow
15ms Hammingwindow

22
Speech Signal Representations IShort-Time
Fourier Analysis

Short-time spectrum of female voice speech
Time signal /aa/local pitch 200Hz
30ms rectangularwindow
15ms rectangular window
30ms Hammingwindow
15ms Hammingwindow

23
Speech Signal Representations IShort-Time
Fourier Analysis

Short-time spectrum of unvoiced speech
Time signal
30ms rectangularwindow
15ms rectangular window
30ms Hammingwindow
15ms Hammingwindow

24
Speech Signal Representations ILinear
Predictive Coding

LPC a.k.a. auto-regressive (AR) modeling
All-pole filter is good approximation of speech,
with p as the order of the LPC analysis
Predicts current sample as linear combination of
past p samples

25
Speech Signal Representations ILinear
Predictive Coding

To estimate predictor coefficients (ak), use
short-term analysis technique
Per segment, minimize the total prediction error
by calculating the minimum squared error
Take the derivative, equate it to 0 expressed as
a set of p linear equationsthe Yule-Walker
equations

26
Speech Signal Representations ILinear
Predictive Coding

Solution of the Yule-Walker equations
Any standard matrix inversion package
Due to the special form of the matrix, efficient
solutions
Covariance methodusing the Cholesky
decomposition
Autocorrelation methodusing windows, results in
equations with Toeplitz matrices, solved by the
Durbin recursion algorithm
Lattice methodequivalent to Levinson Durbin
recursionoften used in fixed-point
implementations because lack of precision doesnt
result in unstable filters

27
Speech Signal Representations I Linear
Predictive Coding
28
Speech Signal Representations I Linear
Predictive Coding
29
Speech Signal Representations ILinear
Predictive Coding

Spectral analysis via LPC
All-pole (IIR) filter
Peaks at the roots of the denominator

30
Speech Signal Representations ILinear
Predictive Coding

Prediction error
Should be (approximately) the excitation
Unvoiced speech, expect white noise OK
Voiced speech, expect impulse train NOK
All-pole assumption not altogether valid
Real speech not perfectly periodic
Pitch synchronous analysis gives better results
LPC order
Larger p gives lower prediction errors
Too large a p results in fitting the individual
harmonics ?separation between filter and source
will not be so good

31
Speech Signal Representations ILinear
Predictive Coding