Speech Signal Representations I - PowerPoint PPT Presentation

About This Presentation
Title:

Speech Signal Representations I

Description:

Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage Speech Signal Representations I Decomposition of the speech signal (x[n]) as a source (e ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 36
Provided by: FRVer3
Category:

less

Transcript and Presenter's Notes

Title: Speech Signal Representations I


1
Speech Signal Representations I
  • Seminar Speech Recognition 2002
  • F.R. Verhage

2
Speech Signal Representations I
  • Decomposition of the speech signal (xn) as a
    source (en) passed through a linear
    time-varying filter (hn).

3
Speech Signal Representations I
  • Estimation of the filter, inspired by
  • Speech production models
  • Linear Predictive Coding (LPC)
  • Cepstral analysis
  • Speech perception models (part II)
  • Mel-frequency cepstrum
  • Perceptual Linaer Prediction (PLP)
  • Speech recognizers estimate filter
    characteristics and ignore the source

4
Speech Signal Representations IShort-Time
Fourier Analysis
  • Spectrogram
  • Representation of a signal highlighting several
    of its properties based on short-time Fourier
    analysis
  • Two dimensional time horizontal and frequency
    vertical
  • Third dimension gray or color level indicating
    energy

5
Speech Signal Representations IShort-Time
Fourier Analysis
  • Spectrogram
  • Narrow band
  • Long windows (gt 20 ms) ?
  • Narrow bandwidth
  • Lower time resolution, better frequency
    resolution
  • Wide band
  • Short windows ( lt10 ms) ?
  • Wide bandwidth
  • Good time resolution, lower frequency resolution
  • Pitch synchronous
  • Requires knowledge of local pitch period

6
Speech Signal Representations IShort-Time
Fourier Analysis
  • Spectrogram

7
Speech Signal Representations IShort-Time
Fourier Analysis
  • Window analysis
  • Series of short segments, analysis frames
  • Short enough so that the signal is stationary
  • Usually constant, 20-30 ms
  • Overlaps possible
  • Different types of window functions (wmn)
  • Rectangular (equal to no window function)
  • Hamming
  • Hanning

8
Speech Signal Representations IShort-Time
Fourier Analysis
  • Window analysis
  • Window size must be long enough
  • Rectangular N M
  • Hamming, Hanning N 2M
  • Pitch period not known in advance ?
  • Prepare for lowest pitch period ?
  • At least 20ms for rectangular or 40ms for
    Hamming/Hanning (50Hz)
  • But longer windows give a more average spectrum
    instead of distinct spectra ?
  • Rectangular window has better time resolution

9
Speech Signal Representations IShort-Time
Fourier Analysis
10
Speech Signal Representations IShort-Time
Fourier Analysis
11
Speech Signal Representations IShort-Time
Fourier Analysis
12
Speech Signal Representations IShort-Time
Fourier Analysis
13
Speech Signal Representations IShort-Time
Fourier Analysis
14
Speech Signal Representations IShort-Time
Fourier Analysis
15
Speech Signal Representations IShort-Time
Fourier Analysis
16
Speech Signal Representations IShort-Time
Fourier Analysis
  • Window analysis
  • Frequency response not completely zero outside
    main lobe ? Spectral leakage
  • Second lobe of a Hamming window is approx. 43dB
    below main lobe ? less spectral leakage
  • Hamming, Hanning, triangular windows offer less
    spectral leakage ?
  • Rectangular windows are rarely used despite their
    better time resolution

17
Speech Signal Representations IShort-Time
Fourier Analysis
18
Speech Signal Representations IShort-Time
Fourier Analysis
19
Speech Signal Representations IShort-Time
Fourier Analysis
20
Speech Signal Representations IShort-Time
Fourier Analysis
21
Speech Signal Representations IShort-Time
Fourier Analysis
  • Short-time spectrum of male voice speech
  • Time signal /ah/local pitch 110Hz
  • 30ms rectangularwindow
  • 15ms rectangular window
  • 30ms Hammingwindow
  • 15ms Hammingwindow

22
Speech Signal Representations IShort-Time
Fourier Analysis
  • Short-time spectrum of female voice speech
  • Time signal /aa/local pitch 200Hz
  • 30ms rectangularwindow
  • 15ms rectangular window
  • 30ms Hammingwindow
  • 15ms Hammingwindow

23
Speech Signal Representations IShort-Time
Fourier Analysis
  • Short-time spectrum of unvoiced speech
  • Time signal
  • 30ms rectangularwindow
  • 15ms rectangular window
  • 30ms Hammingwindow
  • 15ms Hammingwindow

24
Speech Signal Representations ILinear
Predictive Coding
  • LPC a.k.a. auto-regressive (AR) modeling
  • All-pole filter is good approximation of speech,
    with p as the order of the LPC analysis
  • Predicts current sample as linear combination of
    past p samples

25
Speech Signal Representations ILinear
Predictive Coding
  • To estimate predictor coefficients (ak), use
    short-term analysis technique
  • Per segment, minimize the total prediction error
    by calculating the minimum squared error
  • Take the derivative, equate it to 0 expressed as
    a set of p linear equationsthe Yule-Walker
    equations

26
Speech Signal Representations ILinear
Predictive Coding
  • Solution of the Yule-Walker equations
  • Any standard matrix inversion package
  • Due to the special form of the matrix, efficient
    solutions
  • Covariance methodusing the Cholesky
    decomposition
  • Autocorrelation methodusing windows, results in
    equations with Toeplitz matrices, solved by the
    Durbin recursion algorithm
  • Lattice methodequivalent to Levinson Durbin
    recursionoften used in fixed-point
    implementations because lack of precision doesnt
    result in unstable filters

27
Speech Signal Representations I Linear
Predictive Coding
28
Speech Signal Representations I Linear
Predictive Coding
29
Speech Signal Representations ILinear
Predictive Coding
  • Spectral analysis via LPC
  • All-pole (IIR) filter
  • Peaks at the roots of the denominator

30
Speech Signal Representations ILinear
Predictive Coding
  • Prediction error
  • Should be (approximately) the excitation
  • Unvoiced speech, expect white noise OK
  • Voiced speech, expect impulse train NOK
  • All-pole assumption not altogether valid
  • Real speech not perfectly periodic
  • Pitch synchronous analysis gives better results
  • LPC order
  • Larger p gives lower prediction errors
  • Too large a p results in fitting the individual
    harmonics ?separation between filter and source
    will not be so good

31
Speech Signal Representations ILinear
Predictive Coding
  • Prediction error
  • Inverse LPC filter gives residual signal

32
Speech Signal Representations ILinear
Predictive Coding
  • Alternatives for the predictor coefficients
  • Line Spectral Frequencies
  • local sensitivity
  • efficiency
  • Reflection Coefficients
  • Guaranteed stable ? useful for coefficient
    interpolated over time
  • Log-area ratios
  • Flat spectral sensitivity
  • Roots of the polynomial
  • Represent resonance frequencies and bandwidths

33
Speech Signal Representations ICepstral
Processing
  • A homomorphic transformation converts a
    convolution into a sum

34
Speech Signal Representations ICepstral
Processing
35
Speech Signal Representations ICepstral
Processing
Write a Comment
User Comments (0)
About PowerShow.com