LING124 Feature extraction - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

LING124 Feature extraction

Description:

Break the spectrum into bands of equal mel interval (90 mels) ... Mel-frequency cepstral coefficients do not include temporal information ... – PowerPoint PPT presentation

Number of Views:172
Avg rating:3.0/5.0
Slides: 14
Provided by: hahn7
Category:

less

Transcript and Presenter's Notes

Title: LING124 Feature extraction


1
LING124 Feature extraction
  • October 14, 2008

2
Class outline
  • Feature extraction (Front-end)
  • Pre-emphasis
  • Framing and windowing
  • Mel-Frequency Cepstral Coefficients
  • MFSC, MFCC
  • Deltas, double-deltas
  • Typical feature parameters

3
Feature extraction
  • Capture essential acoustic information for speech
    recognition

4
Pre-emphasis
  • Increase the magnitude of some range of
    frequencies with respect to other frequencies
  • Voiced part of the speech signal is often in high
    frequency range, but high frequency formants have
    smaller amplitude than low frequency formants
  • So we apply high-pass filter to speech signal
    before we extract any acoustic features

5
Pre-emphasis (2)
  • FIR high-pass filter
  • yn xn-axn-1
  • 0.9 a 1.0

6
Framing Windowing
  • Divide speech signal into successive overlapping
    frames
  • Frame-size 10 25 milliseconds
  • Frame-shift 5 10 milliseconds
  • Multiply individual frames by a specific window
    function
  • Hanning window (Nnumber of samples in frame)

7
Spectrum
  • Spectrum represents frequency components of
    speech signal
  • Spectrum is derived by applying (discrete)
    Fourier transform to a windowed segment of speech
    signal

8
Mel-Frequency Spectral Coefficients
  • Perception of pitch is semi-logarithmic
  • Recall mel scale and bark scale
  • Frequency-dependent smearing
  • Ignore distinction that depends on difference
    between two frequencies that are less than 90-120
    mels apart
  • Break the spectrum into bands of equal mel
    interval (90 mels)
  • Weighted average of spectral coefficients in each
    band
  • The log of resulting band averages comprise the
    Mel-Frequency Spectral Coefficients (MFSC)

9
Mel filter bank
  • Each filters center frequency follows the
    mel-scale
  • mel(frequency) 2595 log10(1frequency/700)
  • The edges of a filter coincide with the center
    frequencies of adjacent filters
  • maxFreq has to be lower than the Nyquist frequency

10
Mel filter-bank processing
11
Mel-Frequency Cepstral Coefficients
  • Discrete Cosine Transform of MFSC
  • Represents a signal as a sum of cosine functions
  • cf. DFT uses both sine and cosine functions
  • This can be interpreted as applying inverse
    discrete Fourier transform to MFSC

12
Delta, double-delta
  • Mel-frequency cepstral coefficients do not
    include temporal information
  • How spectral shape changes over time is useful in
    speech recognition
  • As we normally calculate velocity and
    acceleration, we use first-derivatives (deltas)
    and second-derivatives (double-deltas) of MFCCs
    for temporal information

13
Typical feature parameters
  • Pre-emphasis coefficient 0.97
  • Frame-size 25ms
  • Frame-shift 10ms
  • 12 MFCC and log-energy
  • Deltas, double-deltas
  • A feature vector of 39 components
Write a Comment
User Comments (0)
About PowerShow.com