Gammachirp Auditory Filter PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Gammachirp Auditory Filter


1
Gammachirp Auditory Filter
  • Alex Park
  • May 7th, 2003

2
Project Overview
  • Goal
  • Investigate use of (non-linear) auditory filters
    for speech analysis
  • Background
  • Sound analysis in auditory periphery similar to
    wavelet transform
  • Comparison
  • Traditional Short-Time Fourier analysis
  • Gammatone wavelet based analysis (auditory
    filter)
  • Extension
  • Gammachirp filter has level-dependent parameters
    which can model non-linear characteristics of
    auditory periphery
  • Implementation
  • Specifics of Gammachirp implementation
  • How to incorporate level dependency

3
Auditory Physiology
  • Sound pressure variation in the air is transduced
    through the outer and middle ears onto end of
    cochlea
  • Basilar membrane which runs throughout the
    cochlea maps place of maximal displacement to
    frequency

Outer ear
Auditory Nerve
Middle ear
Cochlea
Low freq (200 Hz)
Cortex
High freq (20 kHz)
Basilar Membrane
4
Motivation Why better auditory models?
  • Automatic Speech Recognition (ASR)
  • ASR systems perform adequately in clean
    conditions
  • Robustness is a major problem degradation in low
    SNR conditions is much worse than humans
  • Hearing research
  • Build better hearing aids and cochlear implants
  • Hearing impaired subjects with damaged cochlea
    have trouble understanding speech in noisy
    environments
  • Current hearing aids perform linear
    amplification, amplify noise as well as the
    signal
  • Is the lack of compressive non-linearity in the
    front-end a common link?

5
Non-stationary Nature of Speech
  • Why is speech a good candidate for local
    frequency analysis?

Waveform of the word tapestry
6
Time-Frequency Representation
  • The most common way of representing changing
    spectral content is the Short Time Fourier
    Transform (STFT)

7
Spectrogram from STFT
tapestry
8
STFT Characteristics
  • We can think of the STFT as filtering using the
    following basis
  • In the frequency domain, we are using a
    filterbank consisting of linearly spaced,
    constant bandwidth filters

9
Auditory Filterbanks
  • Unlike the STFT, physiological data indicates
    that auditory filters
  • are spaced more closely at lower freq than at
    high freq
  • have narrower bandwidths at lower frequencies
    (constant-Q)
  • The Gammatone filter bank proposed by Patterson,
    models these characteristics using a wavelet
    transform.
  • The mother wavelet, or kernel function, is

Tone carrier
Gamma Envelope
10
Gammatone Characteristics
  • Unlike the STFT, the Gammatone filterbank uses
    the following basis
  • The corresponding frequency responses are

11
What are we missing?
  • The Gammatone filterbank has constant-Q
    bandwidths and logarithmic spacing of center
    frequencies
  • Also, Gamma envelope guarantees compact support
  • But, the filters are 1) symmetric and 2) linear
  • Psychophysical experiments indicate that auditory
    filter shapes are
  • 1) Asymmetric
  • Sharper drop-off on high frequency side
  • 2) Non-linear
  • Filter shape and gain change depending on input
    level
  • Compressive non-linearity of the cochlea
  • Important for hearing in noise and for dynamic
    range

12
Gammachirp Characteristics
  • The Gammachirp filter developed by Irino
    Patterson uses a modified version of the
    Gammatone kernel

Chirp term
Gamma Envelope
Tone carrier
  • Frequency response is asymmetric, can fit passive
    filter
  • Level-dependent parameters can fit changes due to
    stimulus

13
Implementation
  • Looking in the frequency domain, the Gammachirp
    can be obtained by cascading a fixed Gammatone
    filter with an asymmetric filter
  • To fit psychophysical data, a fixed Gammachirp is
    cascaded with level-dependent asymmetric IIR
    filters

14
Comparison Tone vs. Passive Chirp outputs
  • Gammatone output seems to have better frequency
    res.
  • Passive Gammachirp output seems to have better
    time res.

15
Comparison Tone vs. Active Chirp Outputs
16
Incorporating level dependency
  • As illustrated in previous slide, passive
    Gammachirp output offers little advantage on
    clean speech using fixed stimulus levels
  • We can incorporate parameter control via feedback

17
Sample outputs
30dB SNR
Clean
40dB SNR
20dB SNR
18
References
  • Bleeck, S., Patterson, R.D., and Ives, T. (2003)
    Auditory Image Model for Matlab. Centre for the
    Neural Basis of Hearing.
    http//www.mrc-cbu.cam.ac.uk/cnbh/aimmanual/Introd
    uction/
  • Irino, T. and Patterson, R.D. (2001). A
    compressive gammachirp auditory filter for both
    physiological and psychophysical data, J.
    Acoust. Soc. Am. 109, 2008-2022.
  • Pickles, J.O. (1988). An Introduction to the
    Physiology of Hearing (Academic, London).
  • Slaney, M. (1993). An efficient implementation
    of the Patterson-Holdsworth auditory filterbank,
    Apple Computer Technical Report 35.
  • Slaney, M. (1998). Auditory Toolbox for
    Matlab, Interval Research Technical Report
    1998-010. http//rvl4.ecn.purdue.edu/malcolm/int
    erval/1998-010/

19
Sidenote
Clean
40 dB SNR
30 dB SNR
Write a Comment
User Comments (0)
About PowerShow.com