SPECTRUM? - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

SPECTRUM?

Description:

... (coarticulation) mutual info studies (Bilmes, Yang et al.) psychophysics of hearing 200 ms critical time window (forward masking, perception of loudness, ... – PowerPoint PPT presentation

Number of Views:77
Avg rating:3.0/5.0
Slides: 12
Provided by: Hyne3
Category:

less

Transcript and Presenter's Notes

Title: SPECTRUM?


1
SPECTRUM?
  • Hynek Hermansky
  • with
  • Jordan Cohen, Sangita Sharma, and Pratibha Jain,

2
/u/ /o/ /a/ /e/ /iy/
limited commercial success -John Pierce 1969
3
SHORT TERM SPECTRUM
4
Cortical receptive fields
5
ASR from TempoRAl Patterns (TRAP)
6
WHY 200-1000 ms ?
200 1000 ms
frequency
time
  • because thats where the information is
    (coarticulation)
  • mutual info studies (Bilmes, Yang et al.)
  • psychophysics of hearing
  • 200 ms critical time window (forward masking,
    perception of loudness, perception of gaps,
  • physiology of hearing
  • time component of cortical receptive fields
    (Klein)
  • because it works
  • ETSI Aurora work

7
WHY narrow frequency bands?
frequency
time
1-3 Bark
  • psychophysics of hearing
  • independence of processing within critical bands
  • physiology of hearing
  • mechanical selectivity of cochlea
  • cortical receptive fields (e.g. Shamma)
  • because it works
  • multi-band ASR (Bourlard and Dupont, Hermansky et
    al,)
  • decrease in ASR accuracy for wider frequency
    spans (Jain and Hermansky - Eurospeech 2003)

8
Which features?
frequency
time
data-guided processing
features
  • no knowledge is better than wrong knowledge
  • data cannot lie
  • speech evolved to be heard
  • data-derived processing is consistent with
    human-like processing (minus the irrelevant
    components of the human cognitive processing)

9
WHY data-guided processing?
frequency
time
data-guided (trained on data) processing
features
  • some function of class posteriors
  • class posteriors form the most efficient feature
    set e.g. Fukunaga
  • posteriors of which classes?

10
Speech Events
class (phoneme?) detection
11
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com