Motivations - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Motivations

Description:

Mark Hasegawa-Johnson. Ozgur Cetin. Kate Saenko. November 12, ... UIUC (Hasegawa-Johnson et al.) MIT (Livescu, ... Mark Hasegawa-Johnson, U. Illinois at ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 13
Provided by: peopleC
Category:

less

Transcript and Presenter's Notes

Title: Motivations


1
(No Transcript)
2
Motivations
  • Why articulatory feature-based ASR?
  • Improved modeling of co-articulatory
    pronunciation phenomena
  • Take advantage of human perception and production
    knowledge
  • Application to audio-visual modeling
  • Application to multilingual ASR
  • Evidence of improved ASR performance with
    feature-based models
  • In noise Kirchhoff et al. 2002
  • For hyperarticulated speech Soltau et al. 2002
  • Potential savings in training data
  • Why this workshop project?
  • Growing number of sites investigating
    complementary aspects of this idea a
    non-exhaustive list
  • U. Edinburgh (King et al.)
  • UIUC (Hasegawa-Johnson et al.)
  • MIT (Livescu, Glass, Saenko)
  • Recently developed tools (e.g. graphical models)
    for systematic exploration of the model space

3
(No Transcript)
4
(No Transcript)
5
(No Transcript)
6
Recent related work
  • Product observation models combining phones and
    features, p(obss) p(obsphs) ?
    p(obsfsi), improve ASR in some conditions
  • Kirchhoff et al. 2002, Metze et al. 2002,
    Stueker et al. 2002
  • Lexical access from manual transcriptions of
    Switchboard words using DBN model above Livescu
    Glass 2004, 2005
  • Improves over phone-based pronunciation models
    (50 ? 25 error)
  • Preliminary result Articulatory phonology
    features preferable to IPA-style (place/manner)
    features
  • JHU WS04 project Hasegawa-Johnson et al. 2004
  • Can combine landmarks IPA-style features at
    acoustic level with articulatory phonology
    features at pronunciation level
  • Articulatory recognition using DBN and ANN/DBN
    models Wester et al. 2004, Frankel et al. 2005
  • Modeling inter-feature dependencies useful,
    asynchrony may also be useful
  • Lipreading using multistream DBN model SVM
    feature detectors
  • Improves over viseme-based models in
    medium-vocabulary word ranking and realistic
    small-vocabulary task Saenko et al. 2005

7
(No Transcript)
8
(No Transcript)
9
Goals for 2006 workshop
  • To build complete articulatory feature-based ASR
    systems
  • Using multistream DBN structures
  • For both audio-only and audio-visual ASR
  • To develop a thorough understanding of the design
    issues involved
  • Asynchrony modeling
  • Context modeling
  • Speaker dependency
  • Generative observation modeling vs.
    discriminative feature classification

10
Potential participants and contributors
  • Local participants
  • Karen Livescu, MIT
  • Feature-based ASR structures, graphical models,
    GMTK
  • Mark Hasegawa-Johnson, U. Illinois at
    Urbana-Champaign
  • Discriminative feature classification, JHU WS04
  • Simon King, U. Edinburgh
  • Articulatory feature recognition, ANN/DBN
    structures
  • Ozgur Cetin, ICSI Berkeley
  • Multistream/multirate modeling, graphical models,
    GMTK
  • Florian Metze
  • Articulatory features in HMM framework
  • Jeff Bilmes, U. Washington
  • Graphical models, GMTK
  • Kate Saenko, MIT
  • Visual feature classification, AVSR
  • Others?
  • Satellite/advisory contributors
  • Jim Glass, MIT

11
Resources
  • Tools
  • GMTK
  • HTK
  • Intel AVCSR toolkit
  • Data
  • Audio-only
  • Svitchboard (CSTR Edinburgh) Small-vocab,
    continuous, conversational
  • PhoneBook Medium-vocab, isolated-word, read
  • (Switchboard rescoring? LVCSR)
  • Audio-visual
  • AVTIMIT (MIT) Medium-vocab, continuous, read,
    added noise
  • Digit strings database (MIT) Continuous, read,
    naturalistic setting (noise and video background)
  • Articulatory measurements
  • X-ray microbeam database (U. Wisconsin) Many
    speakers, large-vocab, isolated-word and
    continuous
  • MOCHA (QMUC, Edinburgh) Few speakers,
    medium-vocab, continuous
  • Others?
  • Manual transcriptions ICSI Berkeley Switchboard
    transcription project

12
Thanks!Questions? Comments?
Write a Comment
User Comments (0)
About PowerShow.com