Motivations

About This Presentation

Title:

Description:

Number of Views:29

Avg rating:3.0/5.0

Slides: 13

Provided by: peopleC

Learn more at: https://people.csail.mit.edu

Category:

Tags: hasegawa | motivations

Transcript and Presenter's Notes

Title: Motivations

1
(No Transcript)
2
Motivations

Why articulatory feature-based ASR?
Improved modeling of co-articulatory
pronunciation phenomena
Take advantage of human perception and production
knowledge
Application to audio-visual modeling
Application to multilingual ASR
Evidence of improved ASR performance with
feature-based models
In noise Kirchhoff et al. 2002
For hyperarticulated speech Soltau et al. 2002
Potential savings in training data
Why this workshop project?
Growing number of sites investigating
complementary aspects of this idea a
non-exhaustive list
U. Edinburgh (King et al.)
UIUC (Hasegawa-Johnson et al.)
MIT (Livescu, Glass, Saenko)
Recently developed tools (e.g. graphical models)
for systematic exploration of the model space

3
(No Transcript)
4
(No Transcript)
5
(No Transcript)
6
Recent related work

Product observation models combining phones and
features, p(obss) p(obsphs) ?
p(obsfsi), improve ASR in some conditions
Kirchhoff et al. 2002, Metze et al. 2002,
Stueker et al. 2002
Lexical access from manual transcriptions of
Switchboard words using DBN model above Livescu
Glass 2004, 2005
Improves over phone-based pronunciation models
(50 ? 25 error)
Preliminary result Articulatory phonology
features preferable to IPA-style (place/manner)
features
JHU WS04 project Hasegawa-Johnson et al. 2004
Can combine landmarks IPA-style features at
acoustic level with articulatory phonology
features at pronunciation level
Articulatory recognition using DBN and ANN/DBN
models Wester et al. 2004, Frankel et al. 2005
Modeling inter-feature dependencies useful,
asynchrony may also be useful
Lipreading using multistream DBN model SVM
feature detectors
Improves over viseme-based models in
medium-vocabulary word ranking and realistic
small-vocabulary task Saenko et al. 2005

7
(No Transcript)
8
(No Transcript)
9
Goals for 2006 workshop

10
Potential participants and contributors

11
Resources

Tools
GMTK
HTK
Intel AVCSR toolkit
Data
Audio-only
Svitchboard (CSTR Edinburgh) Small-vocab,
continuous, conversational
PhoneBook Medium-vocab, isolated-word, read
(Switchboard rescoring? LVCSR)
Audio-visual
AVTIMIT (MIT) Medium-vocab, continuous, read,
added noise
Digit strings database (MIT) Continuous, read,
naturalistic setting (noise and video background)
Articulatory measurements
X-ray microbeam database (U. Wisconsin) Many
speakers, large-vocab, isolated-word and
continuous
MOCHA (QMUC, Edinburgh) Few speakers,
medium-vocab, continuous
Others?
Manual transcriptions ICSI Berkeley Switchboard
transcription project