Powerpoint template for scientific posters (Swarthmore College)

About This Presentation
Title:

Powerpoint template for scientific posters (Swarthmore College)

Description:

Title: Powerpoint template for scientific posters (Swarthmore College) Author: Colin Purrington Description: Suggestions and gripes to: cpurrin1_at_swarthmore.edu – PowerPoint PPT presentation

Number of Views:1
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Powerpoint template for scientific posters (Swarthmore College)


1
Audiovisual-to-articulatory speech inversion
using Hidden Markov Models
Athanassios Katsamanis, George Papandreou, Petros
Maragos School of E.C.E., National Technical
University of Athens, Athens 15773, Greece
  • Speech inversion ?
  • Recover vocal tract geometry from
  • the speech signal and speakers face
  • Applications in Language Tutoring, Speech Therapy

Qualisys-Movetrack database
Evaluation
Audio
Electromagnetic Articulography (EMA)
Video
  • We use multistream HMMs
  • Visual-to-articulatory mapping is expected to be
    nonlinear.
  • Visual stream incorporated following the
    Audiovisual ASR paradigm.
  • We apply CCA
  • Train a linear mapping at each HMM state between
    audiovisual and articulatory data
  • Performance is improved
  • Compared to global linear modal or audio only or
    visual only HMM

x
wa
wv
ya
yv
spectral characteristics/MFCC
Determination of multistream HMM state sequence
Zero states correspond to the case of a global
linear model.
3-D marker coordinates
References H. Kjellstrom, O. Engwall, and O.
Balter, Reconstructing tongue movements from
audio and video, in Interspeech, 2006, pp.
22382241. O. Engwall, Introducing visual cues
in acoustic-to-articulatory inversion, in
INTERSPEECH, 2005, pp. 32053208. J. Jiang, A.
Alwan, P. A. Keating, E. T. Auer Jr., and L. E.
Bernstein, On the relationship between face
movements, tongue movements, and speech
acoustics, EURASIP Journal on Applied Signal
Processing, vol. 11, pp. 11741188, 2002. H.
Yehia, P. Rubin, and E. Vatikiotis-Bateson,
Quantitative association of vocal-tract and
facial behavior, Sp. Comm., vol. 26, pp. 2343,
1998. K. Richmond, S. King, and P. Taylor,
Modelling the uncertainty in recovering
articulation from acoustics, Computer Speech and
Language, vol. 17, pp. 153172, 2003. S. Hiroya
and M. Honda, Estimation of articulatory
movements from speech acoustics using an
hmm-based speech production model, IEEE TSAP,
vol. 12, no. 2, pp. 175185, March 2004. O.
Engwall and J. Beskow, Resynthesis of 3d tongue
movements from facial data, in EUROSPEECH,
2003. S. Dupont and J. Luettin, Audio-visual
speech modeling for continuous speech
recognition, IEEE Tr. Multimedia, vol. 2, no. 3,
pp. 141151, 2000. K. V. Mardia, J. T. Kent, and
J. M. Bibby, Multivariate Analysis. Acad. Press,
1979. L. L. Scharf and J. K. Thomas, Wiener
filters in canonical coordinates for transform
coding, filtering, and quantizing, IEEE TSAP,
vol. 46, no. 3, pp. 647654, 1998. L. Breiman and
J. H. Friedman, Predicting multivariate
responses in multiple linear regression, Journal
of the Royal Stat. Soc. (B), vol. 59, no. 1, pp.
354, 1997.
/p2/
/p1/
Time t, state i
  • Why Canonical Correlation Analysis (CCA)?
  • Leads to optimal reduced-rank linear regression
    models.
  • Improved predictive performance in the case of
    limited data

Maximum A Posteriori articulatory parameter
estimate
Where Qi is the covariance of the approximation
error and the prior of x is considered to be
Gaussian determined at the training phase
Acknowledgements This research was co-financed
partially by E.U.-European Social Fund (75) and
the Greek Ministry of Development-GSRT (25)
under Grant ?????-2003??866, and partially by
the European research project ASPI under Grant
FP6-021324. We would also like to thank O.
Engwall from KTH for providing us the QSMT
database.
Generalization error of the linear regression
model vs. model order for varying training set
size. Upper row Tongue position from face
expression. Lower Row Face expression from
tongue position.
Measured (black) and predicted (light color)
articulatory trajectories
Write a Comment
User Comments (0)