Audio-Visual Speech and Speaker Recognition - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Audio-Visual Speech and Speaker Recognition

Description:

Audio-Visual Speech and Speaker Recognition. G rard Chollet, Guido Aversano, ... In Stork, D.G. and Hennecke, M.E. (Eds.), Speechreading by Humans and Machines. ... – PowerPoint PPT presentation

Number of Views:567
Avg rating:3.0/5.0
Slides: 41
Provided by: grardc
Category:

less

Transcript and Presenter's Notes

Title: Audio-Visual Speech and Speaker Recognition


1
Audio-Visual Speech and Speaker Recognition
  • Gérard Chollet, Guido Aversano, Hervé Bredin,
    Fabian Brugger, Maurice Charbit, Jerôme Darbon,
    Walid Karam, Chafic Mokbel, Santa Rossi,
  • Eduardo Sanchez, Marc Sigelle,
  • Georges Yazbek, Leila Zouari

2
Talking Faces
  • Recognition of face features (lips, jaws,
    eyebrows, gaze, eye-blinkings,...) in synchrony
    with speech,
  • Tracking of lip movements,
  • Recognition of visemes,
  • Lip reading how well do hard-of-hearing people
    perform ?

3
F. J. Huang and T. Chen, "Real-Time Lip-Synch
Face Animation driven by human voice", IEEE
Workshop on Multimedia Signal Processing, Los
Angeles, California, Dec 1998
Audio-visual recognition of spectrally reduced
speech Frédéric Berthommier
4
SpeechReading
A human listener can use visual cues, such as lip
and tongue movements, to enhance the level of
speech understanding, especially in a noisy
environment. The process of combining the audio
modality and the visual modality is referred to
as speechreading, or lipreading.
There are many applications in which it is
desired to recognize speech under extremely
adverse acoustic environments. Detecting a
person's speech from a distance or through a
glass window, understanding a person speaking
among a very noisy crowd of people, and
monitoring a speech over TV broadcast when the
audio link is weak or corrupted, are some
examples.
5
2001 a Space Odyssee
6
Audio-Visual Speech Recognition(Ref ?)
7
Audio-Visual Speech Recognition(Ref ?)
8
Audio-Visual Speech Recognition(Ref ?)
9
Audio-Visual Speech Recognition(Ref ?)
10
Coupled HMM
11
OpenCV
Open source code for AVCSR can be downloaded from
http//sourceforge.net/projects/opencvlibrary/
.
12
Publications 1
  • - Ara V Nefian, Lu Hong Liang, Xiao Xing Liu,
    Xiaobo Pi and
  • Kevin Murphy, "Dynamic Bayesian networks for
    audio-visual
  • speech recognition", EURASIP, Journal of Applied
    Signal
  • Processing , vol. 2002, no 11, p. 1274-1288,
    2002.
  • Xiao Xing Liu, Yibao Zhao, Xiaobo Pi, Lu Hong
    Liang and
  • Ara V Nefian, "Audio-visual continuous speech
    recognition
  • using a coupled hidden Markov model", IEEE
    International
  • Conference on Spoken Language Processing , p.
    213-216,
  • September 2002.
  • Lu Hong Liang, Xiao Xing Liu, Yibao Zhao, Xiaobo
    Pi and
  • Ara V Nefian, "Speaker independent audio-visual
    continuous
  • speech recognition", IEEE International
    Conference on
  • Multimedia and Expo , vol.2, p. 25-28, August
    2002.

13
Publications 2
  • Ara V Nefian, Lu Hong Liang, Xiao Xing Liu,
    Xiaobo Pi,
  • Crusoe Mao and Kevin Murphy, "A coupled HMM for
  • audio-visual speech recognition", International
    Conference on
  • Acoustics Speech and Signal Processing , vol II,
    pp 2013-2016,
  • Orlando, Florida, May 2002 .
  • - Gerasimos Potamianos, Chalapathy Neti,
    Gridharan Iyengar,
  • Andrew W. Senior and Ashish Verma
  • A cascade visual front end for speaker
    independent
  • automatic speechreading
  • International Journal of Speech Technology,
    Special Issue on
  • Multimedia, 4, 193-208, 2001

14
Biblio
Adjoudani, A. and Benoit, C. (1996) . On the
integration of auditory and visual parameters in
an HMM-based ASR. In Stork, D.G. and Hennecke,
M.E. (Eds.), Speechreading by Humans and
Machines. Berlin, Germany Springer, pp. 461-471.
Bregler, C. and Konig, Y. (1994) . Eigenlips'
for robust speech recognition. Proceedings
International Conference on Acoustics, Speech,
and Signal Processing (ICASSP)'94, Adelaide,
Australia, pp. 669-672.
15
Biblio
Brooke, N.M. (1996) . Talking heads and speech
recognizers that can see The computer
processing of visual speech signals. In Stork,
D.G. and Hennecke, M.E. (Eds.), Speechreading by
Humans and Machines. Berlin, Germany Springer,
pp. 351-371.
Chen, T. (2001) . Audiovisual speech processing.
Lip reading and lip synchronization. IEEE Signal
Processing Magazine, 18(1)9-21.
16
Dupont, S. and Luettin, J. (2000) . Audio-visual
speech modeling for continuous speech
recognition. IEEE Transactions on Multimedia,
2(3)141-151.
Gray, M.S., Movellan, J.R., and Sejnowski, T.J.
(1997) . Dynamic features for visual
speech-reading A systematic comparison. In
Mozer, M.C., Jordan, M.I., and Petsche, T.
(Eds.), Advances in Neural Information
Processing Systems 9. Cambridge, MA MIT Press,
pp. 751-757.
17
Biblio
Neti, C., Potamianos, G., Luettin, J., Matthews,
I., Glotin, H., Vergyri, D., Sison, J., Mashari,
A., and Zhou, J. (2000). Audio-Visual Speech
Recognition. Summer Workshop 2000 Final
Technical Report, Center for Language and Speech
Processing, The Johns Hopkins University,
Baltimore, MD (http //www.clsp.jhu.edu/ws2000/fi
nal reports/avsr/).
Petajan, E.D. (1984) . Automatic lipreading to
enhance speech recognition. Proceedings Global
Telecommunications Conference (GLOBCOM)'84,
Atlanta, GA, pp. 265-272.
18
Rogozan, A., Deleglise, P., and Alissali, M.
(1997) . Adaptive determination of audio and
visual weights for automatic speech recognition.
Proceedings European Tutorial Research
Workshop on Audio-Visual Speech Processing
(AVSP)'97, Rhodes, Greece, pp. 61-64.
Summerfield, A.Q. (1987) . Some preliminaries to
a comprehensive account of audio-visual speech
perception. In Dodd, B. and Campbell, R. (Eds.),
Hearing by Eye The Psychology of Lip-Reading.
Hillside, NJ Lawrence Erlbaum Associates, pp.
97-113.
19
Summerfield, Q., MacLeod, A., McGrath, M., and
Brooke, M. (1989) . Lips, teeth, and the benefits
of lipreading. In Young, A.W. and Ellis, H.D.
(Eds.), Handbook of Research on Face Processing.
Amsterdam, The Netherlands Elsevier Science
Publishers, pp. 223-233.
Teissier, P., Robert-Ribes, J., Schwartz, J.-L.,
and Guerin-Dugue, A. (1999) . Comparing
models for audiovisual fusion in a noisy-vowel
recognition task. IEEE Transactions on Speech
and Audio Processing, 7(6)629-642.
20
Wark, T. and Sridharan, S. (1998) . A syntactic
approach to automatic lip feature extraction for
speaker identication. Proceedings International
Conference on Acoustics, Speech, and Signal
Processing (ICASSP)'98, Seattle, WA, pp.
3693-3696.
A HYBRID ANN/HMM AUDIO-VISUAL SPEECH RECOGNITION
SYSTEM Martin Heckmann, Frédéric Berthommier ,
Kristian Kroschel
21
A HYBRID ANN/HMM AUDIO-VISUAL SPEECH RECOGNITION
SYSTEMMartin Heckmann, Frédéric Berthommier ,
Kristian Kroschel
22
(No Transcript)
23
C. Bregler, S. Manke, H. Hild, and A. Waibel,
Bimodal sensor integration on the example of
speech-reading, in Proc. IEEE Int. Conf. on
Neural Networks, 1993, pp. 667 671.
A. Rogozan and P. Deléglise, Adaptive fusion of
acoustic and visual sources for automatic spech
recognition, Speech Communication, vol. 26, pp.
149161, 1998.
24


25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com