Title: WorkPackage 5: Multimodal Processing and Interaction ETEAMS Overview
1Work-Package 5Multimodal Processing and
Interaction E-TEAMS Overview
- Leaders
- Petros Maragos, ICCS-NTUA
- Alexandros Potamianos, TSI-TUC
2WP5 Outline Description of Work in JPA3
- T1. Book on Multimodal Processing and Interaction
- T2. Audio-Visual Speech Analysis and Recognition
- T2.1 Audio-Visual Feature Extraction and Fusion
- T2.2 Dynamic Models for AV-ASR, Evaluation
- T2.3 Audio-Visual to Articulatory Speech
Inversion - T3. Multimodal Integration for MM Analysis
Recognition - T3.1 Video Analysis Integration of
Asynchronous Time-evolving Modalities - T3.2 Multimodal Saliency
- T3.3 Integrated Multimedia Content Analysis
- T4. Interfaces to Multimedia
- T4.1 Multimodal Dialogue Interfaces
- T4.2 Eye-tracking Interfaces for Information
Retrieval - T4.3 Mobile Interfaces
- T5. Coordination of research and Dissemination of
results
3e-Teams Goals Objectives
- E-Team 10 Audio-Visual Speech Analysis
Recognition - AV Feature Extraction and Feature Fusion
- Dynamical Models for AV-ASR, Evaluation
- Audio-Visual to Articulatory Speech Inversion.
- E-Team 11 Multimodal Processing Multimedia
Understanding - Video Analysis and Integration of Asynchronous
Time-evolving Modalities - Audio-Visual Attention Modeling and Salient Event
Detection - Integrated Multimedia Content Analysis
- E-Team 12 Multimodal Interfaces
- Multimodal Recognition and Dialogue Systems
- Mobile Services
- Novel Interfaces (Eye-tracking)
4e-Team 10 AV Speech Analysis Recogn.
- Partners
- P. Maragos, G. Panandreou, A. Katsamanis, V.
Pitsikalis (ICCS-NTUA) - Alex Potamianos (TSI-TUC)
- Khalid Daoudi, Eduardo Sanchez-Soto (IRIT)
- Yves Laprie (INRIA-Parole)
- Guillaume Gravier, Patrick Gros (INRIA-Texmex)
- Costas Kotropoulos, N. Nikolaidis, I. Pitas
(AUTH) - Ron Kimmel (Technion)
5e-Team 10 AV Speech Analysis Recogn.
- Research areas include
- Active-Appearance (and other Deformable) Models
and Statistical Approaches for Face (or only
mouth area) detection, modelling and feature
extraction - Nonlinear Speech Modelling for better audio
articulatory feature extraction - A-V Feature Fusion
- Audio-visual to Articulatory Speech Inversion
- Application areas include
- Audio-Visual Automatic Speech Recognition
(including Lip-Reading) - Collection of AV Databases and Evaluations
- Applications of AV articulatory Speech Inversion.
6e-Team 10 AV Speech Analysis Recogn.
- The main goals of e-team10 are
- Goal 1 Contribute to the Update of the
State-of-Art Surveys of the WP5 MUSCLE Book - Goal 2 Co-Author New Research Chapters of the
WP5 MUSCLE Book - Goal 3 Co-author conference and journal Papers
on some focus theme with multiple MUSCLE partners
(improve integration) - Goal 4 Collaboration on common research agendas
for AV-ASR and AV speech inversion
7e-Team 10 AV Speech Analysis Recogn.
- Recent Work
- Audio-Visual Speech Recognition (TUC, NTUA)
- Multimodal Feature Fusion (TUC, IRIT, NTUA)
- Audio-Visual Speech Inversion (INRIA-Parole,
NTUA, KTH-Speech) - Contribution to MUSCLE Book
- AV-ASR showcase proposal
- Future Plans
- Continued collaboration in aforementioned
research areas - Book project first draft by June
- Workshop in Athens April 2007 (joint with e-team
11,12)
8e-Team 11 Multimodal Proc. Understanding
- Partners
- P. Maragos, G. Evangelopoulos, K. Rapantzikos, S.
Kollias (NTUA) - Patrick Gros, Ewa Kijak, Guillaume Gravier
(INRIA-Texmex) - Costas Kotropoulos, N. Nikolaidis, I. Pitas
(AUTH) - Andreas Rauber (TU Wien)
- Alex Potamianos (TUC)
- Sanni Siltanen (VTT)
- Fred Stentiford, Wole Oyekoya (UCL)
- Enis Cetin (Bilkent)
9e-Team 11 Multimodal Proc. Understanding
- Research areas include
- Stochastic modeling with several data streams /
several temporal rates / weakly synchronized data
- Audio-Visual Cooperative Feature Extraction and
Salient Event Detection - Audio-visual Dialogue Understanding
- Image Text Integration
- Audio Text integration
- Application areas include
- Understand ( structure) TV and other MM
documents, and Prepare these documents for
applications (repurposing, archiving) - Event Detection and Segmentation in Sports videos
- Salient Event Detection and Dialogue Detection in
Movies videos - Speech Transcription and NLP
- Music genre analysis and music retrieval
10e-Team 11 Multimodal Proc. Understanding
- The main goals of e-team11 are
- Goal 1 Contribute to the Update of the
State-of-Art Surveys of the WP5 MUSCLE Book. - Goal 2 Co-Author New Research Chapters of the
WP5 MUSCLE Book. - Goal 3 Co-author conference and journal Papers
on some focus theme with multiple MUSCLE partners
(improve integration). - Goal 4 Collaboration on a common research agenda
for multimodal feature fusion, saliency detection
and multimodal processing.
11e-Team 11 Multimodal Proc. Understanding
- Recent Work
- Annotated Movie Information Database (AUTH)
- Audio-Visual Saliency Detection (AUTH,
INRIA-Texmex, NTUA, TUC) - Contribution to MUSCLE Book (NTUA, TUC, AUTH,
INRIA-TexMex, TUWien, Bilkent) - Movie summarization showcase proposal
- Future Plans
- Closer collaboration between partners on common
movie DB - Book project first draft by June
- Workshop in Athens April 2007 (joint with e-team
11,12)
12e-Team 12 Multimodal Interfaces
- Partners
- Alex Potamianos, Manolis Perakakis, Michalis
Toutoudakis, TUC - Petros Maragos, Nassos Katsamanis, George
Panandreou, NTUA - Sanni Siltanen, Santtu Toivonen, VTT
- Fred Stentiford, UCL
- Ugur Gudukbay, Ozgur Ulusoy, Enis Cetin, Yigithan
Dedeoglu, Serkan Genc, Bilkent University - Costas Kotropoulos, AUTH
- Andreas Rauber, TU Wien
13e-Team 12 Multimodal Interfaces
- Research areas
- multimodality
- annotation of multimedia databases
- search
- interface efficiency
- eye-tracking interfaces
- speech interfaces
- mobile interfaces
- Application areas
- search/information retrieval on image and video
databases - search/information rertieval on the web
- information-seeking spoken dialogue systems
- mobile services portal/applications
- search/information retrieval for audio data
14e-Team 12 Multimodal Interfaces
- The main goals of e-team 12 are
- Goal 1 Contribute to the Update of the
State-of-Art Surveys of the WP5 MUSCLE Book. - Goal 2 Co-Author New Research Chapters of the
WP5 MUSCLE Book. - Goal 3 Co-author conference and journal Papers
on some focus theme with multiple MUSCLE partners
(improve integration). - Goal 4 Collaboration on a common research agenda
for multimodal feature fusion, saliency detection
and multimodal processing.
15e-Team 12 Multimodal Interfaces
- Recent Work
- Multimodal Spoken Interfaces (TUC, NTUA).
- Mobile Interfaces (TUC, VTT)
- Contribution to MUSCLE Book (TUC, UCL, VTT)
- Augmented assembly using a multimodal interface
showcase proposal - Future Plans
- Improve integration/collaboration between
partners - Book project first draft by June
- Workshop in Athens April 2007 (joint with e-team
11,12)
16BOOK
- Title Multimodal Processing and Interaction
Audio, Video, Text - Contents
- State-of-Art Reviews of WP6 WP10 (updated)
- Contributed Research Chapters New Work
- Agenda
- Scope and Thematic Areas discussed during
Audio-Conf Meetings - Each interested participant emails preliminary
title abstract - Table-of-Contents of selected chapters is
discussed with all participants - Publisher is contacted
17Multimodal Processing and Interaction Audio,
Video, Text
- PART I Review of the State-of-the-Art
- Cross-Modal Integration for Performance Improving
in Multimedia State-of-the-Art Review - Human-Computer Interfaces for Multimedia
Retrieval State-of-the-Art Review - PART II New Research Directions
- Integrated Multimedia Analysis and Recognition
- Stochastic Models for Multimodal Video Analysis
- Adaptive Multimodal Fusion by Uncertainty
Compensation with Application to Audiovisual
Speech Recognition - Movie Analysis with Emphasis to Dialogue
Detections - Using HMM for Action Recognition in Audio-Visual
streams - Surveillance Using Both Video and Audio
- Audiovisual Attention Modeling and Salient Event
Detection
18Multimodal Processing and Interaction Audio,
Video, Text
- PART II (cont.) New Research Directions
- Searching Multimedia Content
- Interactive Image Retrieval using a Hybrid Visual
and Conceptual Content Representation - Multi-Modal Analysis of Text and Audio Features
for Music Information Retrieval - Toward the Integration of NLP and ASR POS
Tagging and Transcription - Interfaces to Multimedia Content
- Design Principles for Multimodal Spoken Dialogue
Systems - Eye Tracking for Image Retrieval
- Natural/ Novel User Interfaces for Mobile Devices
19WP5 e-Team Scientific Talks
- WP5 e-team 10 scientific talk "Stream weight
computation for Audio-Visual Speech Recognition",
by Eduardo Sanchez-Soto, IRIT (duration 15) - WP5 e-team 11 scientific talk "Dialogue Detecion
in Movies", by D. Ververidis, AUTH (duration
15') - WP5 e-team 12 scientific talk "Augmented
reality visualization Construncting the mobile
user interface", by Sanni Siltanen, VTT (duration
15')
20WP5 Scientific Talks (FRIDAY)
- WP 5 scientific talk "Multimodal Fusion
Application to AV-ASR and AV Speech Inversion",
by George Papandreou, NTUA
(duration 15') - WP 5 scientific talk "A Natural Language
Interface for a Video Database Management
System", by Ugur Gudukbay,
Bilkent U. (duration 15') - WP 5 scientific talk " Modality selection in
Multimodal Dialogue Systems", by Alex Potamianos,
TUC (duration 15')