WorkPackage 5: Multimodal Processing and Interaction ETEAMS Overview - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

WorkPackage 5: Multimodal Processing and Interaction ETEAMS Overview

Description:

... Articulatory Speech Inversion. ... on common research agendas for AV-ASR and AV speech inversion ... Table-of-Contents of selected chapters is discussed with ... – PowerPoint PPT presentation

Number of Views:33

Avg rating:3.0/5.0

Slides: 21

Provided by: cvspC

Category:

more less

Transcript and Presenter's Notes

Title: WorkPackage 5: Multimodal Processing and Interaction ETEAMS Overview

1
Work-Package 5Multimodal Processing and
Interaction E-TEAMS Overview

Leaders
Petros Maragos, ICCS-NTUA
Alexandros Potamianos, TSI-TUC

2
WP5 Outline Description of Work in JPA3

T1. Book on Multimodal Processing and Interaction
T2. Audio-Visual Speech Analysis and Recognition
T2.1 Audio-Visual Feature Extraction and Fusion
T2.2 Dynamic Models for AV-ASR, Evaluation
T2.3 Audio-Visual to Articulatory Speech
Inversion
T3. Multimodal Integration for MM Analysis
Recognition
T3.1 Video Analysis Integration of
Asynchronous Time-evolving Modalities
T3.2 Multimodal Saliency
T3.3 Integrated Multimedia Content Analysis
T4. Interfaces to Multimedia
T4.1 Multimodal Dialogue Interfaces
T4.2 Eye-tracking Interfaces for Information
Retrieval
T4.3 Mobile Interfaces
T5. Coordination of research and Dissemination of
results

3
e-Teams Goals Objectives

E-Team 10 Audio-Visual Speech Analysis
Recognition
AV Feature Extraction and Feature Fusion
Dynamical Models for AV-ASR, Evaluation
Audio-Visual to Articulatory Speech Inversion.
E-Team 11 Multimodal Processing Multimedia
Understanding
Video Analysis and Integration of Asynchronous
Time-evolving Modalities
Audio-Visual Attention Modeling and Salient Event
Detection
Integrated Multimedia Content Analysis
E-Team 12 Multimodal Interfaces
Multimodal Recognition and Dialogue Systems
Mobile Services
Novel Interfaces (Eye-tracking)

4
e-Team 10 AV Speech Analysis Recogn.

Partners
P. Maragos, G. Panandreou, A. Katsamanis, V.
Pitsikalis (ICCS-NTUA)
Alex Potamianos (TSI-TUC)
Khalid Daoudi, Eduardo Sanchez-Soto (IRIT)
Yves Laprie (INRIA-Parole)
Guillaume Gravier, Patrick Gros (INRIA-Texmex)
Costas Kotropoulos, N. Nikolaidis, I. Pitas
(AUTH)
Ron Kimmel (Technion)

5
e-Team 10 AV Speech Analysis Recogn.

Research areas include
Active-Appearance (and other Deformable) Models
and Statistical Approaches for Face (or only
mouth area) detection, modelling and feature
extraction
Nonlinear Speech Modelling for better audio
articulatory feature extraction
A-V Feature Fusion
Audio-visual to Articulatory Speech Inversion
Application areas include
Audio-Visual Automatic Speech Recognition
(including Lip-Reading)
Collection of AV Databases and Evaluations
Applications of AV articulatory Speech Inversion.

6
e-Team 10 AV Speech Analysis Recogn.

The main goals of e-team10 are
Goal 1 Contribute to the Update of the
State-of-Art Surveys of the WP5 MUSCLE Book
Goal 2 Co-Author New Research Chapters of the
WP5 MUSCLE Book
Goal 3 Co-author conference and journal Papers
on some focus theme with multiple MUSCLE partners
(improve integration)
Goal 4 Collaboration on common research agendas
for AV-ASR and AV speech inversion

7
e-Team 10 AV Speech Analysis Recogn.

Recent Work
Audio-Visual Speech Recognition (TUC, NTUA)
Multimodal Feature Fusion (TUC, IRIT, NTUA)
Audio-Visual Speech Inversion (INRIA-Parole,
NTUA, KTH-Speech)
Contribution to MUSCLE Book
AV-ASR showcase proposal
Future Plans
Continued collaboration in aforementioned
research areas
Book project first draft by June
Workshop in Athens April 2007 (joint with e-team
11,12)

8
e-Team 11 Multimodal Proc. Understanding

Partners
P. Maragos, G. Evangelopoulos, K. Rapantzikos, S.
Kollias (NTUA)
Patrick Gros, Ewa Kijak, Guillaume Gravier
(INRIA-Texmex)
Costas Kotropoulos, N. Nikolaidis, I. Pitas
(AUTH)
Andreas Rauber (TU Wien)
Alex Potamianos (TUC)
Sanni Siltanen (VTT)
Fred Stentiford, Wole Oyekoya (UCL)
Enis Cetin (Bilkent)

9
e-Team 11 Multimodal Proc. Understanding

Research areas include
Stochastic modeling with several data streams /
several temporal rates / weakly synchronized data
Audio-Visual Cooperative Feature Extraction and
Salient Event Detection
Audio-visual Dialogue Understanding
Image Text Integration
Audio Text integration
Application areas include
Understand ( structure) TV and other MM
documents, and Prepare these documents for
applications (repurposing, archiving)
Event Detection and Segmentation in Sports videos
Salient Event Detection and Dialogue Detection in
Movies videos
Speech Transcription and NLP
Music genre analysis and music retrieval

10
e-Team 11 Multimodal Proc. Understanding

The main goals of e-team11 are
Goal 1 Contribute to the Update of the
State-of-Art Surveys of the WP5 MUSCLE Book.
Goal 2 Co-Author New Research Chapters of the
WP5 MUSCLE Book.
Goal 3 Co-author conference and journal Papers
on some focus theme with multiple MUSCLE partners
(improve integration).
Goal 4 Collaboration on a common research agenda
for multimodal feature fusion, saliency detection
and multimodal processing.

11
e-Team 11 Multimodal Proc. Understanding

Recent Work
Annotated Movie Information Database (AUTH)
Audio-Visual Saliency Detection (AUTH,
INRIA-Texmex, NTUA, TUC)
Contribution to MUSCLE Book (NTUA, TUC, AUTH,
INRIA-TexMex, TUWien, Bilkent)
Movie summarization showcase proposal
Future Plans
Closer collaboration between partners on common
movie DB
Book project first draft by June
Workshop in Athens April 2007 (joint with e-team
11,12)

12
e-Team 12 Multimodal Interfaces

Partners
Alex Potamianos, Manolis Perakakis, Michalis
Toutoudakis, TUC
Petros Maragos, Nassos Katsamanis, George
Panandreou, NTUA
Sanni Siltanen, Santtu Toivonen, VTT
Fred Stentiford, UCL
Ugur Gudukbay, Ozgur Ulusoy, Enis Cetin, Yigithan
Dedeoglu, Serkan Genc, Bilkent University
Costas Kotropoulos, AUTH
Andreas Rauber, TU Wien

13
e-Team 12 Multimodal Interfaces

Research areas
multimodality
annotation of multimedia databases
search
interface efficiency
eye-tracking interfaces
speech interfaces
mobile interfaces
Application areas
search/information retrieval on image and video
databases
search/information rertieval on the web
information-seeking spoken dialogue systems
mobile services portal/applications
search/information retrieval for audio data

14
e-Team 12 Multimodal Interfaces

The main goals of e-team 12 are
Goal 1 Contribute to the Update of the
State-of-Art Surveys of the WP5 MUSCLE Book.
Goal 2 Co-Author New Research Chapters of the
WP5 MUSCLE Book.
Goal 3 Co-author conference and journal Papers
on some focus theme with multiple MUSCLE partners
(improve integration).
Goal 4 Collaboration on a common research agenda
for multimodal feature fusion, saliency detection
and multimodal processing.

15
e-Team 12 Multimodal Interfaces

Recent Work
Multimodal Spoken Interfaces (TUC, NTUA).
Mobile Interfaces (TUC, VTT)
Contribution to MUSCLE Book (TUC, UCL, VTT)
Augmented assembly using a multimodal interface
showcase proposal
Future Plans
Improve integration/collaboration between
partners
Book project first draft by June
Workshop in Athens April 2007 (joint with e-team
11,12)

16
BOOK

Title Multimodal Processing and Interaction
Audio, Video, Text
Contents
State-of-Art Reviews of WP6 WP10 (updated)
Contributed Research Chapters New Work
Agenda
Scope and Thematic Areas discussed during
Audio-Conf Meetings
Each interested participant emails preliminary
title abstract
Table-of-Contents of selected chapters is
discussed with all participants
Publisher is contacted

17
Multimodal Processing and Interaction Audio,
Video, Text

PART I Review of the State-of-the-Art
Cross-Modal Integration for Performance Improving
in Multimedia State-of-the-Art Review
Human-Computer Interfaces for Multimedia
Retrieval State-of-the-Art Review
PART II New Research Directions
Integrated Multimedia Analysis and Recognition
Stochastic Models for Multimodal Video Analysis
Adaptive Multimodal Fusion by Uncertainty
Compensation with Application to Audiovisual
Speech Recognition
Movie Analysis with Emphasis to Dialogue
Detections
Using HMM for Action Recognition in Audio-Visual
streams
Surveillance Using Both Video and Audio
Audiovisual Attention Modeling and Salient Event
Detection

18
Multimodal Processing and Interaction Audio,
Video, Text

PART II (cont.) New Research Directions
Searching Multimedia Content
Interactive Image Retrieval using a Hybrid Visual
and Conceptual Content Representation
Multi-Modal Analysis of Text and Audio Features
for Music Information Retrieval
Toward the Integration of NLP and ASR POS
Tagging and Transcription
Interfaces to Multimedia Content
Design Principles for Multimodal Spoken Dialogue
Systems
Eye Tracking for Image Retrieval
Natural/ Novel User Interfaces for Mobile Devices

19
WP5 e-Team Scientific Talks

WP5 e-team 10 scientific talk "Stream weight
computation for Audio-Visual Speech Recognition",
by Eduardo Sanchez-Soto, IRIT (duration 15)
WP5 e-team 11 scientific talk "Dialogue Detecion
in Movies", by D. Ververidis, AUTH (duration
15')
WP5 e-team 12 scientific talk "Augmented
reality visualization Construncting the mobile
user interface", by Sanni Siltanen, VTT (duration
15')

20
WP5 Scientific Talks (FRIDAY)

WP 5 scientific talk "Multimodal Fusion
Application to AV-ASR and AV Speech Inversion",
by George Papandreou, NTUA
(duration 15')
WP 5 scientific talk "A Natural Language
Interface for a Video Database Management
System", by Ugur Gudukbay,
Bilkent U. (duration 15')
WP 5 scientific talk " Modality selection in
Multimodal Dialogue Systems", by Alex Potamianos,
TUC (duration 15')