WorkPackage 5: Multimodal Processing and Interaction ETEAMS Overview - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

WorkPackage 5: Multimodal Processing and Interaction ETEAMS Overview

Description:

... Articulatory Speech Inversion. ... on common research agendas for AV-ASR and AV speech inversion ... Table-of-Contents of selected chapters is discussed with ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 21
Provided by: cvspC
Category:

less

Transcript and Presenter's Notes

Title: WorkPackage 5: Multimodal Processing and Interaction ETEAMS Overview


1
Work-Package 5Multimodal Processing and
Interaction E-TEAMS Overview
  • Leaders
  • Petros Maragos, ICCS-NTUA
  • Alexandros Potamianos, TSI-TUC

2
WP5 Outline Description of Work in JPA3
  • T1. Book on Multimodal Processing and Interaction
  • T2. Audio-Visual Speech Analysis and Recognition
  • T2.1 Audio-Visual Feature Extraction and Fusion
  • T2.2 Dynamic Models for AV-ASR, Evaluation
  • T2.3 Audio-Visual to Articulatory Speech
    Inversion
  • T3. Multimodal Integration for MM Analysis
    Recognition
  • T3.1 Video Analysis Integration of
    Asynchronous Time-evolving Modalities
  • T3.2 Multimodal Saliency
  • T3.3 Integrated Multimedia Content Analysis
  • T4. Interfaces to Multimedia
  • T4.1 Multimodal Dialogue Interfaces
  • T4.2 Eye-tracking Interfaces for Information
    Retrieval
  • T4.3 Mobile Interfaces
  • T5. Coordination of research and Dissemination of
    results

3
e-Teams Goals Objectives
  • E-Team 10 Audio-Visual Speech Analysis
    Recognition
  • AV Feature Extraction and Feature Fusion
  • Dynamical Models for AV-ASR, Evaluation
  • Audio-Visual to Articulatory Speech Inversion.
  • E-Team 11 Multimodal Processing Multimedia
    Understanding
  • Video Analysis and Integration of Asynchronous
    Time-evolving Modalities
  • Audio-Visual Attention Modeling and Salient Event
    Detection
  • Integrated Multimedia Content Analysis
  • E-Team 12 Multimodal Interfaces
  • Multimodal Recognition and Dialogue Systems
  • Mobile Services
  • Novel Interfaces (Eye-tracking)

4
e-Team 10 AV Speech Analysis Recogn.
  • Partners
  • P. Maragos, G. Panandreou, A. Katsamanis, V.
    Pitsikalis (ICCS-NTUA)
  • Alex Potamianos (TSI-TUC)
  • Khalid Daoudi, Eduardo Sanchez-Soto (IRIT)
  • Yves Laprie (INRIA-Parole)
  • Guillaume Gravier, Patrick Gros (INRIA-Texmex)
  • Costas Kotropoulos, N. Nikolaidis, I. Pitas
    (AUTH)
  • Ron Kimmel (Technion)

5
e-Team 10 AV Speech Analysis Recogn.
  • Research areas include
  • Active-Appearance (and other Deformable) Models
    and Statistical Approaches for Face (or only
    mouth area) detection, modelling and feature
    extraction
  • Nonlinear Speech Modelling for better audio
    articulatory feature extraction
  • A-V Feature Fusion
  • Audio-visual to Articulatory Speech Inversion
  • Application areas include
  • Audio-Visual Automatic Speech Recognition
    (including Lip-Reading)
  • Collection of AV Databases and Evaluations
  • Applications of AV articulatory Speech Inversion.

6
e-Team 10 AV Speech Analysis Recogn.
  • The main goals of e-team10 are
  • Goal 1 Contribute to the Update of the
    State-of-Art Surveys of the WP5 MUSCLE Book
  • Goal 2 Co-Author New Research Chapters of the
    WP5 MUSCLE Book
  • Goal 3 Co-author conference and journal Papers
    on some focus theme with multiple MUSCLE partners
    (improve integration)
  • Goal 4 Collaboration on common research agendas
    for AV-ASR and AV speech inversion

7
e-Team 10 AV Speech Analysis Recogn.
  • Recent Work
  • Audio-Visual Speech Recognition (TUC, NTUA)
  • Multimodal Feature Fusion (TUC, IRIT, NTUA)
  • Audio-Visual Speech Inversion (INRIA-Parole,
    NTUA, KTH-Speech)
  • Contribution to MUSCLE Book
  • AV-ASR showcase proposal
  • Future Plans
  • Continued collaboration in aforementioned
    research areas
  • Book project first draft by June
  • Workshop in Athens April 2007 (joint with e-team
    11,12)

8
e-Team 11 Multimodal Proc. Understanding
  • Partners
  • P. Maragos, G. Evangelopoulos, K. Rapantzikos, S.
    Kollias (NTUA)
  • Patrick Gros, Ewa Kijak, Guillaume Gravier
    (INRIA-Texmex)
  • Costas Kotropoulos, N. Nikolaidis, I. Pitas
    (AUTH)
  • Andreas Rauber (TU Wien)
  • Alex Potamianos (TUC)
  • Sanni Siltanen (VTT)
  • Fred Stentiford, Wole Oyekoya (UCL)
  • Enis Cetin (Bilkent)

9
e-Team 11 Multimodal Proc. Understanding
  • Research areas include
  • Stochastic modeling with several data streams /
    several temporal rates / weakly synchronized data
  • Audio-Visual Cooperative Feature Extraction and
    Salient Event Detection
  • Audio-visual Dialogue Understanding
  • Image Text Integration
  • Audio Text integration
  • Application areas include
  • Understand ( structure) TV and other MM
    documents, and Prepare these documents for
    applications (repurposing, archiving)
  • Event Detection and Segmentation in Sports videos
  • Salient Event Detection and Dialogue Detection in
    Movies videos
  • Speech Transcription and NLP
  • Music genre analysis and music retrieval

10
e-Team 11 Multimodal Proc. Understanding
  • The main goals of e-team11 are
  • Goal 1 Contribute to the Update of the
    State-of-Art Surveys of the WP5 MUSCLE Book.
  • Goal 2 Co-Author New Research Chapters of the
    WP5 MUSCLE Book.
  • Goal 3 Co-author conference and journal Papers
    on some focus theme with multiple MUSCLE partners
    (improve integration).
  • Goal 4 Collaboration on a common research agenda
    for multimodal feature fusion, saliency detection
    and multimodal processing.

11
e-Team 11 Multimodal Proc. Understanding
  • Recent Work
  • Annotated Movie Information Database (AUTH)
  • Audio-Visual Saliency Detection (AUTH,
    INRIA-Texmex, NTUA, TUC)
  • Contribution to MUSCLE Book (NTUA, TUC, AUTH,
    INRIA-TexMex, TUWien, Bilkent)
  • Movie summarization showcase proposal
  • Future Plans
  • Closer collaboration between partners on common
    movie DB
  • Book project first draft by June
  • Workshop in Athens April 2007 (joint with e-team
    11,12)

12
e-Team 12 Multimodal Interfaces
  • Partners
  • Alex Potamianos, Manolis Perakakis, Michalis
    Toutoudakis, TUC
  • Petros Maragos, Nassos Katsamanis, George
    Panandreou, NTUA
  • Sanni Siltanen, Santtu Toivonen, VTT
  • Fred Stentiford, UCL
  • Ugur Gudukbay, Ozgur Ulusoy, Enis Cetin, Yigithan
    Dedeoglu, Serkan Genc, Bilkent University
  • Costas Kotropoulos, AUTH
  • Andreas Rauber, TU Wien

13
e-Team 12 Multimodal Interfaces
  • Research areas
  • multimodality
  • annotation of multimedia databases
  • search
  • interface efficiency
  • eye-tracking interfaces
  • speech interfaces
  • mobile interfaces
  • Application areas
  • search/information retrieval on image and video
    databases
  • search/information rertieval on the web
  • information-seeking spoken dialogue systems
  • mobile services portal/applications
  • search/information retrieval for audio data

14
e-Team 12 Multimodal Interfaces
  • The main goals of e-team 12 are
  • Goal 1 Contribute to the Update of the
    State-of-Art Surveys of the WP5 MUSCLE Book.
  • Goal 2 Co-Author New Research Chapters of the
    WP5 MUSCLE Book.
  • Goal 3 Co-author conference and journal Papers
    on some focus theme with multiple MUSCLE partners
    (improve integration).
  • Goal 4 Collaboration on a common research agenda
    for multimodal feature fusion, saliency detection
    and multimodal processing.

15
e-Team 12 Multimodal Interfaces
  • Recent Work
  • Multimodal Spoken Interfaces (TUC, NTUA).
  • Mobile Interfaces (TUC, VTT)
  • Contribution to MUSCLE Book (TUC, UCL, VTT)
  • Augmented assembly using a multimodal interface
    showcase proposal
  • Future Plans
  • Improve integration/collaboration between
    partners
  • Book project first draft by June
  • Workshop in Athens April 2007 (joint with e-team
    11,12)

16
BOOK
  • Title Multimodal Processing and Interaction
    Audio, Video, Text
  • Contents
  • State-of-Art Reviews of WP6 WP10 (updated)
  • Contributed Research Chapters New Work
  • Agenda
  • Scope and Thematic Areas discussed during
    Audio-Conf Meetings
  • Each interested participant emails preliminary
    title abstract
  • Table-of-Contents of selected chapters is
    discussed with all participants
  • Publisher is contacted

17
Multimodal Processing and Interaction Audio,
Video, Text
  • PART I Review of the State-of-the-Art
  • Cross-Modal Integration for Performance Improving
    in Multimedia State-of-the-Art Review
  • Human-Computer Interfaces for Multimedia
    Retrieval State-of-the-Art Review
  • PART II New Research Directions
  • Integrated Multimedia Analysis and Recognition
  • Stochastic Models for Multimodal Video Analysis
  • Adaptive Multimodal Fusion by Uncertainty
    Compensation with Application to Audiovisual
    Speech Recognition
  • Movie Analysis with Emphasis to Dialogue
    Detections
  • Using HMM for Action Recognition in Audio-Visual
    streams
  • Surveillance Using Both Video and Audio
  • Audiovisual Attention Modeling and Salient Event
    Detection

18
Multimodal Processing and Interaction Audio,
Video, Text
  • PART II (cont.) New Research Directions
  • Searching Multimedia Content
  • Interactive Image Retrieval using a Hybrid Visual
    and Conceptual Content Representation
  • Multi-Modal Analysis of Text and Audio Features
    for Music Information Retrieval
  • Toward the Integration of NLP and ASR POS
    Tagging and Transcription
  • Interfaces to Multimedia Content
  • Design Principles for Multimodal Spoken Dialogue
    Systems
  • Eye Tracking for Image Retrieval
  • Natural/ Novel User Interfaces for Mobile Devices

19
WP5 e-Team Scientific Talks
  • WP5 e-team 10 scientific talk "Stream weight
    computation for Audio-Visual Speech Recognition",
    by Eduardo Sanchez-Soto, IRIT  (duration 15)
  • WP5 e-team 11 scientific talk "Dialogue Detecion
    in Movies", by D. Ververidis, AUTH  (duration
    15')
  • WP5 e-team 12 scientific talk  "Augmented
    reality visualization Construncting the mobile
    user interface", by Sanni Siltanen, VTT (duration
    15')

20
WP5 Scientific Talks (FRIDAY)
  • WP 5 scientific talk "Multimodal Fusion
    Application to AV-ASR and AV Speech Inversion",
    by George Papandreou, NTUA
    (duration 15')
  • WP 5 scientific talk  "A Natural Language
    Interface for a Video Database Management
    System", by Ugur Gudukbay,
    Bilkent U. (duration 15')
  • WP 5 scientific talk  " Modality selection in
    Multimodal Dialogue Systems", by Alex Potamianos,
    TUC (duration 15')
Write a Comment
User Comments (0)
About PowerShow.com