VACE Multimodal Meeting Corpus Lei Chen, Travis Rose, Fey Perill, Xu Han, Jilin Tu, Zhongquian Huang, Mary Harper, Francis Quek, David McNeill, Ronald Tuttle, and Thomas Huang

About This Presentation

Title:

VACE Multimodal Meeting Corpus Lei Chen, Travis Rose, Fey Perill, Xu Han, Jilin Tu, Zhongquian Huang, Mary Harper, Francis Quek, David McNeill, Ronald Tuttle, and Thomas Huang

Description:

VACE Multimodal Meeting Corpus Lei Chen, Travis Rose, Fey Perill, Xu Han, Jilin Tu, Zhongquian Huang – PowerPoint PPT presentation

Number of Views:96

Avg rating:3.0/5.0

Slides: 35

Provided by: franci59

Category:

more less

Transcript and Presenter's Notes

Title: VACE Multimodal Meeting Corpus Lei Chen, Travis Rose, Fey Perill, Xu Han, Jilin Tu, Zhongquian Huang, Mary Harper, Francis Quek, David McNeill, Ronald Tuttle, and Thomas Huang

1
VACE Multimodal Meeting CorpusLei Chen, Travis
Rose, Fey Perill, Xu Han, Jilin Tu, Zhongquian
Huang, Mary Harper, Francis Quek, David McNeill,
Ronald Tuttle, and Thomas Huang

We acknowledge support from
NSF-STIMULATE program, Grant No. IRI-9618887,
Gesture, Speech, and Gaze in Discourse
Segmentation
NSF- KDI program, Grant No. BCS-9980054,
Cross-Modal Analysis of Signal and Sense
Multimedia Corpora and Tools for Gesture, Speech,
and Gaze Research
NSF-ITR program, Grant No. IIS-0219875, Beyond
The Talking Head and Animated Icon Behaviorally
Situated Avatars for Tutoring
ARDA-VACE II program From Video to Information
Cross-Modal Analysis of Planning Meetings

Francis Quek
Professor of Computer Science
Director, Center for Human Computer Interaction
Virginia Tech

2
Corpus Rationale

A quest for meaning Embodied cognition and
language production drives our research
Analysis of natural human human meetings
Resource in support of research in
Multimodal language analysis
Speech recognition and analysis
Vision-based communicative behavior analysis

3
Why Multimodal Language Analysis?

S1 you know like those fireworks?
S2 well if we're trying to drive'em / out
herltrgte we need to put'em up herltrgte
S1 yeah well what I'm saying is we should
S2 in front
S1 we should do it we should make it a linltngte
through the roomltmgts / so that they explode
like here then here then here then here

4
Multimodal Language Example
5
Embodied Communicative Behavior

Constructed dynamically at the moment of speaking
(thinking for speaking)
Dependent on cultural, personal, social,
cognitive differences
Speaker is often unwitting of gestures
Reveals the contrastive foci of language stream
(Hajcova, Halliday et. al.)
Is co-expressive (co-temporal) with speech
Is multiply determined
Temporal synchrony is critical for analysis

6
In a Nutshell

Gesture/Speech Framework (McNeill 1992, 2000,
2001, Quek et al 1999-2003)

7
ARDA/VACE Program

ARDA is to the intelligence community what DARPA
is to the military
Interest is in the exploitation of video data
(Video Analysis and Content Exploitation)
A key VACE challenge Meeting Analysis
Our key theme Multimodal communication analysis

8
From Video to Information Cross-Modal Analysis
for Planning Meetings
9
Team

Multimodal Meeting Analysis A Cross-Disciplinary
Enterprise

10
Overarching Approach

Coordinated multidisciplinary research
Corpus assembly
Data is transcribed and coded for relevant
speech/language structure
War-gaming (planning) scenarios are captured to
provide real planning behavior in a controlled
experimental context (reducing many unknowns)
Meeting room is multiply instrumented with
cross-calibrated video, synchronized audio/video,
motion tracking
All data components are time-aligned across the
dataset
Multimodal video processing research
Research on posture, head position/orientation,
gesture tracking, hand-shape recognition, and in
multimodal integration
Research in tools for analysis, coding and
interpretation
Speech analysis research in support of
multimodality

11
Scenarios

Each Scenario to have Five Participants
Roles Tailored to Available Participant Expertise
Five Initial Scenarios
Delta II Rocket Launch
Foreign Material Exploitation
Intervention to Support Democratic Movement
Humanitarian Assistance
Scholarship Selection

12
Scenarios (contd)

Planned Scenarios (to be Developed)
Lost Aircraft Crisis Response
Hostage Rescue
Downed Pilot Search Rescue
Bomb Shelter Design

13
Scenario Development

Humanitarian Assistance Walkthrough
Purpose Develop Plan for Immediate Military
Support to Dec 04 Asian Tsunami Victims
Considerable Open Source Information from
Internet for Scenario Development
Roles
Medical Officer
Task Force Commander
Intel Officer
Operations Officer
Weather Officer

Mission Goals Priorities Provided for Each Role
14
Meulaboh, Indonesia
As intelligence officer, your role is to provide
intelligence support to OPERATION UNIFIED
ASSISTANCE. While the extent of damage is still
unknown, early reporting indicates that coastal
areas throughout South Asia have been affected.
Communications have been lost with entire towns.
Currently, the only means of determining the
magnitude of destruction is from overhead assets.
Data from the South Asia and Sri Lanka region
has already been received from civilian remote
sensing satellites. Although the US military
will be operating in the region on a strictly
humanitarian mission, the threat still exists of
hostile action to US personnel by terrorist
factions opposed to the US. As intel officer,
you are responsible for briefing the nature of
the terrorist threat in the region.
Before Tsunami
After Tsunami
15
Corpus Assembly
16
Data Acquisition Processing
Video Processing 10-Camera Calibration, Vector
Extraction, Hand Tracking, Gaze Tracking, Head
Modeling, Head Tracking, Body Tracking
Multi-modal Elicitation Experiment
Motion Capture Interpretation
Time Aligned Multimedia Transcription
Speech Psycholinguistic Coding Speech
Transcription, Psycholinguistic Coding
Speech Audio ProcessingAutomatic Transcript
Word/Syllable Alignment to Audio, Audio Feature
Extraction
17
Meeting Room and Camera Configuration
G
F
H
E
F
H
A
H
G
F
1 C9C3 7 C7C10
2 C1C3 8 C2C5
3 C9C1 9 C2C4
4 C4C8 10 C3C5
5 C4C6 11 C7C9
6 C6C8 12 C8C10
C1 DEF C6 BAH
C2 HGF C7 DCB
C3 FE C8 BA
C4 HA C9 DE
C5 FGH C10 BCD
E
F
D
H
A
B
B
C
D
A
B
E
D
D
C
B
18
Cam1
19
Global Pairwise Camera Calibration

48 Calibration Dots for Calibration
18 Vicon Markers for Coordinate System
Transformation
YRXT

20
Error Distributions in Meeting Room Area
(Camera pair 512)
X Direction maximum 0.5886mm minimum 0.4mm
mean 0.4755mm
Error Distribution in X Direction
Error Distribution in Y Direction
Y Direction maximum 0.6925mm minimum 0.3077mm
mean 0.4529mm
Z Direction maximum 0.5064 mm minimum 0.3804mm
mean 0.4317mm
Error Distribution in Z Direction
21
VICON Motion Capture

Motion capture technology
Near-IR cameras
Retro-reflective markers
Datastation PC workstation
Vicon modes of operation
Individual points (as seen in calibration)
Kinematic models
Individual objects

22
VICON Motion Capture

Learning about MoCap
11/03 Initial Installation
6/04 Pilot scenario, using kinematic models
10/04 Follow-up training using object models
11/04 Rehearsed using Vicon with object models
1/05 Data captured for FME scenario
Export position information for each
participants head, hand, body position
orientation
Post-processing of motion capture data 1 hour
per minute for a 5-participant meeting
Incorporating MoCap into Workflow
Labeling of point clusters is labor intensive
3 Work Studies _at_ 20 hours/wk 60 minutes (1
dataset) per week

23
Speech Processing Tasks

Formulate an audio work flow to support the
efficient and effective construction of a
large-size high quality multimodal corpus
Implement support tools to achieve the goal
Package time-aligned word transcriptions into
appropriate data formats that can be efficiently
shared and used

24
Audio Processing
Forced Alignment
audio
Audio Recording, Meeting Metadata Annotation
OOV Word Resolution
Audio Segmentation
audio
segmentation
transcription
Corpus Integration
Manual Transcription
25
VACE Metadata Approach
26
Data Collection Status

Pilot June 04
Low Audio Volume. Sound Mixer Purchased
Video Frame Drop-out. Purchased High Grade DV
Tapes
(AFIT 02-07-05 Democratic movement assistance)
(AFIT 02-07-05 Democratic movement assistance,
session 2)
audio clipping in close-in mikes -- may be able
to salvage data using the desktop mics.
AFIT 02-24-05 Humanitarian Assistance (Tsunami)
AFIT 03-04-05 Humanitarian Assistance (Tsunami)
AFIT 03-18-05 Scholarship selection
AFIT 04-08-05 Humanitarian Assistance (Tsunami)
AFIT 04-08-05 Card Game
AFIT 04-25-05 Problem Solving Task, (cause of
deterioration of Lincoln Memorial)
AFIT 06-??-05 Problem Solving Task

27
Some Multimodal Meeting Room Results
28
F2 F1 Lance Armstrong Episode
NIST Microcorpus July 29, 2003 Meeting Dynamics
F1 vs F2
29
Gaze - NIST July 29, 2003 Data
Instrumental gaze
Gaze direction tracks social patterns
(interactive gaze) and engagement of objects
(instrumental gaze), which may be a form of
pointing as well as perception
Interactive gaze occurrences
Interactive gaze - 5 min. sample
Gaze target
Gaze source
Instrumental gaze
30
Gaze - AFIT data
Gazee
Gazer
31
F-formation analysis

An F-formation arises when two or more people
cooperate together to maintain a space between
them to which they all have direct and exclusive
equal access. (A. Kendon 1977).
An F-formation is discovered from tracking gaze
direction in a social group.
It is not only about shared space.
It reveals common ground and has an associated
meaning.
The cooperative property is crucial.
It is useful for detecting units of thematic
content being jointly developed in a conversation.

32
NIST-F-Formation Coding (76.11s92.27s)
33
NIST-F-Formation Coding (92.27s108.97s)
34
Summary

Corpus collection based on sound scientific
foundations
Data includes audio, video, motion-capture,
speech transcription, and manual codings
A suite of tools for visualizing and coding the
cotemporal data has been developed
Research results demonstrate multimodal discourse
segmentation and meeting dynamics analysis

Write a Comment

User Comments (0)

About PowerShow.com

VACE Multimodal Meeting Corpus Lei Chen, Travis Rose, Fey Perill, Xu Han, Jilin Tu, Zhongquian Huang, Mary Harper, Francis Quek, David McNeill, Ronald Tuttle, and Thomas Huang - PowerPoint PPT Presentation

VACE Multimodal Meeting Corpus Lei Chen, Travis Rose, Fey Perill, Xu Han, Jilin Tu, Zhongquian Huang, Mary Harper, Francis Quek, David McNeill, Ronald Tuttle, and Thomas Huang

VACE Multimodal Meeting Corpus Lei Chen, Travis Rose, Fey Perill, Xu Han, Jilin Tu, Zhongquian Huang – PowerPoint PPT presentation