A Multimodal Analysis of Floor Control in Meetings - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

A Multimodal Analysis of Floor Control in Meetings

Description:

An underlying mechanism is employed to control the floor distribution among ... Deictic: gestures are used to point to entities during a communication. ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 35
Provided by: maryh150
Category:

less

Transcript and Presenter's Notes

Title: A Multimodal Analysis of Floor Control in Meetings


1
A Multimodal Analysis of Floor Control in Meetings
  • Lei Chen, Mary Harper, Amy Franklin, R. Travis
    Rose, Irene Kimbara, Zhongqiang Huang, Francis
    Quek

2
Multimodal Study Floor Control
  • An underlying mechanism is employed to control
    the floor distribution among participants
    understanding floor control in dialogs and
    meetings is helpful for discerning their
    structure.
  • In a meeting for which the current floor holder
    is known, it is interesting to predict
  • Whether floor control will change
  • Who the next floor holder will be
  • We investigate multimodal cues for floor control
    in two VACE meetings

3
Importance of Floor Control
  • The floor holder represents a primary thread in
    summarizing meetings, so identification of the
    primary channels (audio and visual) is important
  • Camera focus
  • Special purpose signal processing
  • More natural human-like conversational agents
  • Using human conversational principles related to
    the distribution of floor control
  • Automatic meeting analysis
  • Floor control information can contribute to
    revealing the topic flow and interaction patterns
    that emerge during meetings.

4
Prior Work
  • Conversation Analysis Research Sacks et al.
    (1974) posited that a conversation is built on
    turn constructional units (TCUs), complete units
    with respect to intonation contours, syntax, and
    semantics. A transition relevance place (TRP)
    raises the likelihood that another speaker can
    take over the floor and start speaking. Many cues
    are used by participants to predict the end of
    TCUs (Duncan,1972 Argyle and Cook, 1974).
  • Dialog-based Research prosody (Caspers, 2000
    Wichmann and Caspers, 2001), gaze (Novick et al.,
    1996) dialog acts by Shriberg et al. (2004)
  • Multiparty Meetings Padilha and Carletta (2002,
    2003), Novick (2005), Vertegaal et al. (2001),
  • Meeting Collections ISL audio corpus, ICSI audio
    corpus, the NIST audio-visual corpus, and MM4
    audio-visual corpus

5
VACE2 Meeting Room Data
6
Data and Annotations
  • Two VACE meetings
  • Jan. 07 foreign weapon testing (41.6 minutes, 5
    participants, 9,871 words)
  • March 18 scholarship selection (44.4 minutes, 5
    participants, 7,547 words)
  • Multimodal annotation

7
January 7th Excerpt
8
March 18th Excerpt
9
Word and SU Annotations
  • Words (Purdue and U Chicago)
  • Segment into speech and non-speech chunks (IHM)
  • Transcribe speech chunks using LDC Quick
    Transcription (QTR) guidelines
  • Obtain time alignments given pronunciations for
    all words using ASR
  • SUs (Purdue)
  • Use Ears MDE annotation specification V6.2
  • SU segmentation
  • Type statement, question, backchannel,
    incomplete
  • Used a hidden event LM to automatically provide
    automatic SU hypotheses that were hand corrected
  • Anvil interface allowed us to view time aligned
    transcripts, while consulting audio and video
    cues for annotating the sentences in the
    meetings.

10
Gesture and Gaze Annotations
  • Gesture and gaze coding was done on MacVissta
    under Mac OS X. The display and annotation tool
    supports the simultaneous display of multiple
    MPEG-4 videos (representing different camera
    angles) and enables the annotator to select an
    appropriate view from any of the videos to
    produce more accurate gaze/gesture coding.
  • 10 cameras were used to record the meeting
    participants from different viewing angles,
    supporting the annotation of each participants
    gaze direction and gestures.
  • Annotators had access to time aligned word
    transcriptions and all of the videos when
    producing gaze and gesture annotations.

11
Gaze Annotations
  • Gaze was annotated by researchers in the McNeill
    Lab at U. Chicago
  • Gaze target plus start and end times were marked
  • Based on markup of major saccades (intervals
    between fixations) 3 frames of video
    (insufficient for micro saccades)
  • Segmentation of space into areas and objects,
    which we collapsed into each participant, paper,
    table, whiteboard, neutral space, and other

12
Gesture Annotations
  • Gesture was annotated by researchers in the
    McNeill Lab at U. Chicago
  • Gesture Annotations that were annotated and used
    in our investigations
  • Emblematic gestures e.g., thumb up means
    good in some cultures.
  • Four gesticulation types were annotated and used
    in our investigations
  • Metaphoric e.g., gestures containing smooth,
    continuous motions (such as sweeping, arcing, or
    dragging) for continuous change
  • Iconic e.g., and he bends it way back while
    making an iconic gesture of appearing to grip
    something and pull it back.
  • Deictic gestures are used to point to entities
    during a communication.
  • Beat simple rhythmical hand motions.
  • Note that fidget and instrumental movements are
    excluded.

13
(No Transcript)
14
Floor Annotations
  • Six types of floor annotations (Purdue)
  • Control Who has control of the floor and which
    participants comprise the floor
  • Sidebar Used to represent sub-floors that have
    split off from the main thread of the meeting.
    Again we want to record who has control and which
    participants are involved.
  • Backchannel An SU type involving utterances like
    yeah'' that is spoken when another controls the
    floor.
  • Challenge An attempt to grab the floor.
  • Cooperative An utterance inserted into the
    middle of the floor controller's utterance (like
    a backchannel but with propositional content)
  • Other Other vocalizations, e.g., self talk, that
    do not contribute to any current floor thread.
  • Anvil interface allowed us to view time aligned
    transcripts and SU annotations, while consulting
    audio and video cues for annotating the floor
    events in the meetings.

15
Cooperative Example
16
Challenge Example
17
Questions
  • Audio
  • How frequently do verbal backchannels occur in
    meetings?
  • Are discourse markers (e.g., right, so, well)
    used more frequently in the beginning, middle, or
    end of a control event?
  • Gaze
  • When a holder finishes his/her turn, does he/she
    gaze at the next floor holder more often than at
    other potential targets?
  • When a holder takes control of the floor, does
    he/she gaze at the previous floor holder more
    often than at other potential targets?
  • Do we observe the frequent mutual gaze breaks
    between two adjacent floor holders during floor
    change?
  • Gesture
  • How frequently does the previous floor holder
    make floor yielding gestures such as pointing to
    the next floor holder?
  • How frequently does the next floor holder make
    floor grabbing gestures to gain control of the
    floor?

18
Measurement Study
  • Goals
  • To gain insight into mechanisms governing floor
    control in meetings
  • To identify useful multimodal cues for an
    automatic floor control identification system
  • Measurements
  • Basic meeting statistics
  • Speech events
  • Verbal backchannels
  • Discourse markers (DM)
  • Gaze events
  • Gaze distribution at floor transitions
  • Meeting managers gaze
  • Gesture events

19
(No Transcript)
20
(No Transcript)
21
Floor Transition Types
  • Change there is a clear floor transition between
    two adjacent floor holders with some gap between
    adjacent floors.
  • Overlap there is a clear floor transition
    between two adjacent floor holders, but the next
    holder begins talking before the previous holder
    stops speaking.
  • Stop the previous floor holder clearly gives up
    the floor, and there is no intended next holder
    so the floor is open to all participants.
  • Self-select without being explicitly yielded the
    floor by the previous holder, a participant takes
    control of the floor.

22
Distribution of Floor Transition Types
23
Verbal Backchannels and Nods
24
Discourse Markers
25
Gaze Patterns Current to Next Holder
26
(No Transcript)
27
(No Transcript)
28
Mutual Gaze Break
29
Meeting Manager Role
  • The ostensible meeting manager for each meeting
    is participant E however, participant E in March
    18 meeting does not appear to embrace that role.
  • In the Jan07 meeting, there were 53 cases that E
    is not either the previous or next floor holder
    in floor exchange (only Change and Overlap).
  • In these 53 cases, E gazes at the next floor
    holder 21 times.
  • If we rule out such cases where other
    participants look at the next floor holder, E
    still gazes to the next floor holder 11 times
    (20.75), suggesting that the gaze of the meeting
    manager plays a role in predicting the next floor
    holder.
  • In Mar18 meeting, there are 100 cases that E is
    not a floor holder. In these100 cases, E gazes to
    the next floor holder only 6 times. In fact, E
    tends to gaze largely at his papers or the
    whiteboard.

30
(No Transcript)
31
Gestures for Yielding and Grabbing the Floor
32
(No Transcript)
33
Conclusions
  • Presented a floor control annotation
    specification and conducted an analysis of two
    VACE meetings
  • Identified some multimodal cues that will be
    helpful for predicting floor control events
  • DMs occur frequently at the beginning of a floor
  • The previous holder often gazes at the next floor
    holder and vice versa during floor transitions
  • The mutual gaze break patterns previously
    observed in dialogs are also found in the Jan07
    meeting.
  • An active meeting manager plays a role in floor
    transitions
  • Gestures, especially floor capturing gestures,
    play a role in floor transitions

34
Acknowledgements
  • Discussions with David McNeill and Susan Duncan
    at U Chicago, Liz Shriberg at ICSI/SRI, and
    Felicia Roberts at Purdue University
  • This work was supported by
  • ARDA VACE II
  • DARPA EARS and Gale
Write a Comment
User Comments (0)
About PowerShow.com