Universit - PowerPoint PPT Presentation

1 / 73
About This Presentation
Title:

Universit

Description:

Intonation and paralinguistic. Facial expression, gaze, gesture, body ... There exists an isomorphism between patterns of speech, intonation and facial actions ... – PowerPoint PPT presentation

Number of Views:85
Avg rating:3.0/5.0
Slides: 74
Provided by: Mart612
Category:

less

Transcript and Presenter's Notes

Title: Universit


1
Multimodal Expressive Embodied Conversational
Agents
Catherine Pelachaud
Elisabetta Bevacqua Nicolas Ech Chafai,
FT Maurizio Mancini Magalie Ochs, FT Christopher
Peters Radek Niewiadomski
  • Université Paris 8

2
ECAs Capabilities
  • Anthropomorphic autonome figures
  • New form on human-machine interaction
  • Study of human communication, human-human
    interaction
  • ECAs ought to be endowed with dialogic and
    expressive capabilities
  • Perception an ECA must be able to pay attention
    to, perceive user and the context she is placed
    in.

3
ECAs capabilities
  • Interaction
  • speaker and addressee emits signals
  • speaker perceives feedback from addressee
  • speaker may decide to adapt to addressees
    feedback
  • consider social context
  • Generation expressive synchronized visual and
    acoustic behaviors.
  • produce expressive behaviours
  • words, voice, intonation,
  • gaze, facial expression, gesture
  • body movements, body posture

4
Synchrony tool - BEAT
  • Cassell et al, Media Lab MIT
  • Decomposition of text into theme and rheme
  • Linked to WordNet
  • Computation of
  • intonation
  • gaze
  • gesture

5
Virtual Training Environments MRE(J. Gratch, L.
Jonhson, S. Marsella, USC)
6
Interactive System
  • Real state agent
  • Gesture synchronized with speech and intonation
  • Small talk
  • Dialog partner

7
MAX, S. Kopp, U of Bielefeld
Gesture understanding and imitation
8
Gilbert and George at the Bank (Upenn, 1994)
9
(No Transcript)
10
Greta
11
Problem to Be Solved
  • Human communication is endowed with three devices
    to express communicative intention
  • Verbs and formulas
  • Intonation and paralinguistic
  • Facial expression, gaze, gesture, body movement,
    posture
  • Problem For any communicative act, the Speaker
    has to decide
  • Which nonverbal behaviors to show
  • How to execute them

12
Verbal and Nonverbal Communication
  • Suppose I want to advise a friend to put on her
    coat because it is snowing.
  • Which signals do I use?
  • Verbal signal use of a syntactically complex
    sentence
  • Take your umbrella because it is raining
  • Verbal nonverbal signals
  • Take your umbrella point out to the window to
    show the rain by a gesture or by gaze

13
Multimodal Signals
  • The whole body communicates by using
  • Verbal acts (words and sentences)
  • Prosody, intonation (nonverbal vocal signals)
  • Gesture (hand and arm movements)
  • Facial action (smile, frown)
  • Gaze (eyes and head movements)
  • Body orientation and posture (trunk and leg
    movements)
  • All these systems of signals have to cooperate in
    expressing overall meaning of communicative act.

14
Multimodal Signals
  • Accompany flow of speech
  • Synchronized at the verbal level
  • Punctuate accented phonemic segments and pauses
  • Substitute for word(s)
  • Emphasize what is being said
  • Regulate the exchange of speaking turn

15
Synchronization
  • There exists an isomorphism between patterns of
    speech, intonation and facial actions
  • Different levels of synchrony
  • Phoneme level (blink)
  • Word level (eyebrow)
  • Phrase level (hand gesture)
  • Interactional synchrony Synchrony between
    speaker and addressee

16
Taxonomy of Communicative Functions (I. Poggi)
  • The speaker may provide three broad types of
    information about
  • Information about the world deictic, iconic
    (adjectival),
  • Information about the speakers mind
  • belief (certainty, adjectival)
  • goal (performative, rheme/theme, turn-system,
    belief relation)
  • emotion
  • meta-cognitive
  • Information about speakers identity (sex,
    culture, age)

17
Multimodal Signals (Isabella Poggi)
  • Characterization of multimodal signals by their
    placement with respect to linguistic utterance
    and significance in transmitting information. Eg
  • Raised eyebrow may signal surprise, emphasis,
    question mark, suggestion
  • Smile may express happiness, be a polite
    greeting, be a backchannel signal
  • Need two information to characterize multimodal
    signals
  • Their meaning
  • Their visual action

18
Lexicon(meaning, signal)
  • Expression meaning
  • deictic this, that, here, there
  • adjectival small, difficult
  • certainty certain, uncertain
  • performative greet, request
  • topic comment emphasis
  • Belief relation contrast,
  • turn allocation take/give turn
  • affective anger, fear, happy-for, sorry-for,
    envy, relief, .
  • Expression signal
  • Deictic gaze direction
  • Certainty Certain palm up open hand Uncertain
    raised eyebrow
  • adjectival small eye aperture
  • Belief relation Contrast raised eyebrow
  • Performative Suggest small raised eyebrow, head
    aside Assert horizontal ring
  • Emotion Sorry-for head aside, inner eyebrow up
    Joy raising fist up
  • Emphasis raised eyebrows, head nod, beat

19
Representation Language
  • Affective Presentation Markup Language APML
  • describes the communicative functions
  • works at meaning level and not the signal level
  • ltAPMLgt
  • ltturn-allocation type"take turn"gt ltperformative
    type"greet"gt
  • Good Morning, Angela. lt/performativegt
  • ltaffective type"happy"gt It is so
  • lttopic-comment type"comment"gt wonderful
    lt/topic-commentgt
  • to see you again. lt/affectivegt ltcertainty
    type"certain"gt I was
  • lttopic-comment type"comment"gt sure
    lt/topic-commentgt
  • we would do so, one day! lt/certaintygt
  • lt/turn-allocationgt lt/APMLgt.

20
Facial Description Language
  • Facial expressions defined as (meaning, signal)
    pairs stored in library
  • Hierarchical set of classes
  • Facial basis FB class basic facial movement
  • An FB may be represented as a set of MPEG-4
    compliant FAPs or recursively, as a combination
    of other FBs using the ' operators
  • FBfap3v1,,fap69vk
  • FB'c1FB1c2FB2
  • where c1 and c2 are constants and FB1 and FB2 can
    be
  • Previous defined FBs
  • FB of the form fap3v1,,fap69vk

21
Facial basis class
  • Facial basis class
  • Examples of facial basis class
  • Eyebrow small_frown, left_raise, right_raise
  • Eyelid upper_lid_raise
  • Mouth left_corner_stretch, left_corner_raise



22
Facial Displays
  • Every facial display (FD) is made up of one or
    more FBs
  • FDFB1 FB2 FB3 FBn
  • surpriseraise_eyebrowraise_lidopen_mouth
  • worried(surprise0.7)sadness

23
Facial Displays
  • Probabilistic mapping between the tags and
    signals
  • Es happy_for (smile0.5, 0.3) (smile0.25)
    (smile2 raised_eyebrow, 0.35) (nothing, 0.1)
  • Definition of a function class for addressee
    association (meaning, signal)
  • Class communicative function
  • Certainty
  • Adjectival
  • Performative
  • Affective

24
Facial Temporal Course
25
Gestural Lexicon
  • Certainty
  • Certain palm up open hand
  • Uncertain showing empty hands while lowering
    forearms
  • Belief-relation
  • List of items of same class numbering on fingers
  • Temporal relation fist with extended hand moves
    back and forth behind ones shoulder
  • Turn-taking
  • Hold the floor raise hand, palm toward hearer
  • Performative
  • Assert horizontal ring
  • Reproach extended index, palm to left, rotating
    up down on wrist
  • Emphasis beat

26
Gesture Specification Language
  • Scripting language for hand-arm gestures, based
    on formational parameters Stokoe
  • Hand shape specified using HamNoSys Prillwitz
    et. al.
  • Arm position concentric squares in front of
    agent McNeill
  • Wrist orientation palm and finger base
    orientation
  • Gestures are defined by a sequence of timed key
    poses gesture frame
  • Gestures are broken down temporally into distinct
    (optional) phases
  • Gesture phase preparation, stroke, hold,
    retraction
  • Change of formational components over time

27
Gesture specification example Certain
28
Gesture Temporal Course
stroke start stroke end
rest position
preparation
retraction
rest position
29
ECA architecture
30
ECA Architecture
  • Input to the system APML annotated text
  • Output to the system Animation files and WAV
    file for the audio
  • System
  • Interprets APML tagged dialogs, i.e. all
    communicative functions
  • Looks in a library the mapping between the
    meaning (specified by the XML-tag) and signals
  • Decides which signals to convey on which
    modalities
  • Synchronizes the signals with speech at different
    levels (word, phoneme or utterance)

31
Behavioral Engine
32
Modules
  • APML Parser XML parser
  • TTS Festival manages the speech synthesis and
    give us the list of phonemes and phonemes
    duration.
  • Expr2Signal Converter given a communicative
    function and its meaning, this module returns the
    list of facial signals
  • Conflicts Resolver resolves the conflicts that
    may happened when more than one facial signals
    should be activated on same facial parts
  • Face Generator converts the facial signals into
    MPEG-4 FAP values
  • Viseme Generator converts each phoneme, given by
    Festival, into a set of FAPs
  • MPEG4 FAP Decoder is an MPEG-4 compliant Facial
    Animation Engine

33
TTS Festival
  • Drive the synchronization of facial expression
  • Synchronization implemented at word level
  • Timing of facial expression connected to the text
    embedded between the markers
  • Use of the tree structure of Festival to compute
    expressions duration

34
Expr2Signal Converter
  • Instantiation of APML tags meaning of a given
    communicative function
  • Converts markers into facial signals
  • Use of a library containing the lexicon of the
    type (meaning, facial expressions)

35
Gaze Model
  • Based on communicative functions model of
    Isabella Poggi
  • This model predicts what should be the value of
    gaze in order to have a given meaning in a given
    conversational context.
  • For example
  • agent wants to emphasize a given word, the model
    will output that the agent should gaze at her
    conversant.

36
Gaze Model
  • Very deterministic behavior model at every
    Communicative Function associated with a meaning
    correspond the same signal (with probabilistic
    changes)
  • Event-driven model only when a Communicative
    Function is specified the associated signals are
    computed
  • only when a Communicative Function is
    specified, the corresponding behavior may vary

37
Gaze Model
  • Several drawbacks as there is no temporal
    consideration
  • No consideration of past and current gaze
    behavior to compute the new one
  • No consideration of how long the current gaze
    state of S and L has lasted

38
Gaze Algorithm
  • Two steps
  • Communicative prediction
  • Apply the communicative function model to compute
    the gaze behavior as to convey a given meaning
    for S and L
  • Statistical prediction
  • The communicative gaze model is probabilistically
    modified by a statistical model defined with
    constraints
  • what is the communicative gaze behavior of S and
    L
  • in which gaze behavior S and L were
  • the duration of the current state of S and L

39
Temporal Gaze Parameters
  • The gaze behaviors depend on the communicative
    functions, general purpose of the conversation
    (persuasion discours, teaching...), personality,
    cultural root, social relations...
  • Very, too, complex model
  • propose parameters that control the gaze behavior
    overall
  • TS1,L1max maximum duration the mutual gaze
    state may remain active.
  • TS1max maximum duration of gaze state S1.
  • TL1max maximum duration of gaze state L1 .
  • TS0max maximum duration of gaze state S0.
  • TL0max maximum duration of gaze state L0.

40
Mutual Gaze
41
Gaze Aversion
42
Gesture Planner
  • Adaptive instantiation
  • Preparation and retraction phase adjustments
  • Transition key and rest gesture insertion
  • Joint-chain follow-through
  • Forward time shifting of children joints in time
  • Stroke of gesture on stressed word
  • Stroke expansion
  • During planning phase, identify rheme clauses
    with closely repeated emphases/pitch accents
  • Indicate secondary accents by repeating the
    stroke of the primary gesture with decreasing
    amplitude

43
Gesture Planner
  • Determination of gesture
  • Look in dictionary
  • Selection of gesture
  • Gestures associated with most embedded tags have
    priority (except beat) adjectival, deictic
  • Duration of gesture
  • Coarticulation between successive gestures closed
    in time
  • Hold for gestures belonging to higher up tag
    hierarchy (e.g. performative, belief-relation)
  • Otherwise go to rest position

44
Behavior Expressivity
  • Behavior is related to the (Wallbott, 1998)
  • quality of the mental state (e.g. emotion) it
    refers to
  • quantity (somehow linked to the intensity factor
    of the mental state)
  • Behaviors encode
  • content information (the What is communicating)
  • expressive information (the How it is
    communicating)
  • Behavior expressivity refers to the manner of
    execution of the behavior

45
Expressivity Dimensions
  • Spatial amplitude of movement
  • Temporal duration of movement
  • Power dynamic property of movement
  • Fluidity smoothness and continuity of movement
  • Repetitiveness tendency to rhythmic repeats
  • Overall Activation quantity of movement across
    modalities

46
Overall Activitation
  • Threshold filter on atomic behaviors during
    APML tag matching
  • Determines the number of nonverbal signals to
    be executed.

47
Spatial Parameter
  • Amplitude of movement controlled through
    asymmetric scaling of the reach
  • space that is used to find IK goal positions
  • Expand or condense the entire space in front of
    agent

48
Temporal parameter
  • Determine the speed of the arm movement of a
    gesture's
  • meaning-carrying stroke phase
  • Modify speed of stroke

Stroke shift / velocity control of a beat gesture
Y position of wrist w.r.t. shoulder cm
Frame
49
Fluidity
  • Continuity control of TCB interpolation
    splines and gesture-to-gesture
  • Continuity of arms trajectory paths
  • Control the velocity profiles of an action

coarticulation
X position of wrist w.r.t. shoulder cm
Frame
50
Power
  • Tension and Bias control of TCB splines
  • Overshoot reduction
  • Acceleration and deceleration of limbs

Hand shape control for gestures that do not need
hand configuration to convey their meaning
(beats).
51
Repetitivity
  • Technique of stroke expansion Consecutive
    emphases are realized gesturally by repeating the
    stroke of the first gesture.

52
Multiple Modality Ex Abrupt
Overall Activity 0.6 Spatial 0 Temporal
1 Fluidity -1 Power 1 Repetition -1
53
Multiple Modality Ex Vigorous
Overall Activity 1 Spatial 1 Temporal
1 Fluidity 1 Power 0 Repetition 1
54
Evaluation of Expressive Gesture
  • (H1) The chosen implementation for mapping single
    dimensions of expressivity onto animation
    parameters is appropriate - a change in a single
    dimension can be recognized and correctly
    attributed by users.
  • (H2) Combining parameters in such a way that they
    reflect a given communicative intent will result
    in more believable overall impression of the
    agent.
  • 106 subjects from 17 to 26 years old

55
Perceptual Test Studies
  • Evaluation of the adequacy of the implementation
    of each parameter
  • check whether subjects could perceive and
    distinguish the six different expressivity
    parameters and indicate their direction of
    change.
  • Result good recognition for spatial and temporal
    parameters lower recognition for fluidity and
    power parameters as they are inter-dependent.
  • Evaluation task does setting appropriate values
    for the expressivity parameters create behaviors
    that are judged as exhibiting corresponding
    expressivity?
  • 3 different types of behaviors abrupt, sluggish,
    vigorous
  • users prefer the coherent performance for
    vigorous and abrupt

56
Interaction
  • Interaction two or more parties exchange
    messages.
  • Interaction is by no means a one way
    communication channel between parties.
  • Within an interaction, parties take turns in
    playing the roles of the speaker and of the
    addressee.

57
Interaction
  • Speaker and addressee adapt their behaviors to
    each other
  • Speaker monitors addressees attention and
    interest in what he has to say
  • addressee selects feedback behaviors to show the
    speaker that he is paying attention

58
Interaction
  • Speaker
  • Pointless for a speaker to engage in an act of
    communication if addressee does not pay or intend
    to pay attention
  • Important for speaker to assess addressees
    engagement at
  • when starting an interaction assess the
    possibility of engagement in interaction
    (establish phase)
  • when interaction is going on check if engagement
    is lasting and sustaining conversation (maintain
    phase)

59
Interaction
  • addressee
  • attention pay attention to the signals produced
    by speaker to perceive, process and memorize them
  • perception of signals
  • comprehension understand meaning attached to
    signals
  • internal reaction the comprehension of the
    meaning may create cognitive and emotional
    reaction
  • decision communication or not of the internal
    reaction
  • generation display behaviors

60
Backchannel
  • Types of backchannels (I. Poggi)
  • attention
  • comprehension
  • belief
  • interest
  • agreement
  • positive/negative
  • any combination of the above pay attention but
    not understand understand but non believe, etc.

61
Backchannel
  • Depending on the type of speech act they respond
    to, a signal will be interpreted as a backchannel
    or not.
  • backchannel a signal of agreement / disagreement
    that follows the expression of opinions,
    evaluations, planning
  • not a backchannel a signal of comprehension /
    incomprehension after an explicit question  Did
    you understand? 

62
Backchannel
  • Polysemy of backchannel signals
  • a signal may provide different types of
    information
  • a frown negative feedback for understanding,
    believing and agreeing

63
Backchannel signals of gaze
  • gaze
  • show direction of attention
  • inform on level of engagement or on intention to
    maintain engagement
  • indicate degree of intimacy
  • but also
  • monitor the gaze behavior of others to establish
    their intention to engage or maintain engaged
  • shared attention situation involved mutual gaze
    at each other partner or mutual gaze at a same
    object

64
Backchannel modelling
  • Reactive model
  • generates an instinctive feedback without
    reasoning
  • simple backchannel or mimicry
  • spontaneous - sincere
  • Cognitive model
  • conscious decision to provide backchannel to
    provoke a particular effect on the speaker or to
    reach a specific goal
  • deliberate possibly pretended
  • it can be shifted to automatic (ex. when
    listening to a bore)

65
Backchannel Demo
66
A reactive backchannel
  • Currently, our model is reactive in nature
  • Dependent on perception
  • Speaker interprets addressees behavior
  • Speaker generates or alters its own behavior
  • Our focus interest and attention on a signal
    level (not on a cognitive level)

67
Organization of the communication Attraction of
attention
  • Communicative agents the agents provide
    information to the user, and should guarantee the
    user pay attention
  • Animation expressivity principle of staging,
    so that a single idea is clearly expressed at
    each instant of time
  • Animation specificity animators creativity, no
    realistic constraints for animators

What types of gesture properties could guarantee
users attention?
France Telecom
68
Organization of the communication Attraction of
attention
  • Corpus videos from traditional animation that
    illustrate different types of conversational
    interaction
  • the modulations of gesture expressivity over time
    play a role in managing communication, thus
    serving as a pragmatic tool

France Telecom
69
Emotion
  • elicited by the evaluation of events, objects,
    actions
  • integration of emotions in a dialog system
    (Artimis, FT)
  • identify under which circumstances a dialog agent
    should express emotions

France Telecom
70
Emotion
  • BDI representation
  • based on OCC model Appraisal variables Ortony
    et al. 1988
  • Desirability/Undesirability Achievement or
    threaten of the agent's choice
  • Degree of realization Degree of certainty of
    the choice's achievement
  • Probability of an event Probability of
    feasibility of an event
  • Agency The agent who is actor of the event

France Telecom
71
Emotion
  • complex emotions
  • superposition of 2 emotions evaluation of an
    event can happen under different angles
  • mask an emotion by another one consideration of
    social context
  • joy deception
    masking

72
VideoMasking of Deception by Joy
73
Conclusion
  • Creation of a virtual agent able to
  • communicate nonverbally
  • show emotions
  • use expressive gestures
  • perceive and be attentive
  • maintain the attention
  • Two studies on expressivity
  • from manual annotation of video corpus
  • from mimicry of movement analysis
Write a Comment
User Comments (0)
About PowerShow.com