Hierarchical Hidden Markov Models - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

Hierarchical Hidden Markov Models

Description:

Common Framework for complex recognition and planning under uncertainty ... the expressions (neutral, anger, disgust, joy, sadness, fear & surprise) Low level (66) ... – PowerPoint PPT presentation

Number of Views:556
Avg rating:1.0/5.0
Slides: 55
Provided by: FX06
Category:

less

Transcript and Presenter's Notes

Title: Hierarchical Hidden Markov Models


1
Université Catholique de Louvain Faculté des
Sciences Appliquées
Laboratoire de Télécommunications et
Télédétection (TELE)
  • Hierarchical Hidden Markov Models
  • Common Framework for complex recognition and
    planning under uncertainty
  • Facial Animation (F-X Fanard)
  • Emotion Recognition (Olivier Martin)
  • Gesture Recognition (Kosta Gaitanis)

2
Outline
  • HMM DBN what is what why ?
  • HHMM A common framework
  • Applications
  • Kosta Gesture Recognition
  • F-X Facial Animation
  • Olivier Emotion Recognition

3
What are Bayesian Networks ?
  • A Bayesian Network is a graph
  • A set of nodes stochastic variables
  • A set of directed links causal relationships
  • The graph has no cycles (DAG)
  • Each node as a conditional probability table that
    quantifies the effect that the parents have on
    the node (causality)
  • P(O1, , ON) ?i P(OiParents(Oi))

4
An exemple
5
Bayesian Network Classifiers
  • U O1, , ON, S
  • Oi are the observation variables.
  • S is the state variable
  • Goal Infer P(S O1, , ON) using Bayes rule

6
What is a Dynamic Bayesian Network?
  • For each time slice, a bayesian network

St1
St
O3
O1
O3
O1
O4
O2
O2
O4
O5
O5
O6
O6
7
Hidden Markov Model
  • The simplest Dynamic Bayesian Network
  • Defined by ? (?,B,?)
  • ? ajk state transition probabilities ?
    P(StkSt-1j)
  • B bj(k) Probability of observations ?
    P(OtvkStj)
  • ? pj Initial state distribution ? P(S1j)

St1
St-1
N possible states
St
Ot1
Ot
Ot-1
M possible values
8
HMM 3 problems
  • The evaluation problem (analysis
    forward/backward)
  • Given an HMM ? and a sequence of observations O,
    what is the probability that the observations are
    generated by the model P(O ?) ?
  • The decoding problem (analysis - Viterbi)
  • Given an HMM ? and a sequence of observations O,
    what is the most likely state sequence in the
    model that produced the observations ?
  • The learning problem (synthesis Baum-Welch)
  • Given an HMM ? and a sequence of observations O,
    how should we ajust the model parameters ? in
    order to maximize P(O ?) ?

9
An exemple
HMM P(SO1, , O6) BN P(SO1, O2, O5
).P(SO4 ).P(SO3, O6 )
  • If every observation variable may take
  • 10 different values
  • HMM 106 values
  • BN 1110 values
  • Naive BN 60 values

10
Why Dynamic Bayesian Networks ?
  • Handle temporal aspects (like HMM)
  • Handle dependancies within variables allowing for
    efficient inference mechanisms
  • Trade-off between precision and computational
    time for real-time applications hybrid
    inference methods (Rao-Blackwellised Particle
    Filters)

11
Why Dynamic Bayesian Networks ?
  • Handle Multimodal Fusion Fission, simply by
    adding/substracting edges between vertices.
  • Intuitive and Comprehensive representation
    (unlike NN, SVM,)
  • Take all scenarios into account (unlike
    rule-based)
  • No need for much learninglearning is used to
    refine a priori information about the model
    (unlike NN, SVM, etc)

12
Smoothed learning
  • Too few samples for unbiased learning ?
  • ? Compute a posteriori probability by machine
    learning, taking a priori information into
    account intelligently (learning coefficient µ)

13
References
  • Dynamic Bayesian Networks
  • Friedman Bayesian Network Classifiers
  • http//www.cs.uu.nl/docs/vakken/pn/bayesclass.pdf
  • Hidden Markov Models
  • Rabiner A tutorial on Hidden Markov Models and
    selected applications in Speech Recognition
  • www.ai.mit.edu/courses/6.867-f02/papers/rabiner.p
    df

14
HHMM A common framework
  • Kosta Gaitanis, UCL
  • Louvain-la-Neuve, Belgium

15
Outline
  • HHMM extending the classical HMM
  • Inference
  • Brief presentation of classical methods
  • Linear time Inference in the HHMM
  • Multiple Actors extending the HHMM

16
From the HMM to the Hierarchical HMM
  • A HMM generates symbols
  • Some problems have inherently a hierarchical
    structure
  • We want to exploit and model this hierarchical
    structure
  • Solution A HMM that generates another HMM !
  • Advantage the observations are correlated at
    higher levels ? longer periods of time

17
DBN representation of a HHMM
State at level k
Termination node
Observation (noisy)
2 types of inference - Bottom up
(recognition) - Top down (planning)
Simple HMM
18
Example moving in an airport
Same model can be used for planning AND for
recognition !
19
Inference in the HHMM
  • Exact Inference (Pearl, JTree, loopy, )
  • Exponential complexity wrt number of arcs
  • Approximate Inference
  • Distribution approximations
  • ? non-gaussian
  • Sampling methods (FP, MCMC, )
  • ? Poor precision with large number of nodes
  • Hybrid Inference (RBPF, )
  • Good tradeoff betwin complexity and precision
  • Intelligent sampling can reduce network
    connectivity ? exact inference can be possible
    for the rest of the network

20
Linear Time Inference in the HHMM
The original HHMM
The HHMM after sampling the termination nodes
  • Rao-Blackwellised Particle Filter
  • Particle Filter samples the termination nodes and
    the horizontal transitions
  • Exact Inference calculates the belief states of
    the other variables using Bayes Rule ? Linear
    complexity (tree structure)

21
Taking into account multiple actors
  • An actor can be
  • A point (edge of mouth, edge of body, )
  • A part of the body (mouth, hand, leg, )
  • A group of parts of the body (legs, hands,
    upper/lower face)
  • A person
  • An object
  • In general, anything that can be observed is
    considered as an actor
  • Conditional dependance between actors at
    different levels of the hierarchy
  • If each actor needs a separate HHMM ?
    Exponential Complexity, oups

22
Extending the HHMM for multiple actors
  • Idea The dependance between actors is modeled
    only at one level.
  • Complexity stays linear because the substructures
    are independant (tree structure)
  • Coordination node models structures such as AND
    / OR

23
Applications
  • Any problem that has
  • Noisy data
  • Correlated observations
  • Natural hierarchical decomposition
  • And needs
  • Dynamic Recognition (Bottom-up) or Planning
    (Top-down)
  • Flexible modeling of actions
  • Learning

24
Gesture Recognition using the HHMM
  • Kosta Gaitanis, UCL
  • Louvain-la-Neuve, Belgium

25
Applications Gesture Recognition
  • Goal What is this man doing ?
  • Walking, jumping, sitting, taking an object,
  • Data Acquisition Natural Gesture (Alterface)
  • Positions and speed of 5 crucial body points
    (head, hands, feet)

26
Hierarchical Decomposition
  • Natural Hierarchical decomposition of the body
  • The variables are independant at lower levels but
    dependant at higher levels
  • Only one object at the top of the hierarchy

27
Modelling an action
28
Modelling Actions in a HHMM
29
Learning
  • Higher levels can be modelled easily using a
    priori knowledge.
  • Lower levels are more complex
  • Production States
  • From verbally stated actions to point movements
  • Observation error model
  • Create a model for the errors made during data
    acquisition
  • Inversions (left/right hand, head/hand, leg/hand,
    )
  • Self-Occlusions

30
Observation Error Model
  • Self Occlusion
  • Bayesian Networks can still infer with
    missing-data
  • Naive missing data inference
  • Inference with estimated data

31
Application Specific Model
32
MPEG-4 Facial Animation
  • Fanard François-Xavier, UCL
  • Louvain-la-Neuve, Belgium

33
Facial Definition Parameters (FDPs)
  • Set of 84 feature points placed on the face
  • Divided in 10 subgroups
  • Help to
  • customize an animation according to personnal
    characteristics
  • reproduce the ossatures topography of a
    particular face on which a specific texture may
    be applied (the skin, the eyes, the beard,...)
  • Send once each animation session

34
Facial Animation Parameter Units (FAPUs)
  • Independance between emotions/animations and
    facial models
  • Fractions of distance between face key
    characteristics (iris/iris distance,)
  • Computed on a neutral face

35
Facial Animation Parameters (FAPs)
  • Correspond to facial muscles actions
  • Reproduce basic facial actions (expressions,
    emotions articulation)
  • 68 FAPs divided in 2 levels
  • High level (2)
  • the visemes mouth movements during elocution
    (predefined)
  • the expressions (neutral, anger, disgust, joy,
    sadness, fear surprise)

Neutral
Joy
Surprise
Anger
36
Facial Animation Parameter (FAP)
  • Corresponds to facial muscles actions
  • Reproduce basic facial actions (expressions,
    emotions articulation)
  • 68 FAPs divided in 2 levels
  • High level (2)
  • Low level (66)
  • raise_b_midlip, stretch_l_cornerlip,
    raise_b_lip_lm,

Surprise
open_jaw raise_b_midlip stretch_l_cornerlip stretc
h_r_cornerlip raise_b_lip_lm raise_b_lip_rm close_
t_l_eyelid close_t_r_eyelid close_b_l_eyelid close
_b_r_eyelid
raise_l_i_eyebrow raise_r_i_eyebrow raise_l_m_eye
brow raise_r_m_eyebrow raise_l_o_eyebrow raise_r_o
_eyebrow squeeze_l_eyebrow squeeze_r_eyebrow stret
ch_l_cornerlip_o stretch_r_cornerlip_o
37
Facial Animation Parameter (FAP)
  • Corresponds to facial muscles actions
  • Reproduce basic facial actions (expressions,
    emotions articulation)
  • 68 FAPs divided in 2 levels
  • High level (2)
  • the visemes mouth movements during elocution
    (21 predefined)
  • the expressions (neutral, anger, disgust, joy,
    sadness, fear surprise)
  • Low level (66)
  • raise_b_midlip, stretch_l_cornerlip,
    raise_b_lip_lm,
  • Applicables on most of the FDPs
  • Transmitted continuously during model animation
  • Mesh deformations

38
Requirements for facial animation
  • The animation requires a large set of different
    emotions
  • Noise introduction
  • The animation needs to be real
  • Handling of temporal relations
  • The possibility of structuration of the FDPs
    subgroups
  • Introduction of hierarchy (and/or)
  • Creating a facial expression is a complex task
  • Model learning

Hierarchical Hidden Markov Model (HHMM)
39
HHMM for facial animation
EXPRESSION (t1)
EXPRESSION (t)
Level 4

Level 3

Level 2

Level 1
3.2
3.4
3.6
Level 0
Observations
40
HHMM for facial animation
EXPRESSION (t1)
EXPRESSION (t)
Level 4


speech
Level 3

Level 2

Level 1
3.2
3.4
3.6
Level 0
Observations
41
Towards Multimodal Emotion Recognition using
Bayesian Networks
  • Olivier Martin, UCL
  • Louvain-la-Neuve, Belgium

42
Goal
  • Recognize the users emotional state using a
    combinaison of facial, vocal and gestual
    information.
  • (Use the recognised emotional state to understand
    users interactions in interactive applications)

43
Outline
  • Facial, Vocal Gestual modalities for emotion
    recognition
  • Multimodal Fusion Fission
  • Collaborations inside SIMILAR

44
System Overview Facial Layer(Feature
Extraction by the team of Alice Caplier,
LIS-INPG, Grenoble, France)
Lips State
Eye state
Eyebrow state
Bayesian network
Bayesian network
Bayesian network
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
45
System Overview Vocal Layer(Feature Extraction
by the team of Thierry Dutoit at Multitel, Mons,
Belgium)
Energy
µ
s²
Min Max
Speaking Rate
µ
s²
Min Max
Vocal Information
Bayesian Network
Pitch
µ
s²
Min Max
Examples
Angry ?
Noise
µ
s²
Min Max
Stressed ?
46
System Overview Gestual Layer(Real-time
tracking of gestual features provided by
Alterface, Belgium)
Gestual Information
Bayesian Network
Examples
Position speed
47
Multimodal Fusion A Second Stage Module
Lips States
N states N variables !
Left Eyebrow States
Right Eyebrow States
Emotionnal Belief Vector
Bayesian Network
Left Eye States
Right Eye States
Vocal Info
Gestual Info
48
Examples of Multimodal Fission in Multimodal
Emotion Recognition
  • Speech detection to decrease mouth influence when
    the user is speaking and turn on the vocal
    modality influence.
  • Hands tracking to detect occlusions, turn off
    occluded feature influence (and find semantic
    meaning of the occlusion).

49
Multimodal fission an example
  • P(Ck Ck-1, Ak ) where Ak F1,k ,
    F2,k,G1,k, V1,k
  • Observations
  • Fi mouth corners
  • G1 a gestual feature
  • V1 a vocal feature

User has his hand in front of right mouth corner
User is not speaking
User is speaking
Ck1
Ck-1
Ck
V1
F1
F1
F1
V1
V1
F2
G1
F2
F2
G1
G1
50
Using 5 distances
51
Learning
  • Learning using Anthonys way to simulate disgust
  • P(Di disgust) and P(Di neutral)

52
Results
  • 0 disgust recognition for Bert
  • Bert is not a very good actor !and most of the
    people claiming
  • doing emotion recognition are NOT doing it

53
Results (2)
  •  96  disgust recognition for Alex
  • When asked to show disgust, Alex activates the
    same muscles than Anthony
  • AND
  • ? My system works

54
Collaborations inside SIMILAR
  • Thanks to
  • LIS-INPG Grenoble, France
  • Facial features extraction
  • Multitel, Mons, Belgium
  • Vocal features extraction
  • Alterface, Louvain-la-Neuve, Belgium
  • Gestual features extraction
Write a Comment
User Comments (0)
About PowerShow.com