Hierarchical Hidden Markov Models - PowerPoint PPT Presentation

1 / 54

About This Presentation

Title:

Hierarchical Hidden Markov Models

Description:

Common Framework for complex recognition and planning under uncertainty ... the expressions (neutral, anger, disgust, joy, sadness, fear & surprise) Low level (66) ... – PowerPoint PPT presentation

Number of Views:556

Avg rating:1.0/5.0

Slides: 55

Provided by: FX06

Category:

more less

Transcript and Presenter's Notes

Title: Hierarchical Hidden Markov Models

1
Université Catholique de Louvain Faculté des
Sciences Appliquées
Laboratoire de Télécommunications et
Télédétection (TELE)

Hierarchical Hidden Markov Models
Common Framework for complex recognition and
planning under uncertainty
Facial Animation (F-X Fanard)
Emotion Recognition (Olivier Martin)
Gesture Recognition (Kosta Gaitanis)

2
Outline

HMM DBN what is what why ?
HHMM A common framework
Applications
Kosta Gesture Recognition
F-X Facial Animation
Olivier Emotion Recognition

3
What are Bayesian Networks ?

A Bayesian Network is a graph
A set of nodes stochastic variables
A set of directed links causal relationships
The graph has no cycles (DAG)
Each node as a conditional probability table that
quantifies the effect that the parents have on
the node (causality)

P(O1, , ON) ?i P(OiParents(Oi))

4
An exemple
5
Bayesian Network Classifiers

U O1, , ON, S
Oi are the observation variables.
S is the state variable
Goal Infer P(S O1, , ON) using Bayes rule

6
What is a Dynamic Bayesian Network?

For each time slice, a bayesian network

St1
St
O3
O1
O3
O1
O4
O2
O2
O4
O5
O5
O6
O6
7
Hidden Markov Model

The simplest Dynamic Bayesian Network
Defined by ? (?,B,?)
? ajk state transition probabilities ?
P(StkSt-1j)
B bj(k) Probability of observations ?
P(OtvkStj)
? pj Initial state distribution ? P(S1j)

St1
St-1
N possible states
St
Ot1
Ot
Ot-1
M possible values
8
HMM 3 problems

The evaluation problem (analysis
forward/backward)
Given an HMM ? and a sequence of observations O,
what is the probability that the observations are
generated by the model P(O ?) ?
The decoding problem (analysis - Viterbi)
Given an HMM ? and a sequence of observations O,
what is the most likely state sequence in the
model that produced the observations ?
The learning problem (synthesis Baum-Welch)
Given an HMM ? and a sequence of observations O,
how should we ajust the model parameters ? in
order to maximize P(O ?) ?

9
An exemple
HMM P(SO1, , O6) BN P(SO1, O2, O5
).P(SO4 ).P(SO3, O6 )

If every observation variable may take
10 different values
HMM 106 values
BN 1110 values
Naive BN 60 values

10
Why Dynamic Bayesian Networks ?

Handle temporal aspects (like HMM)
Handle dependancies within variables allowing for
efficient inference mechanisms
Trade-off between precision and computational
time for real-time applications hybrid
inference methods (Rao-Blackwellised Particle
Filters)

11
Why Dynamic Bayesian Networks ?

Handle Multimodal Fusion Fission, simply by
adding/substracting edges between vertices.
Intuitive and Comprehensive representation
(unlike NN, SVM,)
Take all scenarios into account (unlike
rule-based)
No need for much learninglearning is used to
refine a priori information about the model
(unlike NN, SVM, etc)

12
Smoothed learning

Too few samples for unbiased learning ?
? Compute a posteriori probability by machine
learning, taking a priori information into
account intelligently (learning coefficient µ)

13
References

Dynamic Bayesian Networks
Friedman Bayesian Network Classifiers
http//www.cs.uu.nl/docs/vakken/pn/bayesclass.pdf
Hidden Markov Models
Rabiner A tutorial on Hidden Markov Models and
selected applications in Speech Recognition
www.ai.mit.edu/courses/6.867-f02/papers/rabiner.p
df

14
HHMM A common framework

Kosta Gaitanis, UCL
Louvain-la-Neuve, Belgium

15
Outline

HHMM extending the classical HMM
Inference
Brief presentation of classical methods
Linear time Inference in the HHMM
Multiple Actors extending the HHMM

16
From the HMM to the Hierarchical HMM

A HMM generates symbols
Some problems have inherently a hierarchical
structure
We want to exploit and model this hierarchical
structure
Solution A HMM that generates another HMM !
Advantage the observations are correlated at
higher levels ? longer periods of time

17
DBN representation of a HHMM
State at level k
Termination node
Observation (noisy)
2 types of inference - Bottom up
(recognition) - Top down (planning)
Simple HMM
18
Example moving in an airport
Same model can be used for planning AND for
recognition !
19
Inference in the HHMM

Exact Inference (Pearl, JTree, loopy, )
Exponential complexity wrt number of arcs
Approximate Inference
Distribution approximations
? non-gaussian
Sampling methods (FP, MCMC, )
? Poor precision with large number of nodes
Hybrid Inference (RBPF, )
Good tradeoff betwin complexity and precision
Intelligent sampling can reduce network
connectivity ? exact inference can be possible
for the rest of the network

20
Linear Time Inference in the HHMM
The original HHMM
The HHMM after sampling the termination nodes

Rao-Blackwellised Particle Filter
Particle Filter samples the termination nodes and
the horizontal transitions
Exact Inference calculates the belief states of
the other variables using Bayes Rule ? Linear
complexity (tree structure)

21
Taking into account multiple actors

An actor can be
A point (edge of mouth, edge of body, )
A part of the body (mouth, hand, leg, )
A group of parts of the body (legs, hands,
upper/lower face)
A person
An object
In general, anything that can be observed is
considered as an actor
Conditional dependance between actors at
different levels of the hierarchy
If each actor needs a separate HHMM ?
Exponential Complexity, oups

22
Extending the HHMM for multiple actors

Idea The dependance between actors is modeled
only at one level.
Complexity stays linear because the substructures
are independant (tree structure)
Coordination node models structures such as AND
/ OR

23
Applications

Any problem that has
Noisy data
Correlated observations
Natural hierarchical decomposition
And needs
Dynamic Recognition (Bottom-up) or Planning
(Top-down)
Flexible modeling of actions
Learning

24
Gesture Recognition using the HHMM

Kosta Gaitanis, UCL
Louvain-la-Neuve, Belgium

25
Applications Gesture Recognition

Goal What is this man doing ?
Walking, jumping, sitting, taking an object,
Data Acquisition Natural Gesture (Alterface)
Positions and speed of 5 crucial body points
(head, hands, feet)

26
Hierarchical Decomposition

Natural Hierarchical decomposition of the body
The variables are independant at lower levels but
dependant at higher levels
Only one object at the top of the hierarchy

27
Modelling an action
28
Modelling Actions in a HHMM
29
Learning

Higher levels can be modelled easily using a
priori knowledge.
Lower levels are more complex
Production States
From verbally stated actions to point movements
Observation error model
Create a model for the errors made during data
acquisition
Inversions (left/right hand, head/hand, leg/hand,
)
Self-Occlusions

30
Observation Error Model

Self Occlusion
Bayesian Networks can still infer with
missing-data
Naive missing data inference
Inference with estimated data

31
Application Specific Model
32
MPEG-4 Facial Animation

Fanard François-Xavier, UCL
Louvain-la-Neuve, Belgium

33
Facial Definition Parameters (FDPs)

Set of 84 feature points placed on the face
Divided in 10 subgroups
Help to
customize an animation according to personnal
characteristics
reproduce the ossatures topography of a
particular face on which a specific texture may
be applied (the skin, the eyes, the beard,...)
Send once each animation session

34
Facial Animation Parameter Units (FAPUs)

Independance between emotions/animations and
facial models
Fractions of distance between face key
characteristics (iris/iris distance,)
Computed on a neutral face

35
Facial Animation Parameters (FAPs)

Correspond to facial muscles actions
Reproduce basic facial actions (expressions,
emotions articulation)
68 FAPs divided in 2 levels
High level (2)
the visemes mouth movements during elocution
(predefined)
the expressions (neutral, anger, disgust, joy,
sadness, fear surprise)

Neutral
Joy
Surprise
Anger
36
Facial Animation Parameter (FAP)

Corresponds to facial muscles actions
Reproduce basic facial actions (expressions,
emotions articulation)
68 FAPs divided in 2 levels
High level (2)
Low level (66)
raise_b_midlip, stretch_l_cornerlip,
raise_b_lip_lm,

Surprise
open_jaw raise_b_midlip stretch_l_cornerlip stretc
h_r_cornerlip raise_b_lip_lm raise_b_lip_rm close_
t_l_eyelid close_t_r_eyelid close_b_l_eyelid close
_b_r_eyelid
raise_l_i_eyebrow raise_r_i_eyebrow raise_l_m_eye
brow raise_r_m_eyebrow raise_l_o_eyebrow raise_r_o
_eyebrow squeeze_l_eyebrow squeeze_r_eyebrow stret
ch_l_cornerlip_o stretch_r_cornerlip_o
37
Facial Animation Parameter (FAP)

Corresponds to facial muscles actions
Reproduce basic facial actions (expressions,
emotions articulation)
68 FAPs divided in 2 levels
High level (2)
the visemes mouth movements during elocution
(21 predefined)
the expressions (neutral, anger, disgust, joy,
sadness, fear surprise)
Low level (66)
raise_b_midlip, stretch_l_cornerlip,
raise_b_lip_lm,
Applicables on most of the FDPs
Transmitted continuously during model animation
Mesh deformations

38
Requirements for facial animation

The animation requires a large set of different
emotions
Noise introduction
The animation needs to be real
Handling of temporal relations
The possibility of structuration of the FDPs
subgroups
Introduction of hierarchy (and/or)
Creating a facial expression is a complex task
Model learning

Hierarchical Hidden Markov Model (HHMM)
39
HHMM for facial animation
EXPRESSION (t1)
EXPRESSION (t)
Level 4

Level 3

Level 2

Level 1
3.2
3.4
3.6
Level 0
Observations
40
HHMM for facial animation
EXPRESSION (t1)
EXPRESSION (t)
Level 4

speech
Level 3

Level 2

Level 1
3.2
3.4
3.6
Level 0
Observations
41
Towards Multimodal Emotion Recognition using
Bayesian Networks

Olivier Martin, UCL
Louvain-la-Neuve, Belgium

42
Goal

Recognize the users emotional state using a
combinaison of facial, vocal and gestual
information.
(Use the recognised emotional state to understand
users interactions in interactive applications)

43
Outline

Facial, Vocal Gestual modalities for emotion
recognition
Multimodal Fusion Fission
Collaborations inside SIMILAR

44
System Overview Facial Layer(Feature
Extraction by the team of Alice Caplier,
LIS-INPG, Grenoble, France)
Lips State
Eye state
Eyebrow state
Bayesian network
Bayesian network
Bayesian network
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
(x,y)
45
System Overview Vocal Layer(Feature Extraction
by the team of Thierry Dutoit at Multitel, Mons,
Belgium)
Energy
µ
s²
Min Max
Speaking Rate
µ
s²
Min Max
Vocal Information
Bayesian Network
Pitch
µ
s²
Min Max
Examples
Angry ?
Noise
µ
s²
Min Max
Stressed ?
46
System Overview Gestual Layer(Real-time
tracking of gestual features provided by
Alterface, Belgium)
Gestual Information
Bayesian Network
Examples
Position speed
47
Multimodal Fusion A Second Stage Module
Lips States
N states N variables !
Left Eyebrow States
Right Eyebrow States
Emotionnal Belief Vector
Bayesian Network
Left Eye States
Right Eye States
Vocal Info
Gestual Info
48
Examples of Multimodal Fission in Multimodal
Emotion Recognition

Speech detection to decrease mouth influence when
the user is speaking and turn on the vocal
modality influence.
Hands tracking to detect occlusions, turn off
occluded feature influence (and find semantic
meaning of the occlusion).

49
Multimodal fission an example

P(Ck Ck-1, Ak ) where Ak F1,k ,
F2,k,G1,k, V1,k
Observations
Fi mouth corners
G1 a gestual feature
V1 a vocal feature

User has his hand in front of right mouth corner
User is not speaking
User is speaking
Ck1
Ck-1
Ck
V1
F1
F1
F1
V1
V1
F2
G1
F2
F2
G1
G1
50
Using 5 distances
51
Learning