Learning and Recognizing Human Dynamics in Video Sequences Christoph Bregler presentation

About This Presentation

Transcript and Presenter's Notes

Title: Learning and Recognizing Human Dynamics in Video Sequences Christoph Bregler

1
Learning and Recognizing Human Dynamics in Video
SequencesChristoph Bregler

Alvina Goh
Reading group 07/06/06

2
Motivation

Seeing lights attached to the joints of an actor,
humans were able to distinguish human gaits,
dance styles, stair climbing, or even the gender
and identity.
This paper attempts to find the right balance of
supplied structure and learned parameters.
Guiding principles
no early commitment to specific hypotheses
higher level hypothesis should be able to
disambiguate lower level estimates
low computation and representation costs
mid and higher level models should be learnable

3
Motivation

Human motion is represented at many levels of
abstraction.
This paper describes a way of combining cues from
the lowest level to the highest level in order to
do activity recognition.
By suggesting the idea of representing motion
data by movemes (like phonemes in speech
recognition), it is possible to compose a complex
activity (word) out of simple movemes.

4
Probabilistic Compositional Framework

Low-level primitives areas of coherent motion
Image region belonging to a rigid body segment is
one coherent motion
Mid-level categories simple movements
These are represented by linear dynamical
systems
High-level complex gestures a sequence of simple
movements
These are represented by Hidden Markov Models as
successive phases of simple movements

5
Probabilistic Compositional Framework
Each dynamical model corresponds to the emission
probability of the state of a hidden Markov model
Temporal sequences of blob tracks are grouped to
linear stochastic dynamical models.
Each blob is presented with a probability
distribution over coherent motion (rigid/affine),
color (HSV values), and spatial support regions.
At each pixel, represent spatio-temporal image
gradients, and the color value as a random
variable
6
Probabilistic Compositional Framework

Example of one leg during a walk cycle
One coherent blob for upper leg, another for the
lower leg
One dynamical system when the leg has ground
support, another when swinging above ground
State space translation and angular velocities
One cyclic HMM with 2 states
Sequence of images,
need to find corresponding blob estimates, linear
dynamical systems, and HMMs for a set of
different gaits,
classify using the posterior probability
ie, HMM with the highest score is the most
likely complex gesture performed in the image
sequence

7
1st and 2nd Levels
Each blob is presented with a probability
distribution over coherent motion (rigid/affine),
color (HSV values), and spatial support regions.
At each pixel, represent spatio-temporal image
gradients, and the color value as a random
variable
8
Classification of Pixels into Blobs

For each pixel location (x,y), we need to
estimate the label S(x,y), which indicates which
blob the pixel belongs to. (assuming there are K
blobs)
For each one of the K blobs, we need to estimate
the motion, color and spatial distribution.
In order to estimate the labels
S(x,y)?1,2,...,K and the model parameters for
motion, color and spatiality simultaneously, EM
is used.

9
Representation of Mixture of Blobs

Set of blob hypotheses for a given image frame
I(t) are represented as a mixture of multivariate
Gaussians ?(t)
Each ?k(t) contains the parameters for coherent
motion and color and the center of mass and
second moments in each blob. A background class
with uniform distribution is also defined.
Likelihood of an image frame I(t) conditional on
a mixture of blobs hypothesis is

We want to maximize this cost function
Spatial proximity prior for blob k
10
Before we can maximize the cost

We need to model
This term is defined using the spatial-temporal
image gradient (motion) and color values.
Optical flow
How do we model the pdf for optical flow?
This is done with a zero-mean Gaussian
distribution as described in the paper
E. Simoncelli, E. Adelson, and D. Heeger,
"Probability distributions of optical flow," in
Proceedings of the IEEE Computer Vision and
Pattern Recognition Conference, pp. 310--315,
1991.
This defines
which we use for

11
Expectation step

Estimation of the support layer for each blob,
which is the posterior probability
Note that we are calculating the expected
membership.

12
Maximization step

Seek to maximize the expected log-likelihood.
This is equivalent to minimizing the following
Minimizing (8) wrt the constraint ?k wk1 is
equivalent to assigning
Minimizing (9) is equivalent to computing the
weighted means and covariances for the support
layer
Minimizing (10) is done by extending the
Lucas-Kanade motion estimation in the paper Good
Features to track by Shi and Tomasi, CVPR 1994

13
A side note

Black ink high probability
Support map has high probability for the motion
model at regions with high gradients as they can
be uniquely matched to specific motion models. At
non-textured regions, equal probability is
assigned to several motion models.
This approach can be viewed as an edge based
tracker at regions with high edge gradients, and
a region based tracker at regions with high
texture.

14
Considering Past Estimates

Since EM converges to a local maxima only, it is
important to initialize the starting point
intelligently.
Now given past estimates of the blob parameters
?(t-1), Kalman filters is used to predict the
mean and covariance of ?(t). ?(t) state space
of the filter
The EM starting point is the predicted Kalman
state.

15
3rd and 4th Levels
Each dynamical model corresponds to the emission
probability of the state of a hidden Markov model
Temporal sequences of blob tracks are grouped to
linear stochastic dynamical models.
16
Classification of Blobs into Dynamical Systems

Similar to what was done in the lower levels
where we introduced the hidden variables
Sk(t,x,y), indicating the probability of a blob
at a pixel.
We now introduce the variable Dm(t,k), which
groups a sequence of blobs ?k(t), ?k(t-1),..
?k(t-d) to a dynamical system m.
In order to do so, we assume the following
discrete 2nd order stochastic dynamical system
(moveme)
The state variable Q(t) is the motion estimate
of the specific blob ?k(t),
w is system noise, and
CmBm BmT is the system covariance

17
Classification of Complex Gestures

Hidden Markov Models are used to represent
complex geatures composed of simple dynamical
systems. The state of the HMM corresponds to the
validity of the dynamic system. The emission
probabilities are represented by the dynamic
stochastic system.
We want to compute the global best segmentation
across time. This is done using dynamic
programming.
Estimate that a HMMi fits a track
P(Dm(t,k) is the probability that dynamical
system m fits blob k at time t. trn,m is the HMM
transition probability between state n and m.
Compare across all the complex category HMMi and
an outlier model HMM0, classify.

18
Hybrid Dynamical Models

We need to estimate the system parameters of each
dynamical model
and the entries trm,n of the HMM transition
probability matrix.
However, if we are only given the motion
trajectories Q(1), Q(2),.. Q(T), we do not know
the partition into subsequences. If we know the
partition, calculating the system parameters is
easy.
Proceed in a EM manner by maximizing the log
likelihood of a set of M dynamical systems and
the corresponding HMM

19
Expectation step

Estimation of the partition of the training set.
Find the probability Dm(t) that training example
Q(t) was generated by dynamical system ?m.
computed with dynamic programming with linear
complexity

20
Maximization step

Seek to maximize the following expected
log-likelihood for each model m
This is done by solving a linear equation.
New estimate of the HMM transition probability is
also computed with EM.

21
Experiments Training and Validation of Gait
Models

33 sequences of 5 subjects
Running, Walking, Skipping
Sequences start at different phases
4 dynamical models per gait
Uniform partition where each model is assigned
1/4

22
Experiments Recognizing of Gaits

Apply the learned dynamical models and HMM on
unseen data
Outlier model with 1 state and constant velocity
dynamical model
Highest likelihood is the final gait
classification

23
Experiments Recognizing of Gaits
24
Conclusion

Decomposes the domain and incorporates different
levels of abstraction using mixture models, EM,
recursive Kalman and Markov estimation.
How much data is need to build the recognizer?
How much computational time?

Write a Comment

User Comments (0)

About PowerShow.com

Learning and Recognizing Human Dynamics in Video Sequences Christoph Bregler PowerPoint PPT Presentation