Learning and Recognizing Human Dynamics in Video Sequences Christoph Bregler PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Learning and Recognizing Human Dynamics in Video Sequences Christoph Bregler


1
Learning and Recognizing Human Dynamics in Video
SequencesChristoph Bregler
  • Alvina Goh
  • Reading group 07/06/06

2
Motivation
  • Seeing lights attached to the joints of an actor,
    humans were able to distinguish human gaits,
    dance styles, stair climbing, or even the gender
    and identity.
  • This paper attempts to find the right balance of
    supplied structure and learned parameters.
  • Guiding principles
  • no early commitment to specific hypotheses
  • higher level hypothesis should be able to
    disambiguate lower level estimates
  • low computation and representation costs
  • mid and higher level models should be learnable

3
Motivation
  • Human motion is represented at many levels of
    abstraction.
  • This paper describes a way of combining cues from
    the lowest level to the highest level in order to
    do activity recognition.
  • By suggesting the idea of representing motion
    data by movemes (like phonemes in speech
    recognition), it is possible to compose a complex
    activity (word) out of simple movemes.

4
Probabilistic Compositional Framework
  • Low-level primitives areas of coherent motion
  • Image region belonging to a rigid body segment is
    one coherent motion
  • Mid-level categories simple movements
  • These are represented by linear dynamical
    systems
  • High-level complex gestures a sequence of simple
    movements
  • These are represented by Hidden Markov Models as
    successive phases of simple movements

5
Probabilistic Compositional Framework
Each dynamical model corresponds to the emission
probability of the state of a hidden Markov model
Temporal sequences of blob tracks are grouped to
linear stochastic dynamical models.
Each blob is presented with a probability
distribution over coherent motion (rigid/affine),
color (HSV values), and spatial support regions.
At each pixel, represent spatio-temporal image
gradients, and the color value as a random
variable
6
Probabilistic Compositional Framework
  • Example of one leg during a walk cycle
  • One coherent blob for upper leg, another for the
    lower leg
  • One dynamical system when the leg has ground
    support, another when swinging above ground
  • State space translation and angular velocities
  • One cyclic HMM with 2 states
  • Sequence of images,
  • need to find corresponding blob estimates, linear
    dynamical systems, and HMMs for a set of
    different gaits,
  • classify using the posterior probability
  • ie, HMM with the highest score is the most
    likely complex gesture performed in the image
    sequence

7
1st and 2nd Levels
Each blob is presented with a probability
distribution over coherent motion (rigid/affine),
color (HSV values), and spatial support regions.
At each pixel, represent spatio-temporal image
gradients, and the color value as a random
variable
8
Classification of Pixels into Blobs
  • For each pixel location (x,y), we need to
    estimate the label S(x,y), which indicates which
    blob the pixel belongs to. (assuming there are K
    blobs)
  • For each one of the K blobs, we need to estimate
    the motion, color and spatial distribution.
  • In order to estimate the labels
    S(x,y)?1,2,...,K and the model parameters for
    motion, color and spatiality simultaneously, EM
    is used.

9
Representation of Mixture of Blobs
  • Set of blob hypotheses for a given image frame
    I(t) are represented as a mixture of multivariate
    Gaussians ?(t)
  • Each ?k(t) contains the parameters for coherent
    motion and color and the center of mass and
    second moments in each blob. A background class
    with uniform distribution is also defined.
  • Likelihood of an image frame I(t) conditional on
    a mixture of blobs hypothesis is

We want to maximize this cost function
Spatial proximity prior for blob k
10
Before we can maximize the cost
  • We need to model
  • This term is defined using the spatial-temporal
    image gradient (motion) and color values.
  • Optical flow
  • How do we model the pdf for optical flow?
  • This is done with a zero-mean Gaussian
    distribution as described in the paper
  • E. Simoncelli, E. Adelson, and D. Heeger,
    "Probability distributions of optical flow," in
    Proceedings of the IEEE Computer Vision and
    Pattern Recognition Conference, pp. 310--315,
    1991.
  • This defines
  • which we use for

11
Expectation step
  • Estimation of the support layer for each blob,
    which is the posterior probability
  • Note that we are calculating the expected
    membership.

12
Maximization step
  • Seek to maximize the expected log-likelihood.
    This is equivalent to minimizing the following
  • Minimizing (8) wrt the constraint ?k wk1 is
    equivalent to assigning
  • Minimizing (9) is equivalent to computing the
    weighted means and covariances for the support
    layer
  • Minimizing (10) is done by extending the
    Lucas-Kanade motion estimation in the paper Good
    Features to track by Shi and Tomasi, CVPR 1994

13
A side note
  • Black ink high probability
  • Support map has high probability for the motion
    model at regions with high gradients as they can
    be uniquely matched to specific motion models. At
    non-textured regions, equal probability is
    assigned to several motion models.
  • This approach can be viewed as an edge based
    tracker at regions with high edge gradients, and
    a region based tracker at regions with high
    texture.

14
Considering Past Estimates
  • Since EM converges to a local maxima only, it is
    important to initialize the starting point
    intelligently.
  • Now given past estimates of the blob parameters
    ?(t-1), Kalman filters is used to predict the
    mean and covariance of ?(t). ?(t) state space
    of the filter
  • The EM starting point is the predicted Kalman
    state.

15
3rd and 4th Levels
Each dynamical model corresponds to the emission
probability of the state of a hidden Markov model
Temporal sequences of blob tracks are grouped to
linear stochastic dynamical models.
16
Classification of Blobs into Dynamical Systems
  • Similar to what was done in the lower levels
    where we introduced the hidden variables
    Sk(t,x,y), indicating the probability of a blob
    at a pixel.
  • We now introduce the variable Dm(t,k), which
    groups a sequence of blobs ?k(t), ?k(t-1),..
    ?k(t-d) to a dynamical system m.
  • In order to do so, we assume the following
    discrete 2nd order stochastic dynamical system
    (moveme)
  • The state variable Q(t) is the motion estimate
    of the specific blob ?k(t),
  • w is system noise, and
  • CmBm BmT is the system covariance

17
Classification of Complex Gestures
  • Hidden Markov Models are used to represent
    complex geatures composed of simple dynamical
    systems. The state of the HMM corresponds to the
    validity of the dynamic system. The emission
    probabilities are represented by the dynamic
    stochastic system.
  • We want to compute the global best segmentation
    across time. This is done using dynamic
    programming.
  • Estimate that a HMMi fits a track
  • P(Dm(t,k) is the probability that dynamical
    system m fits blob k at time t. trn,m is the HMM
    transition probability between state n and m.
  • Compare across all the complex category HMMi and
    an outlier model HMM0, classify.

18
Hybrid Dynamical Models
  • We need to estimate the system parameters of each
    dynamical model
  • and the entries trm,n of the HMM transition
    probability matrix.
  • However, if we are only given the motion
    trajectories Q(1), Q(2),.. Q(T), we do not know
    the partition into subsequences. If we know the
    partition, calculating the system parameters is
    easy.
  • Proceed in a EM manner by maximizing the log
    likelihood of a set of M dynamical systems and
    the corresponding HMM

19
Expectation step
  • Estimation of the partition of the training set.
  • Find the probability Dm(t) that training example
    Q(t) was generated by dynamical system ?m.
  • computed with dynamic programming with linear
    complexity

20
Maximization step
  • Seek to maximize the following expected
    log-likelihood for each model m
  • This is done by solving a linear equation.
  • New estimate of the HMM transition probability is
    also computed with EM.

21
Experiments Training and Validation of Gait
Models
  • 33 sequences of 5 subjects
  • Running, Walking, Skipping
  • Sequences start at different phases
  • 4 dynamical models per gait
  • Uniform partition where each model is assigned
    1/4

22
Experiments Recognizing of Gaits
  • Apply the learned dynamical models and HMM on
    unseen data
  • Outlier model with 1 state and constant velocity
    dynamical model
  • Highest likelihood is the final gait
    classification

23
Experiments Recognizing of Gaits
24
Conclusion
  • Decomposes the domain and incorporates different
    levels of abstraction using mixture models, EM,
    recursive Kalman and Markov estimation.
  • How much data is need to build the recognizer?
  • How much computational time?
Write a Comment
User Comments (0)
About PowerShow.com