Title: Continuousstate Graphical Models for Object Localization, Pose Estimation and Tracking
1Continuous-state Graphical Models for Object
Localization, Pose Estimation and Tracking
Leonid Sigal
Department of Computer Science, Brown University
http//www.cs.brown.edu/people/ls/
2Big picture
- Computer Vision Build tools that allow computers
to reason about the world based on visual inputs - (building computational models of the world)
- Building models of objects that allow us to
reason about the position, configuration and
interactions between these objects (e.g. cars,
buildings, people)
3Pose Estimation
Find the pose of the body
such that the model explains the image data.
4Tracking
Time
- Estimate pose for every time instance
- Tracking can simply be pose estimation at every
frame, but its very costly and ambiguous - Using temporal information is useful in
- localizing the body
- resolving pose ambiguities
5Applications
- Navigation Automated vehicle navigation,
obstacle avoidance, robotics - Human Computer Interaction Smart homes
- Entertainment Animation, Games
- Clinical Rehabilitation medicine
- Security Surveillance
- Understanding Gesture/Activity recognition
Matrix Trilogy, Warner Bros. Studios
Smart Rooms Project, MIT
SmartTer Project, EPFL
6Dont believe me?
Within five years "you could use gesture
recognition to get rid of the remote control,"
body trackingthe whole body, not just the
handscould drive demand for Intels important
new generation of semiconductors
Justin Rattner, Intel CEO
7Why is it hard?
- Appearance/size/shape of people can vary
- Occlusions
- High dimensionality
- Loss of depth information
- Loose clothing
- Motion blur .
8Approach
- Break up a very hard problem into smaller
manageable pieces
- Use continuous-state graphical models to model
the person
Statistical Graphical Models 1,220,000
Sigal, Black, AMDO06
Fashion Models 27,600,000
9Contributions
- Define a new and very rich model for modeling
people and reasoning about their
pose (loose-limbed body model)
- Introduce hierarchical strategy for 3D reasoning
10Related WorkGenerative Approaches
- Local stochastic search (top-down)
- Part-based approaches (bottom-up)
Felzenszwalb Huttenlocher, 00
Ramanan, Forsyth, Zisserman, 05
11Loose-limbed Body Model Preview
12Graphical Model (Toy Example)
- Encode Conditional Independence between random
variables
0.4 0.3 0.0 0.1 0.2
13Inference in a Graphical Model
- Finding the most likely values for all unknown
variables (X1, X2, X3) - Brute force algorithm
- by Hammersley-Clifford Theorem
p(X1bucket, X2bucket, X3bucket) p(X1bucket,
X2bucket, X3tree trunk) p(X1face, X2thigh,
X3calf) p(X1calf, X2calf, X3calf)
Prior
Likelihood
14Belief Propagation
- What is BP?
- Efficient algorithm for doing inference in
graphical models - For example, on a tree
- Brute force algorithm 0(MN)
- BP is O(NM2)
- In this simple example
- Brute force 53125
- BP 3x5275
- Real life 1028 times faster
15Step 1 of 2 Message Propagation
- X1?X2
- I am sure I am a face, so I think you should be
some other part of the body
P(X2bucket) 0.0 P(X2tree trunk)
0.0 P(X2face) 0.0 P(X2thigh)
0.6 P(X2calf) 0.4
16Step 2 of 2 Belief Estimation
- Merge
- Information from neighbors
- Local Information
- Distribution (belief) over X2
- Most likely value for X2
17Loopy-graphical Models
- Belief Propagation can be used to get an
approximate solution - Exact solution is intractable
- Inference on a graph
- BP is O(NMC)
X1
I1
X4
I4
X2
X5
X3
I2
I5
I3
18Generative Approaches
- Local stochastic search (top-down)
- Part-based approaches (bottom-up)
Felzenszwalb Huttenlocher, 00
Ramanan, Forsyth, Zisserman, 05
19Tree-structured Body Model
? s1, s2, , sM
X1
X , , ,
X1
I
Kinematic
X10
X2
X8
X4
X6
X2
X10
I
I
I
I
I
X3
X5
X9
X7
I
I
I
I
Felzenszwalb Huttenlocher, 00
20State-space
Felzenszwalb Huttenlocher, 00
21State-space
Felzenszwalb Huttenlocher, 00
22State-space
This example 13 x 10 positions x 8
rotations x 5 scales 5,200
Felzenszwalb Huttenlocher, 00
23State-space
Real Case 100 x 100 positions x 20
rotations x 5 scales 1,000,000
Felzenszwalb Huttenlocher, 00
24Tree-structured Body Model
? s1, s2, , sM
X1
X , , ,
X1
X2
X10
Kinematic
X10
X2
X8
X4
X6
X3
X5
X9
X7
Felzenszwalb Huttenlocher, 00
25Inference in Tree-structured Model
Prior
Likelihood
Xi
Xj
- Prior
- Body parts are connected at joints
- Relative positions of Xi and Xj
26Inference in Tree-structured Model
Prior
Likelihood
27Inference in Tree-structured Model
Prior
Likelihood
28Inference in Tree-structured Model
Prior
Likelihood
Color
Edges
29Inference in Tree-structured Model
Prior
Likelihood
Inference in this model can be done using
standard Belief Propagation (BP)
(exact inference)
Message Passing
Belief
30Tree-structured model limitations
Prior
Likelihood
- To make the inference tractable dynamic
programming is used - Only works for a tree structured model
- Requires a relatively coarse discretization
- Requires very simple form of the prior
-
- O(M2 N) ? O(M N)
31Comparison
32Loose-limbed Body Model
? s1, s2, , sM
X1
X , , ,
X1
X2
X10
?R5
Kinematic
X10
X2
X8
X4
X6
X3
X5
X9
X7
Sigal, Isard, Sigelman, Black, NIPS03
33Inference in Loose-Limbed Model
Prior
Likelihood
34Inference in Loose-Limbed Model
Prior
Likelihood
Message Passing
Belief
35Inference in Loose-Limbed Model
Prior
Likelihood
Inference in this model can be done using
standard Belief Propagation (BP)
Integration cannot be done analytically
Message Passing
Belief
36Inference in Loose-Limbed Model
- In tree-structured graphical models, exact
inference can be computed using BP - But, not in our case, where
- Variables are continuous
- Likelihoods (or priors) are not Gaussian
- Graph contains loops
- This forces the use of approximate BP inference
algorithms - PAMPAS M. Isard, 03
- Non-Parametric BP E. Sudderth, A. Ihler, W.
Freeman, A. Willsky, 03
X1
X10
X2
X4
X8
X6
X3
X5
X9
X7
37Loose-limbed Body Model
? R5
X1
X , , ,
X1
X2
X10
Kinematic
X10
X2
X8
X4
X6
X3
X5
X9
X7
Sigal, Isard, Sigelman, Black, NIPS03
38How good are tree structured model?
- Model always prefers undesired hypothesis
39Tree-structured approaches
- Assume likelihood factors
F ( )
F1( )
x
When Parts Can Occlude Each Other
Fi ( )
40Occlusion-sensitive Likelihoods
- We introduce explicit occlusion modeling into the
likelihood function in the form of hidden
per-pixel binary variables
F( )
F1( )
x
Ensures that we can factor the likelihood even
in presence of occlusions
41Occlusion-sensitive Likelihoods(for the torso)
Is there any part in front of torso?
Vi
F1( )
Vi
Is there any part behind torso?
42Occlusion-sensitive model
- Always prefers the true hypothesis
43Occlusion-sensitive Loose-limbed Body Model
e R5
X1
X , , ,
X1
X2
X10
Kinematic
X10
X2
X8
X4
X6
X3
X5
X9
X7
Occlusion
Sigal, Black, CVPR06
442D Pose Estimation
Most Likely Sample
Most Likely Sample
Most Likely Sample
Distribution
Distribution
Frame 2
Frame 24
Frame 49
Loose-limbed (No Occlusions)
Pictorial Structures
Loose-limbed (Occlusion-sensitive)
452D Pose Estimation
Most Likely Sample
Most Likely Sample
Most Likely Sample
Distribution
Distribution
Frame 2
Frame 24
Frame 49
Loose-limbed (No Occlusions)
Pictorial Structures
Loose-limbed (Occlusion-sensitive)
462D Pose Estimation
47Quantitative Evaluation
- Synchronized marker based motion capture and
multiocular video dataset - Currently downloaded by gt60 groups around the
world
Workshops in NIPS06, CVPR07
48Quantitative Comparison
- All algorithms were run on the same data
Subject specific Motion specific
() Beyond Trees Common Factor Models for 2D
Human Pose Recovery, Lan and Huttenlocher, ICCV
2005.
49Inferring 2D pose
- Occlusion-sensitive Loose-limbed body model
allows us to infer the 2D pose reliably (at about
50 overhead) - Even when motions are complex
Moving Camera
50Summary so far
Occlusion-sensitive Loose-limbed body model
51Hierarchical Graphical Model Structure
3D
2D
Image
52Hierarchical Graphical Model Structure
3D
2D
Image
53Hierarchical Graphical Model Structure
3D
2D
Image
54Inferring 3D pose from 2D pose
- We obtain estimates for the joints automatically
- We learn direct probabilistic mapping
55Inferring 3D pose from 2D pose
Mixture of Experts (MoE)
Sminchisescu et al, 05
Waterhouse et al, 96
56Inferring 3D pose from 2D pose
We want to estimate a distribution/mapping p(3D
Pose2D Pose)
X e Rn
2D Pose
p(YX)
Y e Rm
3D Pose
Problem p(YX) is non-linear mapping, and not
one-to-one
57Mixture of Experts (MoE)
We want to estimate a distribution/mapping p(3D
Pose2D Pose)
X e Rn
2D Pose
p(YX)
Y e Rm
3D Pose
Solution p(YX) may be approximated by a locally
linear mappings (experts)
58How well does MoE model work?
- View only 22 mm
- Pose only 59 mm
- Overall 64 mm
59Hierarchical 3D Pose Estimation from Single View
Monocular Images
Most Likely Sample
Most Likely Sample
Distribution
Distribution
Frame 10
Frame 20
Frame 50
2D Pose Estimation
3D Pose Estimation
Image
60Hierarchical 3D Pose Estimation from Single View
Monocular Images
2D Pose Estimation
3D Pose Estimation
61Summary so far
Hidden Markov Model (HMM)
62Hierarchical Graphical Model Structure
3D
2D
Image
t1
t
t-1
63Benefits of tracking
Frame 50
Frame 50
Frame 49
Frame 49
2D Pose Estimation
3D Pose Estimation Tracking
64Other ApplicationMultiocular imagery
Link
Sigal, Bhatia, Roth, Black, Isard, CVPR04
65Other Applications Vehicle Detection and
Tracking
Sigal, Zhu, Comaniciu, Black, IWCM04
Link
66Contributions
- Introduced loose-limbed body model
- can deal with continuous-state estimation
- can encode rich set of constraints (occlusions,
penetrations, action specific kinematics) - Introduced tractable inference approach for this
model - Used hierarchical representation and inference to
manage complexity of the problem - Quantitative evaluation of human pose estimation
67Future WorkBetter inference methods
- Particle Message Passing does not deal well with
multiple modes in the distribution - Mixture Tracking
- Inference approaches are relatively slow
- Hybrid Monte Carlo filtering
Vermaak, Doucet, Perez, CVPR03
Choo, Fleet, ICCV01
68Future WorkLearning model structure
- Learning the model structure (useful for deriving
motion specific models) - Kernel Generalized Variance
Bach, Jordan, NIPS03
Walking
Stretching
69Future Work Deeper Hierarchical Models
3D
2D
Image
70Future Work Deeper Hierarchical Models
3D
2D
Features
Image
71Future Work Deeper Hierarchical Models
Scene
3D
2D
Features
Natural Language Processing Visual Grammars
F. Han and S.-C. Zhu, ICCV05
Image
72Collaborators and Colleagues
- Michael J. Black
- - Alex Balan
- - Stefan Roth
- - Sidharth Bhatia
- - Ben Sigelman
- Michael Isard
- Horst Haussecker
- Trista Chen
- Konstantin Radyushkin
73Thank you !!!
74Contributions
- Introduced loose-limbed body model
- can deal with continuous-state estimation
- can encode rich set of constraints (occlusions,
penetrations, action specific kinematics) - Introduced tractable inference approach for this
model - Used hierarchical representation and inference to
manage complexity of the problem - Quantitative evaluation of human pose estimation