Ankur Agarwal and Bill Triggs - PowerPoint PPT Presentation

1 / 33

About This Presentation

Title:

Ankur Agarwal and Bill Triggs

Description:

Learning to Reconstruct 3D Human Pose and Motion from Silhouettes. Pattern Recognition and Machine Learning in ... POSER human modeler from Curious Labs ... – PowerPoint PPT presentation

Number of Views:68

Avg rating:3.0/5.0

Slides: 34

Provided by: aaga

Category:

more less

Transcript and Presenter's Notes

Title: Ankur Agarwal and Bill Triggs

1
Learning to Reconstruct 3D Human Pose and Motion
from Silhouettes

Ankur Agarwal and Bill Triggs
LEAR
GRAVIR-CNRS-INRIA, Grenoble

Pattern Recognition and Machine Learning in
Computer Vision Workshop 05 May 2004
2
Goal

Recover 3D human body pose from image silhouettes
3D pose joint angles
Use either individual images or video sequences
Applications
motion capture
human-computer interaction
action recognition
visual surveillance

3
2 Broad Classes of Approaches

Model based approaches
Presuppose an explicitly known parametric body
model
Inverting kinematics / Numerical optimization
subcase Model based tracking
Learning based approaches
Avoid accurate 3D modeling/rendering
e.g. Example based methods

4
Model Free Learning based Approach

Recovers 3D pose (joint angles) by direct
regression on robust silhouette descriptors
Sparse kernel-based regressor trained used human
motion capture data

Advantages
no need to build an explicit 3D model
easily adapted to different people / appearances
may be more robust than model based approach
Disadvantages
harder to interpret than explicit model, and may
be less
accurate

5
The Basic Idea

To learn a compact system that directly outputs
pose from an image
Represent the input (image) by a descriptor
vector z.
Write the multi-parameter output (pose) as a
vector x.
Learn a regressor
x F(z) e
Note this assumes a functional relationship
between z and x, which might not really be the
case.

6
Silhouette Descriptors
7
Why Use Silhouettes ?

Captures most of the available pose information
Can (often) be extracted from real images
Insensitive to colour, texture, clothing
No prior labeling (e.g. of limbs) required
Limitations
Artifacts like attached shadows are
common
Depth ordering / sidedness information
is lost

8
Ambiguities

Which arm / leg is forwards? Front or back
view?
Where is occluded arm? How much is knee
bent?
Silhouette-to-pose problem is inherently
multi-valued
Single-valued regressors sometimes behave
erratically

9
Shape Context Histograms

Need to capture silhouette shape but be robust
against occlusions/segmentation failures
Avoid global descriptors like moments
Use Shape Context Histograms distributions of
local shape context responses

10
Shape Context Histograms Encode Locality

First 2 principal components of Shape Context
(SC) distribution from combined training data,
with k-means centres superimposed, and an SC
distribution from a single silhouette.
SCs implicitly encode position on silhouette
an average overall human silhouettes -like form
is discernable

11
Nonlinear Regression
12
Regression Model

Predict output vector x (here 3D human pose),
given input vector z (here a shape context
histogram)
x ? akfk(z) e A f(z) e
fk(z) k 1p basis functions
A (a1 a2 ap)
f(z) (f1(z) f2(z) fp(z))T
Kernel bases fk K(z,zk) for given centre
points zk and kernel K.
e.g. K(z,zk) exp(-ßz-zk2)

p
k1
A
13
Regularized Least Squares
n

A arg min ? A f(zi) - xi2 R(A)
arg min A F - X2 R(A)
R(A) Regularizer / penalty function to control
overfitting
Ridge Regression
R(A) trace(A T A)

i1
A
A
14
Relevance Vector Machine a brief introduction

A sparse Bayesian approach to classification and
regression, proposed in M. Tipping, NIPS 01.
Gaussian priors on each parameter (or group of
parameters)
Non-convex priors of the form
R(a) ? loga (dR/da ?/a)
R(A) ??klogak
?Pruning/shrinkage strength

15
Contd.

Advantage Sparse solutions
With kernel bases only relevant examples are
retained
With linear bases (fk(z) z), relevant features
are selected

16
Pose from Static Images
17
Training Test Data

For the movements, we use real human motion
capture data
captures typical human movements, not just
possible ones
from www.ict.usc.edu/graphics/animWeb/humanoid
Unfortunately we don t have the corresponding
silhouettes, so we synthesize realistic ones
POSER human modeler from Curious Labs
somewhat artificial, but gives ground truth for
testing, allows a wide range of training
viewpoints.
Also test on real sequences of another person
(without ground truth)

18
Methods Tested

Regressors we tested both ridge regression and
RVM
Basis we tested both linear basis (in our
nonlinear SC Histogram descriptors) and Gaussian
kernels of various widths.
Performance is very similar for all methods
Gaussian kernels are a little better than the
linear basis.
The RVM regressors are much sparser than ridge
regressors, with very similar performance.

19
Synthetic Spiral Walk Test Sequence
Single image, RVM with Gaussian kernel, sparsity
6 (2636 examples, 156 support vectors). Mean
angular error per d.o.f. is 6.0o
20
Spiral Walk Test Sequence
Mostly OK, but 15 glitches owing to pose
ambiguities
21
Some statistics ..

Mean RMS reconstruction error over all joints
6.02o
Graphs for left hip angle and overall heading
angle

22
Glitches

Results are OK most of the time, but there are
frequent glitches
regressor either chooses wrong case of an
ambiguous pair, or remains undecided.
Problem is especially evident for heading angle
the most visible pose variable.
For heading, we can quantify the conflict
it has a 360o range so we actually regress
(cos,sin)
denormalization of this unit vector is a sign of
conflict

23
(No Transcript)
24
Real Image example
25
Understanding the Problem

x vs z is actually a multi-branched surface
Functional treatment can lead to learning the
mean of possible solutions, or zig-zagging
between different solutions (in kernel spaces )
Real solution Multi-valued regression
A possible solution to resolve ambiguities using
temporal information

26
Pose from Video Sequences
27
Tracking Framework

Reduce glitches by embedding problem in a
tracking framework.
Idea using temporal information to serve as a
hint to select the correct solution
To include state information, we use the familiar
(dynamical prediction) (observation update)
framework, but implement both parts using learned
regression models.

28
Joint Regression equations

Dynamics
2nd order linear autoregressive model
xt A xt-1 B xt-2
State-sensitive observation update
Nonlinear dependence on state prediction
xt C xt ?dkfk(xt,zt) e
Kernel selects examples close in both z and x
space

29
Results with Joint Regression

Ensures temporal smoothness, handles ambiguities
Mean RMS reconstruction error over all joints
4.1o
Graphs for left hip angle and overall heading
angle

30
Spiral Walk Test Sequence
RMS reconstruction error is about 4 degrees per
joint angle
31
Real Images Test Sequence
Weakness a weak observation may lead to
domination of the dynamical model..
32
Conclusion