Ankur Agarwal and Bill Triggs - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Ankur Agarwal and Bill Triggs

Description:

Learning to Reconstruct 3D Human Pose and Motion from Silhouettes. Pattern Recognition and Machine Learning in ... POSER human modeler from Curious Labs ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 34
Provided by: aaga
Category:
Tags: agarwal | ankur | bill | poser | triggs

less

Transcript and Presenter's Notes

Title: Ankur Agarwal and Bill Triggs


1
Learning to Reconstruct 3D Human Pose and Motion
from Silhouettes
  • Ankur Agarwal and Bill Triggs
  • LEAR
  • GRAVIR-CNRS-INRIA, Grenoble

Pattern Recognition and Machine Learning in
Computer Vision Workshop 05 May 2004
2
Goal
  • Recover 3D human body pose from image silhouettes
  • 3D pose joint angles
  • Use either individual images or video sequences
  • Applications
  • motion capture
  • human-computer interaction
  • action recognition
  • visual surveillance

3
2 Broad Classes of Approaches
  • Model based approaches
  • Presuppose an explicitly known parametric body
    model
  • Inverting kinematics / Numerical optimization
  • subcase Model based tracking
  • Learning based approaches
  • Avoid accurate 3D modeling/rendering
  • e.g. Example based methods

4
Model Free Learning based Approach
  • Recovers 3D pose (joint angles) by direct
    regression on robust silhouette descriptors
  • Sparse kernel-based regressor trained used human
    motion capture data
  • Advantages
  • no need to build an explicit 3D model
  • easily adapted to different people / appearances
  • may be more robust than model based approach
  • Disadvantages
  • harder to interpret than explicit model, and may
    be less
  • accurate

5
The Basic Idea
  • To learn a compact system that directly outputs
    pose from an image
  • Represent the input (image) by a descriptor
    vector z.
  • Write the multi-parameter output (pose) as a
    vector x.
  • Learn a regressor
  • x F(z) e
  • Note this assumes a functional relationship
    between z and x, which might not really be the
    case.

6
Silhouette Descriptors
7
Why Use Silhouettes ?
  • Captures most of the available pose information
  • Can (often) be extracted from real images
  • Insensitive to colour, texture, clothing
  • No prior labeling (e.g. of limbs) required
  • Limitations
  • Artifacts like attached shadows are
  • common
  • Depth ordering / sidedness information
  • is lost

8
Ambiguities
  • Which arm / leg is forwards? Front or back
    view?
  • Where is occluded arm? How much is knee
    bent?
  • Silhouette-to-pose problem is inherently
    multi-valued
  • Single-valued regressors sometimes behave
    erratically

9
Shape Context Histograms
  • Need to capture silhouette shape but be robust
    against occlusions/segmentation failures
  • Avoid global descriptors like moments
  • Use Shape Context Histograms distributions of
    local shape context responses

10
Shape Context Histograms Encode Locality
  • First 2 principal components of Shape Context
    (SC) distribution from combined training data,
    with k-means centres superimposed, and an SC
    distribution from a single silhouette.
  • SCs implicitly encode position on silhouette
    an average overall human silhouettes -like form
    is discernable

11
Nonlinear Regression
12
Regression Model
  • Predict output vector x (here 3D human pose),
    given input vector z (here a shape context
    histogram)
  • x ? akfk(z) e A f(z) e
  • fk(z) k 1p basis functions
  • A (a1 a2 ap)
  • f(z) (f1(z) f2(z) fp(z))T
  • Kernel bases fk K(z,zk) for given centre
    points zk and kernel K.
  • e.g. K(z,zk) exp(-ßz-zk2)

p
k1
A
13
Regularized Least Squares
n
  • A arg min ? A f(zi) - xi2 R(A)
  • arg min A F - X2 R(A)
  • R(A) Regularizer / penalty function to control
    overfitting
  • Ridge Regression
  • R(A) trace(A T A)

i1
A
A
14
Relevance Vector Machine a brief introduction
  • A sparse Bayesian approach to classification and
    regression, proposed in M. Tipping, NIPS 01.
  • Gaussian priors on each parameter (or group of
    parameters)
  • Non-convex priors of the form
  • R(a) ? loga (dR/da ?/a)
  • R(A) ??klogak
  • ?Pruning/shrinkage strength

15
Contd.
  • Advantage Sparse solutions
  • With kernel bases only relevant examples are
    retained
  • With linear bases (fk(z) z), relevant features
    are selected

16
Pose from Static Images
17
Training Test Data
  • For the movements, we use real human motion
    capture data
  • captures typical human movements, not just
    possible ones
  • from www.ict.usc.edu/graphics/animWeb/humanoid
  • Unfortunately we don t have the corresponding
    silhouettes, so we synthesize realistic ones
  • POSER human modeler from Curious Labs
  • somewhat artificial, but gives ground truth for
    testing, allows a wide range of training
    viewpoints.
  • Also test on real sequences of another person
    (without ground truth)

18
Methods Tested
  • Regressors we tested both ridge regression and
    RVM
  • Basis we tested both linear basis (in our
    nonlinear SC Histogram descriptors) and Gaussian
    kernels of various widths.
  • Performance is very similar for all methods
  • Gaussian kernels are a little better than the
    linear basis.
  • The RVM regressors are much sparser than ridge
    regressors, with very similar performance.

19
Synthetic Spiral Walk Test Sequence
Single image, RVM with Gaussian kernel, sparsity
6 (2636 examples, 156 support vectors). Mean
angular error per d.o.f. is 6.0o
20
Spiral Walk Test Sequence
Mostly OK, but 15 glitches owing to pose
ambiguities
21
Some statistics ..
  • Mean RMS reconstruction error over all joints
    6.02o
  • Graphs for left hip angle and overall heading
    angle

22
Glitches
  • Results are OK most of the time, but there are
    frequent glitches
  • regressor either chooses wrong case of an
    ambiguous pair, or remains undecided.
  • Problem is especially evident for heading angle
    the most visible pose variable.
  • For heading, we can quantify the conflict
  • it has a 360o range so we actually regress
    (cos,sin)
  • denormalization of this unit vector is a sign of
    conflict

23
(No Transcript)
24
Real Image example
25
Understanding the Problem
  • x vs z is actually a multi-branched surface
  • Functional treatment can lead to learning the
    mean of possible solutions, or zig-zagging
    between different solutions (in kernel spaces )
  • Real solution Multi-valued regression
  • A possible solution to resolve ambiguities using
    temporal information

26
Pose from Video Sequences
27
Tracking Framework
  • Reduce glitches by embedding problem in a
    tracking framework.
  • Idea using temporal information to serve as a
    hint to select the correct solution
  • To include state information, we use the familiar
    (dynamical prediction) (observation update)
    framework, but implement both parts using learned
    regression models.

28
Joint Regression equations
  • Dynamics
  • 2nd order linear autoregressive model
  • xt A xt-1 B xt-2
  • State-sensitive observation update
  • Nonlinear dependence on state prediction
  • xt C xt ?dkfk(xt,zt) e
  • Kernel selects examples close in both z and x
    space

29
Results with Joint Regression
  • Ensures temporal smoothness, handles ambiguities
  • Mean RMS reconstruction error over all joints
    4.1o
  • Graphs for left hip angle and overall heading
    angle

30
Spiral Walk Test Sequence
RMS reconstruction error is about 4 degrees per
joint angle
31
Real Images Test Sequence
Weakness a weak observation may lead to
domination of the dynamical model..
32
Conclusion
  • Advantages
  • Compact model for direct regression on image
    observations
  • No explicit 3D model ?easy adaptability
  • Exploits temporal coherency in sequences by
    explicitly modeling dynamics
  • Potentially self-initialized tracking
  • (84 correctness using automatic initialization)
  • Disadvantages
  • Requires segmentation

33
Approximating the prior with quadratic bridges
in the RVM training algorithm
Write a Comment
User Comments (0)
About PowerShow.com