Title: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection Segmentation = Tracking?
1Simultaneous Segmentation and 3D Pose Estimation
of HumansorDetection Segmentation Tracking?
- Philip H.S. Torr
- Pawan Kumar, Pushmeet Kohli, Matt Bray
- Oxford Brookes University
- Andrew Zisserman
- Oxford
- Arasanathan Thayananthan, Bjorn Stenger, Roberto
Cipolla - Cambridge
2Algebra
- Unifying Conjecture
- Tracking Detection Recognition
- Detection Segmentation
- therefore
- Tracking (pose estimation)Segmentation?
3Objective
Aim to get a clean segmentation of a human
Image
Segmentation
Pose Estimate??
4Developments
- ICCV 2003, pose estimation as fast nearest
neighbour plus dynamics (inspired by Gavrilla and
Toyoma Blake) - BMVC 2004, parts based chamfer to make space of
templates more flexible (a la pictorial
structures of Huttenlocher) - CVPR 2005, ObjCut combining segmentation and
detection. - ECCV 2006, interpolation of poses using the MVRVM
(Agarwal and Triggs) - ECCV 2006 combination of pose estimation and
segmentation using graph cuts.
5Tracking as Detection (Stenger et al ICCV 2003)
- Detection has become very efficient,
- e.g. real-time face detection, pedestrian
detection - Example Pedestrian detection Gavrila
Philomin, 1999 - Find match among large number of exemplar
templates - Issues
- Number of templates needed
- Efficient search
- Robust cost function
6Cascaded Classifiers
71280x1024 image, 11 subsampling levels,
80s Average number of filter per patch 6.7
First filter 19.8 patches remaining
81280x1024 image, 11 subsampling levels,
80s Average number of filter per patch 6.7
Filter 10 0.74 patches remaining
91280x1024 image, 11 subsampling levels,
80s Average number of filter per patch 6.7
Filter 20 0.06 patches remaining
101280x1024 image, 11 subsampling levels,
80s Average number of filter per patch 6.7
Filter 30 0.01 patches remaining
111280x1024 image, 11 subsampling levels,
80s Average number of filter per patch 6.7
Filter 70 0.007 patches remaining
12Hierarchical Detection
- Efficient template matching (Huttenlocher
Olson, Gavrila) - Idea When matching similar objects, speed-up by
forming template hierarchy found by clustering - Match prototypes first, sub-tree only if cost
below threshold
13Trees
- These search trees are the same as used for
efficient nearest neighbour. - Add dynamic model and
- Detection Tracking Recognition
14Evaluation at Multiple Resolutions
- One traversal of tree per time step
15Evaluation at Multiple Resolutions
Tree 9000 templates of hand pointing, rigid
16Templates at Level 1
17Templates at Level 2
18Templates at Level 3
19Comparison with Particle Filters
- This method is grid based,
- No need to render the model on line
- Like efficient search
- Can always use this as a proposal process for a
particle filter if need be.
20Interpolation, MVRVM, ECCV 2006
Code available.
21Energy being Optimized, link to graph cuts
- Combination of
- Edge term (quickly evaluated using chamfer)
- Interior term (quickly evaluated using integral
images) - Note that possible templates are a bit like cuts
that we put down, one could think of this whole
process as a constrained search for the best
graph cut.
22Likelihood Edges
3D Model
Input Image
Edge Detection
Projected Contours
Robust Edge Matching
23Chamfer Matching
Input image
Canny edges
Distance transform
Projected Contours
24Likelihood Colour
3D Model
Input Image
Projected Silhouette
Skin Colour Model
Template Matching
25Template Matching
- Template Matching constrained search for a
cut/segmentation? - Detection Segmentation?
26Objective
Aim to get a clean segmentation of a human
Image
Segmentation
Pose Estimate??
27MRF for Interactive Image Segmentation, Boykov
and Jolly ICCV 2001
EnergyMRF
Unary likelihood
Contrast Term
Uniform Prior (Potts Model)
Maximum-a-posteriori (MAP) solution x arg min
E(x)
x
Pair-wise Terms
Unary likelihood
Data (D)
MAP Solution
28However
- This energy formulation rarely provides realistic
(target-like) results.
29Shape-Priors and Segmentation
- Combine object detection with segmentation
- Obj-Cut, Kumar et al., CVPR 05
- Zhao and Davis, ICCV 05
- Obj-Cut
- Shape-Prior Layered Pictorial Structure (LPS)
- Learned exemplars for parts of the LPS model
- Obtained impressive results
Layer 1
Layer 2
LPS model
30LPS for Detection
- Learning
- Learnt automatically using a set of examples
- Detection
- Tree of chamfers to detect parts, assemble with
pictorial structure and belief propogation.
31Solve via Integer Programming
- SDP formulation (Torr 2001, AI stats)
- SOCP formulation (Kumar, Torr Zisserman this
conference) - LBP (Huttenlocher, many)
32Obj-Cut
Image
Likelihood Ratio (Colour)
Likelihood Distance from ?
Distance from ?
Shape Prior
33Integrating Shape-Prior in MRFs
Pairwise potential
Pixels
Labels
Unary potential
Prior Potts model
MRF for segmentation
34Integrating Shape-Prior in MRFs
Pairwise potential
Pixels
Labels
Unary potential
Prior Potts model
?
Pose parameters
Pose-specific MRF
35Do we really need accurate models?
Cow Instance
Layer 2
Transformations
T1 P(T1) 0.9
Layer 1
36Do we really need accurate models?
- Segmentation boundary can be extracted from edges
- Rough 3D Shape-prior enough for region
disambiguation
37Energy of the Pose-specific MRF
Energy to be minimized
Pairwise potential
Unary term
Potts model
Shape prior
But what should be the value of ??
38The different terms of the MRF
Likelihood of being foreground given a foreground
histogram
Likelihood of being foreground given all the terms
Shape prior model
Grimson-Stauffer segmentation
Shape prior (distance transform)
Resulting Graph-Cuts segmentation
Original image
39Can segment multiple views simultaneously
40Solve via gradient descent
- Comparable to level set methods
- Could use other approaches (e.g. Objcut)
- Need a graph cut per function evaluation
41Formulating the Pose Inference Problem
42But
- to compute the MAP of E(x) w.r.t the pose, it
means that the unary terms will be changed at
EACH iteration and the maxflow recomputed!
43Dynamic Graph Cuts
PA
cheaper operation
PB
computationally expensive operation
44Dynamic Image Segmentation
Image
Segmentation Obtained
Flows in n-edges
45Our Algorithm
46Dynamic Graph Cut vs Active Cuts
- Our method flow recycling
- AC cut recycling
- Both methods Tree recycling
47Experimental Analysis
Running time of the dynamic algorithm
MRF consisting of 2x105 latent variables
connected in a 4-neighborhood.
48Segmentation Comparison
Grimson-Stauffer
Bathia04
Our method
49Face Detector and ObjCut
50Segmentation
51Segmentation
52Conclusion
- Combining pose inference and segmentation worth
investigating. - Tracking Detection
- Detection Segmentation
- Tracking Segmentation.
- Segmentation SFM ??