Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection Segmentation = Tracking? - PowerPoint PPT Presentation

1 / 52

About This Presentation

Title:

Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection Segmentation = Tracking?

Description:

Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking? Philip H.S. Torr Pawan Kumar, Pushmeet Kohli, Matt Bray – PowerPoint PPT presentation

Number of Views:217

Avg rating:3.0/5.0

Slides: 53

Provided by: Push7

Category:

more less

Transcript and Presenter's Notes

Title: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection Segmentation = Tracking?

1
Simultaneous Segmentation and 3D Pose Estimation
of HumansorDetection Segmentation Tracking?

Philip H.S. Torr
Pawan Kumar, Pushmeet Kohli, Matt Bray
Oxford Brookes University
Andrew Zisserman
Oxford
Arasanathan Thayananthan, Bjorn Stenger, Roberto
Cipolla
Cambridge

2
Algebra

Unifying Conjecture
Tracking Detection Recognition
Detection Segmentation
therefore
Tracking (pose estimation)Segmentation?

3
Objective
Aim to get a clean segmentation of a human
Image
Segmentation
Pose Estimate??
4
Developments

ICCV 2003, pose estimation as fast nearest
neighbour plus dynamics (inspired by Gavrilla and
Toyoma Blake)
BMVC 2004, parts based chamfer to make space of
templates more flexible (a la pictorial
structures of Huttenlocher)
CVPR 2005, ObjCut combining segmentation and
detection.
ECCV 2006, interpolation of poses using the MVRVM
(Agarwal and Triggs)
ECCV 2006 combination of pose estimation and
segmentation using graph cuts.

5
Tracking as Detection (Stenger et al ICCV 2003)

Detection has become very efficient,
e.g. real-time face detection, pedestrian
detection
Example Pedestrian detection Gavrila
Philomin, 1999
Find match among large number of exemplar
templates
Issues
Number of templates needed
Efficient search
Robust cost function

6
Cascaded Classifiers
7
1280x1024 image, 11 subsampling levels,
80s Average number of filter per patch 6.7
First filter 19.8 patches remaining
8
1280x1024 image, 11 subsampling levels,
80s Average number of filter per patch 6.7
Filter 10 0.74 patches remaining
9
1280x1024 image, 11 subsampling levels,
80s Average number of filter per patch 6.7
Filter 20 0.06 patches remaining
10
1280x1024 image, 11 subsampling levels,
80s Average number of filter per patch 6.7
Filter 30 0.01 patches remaining
11
1280x1024 image, 11 subsampling levels,
80s Average number of filter per patch 6.7
Filter 70 0.007 patches remaining
12
Hierarchical Detection

Efficient template matching (Huttenlocher
Olson, Gavrila)
Idea When matching similar objects, speed-up by
forming template hierarchy found by clustering
Match prototypes first, sub-tree only if cost
below threshold

13
Trees

These search trees are the same as used for
efficient nearest neighbour.
Add dynamic model and
Detection Tracking Recognition

14
Evaluation at Multiple Resolutions

One traversal of tree per time step

15
Evaluation at Multiple Resolutions
Tree 9000 templates of hand pointing, rigid
16
Templates at Level 1
17
Templates at Level 2
18
Templates at Level 3
19
Comparison with Particle Filters

This method is grid based,
No need to render the model on line
Like efficient search
Can always use this as a proposal process for a
particle filter if need be.

20
Interpolation, MVRVM, ECCV 2006
Code available.
21
Energy being Optimized, link to graph cuts

Combination of
Edge term (quickly evaluated using chamfer)
Interior term (quickly evaluated using integral
images)
Note that possible templates are a bit like cuts
that we put down, one could think of this whole
process as a constrained search for the best
graph cut.

22
Likelihood Edges
3D Model
Input Image
Edge Detection
Projected Contours
Robust Edge Matching
23
Chamfer Matching
Input image
Canny edges
Distance transform
Projected Contours
24
Likelihood Colour
3D Model
Input Image
Projected Silhouette
Skin Colour Model
Template Matching
25
Template Matching

Template Matching constrained search for a
cut/segmentation?
Detection Segmentation?

26
Objective
Aim to get a clean segmentation of a human
Image
Segmentation
Pose Estimate??
27
MRF for Interactive Image Segmentation, Boykov
and Jolly ICCV 2001
EnergyMRF

Unary likelihood
Contrast Term
Uniform Prior (Potts Model)
Maximum-a-posteriori (MAP) solution x arg min
E(x)
x
Pair-wise Terms
Unary likelihood
Data (D)
MAP Solution
28
However

This energy formulation rarely provides realistic
(target-like) results.

29
Shape-Priors and Segmentation

Combine object detection with segmentation
Obj-Cut, Kumar et al., CVPR 05
Zhao and Davis, ICCV 05
Obj-Cut
Shape-Prior Layered Pictorial Structure (LPS)
Learned exemplars for parts of the LPS model
Obtained impressive results

Layer 1
Layer 2
LPS model
30
LPS for Detection

Learning
Learnt automatically using a set of examples
Detection
Tree of chamfers to detect parts, assemble with
pictorial structure and belief propogation.

31
Solve via Integer Programming

SDP formulation (Torr 2001, AI stats)
SOCP formulation (Kumar, Torr Zisserman this
conference)
LBP (Huttenlocher, many)

32
Obj-Cut
Image
Likelihood Ratio (Colour)
Likelihood Distance from ?
Distance from ?
Shape Prior
33
Integrating Shape-Prior in MRFs
Pairwise potential
Pixels
Labels
Unary potential
Prior Potts model
MRF for segmentation
34
Integrating Shape-Prior in MRFs
Pairwise potential
Pixels
Labels
Unary potential
Prior Potts model
?
Pose parameters
Pose-specific MRF
35
Do we really need accurate models?
Cow Instance
Layer 2
Transformations
T1 P(T1) 0.9
Layer 1
36
Do we really need accurate models?

Segmentation boundary can be extracted from edges
Rough 3D Shape-prior enough for region
disambiguation

37
Energy of the Pose-specific MRF
Energy to be minimized
Pairwise potential
Unary term
Potts model
Shape prior
But what should be the value of ??
38
The different terms of the MRF
Likelihood of being foreground given a foreground
histogram
Likelihood of being foreground given all the terms
Shape prior model
Grimson-Stauffer segmentation
Shape prior (distance transform)
Resulting Graph-Cuts segmentation
Original image
39
Can segment multiple views simultaneously
40
Solve via gradient descent

Comparable to level set methods
Could use other approaches (e.g. Objcut)
Need a graph cut per function evaluation

41
Formulating the Pose Inference Problem
42
But

to compute the MAP of E(x) w.r.t the pose, it
means that the unary terms will be changed at
EACH iteration and the maxflow recomputed!

43
Dynamic Graph Cuts
PA
cheaper operation
PB
computationally expensive operation
44
Dynamic Image Segmentation
Image
Segmentation Obtained
Flows in n-edges
45
Our Algorithm
46
Dynamic Graph Cut vs Active Cuts

Our method flow recycling
AC cut recycling
Both methods Tree recycling

47
Experimental Analysis
Running time of the dynamic algorithm
MRF consisting of 2x105 latent variables
connected in a 4-neighborhood.
48
Segmentation Comparison
Grimson-Stauffer
Bathia04
Our method
49
Face Detector and ObjCut
50
Segmentation
51
Segmentation
52
Conclusion