Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection Segmentation = Tracking? - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection Segmentation = Tracking?

Description:

Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking? Philip H.S. Torr Pawan Kumar, Pushmeet Kohli, Matt Bray – PowerPoint PPT presentation

Number of Views:217
Avg rating:3.0/5.0
Slides: 53
Provided by: Push7
Category:

less

Transcript and Presenter's Notes

Title: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection Segmentation = Tracking?


1
Simultaneous Segmentation and 3D Pose Estimation
of HumansorDetection Segmentation Tracking?
  • Philip H.S. Torr
  • Pawan Kumar, Pushmeet Kohli, Matt Bray
  • Oxford Brookes University
  • Andrew Zisserman
  • Oxford
  • Arasanathan Thayananthan, Bjorn Stenger, Roberto
    Cipolla
  • Cambridge

2
Algebra
  • Unifying Conjecture
  • Tracking Detection Recognition
  • Detection Segmentation
  • therefore
  • Tracking (pose estimation)Segmentation?

3
Objective
Aim to get a clean segmentation of a human
Image
Segmentation
Pose Estimate??
4
Developments
  • ICCV 2003, pose estimation as fast nearest
    neighbour plus dynamics (inspired by Gavrilla and
    Toyoma Blake)
  • BMVC 2004, parts based chamfer to make space of
    templates more flexible (a la pictorial
    structures of Huttenlocher)
  • CVPR 2005, ObjCut combining segmentation and
    detection.
  • ECCV 2006, interpolation of poses using the MVRVM
    (Agarwal and Triggs)
  • ECCV 2006 combination of pose estimation and
    segmentation using graph cuts.

5
Tracking as Detection (Stenger et al ICCV 2003)
  • Detection has become very efficient,
  • e.g. real-time face detection, pedestrian
    detection
  • Example Pedestrian detection Gavrila
    Philomin, 1999
  • Find match among large number of exemplar
    templates
  • Issues
  • Number of templates needed
  • Efficient search
  • Robust cost function

6
Cascaded Classifiers
7
1280x1024 image, 11 subsampling levels,
80s Average number of filter per patch 6.7
First filter 19.8 patches remaining
8
1280x1024 image, 11 subsampling levels,
80s Average number of filter per patch 6.7
Filter 10 0.74 patches remaining
9
1280x1024 image, 11 subsampling levels,
80s Average number of filter per patch 6.7
Filter 20 0.06 patches remaining
10
1280x1024 image, 11 subsampling levels,
80s Average number of filter per patch 6.7
Filter 30 0.01 patches remaining
11
1280x1024 image, 11 subsampling levels,
80s Average number of filter per patch 6.7
Filter 70 0.007 patches remaining
12
Hierarchical Detection
  • Efficient template matching (Huttenlocher
    Olson, Gavrila)
  • Idea When matching similar objects, speed-up by
    forming template hierarchy found by clustering
  • Match prototypes first, sub-tree only if cost
    below threshold

13
Trees
  • These search trees are the same as used for
    efficient nearest neighbour.
  • Add dynamic model and
  • Detection Tracking Recognition

14
Evaluation at Multiple Resolutions
  • One traversal of tree per time step

15
Evaluation at Multiple Resolutions
Tree 9000 templates of hand pointing, rigid
16
Templates at Level 1
17
Templates at Level 2
18
Templates at Level 3
19
Comparison with Particle Filters
  • This method is grid based,
  • No need to render the model on line
  • Like efficient search
  • Can always use this as a proposal process for a
    particle filter if need be.

20
Interpolation, MVRVM, ECCV 2006
Code available.
21
Energy being Optimized, link to graph cuts
  • Combination of
  • Edge term (quickly evaluated using chamfer)
  • Interior term (quickly evaluated using integral
    images)
  • Note that possible templates are a bit like cuts
    that we put down, one could think of this whole
    process as a constrained search for the best
    graph cut.

22
Likelihood Edges
3D Model
Input Image
Edge Detection
Projected Contours
Robust Edge Matching
23
Chamfer Matching
Input image
Canny edges
Distance transform
Projected Contours
24
Likelihood Colour
3D Model
Input Image
Projected Silhouette
Skin Colour Model
Template Matching
25
Template Matching
  • Template Matching constrained search for a
    cut/segmentation?
  • Detection Segmentation?

26
Objective
Aim to get a clean segmentation of a human
Image
Segmentation
Pose Estimate??
27
MRF for Interactive Image Segmentation, Boykov
and Jolly ICCV 2001
EnergyMRF

Unary likelihood
Contrast Term
Uniform Prior (Potts Model)
Maximum-a-posteriori (MAP) solution x arg min
E(x)
x
Pair-wise Terms
Unary likelihood
Data (D)
MAP Solution
28
However
  • This energy formulation rarely provides realistic
    (target-like) results.

29
Shape-Priors and Segmentation
  • Combine object detection with segmentation
  • Obj-Cut, Kumar et al., CVPR 05
  • Zhao and Davis, ICCV 05
  • Obj-Cut
  • Shape-Prior Layered Pictorial Structure (LPS)
  • Learned exemplars for parts of the LPS model
  • Obtained impressive results



Layer 1
Layer 2
LPS model
30
LPS for Detection
  • Learning
  • Learnt automatically using a set of examples
  • Detection
  • Tree of chamfers to detect parts, assemble with
    pictorial structure and belief propogation.

31
Solve via Integer Programming
  • SDP formulation (Torr 2001, AI stats)
  • SOCP formulation (Kumar, Torr Zisserman this
    conference)
  • LBP (Huttenlocher, many)

32
Obj-Cut
Image
Likelihood Ratio (Colour)
Likelihood Distance from ?
Distance from ?
Shape Prior
33
Integrating Shape-Prior in MRFs
Pairwise potential
Pixels
Labels
Unary potential
Prior Potts model
MRF for segmentation
34
Integrating Shape-Prior in MRFs
Pairwise potential
Pixels
Labels
Unary potential
Prior Potts model
?
Pose parameters
Pose-specific MRF
35
Do we really need accurate models?
Cow Instance
Layer 2
Transformations
T1 P(T1) 0.9
Layer 1
36
Do we really need accurate models?
  • Segmentation boundary can be extracted from edges
  • Rough 3D Shape-prior enough for region
    disambiguation

37
Energy of the Pose-specific MRF
Energy to be minimized
Pairwise potential
Unary term
Potts model
Shape prior
But what should be the value of ??
38
The different terms of the MRF
Likelihood of being foreground given a foreground
histogram
Likelihood of being foreground given all the terms
Shape prior model
Grimson-Stauffer segmentation
Shape prior (distance transform)
Resulting Graph-Cuts segmentation
Original image
39
Can segment multiple views simultaneously
40
Solve via gradient descent
  • Comparable to level set methods
  • Could use other approaches (e.g. Objcut)
  • Need a graph cut per function evaluation

41
Formulating the Pose Inference Problem
42
But
  • to compute the MAP of E(x) w.r.t the pose, it
    means that the unary terms will be changed at
    EACH iteration and the maxflow recomputed!

43
Dynamic Graph Cuts
PA
cheaper operation
PB
computationally expensive operation
44
Dynamic Image Segmentation
Image
Segmentation Obtained
Flows in n-edges
45
Our Algorithm
46
Dynamic Graph Cut vs Active Cuts
  • Our method flow recycling
  • AC cut recycling
  • Both methods Tree recycling

47
Experimental Analysis
Running time of the dynamic algorithm
MRF consisting of 2x105 latent variables
connected in a 4-neighborhood.
48
Segmentation Comparison
Grimson-Stauffer
Bathia04
Our method
49
Face Detector and ObjCut
50
Segmentation
51
Segmentation
52
Conclusion
  • Combining pose inference and segmentation worth
    investigating.
  • Tracking Detection
  • Detection Segmentation
  • Tracking Segmentation.
  • Segmentation SFM ??
Write a Comment
User Comments (0)
About PowerShow.com