Title: Partbased Object DetectionRecognition
1Part-based Object Detection/Recognition
2Recap of last Talk
- We can extract salient features (e.g. SIFT
feature) from images - These salient features can be used for the tasks
of object recognition using the bag of words
model - Bag of words model, however, ignores the
position of feature points
3Part-Based Object Recognition/Detection
- Representation
- Models to represent the geometry relationship
between parts - Learning
- Find the parameters for the models
- Recognition
- Recognize/locate the objects
-
4Representation
- Object as set of parts
- Generative representation
- Model
- Relative locations between parts
- Appearance of part
- Issues
- How to model location
- How to represent appearance
- Sparse or dense (pixels or regions)
- How to handle occlusion/clutter
Figure from Fischler73
5Todays Coverage
- Constellation Model
- Fergus, R. , Perona, P. and Zisserman, A. Object
Class Recognition by Unsupervised Scale-Invariant
Learning, CVPR 2003. (Winner of CVPR 2003 Best
Pape prize) - Pictorial Structure Representation
- P. Felzenszwalb and D. Huttenlocher. Pictorial
structures for object recognition. International,
Journal of Computer Vision, 6155-79, January
2005. - Shape Matching
- A. C. Berg, T. L. Berg, J. Malik. Shape Matching
and Object Recognition using Low Distortion
Correspondence, CVPR 2005
6Todays Coverage
- Constellation Model
- Fergus, R. , Perona, P. and Zisserman, A. Object
Class Recognition by Unsupervised Scale-Invariant
Learning, CVPR 2003. (Winner of CVPR 2003 Best
Pape prize) - Pictorial Structure Representation
- P. Felzenszwalb and D. Huttenlocher. Pictorial
structures for object recognition. International,
Journal of Computer Vision, 6155-79, January
2005. - Shape Matching
- A. C. Berg, T. L. Berg, J. Malik. Shape Matching
and Object Recognition using Low Distortion
Correspondence, CVPR 2005
7Part-based Matching
- Model with P parts
- Image with N possible locations for each part
- NP combinations!!!
- Use a vector h, h P, and h(i) k means the
ith part of the object is assigned to kth
detected features, h(i) 0 means ith part is
occluded
8Model Representation(1)
- In a query image, assume that we identify N
interesting feature points with Location X,
scales S, and appearance A - We now make a Bayesian decision, R
9Model Representation(2)
10Model Representation(3)
- Appearance Model
- Notes
- G is the Gaussian Distribution
- Each part p has a Gaussian density
- Dp is the pth entry of vector d, the occlusion
vector, d sign(h)
11Model Representation(3)
- Shape Model
- A joint Gaussian distribution
- The covariance matrix is a full matrix
- Alpha is the background area, f is the
normalization factor(?)
12Model Representation(4)
- Relative Scale Model
- Occlusion Model
- Where M is the mean of the Poission distribution
13Learning(1)
- Task to estimate
- ML estimate using training images
14Learning(2)
- Varying levels of supervision
- Unsupervised
- Image labels
- Object centroid/bounding box
- Segmented object
- Manual correspondence (typically sub-optimal)
- Generative models naturally incorporate labelling
information (or lack of it) - If the correspondence of parts and detected
features in the training data are not given, use
EM algorithm to estimate the parameters
Contains a motorbike
15Learning using EM
- Task Estimation of model parameters
- Chicken and Egg type problem, since we initially
know neither - Model parameters
- - Assignment of regions to parts
- Let the assignments be a hidden variable and use
EM algorithm to learn them and the model
parameters
16Learning procedure
- Find regions their location appearance
- Initialize model parameters
- Use EM and iterate to convergence
E-step Compute assignments for which regions
belong to which part M-step Update model
parameters
- Trying to maximize likelihood consistency in
shape appearance
17Todays Coverage
- Constellation Model
- Fergus, R. , Perona, P. and Zisserman, A. Object
Class Recognition by Unsupervised Scale-Invariant
Learning, CVPR 2003. (Winner of CVPR 2003 Best
Pape prize) - Pictorial Structure Representation
- P. Felzenszwalb and D. Huttenlocher. Pictorial
structures for object recognition. International,
Journal of Computer Vision, 6155-79, January
2005. - Shape Matching
- A. C. Berg, T. L. Berg, J. Malik. Shape Matching
and Object Recognition using Low Distortion
Correspondence, CVPR 2005
18Pictorial Structure
- Represent an object by a collection of parts
arranged in a deformable configuration - Model appearance of each part separately
- Deformable configuration using spring-like
connection between pairs of parts - Matching as an optimization problem
19Efficient Optimization Search
- Previous methods have used heuristics or local
search techniques that do not find an optimal
solution and depend on having good initialization - This paper presents an efficient search for the
optimal solution given that - the graph G be acyclic (i.e., form a tree).
- the relationships between connected pairs of
parts is a Mahalanobis distance between
transformed locations
20Mahalanobis DistanceBob Fisher
- The distance between two N dimensional points
scaled by the statistical variation in each
component of the point. For example, if X and Y
are two points from the same distribution which
has covariance matrix , then the Mahalanobis
distance is given by - Sqrt(((X-Y)) C(-1)(X-Y))
- The Mahalanobis distance is the same as the
Euclidean distance if the covariance matrix is
the identity matrix. - A common usage in computer vision systems is for
comparing feature vectors whose elements are
quantities having different ranges and amounts of
variation, such as a 2-vector recording the
properties of area and perimeter.
21Todays Coverage
- Constellation Model
- Fergus, R. , Perona, P. and Zisserman, A. Object
Class Recognition by Unsupervised Scale-Invariant
Learning, CVPR 2003. (Winner of CVPR 2003 Best
Pape prize) - Pictorial Structure Representation
- P. Felzenszwalb and D. Huttenlocher. Pictorial
structures for object recognition. International,
Journal of Computer Vision, 6155-79, January
2005. - Shape Matching
- A. C. Berg, T. L. Berg, J. Malik. Shape Matching
and Object Recognition using Low Distortion
Correspondence, CVPR 2005
22A 1-1 matching problem
Database of Templates
Query Image
Best matching template is a helicopter
23A 1-1 matching problem
- Find a correspondence between the query image and
each template - Evaluate correspondence based on
- Similarity of appearance near feature points
- Similarity in configuration of the feature points
(distortion)
24An Integer Quadratic Programming Problem
- Use a binary matrix x to represent a
correspondence, x(i,j)1 iff template point i
maps to query point j - An Integer Quadratic Programming Problem
25Measuring Distortion(Similarity in Configuration)
Query
Template
Rij
Si'j'
Measure distortion in vectors between pairs of
feature points - R and S same length for
rotations - R and S same direction for scalings
26Modeling Distortion
- Formulation
- da penalizes the change in direction
- dr penalizes the change in length
- are constants
27Experiment Face Detection Result
28Caltech 101 Recognition Results
102 way Alternative Forced Choice test (15
training examples per class)
Chance 1 N.N. whole image 16 Discrimina
tive version of Constellation Model 27 N.N.
Geometric Blur Descriptors 38 Low Distortion
Correspondence (GBIQP) 45
102 way confusion matrix
100
0
29Summary
- Constellation Model
- Gaussian model for each parts appearance
- Gaussian model for shape
- Gaussian model for relative scale
- Poission model for occlusion
- Pictorial Structure
- Model shape using pairs of parts
- Geometry Distortion
- One to one Match, instead matching an image to a
model built from a large number of training
images - Model distortion using rotation distortion and
length distortion for pairs of parts