Title: Sharing features for multiclass object detection
1Sharing features for multi-class object detection
- Antonio Torralba, Kevin Murphy and Bill Freeman
- MIT
- Computer Science and Artificial Intelligence
Laboratory (CSAIL) - Oct. 10, 2004
2Antonio Torralba
Kevin Murphy
3Goal
- We want a machine to be able to identify
thousands of different objects as it looks around
the world.
4Multi-class object detection
5Where is the field?
But the problem of multi-class and multi-view
object detection is still largely unsolved.
6Why multi-object detection is a hard problem
Styles, lighting conditions, etc, etc, etc
Need to detect Nclasses Nviews Nstyles, in
clutter. Lots of variability within classes, and
across viewpoints.
7Standard approach for multiclass object detection
(vision community)
Using a set of independent binary classifiers is
the dominant strategy
8Characteristics of one-vs-all multiclass
approaches cost
Computational cost grows linearly with Nclasses
Nviews Nstyles Surely, this will not scale
well to 30,000 object classes.
9Characteristics of one-vs-all approaches
representation
Part-based object representation (looking for
meaningful parts)
- M. Weber, M. Welling and P. Perona
Ullman, Vidal-Naquet, and Sali, 2004 features
of intermediate complexity are most informative
for (single-object) classification.
10Other vision-related approaches
11Multi-class classifiers (machine learning
community)
- Error correcting output codes (Dietterich
Bakiri, 1995 ) - But only use classification decisions (1/-1), not
real values. - Reducing multi-class to binary (Allwein et al,
2000) - Showed that the best code matrix is
problem-dependent dont address how to design
code matrix. - Bunching algorithm (Dekel and Singer, 2002)
- Also learns code matrix and classifies, but more
complicated than our algorithm and not applied to
object detection. - Multitask learning (Caruana, 1997 )
- Train tasks in parallel to improve
generalization, share features. But not applied
to object detection, nor in a boosting framework.
12Our approach
- Share features across objects, automatically
selecting the best sharing pattern. - Benefits of shared features
- Efficiency
- Sharing computations across classes
- Accuracy
- Generalization ability
- Sharing generic knowledge about detecting objects
(eg, from the background).
13Independent features
Object class 1
Total number of hyperplanes (features) 4 x 6
24. Scales linearly with number of classes
14Shared features
Total number of shared hyperplanes (features) 8
May scale sub-linearly with number of classes,
and may generalize better.
15Aside the sharing structure is a graph, not a
tree
Features 3, 4
Features 1, 2
Features 5, 6
Features 7, 8
Sharing graph for the 8 features
16At the algorithmic level
- Our approach is a variation on boosting that
allows for sharing features in a natural way. - So lets review boosting (ada-boost demo)
17Boosting demo
application to vision
18Additive models for classification
19Feature sharing in additive models
- Simple to have sharing between additive models
- Each term hm can be mapped to a single feature
G1
G1,2
G2
20Flavors of boosting
- Different boosting algorithms use different loss
- functions or minimization procedures
- (Freund Shapire, 1995 Friedman, Hastie,
Tibshhirani, 1998). - We base our approach on Gentle boosting learns
faster than others - (Friedman, Hastie, Tibshhirani, 1998
- Lienahart, Kuranov, Pisarevsky, 2003).
21Multi-class Boosting
We use the exponential multi-class cost function
classes
classifier output for class c
membership in class c, 1/-1
cost function
22Weak learners are shared
At each boosting round, we add a perturbation or
weak learner which is shared across some
classes
23Use Newtons method to select weak learners
Treat hm as a perturbation, and expand loss J to
second order in hm
classifier with perturbation
cost function
squared error
reweighting
24Multi-class Boosting
weight
squared error
Weight squared error over training data
25Specialize weak learners to decision stumps
hm (v,c)
Feature output, v
26Find weak learner parameters analytically
hm (v,c)
Feature output, v
27Joint Boosting select sharing pattern and weak
learner to minimize cost.
Conceptually, for all features for all class
sharing patterns find the optimal decision
stump, hm(v,c) end end select the hm(v,c) and
sharing pattern that minimizes the weighted
squared error Jwse for this boosting round.
28Example selected weak learner, hm(v,c)
object 1
object 2
object 3
object 4
object 5
Algorithm details in CVPR 2004, Torralba, Murphy
Freeman
29Approximate best sharing
To avoid exploring all 2C 1 possible sharing
patterns, use best-first search S Grow a
list of candidate sharing patterns, S. while
length S lt Nc for each object class, ci, not in
S consider adding ci to the list of shared
classes, S for all features, hm evaluate the
cost J of hm shared over S, ci end end S
S, cmin_cost end Pick the sharing pattern S and
feature hm which gave the minimum multi-class
cost J.
30The heuristic for approximate best sharing works
well
C9 classes, D2 dimensions, synthetic data
31Effect of pattern of feature sharing on number of
features required (synthetic example)
32Effect of pattern of feature sharing on number of
features required (synthetic example)
(best first search heuristic)
33Database of 2500 images
Annotated instances
34Now, apply this to images.Image features (weak
learners)
32x32 training image of an object
35The candidate features
position
template
36The candidate features
position
template
Dictionary of 2000 candidate patches and position
masks, randomly sampled from the training images
37Multiclass object detection
We use 20 - 50 training samples per object, and
about 20 times as many background examples as
object examples.
38Feature sharing at each boosting round during
training
39Feature sharing at each boosting round during
training
40Example shared feature (weak classifier)
Response histograms for background (blue) and
class members (red)
At each round of running joint boosting on
training set we get a feature and a sharing
pattern.
41How the features were shared across objects
(features sorted left-to-right from generic to
specific)
42Performance evaluation
Correct detection rate
Area under ROC (shown is .9)
False alarm rate
43Performance improvement over training
Significant benefit to sharing features using
joint boosting.
44ROC curves for our 21 object database
- How will this work under training-starved or
feature-starved conditions? - Presumably, in the real world, we will always be
starved for training data and for features.
4570 features, 20 training examples (left)
4615 features, 20 training examples (mid)
70 features, 20 training examples (left)
4715 features, 2 training examples (right)
70 features, 20 training examples (left)
15 features, 20 training examples (middle)
48Scaling
Joint Boosting shows sub-linear scaling of
features with objects (for area under ROC 0.9).
Results averaged over 8 training sets, and
different combinations of objects. Error bars
show variability.
49Red shared features Blue independent
features
50Red shared features Blue independent
features
51What makes good features?
- Depends on whether we are doing single-class or
multi-class detection
52Generic vs. specific features
53Shared feature
Non-shared feature
54Shared feature
Non-shared feature
55Qualitative comparison of features, for
single-class and multi-class detectors
56An application of feature sharing Object
clustering
Count number of common features between objects
57Multi-view object detectiontrain for object and
orientation
Sharing features is a natural approach to
view-invariant object detection.
View invariant features
View specific features
58Multi-view object detection
Sharing is not a tree. Depends also on 3D
symmetries.
59Multi-view object detection
60Multi-view object detection
Strong learner H response for car as function of
assumed view angle
61Visual summary
62Features
Object units
63(No Transcript)
64Summary
- Feature sharing essential for scaling up object
detection to many objects and viewpoints. - Joint boosting generalizes boosting.
- Initial results (up to 30 objects) show the
desired scaling behavior. - The shared features
- generalize better,
- allow learning from fewer examples,
- with fewer features.