Title: Sharing features for multiclass object detection
1Sharing features for multi-class object detection
- Antonio Torralba, Kevin Murphy and Bill Freeman
- MIT
- Computer Science
- and Artificial Intelligence Laboratory (CSAIL)
- Sept. 1, 2004
2The lead author Antonio Torralba
3Goal
- We want a machine to be able to identify
thousands of different objects as it looks around
the world.
4Multi-class object detection
5Need to detect as well as to recognize
6There are efficient solutions for
But the problem of multi-class and multi-view
object detection is still largely unsolved.
7Why multi-object detection is a hard problem
Styles, lighting conditions, etc, etc, etc
Need to detect Nclasses Nviews Nstyles. Lots
of variability within classes, and across
viewpoints.
8Existing approaches for multiclass object
detection (vision community)
Using a set of independent binary classifiers is
the dominant strategy
9Promising approaches
10Characteristics of one-vs-all multiclass
approaches cost
Computational cost grows linearly with Nclasses
Nviews Nstyles Surely, this will not scale
well to 30,000 object classes.
11Characteristics of one-vs-all multiclass
approaches representation
What is the best representation to detect a
traffic sign?
Very regular object template matching will do
the job
Some of these parts can only be used for this
object.
12Meaningful parts, in one-vs-all approaches
Part-based object representation (looking for
meaningful parts)
- M. Weber, M. Welling and P. Perona
Ullman, Vidal-Naquet, and Sali, 2004 features
of intermediate complexity are most informative
for (single-object) classification.
13Multi-class classifiers (machine learning
community)
- Error correcting output codes (Dietterich
Bakiri, 1995 ) - But only use classification decisions (1/-1), not
real values. - Reducing multi-class to binary (Allwein et al,
2000) - Showed that the best code matrix is
problem-dependent dont address how to design
code matrix. - Bunching algorithm (Dekel and Singer, 2002)
- Also learns code matrix and classifies, but more
complicated than our algorithm and not applied to
object detection. - Multitask learning (Caruana, 1997 )
- Train tasks in parallel to improve
generalization, share features. But not applied
to object detection, nor in a boosting framework.
14Our approach
- Share features across objects, automatically
selecting the best sharing pattern. - Benefits of shared features
- Efficiency
- Accuracy
- Generalization ability
15Algorithm goals, for object recognition.
We want to find the vocabulary of parts that can
be shared We want share across different objects
generic knowledge about detecting objects (eg,
from the background). We want to share
computations across classes so that computational
cost lt O(Number of classes)
16Independent features
Total number of hyperplanes (features) 4 x 6
24. Scales linearly with number of classes
17Shared features
Total number of shared hyperplanes (features) 8
May scale sub-linearly with number of classes,
and may generalize better.
18Note sharing is a graph, not a tree
Objects R, b, 3
R
3
b
This defines a vocabulary of parts shared across
objects
19At the algorithmic level
- Our approach is a variation on boosting that
allows for sharing features in a natural way. - So lets review boosting (ada-boost demo)
20Boosting demo
21Joint boosting, outside of the context of
images.Additive models for classification
22Feature sharing in additive models
- Simple to have sharing between additive models
- Each term hm can be mapped to a single feature
23Flavors of boosting
- Different boosting algorithms use different loss
- functions or minimization procedures
- (Freund Shapire, 1995 Friedman, Hastie,
Tibshhirani, 1998). - We base our approach on Gentle boosting learns
faster than others - (Friedman, Hastie, Tibshhirani, 1998
- Lienahart, Kuranov, Pisarevsky, 2003).
24Joint Boosting
We use the exponential multi-class cost function
classes
classifier output for class c
membership in class c, 1/-1
25Newtons method
Treat hm as a perturbation, and expand loss J to
second order in hm
classifier with perturbation
squared error
reweighting
26Joint Boosting
weight
squared error
Weight squared error over training data
27For a trial sharing pattern, set weak learner
parameters to optimize overall classification
hm (v,c)
Feature output, v
28Joint Boosting select sharing pattern and weak
learner to minimize cost.
Algorithm details in CVPR 2004, Torralba, Murphy
Freeman
29Approximate best sharing
- But this requires exploring 2C 1 possible
sharing patterns - Instead we use a first-best search
- S
- 1) We fit stumps for each class independently
- 2) take best class - ci
- S S ci
- fit stumps for S ci with ci not in S
- go to 2, until length(S) Nclasses
- 3) select the sharing with smallest WLS error
30Effect of pattern of feature sharing on number of
features required (synthetic example)
31Effect of pattern of feature sharing on number of
features required (synthetic example)
322-d synthetic example
3 classes 1 background class
33No feature sharing
Three one-vs-all binary classifiers
This is with only 8 separation lines
34With feature sharing
Some lines can be shared across classes.
This is with only 8 separation lines
35The shared features
36Comparison of the classifiers
Shared features note better isolation of
individual classes.
Non-shared features.
37Now, apply this to images.Image features (weak
learners)
32x32 training image of an object
38The candidate features
position
template
39The candidate features
position
template
Dictionary of 2000 candidate patches and position
masks, randomly sampled from the training images
40Database of 2500 images
Annotated instances
41Multiclass object detection
We use 20, 50 training samples per object, and
about 20 times as many background examples as
object examples.
42Feature sharing at each boosting round during
training
43Feature sharing at each boosting round during
training
44Example shared feature (weak classifier)
Response histograms for background (blue) and
class members (red)
At each round of running joint boosting on
training set we get a feature and a sharing
pattern.
45Shared feature
Non-shared feature
46Shared feature
Non-shared feature
47How the features were shared across objects
(features sorted left-to-right from generic to
specific)
48Performance evaluation
Correct detection rate
Area under ROC (shown is .9)
False alarm rate
49Performance improvement over training
Significant benefit to sharing features using
joint boosting.
50ROC curves for our 21 object database
- How will this work under training-starved or
feature-starved conditions? - Presumably, in the real world, we will always be
starved for training data and for features.
5170 features, 20 training examples (left)
5215 features, 20 training examples (mid)
70 features, 20 training examples (left)
5315 features, 2 training examples (right)
70 features, 20 training examples (left)
15 features, 20 training examples (middle)
54Scaling
Joint Boosting shows sub-linear scaling of
features with objects (for area under ROC 0.9).
Results averaged over 8 training sets, and
different combinations of objects. Error bars
show variability.
55Red shared features Blue independent
features
56Red shared features Blue independent
features
57Examples of correct detections
58What makes good features?
- Depends on whether we are doing single-class or
multi-class detection
59Generic vs. specific features
60Qualitative comparison of features, for
single-class and multi-class detectors
61Multi-view object detectiontrain for object and
orientation
Sharing features is a natural approach to
view-invariant object detection.
View invariant features
View specific features
62Multi-view object detection
Sharing is not a tree. Depends also on 3D
symmetries.
63Multi-view object detection
64Multi-view object detection
Strong learner H response for car as function of
assumed view angle
65Visual summary
66Features
Object units
67(No Transcript)
68Summary
- Argued that feature sharing will be an essential
part of scaling up object detection to hundreds
or thousands of objects (and viewpoints). - We introduced joint boosting, a generalization to
boosting that incorporates feature sharing in a
natural way. - Initial results (up to 30 objects) show the
desired scaling behavior for features vs
objects. - The shared features are observed to generalize
better, allowing learning from fewer examples,
using fewer features.
69end