Learning shared representations for object recognition - PowerPoint PPT Presentation

1 / 99

About This Presentation

Title:

Learning shared representations for object recognition

Description:

Learning shared representations for object recognition – PowerPoint PPT presentation

Number of Views:97

Avg rating:3.0/5.0

Slides: 100

Provided by: carbonVide1

Category:

more less

Transcript and Presenter's Notes

Title: Learning shared representations for object recognition

1
Learning shared representations for object
recognition
Antonio Torralba CSAIL Massachusetts Institute of
Technology
In collaboration with Erik Sudderth, Kevin
Murphy William
Freeman, Aude Oliva
2
Collaborators
Erik Sudderth, Berkeley
Kevin Murphy, UBC
Aude Oliva, MIT
William Freeman, MIT
3
Standard approach for object detection
Object detection and recognition is formulated as
a classification problem.
The image is partitioned into a set of
overlapping windows
and a decision is taken at each window about if
it contains a target object or not.
Decision boundary
Where are the screens?
Bag of image patches
4
Face detection

Human Face Detection in Visual Scenes - Rowley,
Baluja, Kanade (1995)
Graded Learning for Object Detection - Fleuret,
Geman (1999)
Robust Real-time Object Detection - Viola, Jones
(2001)
Feature Reduction and Hierarchy of Classifiers
for Fast Object Detection in Video Images -
Heisele, Serre, Mukherjee, Poggio (2001)

5
The single class age
6
Multiclass object detection
Using a set of independent binary classifiers is
the dominant strategy

Viola-Jones extension for dealing with rotations

- two cascades for each view

Schneiderman-Kanade multiclass object detection

a) One detector for each class
7
Object detection and theHead in the coffee
beans problem
8
Head in the coffee beans problem
Can you find the head in this image?
9
Head in the coffee beans problem
Can you find the head in this image?
10
Symptoms of local detectors
False alarms occur in image regions in which is
impossible for the target to be present.
11
Failure modes for object presence detection
Low probability of keyboard presence
High probability of keyboard presence
12
The system does not care about the scene, but we
do
We know there is a keyboard present in this scene
even if we cannot see it clearly.
13
Some symptoms of one-vs-all multiclass approaches
What is the best representation to detect a
traffic sign?
Very regular object template matching will do
the job
Some of these parts cannot be used for anything
else than this object.
14
Some symptoms of one-vs-all multiclass approaches
Part-based object representation (looking for
meaningful parts)

A. Agarwal and D. Roth

M. Weber, M. Welling and P. Perona

These studies try to recover parts that are
meaningful. But is this the right thing to do?
The derived parts may be too specific, and they
are not likely to be useful in a general system.
15
Some symptoms of one-vs-all multiclass approaches
Computational cost grows linearly with Nclasses
Nviews Nstyles
16
Green pastures for research in multiclass object
detection

Transfer knowledge between objects
More efficient representations
Better generalization
Discovering commonalities
Context among objects
More efficient search
Robust systems
Scene understanding

17
Links to datasets
The next tables summarize some of the available
datasets for training and testing object
detection and recognition algorithms. These lists
are far from exhaustive.
Databases for object localization
CMU/MIT frontal faces vasc.ri.cmu.edu/idb/html/face/frontal_images cbcl.mit.edu/software-datasets/FaceData2.html Patches Frontal faces
Graz-02 Database www.emt.tugraz.at/pinz/data/GRAZ_02/ Segmentation masks Bikes, cars, people
UIUC Image Database l2r.cs.uiuc.edu/cogcomp/Data/Car/ Bounding boxes Cars
TU Darmstadt Database www.vision.ethz.ch/leibe/data/ Segmentation masks Motorbikes, cars, cows
LabelMe dataset people.csail.mit.edu/brussell/research/LabelMe/intro.html Polygonal boundary gt500 Categories
Databases for object recognition
Caltech 101 www.vision.caltech.edu/Image_Datasets/Caltech101/Caltech101.html Segmentation masks 101 categories
COIL-100 www1.cs.columbia.edu/CAVE/research/softlib/coil-100.html Patches 100 instances
NORB www.cs.nyu.edu/ylclab/data/norb-v1.0/ Bounding box 50 toys
On-line annotation tools
ESP game www.espgame.org Global image descriptions Web images
LabelMe people.csail.mit.edu/brussell/research/LabelMe/intro.html Polygonal boundary High resolution images
Collections
PASCAL http//www.pascal-network.org/challenges/VOC/ Segmentation, boxes various
18
Bryan Russell, Antonio Torralba, Bill Freeman
Google search LabelMe
19
LabelMe Screen Shot
20
Some stats
21
Online resources
http//people.csail.mit.edu/torralba/iccv2005/
22
What do we do with many classes?
Styles, lighting conditions, etc, etc, etc
Need to detect Nclasses Nviews Nstyles, in
clutter. Lots of variability within classes, and
across viewpoints.
23
Shared features

Is learning the object class 1000 easier than
learning the first?
Can we transfer knowledge from one object to
another?
Are the shared properties interesting by
themselves?

24
Multitask learning
R. Caruana. Multitask Learning. ML 1997
MTL improves generalization by leveraging the
domain-specific information contained in the
training signals of related tasks. It does this
by training tasks in parallel while using a
shared representation.
vs.
Sejnowski Rosenberg 1986 Hinton 1986 Le Cun
et al. 1989 Suddarth Kergosien 1990 Pratt et
al. 1991 Sharkey Sharkey 1992
25
Multitask learning
R. Caruana. Multitask Learning. ML 1997
Primary task detect door knobs
Tasks used

horizontal location of right door jamb
width of left door jamb
width of right door jamb
horizontal location of left edge of door
horizontal location of right edge of door

horizontal location of doorknob
single or double door
horizontal location of doorway center
width of doorway
horizontal location of left door jamb

26
Sharing invariances
S. Thrun. Is Learning the n-th Thing Any Easier
Than Learning The First? NIPS 1996 Knowledge is
transferred between tasks via a learned model of
the invariances of the domain object recognition
is invariant to rotation, translation, scaling,
lighting, These invariances are common to all
object recognition tasks.
Toy world
With sharing
Without sharing
27
Sharing transformations

Miller, E., Matsakis, N., and Viola, P. (2000).
Learning from one example through shared
densities on transforms. In IEEE Computer Vision
and Pattern Recognition.

Transformations are shared and can be learnt from
other tasks.
28
Models of object recognition
I. Biederman, Recognition-by-components A
theory of human image understanding,
Psychological Review, 1987. M. Riesenhuber and
T. Poggio, Hierarchical models of object
recognition in cortex, Nature Neuroscience 1999.
T. Serre, L. Wolf and T. Poggio. Object
recognition with features inspired by visual
cortex. CVPR 2005
29
Sharing in constellation models
Pictorial StructuresFischler Elschlager, IEEE
Trans. Comp. 1973
SVM DetectorsHeisele, Poggio, et. al., NIPS 2001
Constellation ModelFergus, Perona, Zisserman,
CVPR 2003
Model-Guided SegmentationMori, Ren, Efros,
Malik, CVPR 2004
30
Variational EM
Random initialization
Fei-Fei, Fergus, Perona, ICCV 2003
(Attias, Hinton, Beal, etc.)
Slide from Fei Fei Li
31
Grand piano
Slide from Fei Fei Li
32
Reusable Parts
Krempp, Geman, Amit Sequential Learning of
Reusable Parts for Object Detection. TR 2002
Goal Look for a vocabulary of edges that reduces
the number of features.
Examples of reused parts
Number of features
Number of classes
33
Sharing patches

Bart and Ullman, 2004

For a new class, use only features similar to
features that where good for other classes
Proposed Dog features
34
Additive models and boosting

Independent binary classifiers

Class 1
Class 2
Class 3

Binary classifiers that share features

Class 1
Class 2
Class 3
35
Boosting

Boosting fits the additive model

by minimizing the exponential loss
Training samples
The exponential loss is a differentiable upper
bound to the misclassification error.
36
Why boosting?

A simple algorithm for learning robust
classifiers
Freund Shapire, 1995
Friedman, Hastie, Tibshhirani, 1998
Provides efficient algorithm for sparse visual
feature selection
Tieu Viola, 2000
Viola Jones, 2003
Easy to implement, not requires external
optimization tools.

37
Weak detectors

Part based similar to part-based generative
models. We create weak detectors by using parts
and voting for the object center location

Screen model
Car model
These features are used for the detector on the
course web site.
38
Weak detectors

Tieu and Viola, CVPR 2000
Viola and Jones, ICCV 2001
Carmichael, Hebert 2004
Yuille, Snow, Nitzbert, 1998
Amit, Geman 1998
Papageorgiou, Poggio, 2000
Heisele, Serre, Poggio, 2001
Agarwal, Awan, Roth, 2004
Schneiderman, Kanade 2004

39
Weak detectors
First we collect a set of part templates from a
set of training objects. Vidal-Naquet, Ullman
(2003)

40
Weak detectors
We now define a family of weak detectors as

Better than chance
41
Example screen detection
Thresholded output
Feature output
Strong classifier

Adding features
Final classification
Strong classifier at iteration 200
42
Multi-class Boosting
We use the exponential multi-class cost function
classes
classifier output for class c
membership in class c, 1/-1
cost function
Freund Shapire, 1995 Friedman, Hastie,
Tibshhirani, 1998
43
Weak learners are shared
At each boosting round, we add a perturbation or
weak learner which is shared across some
classes
We add the weak classifier that provides the best
reduction of the exponential cost
Freund Shapire, 1995 Friedman, Hastie,
Tibshhirani, 1998
44
Summary of our algorithm for finding shared
features

It is an iterative algorithm that adds one
feature at each iteration
At each iteration, the algorithm selects from a
dictionary of features, the best feature and the
set of object classes to which the feature has to
be applied.
All the training samples are reweighted to
increase the weight of samples for which the
previously selected features provided wrong
labels.

45
Specific feature
pedestrian
chair
Traffic light
sign
face
Background class
Non-shared feature this feature is too specific
to faces.
46
Shared feature
shared feature
47
Shared vs. specific features
48
Shared vs. specific features
49
How the features are shared across objects
(features sorted left-to-right from generic to
specific)
Torralba, Murphy, Freeman. CVPR 2004.
50
Red shared features Blue independent
features
Sharing features shows sub-linear scaling of
features with objects (for area under ROC 0.9).
Results averaged over 8 training sets, and
different combinations of objects. Error bars
show variability.
51
Red shared features Blue independent
features
52
(No Transcript)
53
An application of feature sharing Object
clustering
Count number of common features between objects
54
Multi-view object detectiontrain for object and
orientation
Sharing features is a natural approach to
view-invariant object detection.
View invariant features
View specific features
55
Multi-view object detection
Sharing is not a tree. Depends also on 3D
symmetries.

56
Multi-view object detection
Strong learner H response for car as function of
assumed view angle
57
Generalization as a function of object
similarities
Number of training samples per class
Number of training samples per class
Each point in the graphs is the average over the
12 classes.
58
PASCAL dataset
59
From shared to specific features
Face detection and recognition
60
Hierarchical Topic Models
Pr(topic doc)

Topic models typically use a bag of words
approx.
Learning topics allows transfer of information
within a corpus of related documents
Mixing proportions capture the distinctive
features of particular documents

a
p
z
q
K
x
N
J
Pr(word topic)
Latent Dirichlet Allocation (LDA)Blei, Ng,
Jordan, JMLR 2003
61
Hierarchical Topic Models
S
Pr(xword ztopic) Pr(ztopic doc)
Pr(xword doc)
Pr(topic doc)
topic
a
p
z
q
K
x
N
J
Pr(word topic)
Latent Dirichlet Allocation (LDA)Blei, Ng,
Jordan, JMLR 2003
62
Hierarchical Topic Models
Pr(topic doc)
a
p
z
q
K
Some previous work on bag of features models
x
N
J
Object Recognition (Sivic et. al., ICCV
2005) Scene Recognition (Fei-Fei et. al., CVPR
2005)
Pr(word topic)
Latent Dirichlet Allocation (LDA)Blei, Ng,
Jordan, JMLR 2003
63
Hierarchical Sharing and Context
E. Sudderth, A. Torralba, W. T. Freeman, and A.
Wilsky. ICCV 2005.

Scenes share objects
Objects share parts
Parts share features

64
From images to visual words
Maximally StableExtremal Regions
Linked Sequencesof Canny Edges
Affinely AdaptedHarris Corners

Some invariance to lighting pose variations
Dense, multiscale, over-segmentation of image

65
From images to visual words
SIFT Descriptors

Normalized histograms of orientation energy
Compute 1,000 word dictionary via K-means
Map each feature to nearest visual word

Lowe, IJCV 2004
appearance offeature i in image j
2D position offeature i in image j
66
Object models
Constellation model
Bag of words
Structured clusters
E. Sudderth, A. Torralba, W. T. Freeman, and A.
Wilsky. ICCV 2005.
67
Counting Objects Parts
How many parts?
68
Generative Model for Objects
69
Graphical Model for Objects
p
z
Y
For each of J images, sample a reference position
r
z
m
h
L
w
y
K
N
K
J
70
Parametric Object Model

For a fixed reference position, the generative
model is equivalent to a finite mixture model

Distribution of appearances for each part
Mixture of K parts
Feature appearance
Distribution of feature locations for each part
Feature location
Weights

How many parts should we choose?
Too few reduces model accuracy
Too many causes overfitting poor generalization

71
Dirichlet Process Object Model

Dirichlet process allows using an infinite mixture

Dirichlet Processes define priors over the
mixture weights pok

Some weights are effectively zero which
corresponds to having a finite number of parts
(automatically selected from the data).

72
Dirichlet Process Object Model
p
z
a
Y
r
z
m
h
L
w
y

N
J
73
Decomposing Faces into Parts
Number of parts
Number training images
4 Images
16 Images
64 Images
74
Multiclass object model

We want to model N object classes jointly
We want an efficient representation
We want to transfer between categories
Furthermore,
We do not know how many parts to share
We do not know how many parts each object should
use (each object needs different number of parts).

75
Learning Shared Parts

Objects are often locally similar in appearance
Discover parts shared across categories
Need unsupervised methods for part discovery

Sharing features in a discriminative framework
(Torralba, Murphy, Freeman, CVPR 2004)
76
HDP Object Model

We learn the number of parts.
Each object uses a different number of parts.
The model assumes a known number of object
categories.

77
HDP Object Model
There is no context, so the model is happy in
creating impossible part combinations.
78
HDP Object Model
Global Dirichlet process learns number of shared
parts
g
b
p
z
a
Y
Reference position allows consistent spatial
model
Objects reuse global parts in different
proportions
r
z
m
Parts location model
h
L
w
y

N
Parts appearance model
J
Joint model of O objects
O
79
Learning HDPs Gibbs Sampling
g
b
R
p
z
a
Y
Integrate
r
Sample
Sample(via implicit table assignments)
z
m
H
y
h
L
w
y
H
w

N
J
Integrate
Integrate
O
80
Sharing Parts 16 Categories

Caltech 101 Dataset (Li Perona)
Horses (Borenstein Ullman)
Cat dog faces (Vidal-Naquet Ullman)

Bikes from Graz-02 (Opelt Pinz)
Google

81
Visualization of Shared Parts
Pr(position part)
Pr(appearance part)
82
Visualization of Shared Parts
Pr(position part)
Pr(appearance part)
83
Visualization of Part Densities
MDS Embedding of Pr(part object)
84
Detection Task
versus
85
Detection Results
Detection vs. Training Set Size
6 Training Images per Category
86
Recognition Task
versus
87
Recognition Results
6 Training Images per Category
Recognition vs. Training Set Size
88
Context
What do you think are the hidden objects?
1
2
89
Context
What do you think are the hidden objects?
Even without local object models, we can make
reasonable detections!
90
The multiple personalities of a blob
91
The multiple personalities of a blob
Human vision Biederman, Bar Ullman, Palmer,
92
Context relationships between objects
Detect first simple objects (reliable detectors)
that provide strong contextual constraints to the
target (screen -gt keyboard -gt mouse)
93
Global context location priming
How far can we go without object detectors?

Context features that represent the scene instead
of other objects.
The global features can provide
Object presence
Location priming
Scale priming

94
Object global features
First we create a dictionary of scene features
and object locations
Associated screen location
Feature map

.
.
.
Only the vertical position of the object is well
constrained by the global features
95
Object global features
How to compute the global features
96
Car detection with global features
Features selected by boosting
Car

Boosting round
97
Combining global and local

ROC for same total number of features (100
boosting rounds)
car
building
road
screen
keyboard
mouse
desk
Global and local
Only local
98
Clustering of objects with local and global
feature sharing
Clustering with local features
Clustering with global and local features
Objects are similar if they share local features
and they appear in the same contexts.
99
Conclusions

Sharing information at multiple levels leads to
reduced computation better generalization
What are the object representations that allow
transfer between classes?

Write a Comment

User Comments (0)