Learning shared representations for object recognition - PowerPoint PPT Presentation

1 / 99
About This Presentation
Title:

Learning shared representations for object recognition

Description:

Learning shared representations for object recognition – PowerPoint PPT presentation

Number of Views:97
Avg rating:3.0/5.0
Slides: 100
Provided by: carbonVide1
Category:

less

Transcript and Presenter's Notes

Title: Learning shared representations for object recognition


1
Learning shared representations for object
recognition
Antonio Torralba CSAIL Massachusetts Institute of
Technology
In collaboration with Erik Sudderth, Kevin
Murphy William
Freeman, Aude Oliva
2
Collaborators
Erik Sudderth, Berkeley
Kevin Murphy, UBC
Aude Oliva, MIT
William Freeman, MIT
3
Standard approach for object detection
Object detection and recognition is formulated as
a classification problem.
The image is partitioned into a set of
overlapping windows
and a decision is taken at each window about if
it contains a target object or not.
Decision boundary
Where are the screens?
Bag of image patches
4
Face detection
  • Human Face Detection in Visual Scenes - Rowley,
    Baluja, Kanade (1995)
  • Graded Learning for Object Detection - Fleuret,
    Geman (1999)
  • Robust Real-time Object Detection - Viola, Jones
    (2001)
  • Feature Reduction and Hierarchy of Classifiers
    for Fast Object Detection in Video Images -
    Heisele, Serre, Mukherjee, Poggio (2001)

5
The single class age
6
Multiclass object detection
Using a set of independent binary classifiers is
the dominant strategy
  • Viola-Jones extension for dealing with rotations

- two cascades for each view
  • Schneiderman-Kanade multiclass object detection

a) One detector for each class
7
Object detection and theHead in the coffee
beans problem
8
Head in the coffee beans problem
Can you find the head in this image?
9
Head in the coffee beans problem
Can you find the head in this image?
10
Symptoms of local detectors
False alarms occur in image regions in which is
impossible for the target to be present.
11
Failure modes for object presence detection
Low probability of keyboard presence
High probability of keyboard presence
12
The system does not care about the scene, but we
do
We know there is a keyboard present in this scene
even if we cannot see it clearly.
13
Some symptoms of one-vs-all multiclass approaches
What is the best representation to detect a
traffic sign?
Very regular object template matching will do
the job
Some of these parts cannot be used for anything
else than this object.
14
Some symptoms of one-vs-all multiclass approaches
Part-based object representation (looking for
meaningful parts)
  • A. Agarwal and D. Roth
  • M. Weber, M. Welling and P. Perona


These studies try to recover parts that are
meaningful. But is this the right thing to do?
The derived parts may be too specific, and they
are not likely to be useful in a general system.
15
Some symptoms of one-vs-all multiclass approaches
Computational cost grows linearly with Nclasses
Nviews Nstyles
16
Green pastures for research in multiclass object
detection
  • Transfer knowledge between objects
  • More efficient representations
  • Better generalization
  • Discovering commonalities
  • Context among objects
  • More efficient search
  • Robust systems
  • Scene understanding

17
Links to datasets
The next tables summarize some of the available
datasets for training and testing object
detection and recognition algorithms. These lists
are far from exhaustive.
Databases for object localization
CMU/MIT frontal faces vasc.ri.cmu.edu/idb/html/face/frontal_images cbcl.mit.edu/software-datasets/FaceData2.html Patches Frontal faces
Graz-02 Database www.emt.tugraz.at/pinz/data/GRAZ_02/ Segmentation masks Bikes, cars, people
UIUC Image Database l2r.cs.uiuc.edu/cogcomp/Data/Car/ Bounding boxes Cars
TU Darmstadt Database www.vision.ethz.ch/leibe/data/ Segmentation masks Motorbikes, cars, cows
LabelMe dataset people.csail.mit.edu/brussell/research/LabelMe/intro.html Polygonal boundary gt500 Categories
Databases for object recognition
Caltech 101 www.vision.caltech.edu/Image_Datasets/Caltech101/Caltech101.html Segmentation masks 101 categories
COIL-100 www1.cs.columbia.edu/CAVE/research/softlib/coil-100.html Patches 100 instances
NORB www.cs.nyu.edu/ylclab/data/norb-v1.0/ Bounding box 50 toys
On-line annotation tools
ESP game www.espgame.org Global image descriptions Web images
LabelMe people.csail.mit.edu/brussell/research/LabelMe/intro.html Polygonal boundary High resolution images
Collections
PASCAL http//www.pascal-network.org/challenges/VOC/ Segmentation, boxes various
18
Bryan Russell, Antonio Torralba, Bill Freeman
Google search LabelMe
19
LabelMe Screen Shot
20
Some stats
21
Online resources
http//people.csail.mit.edu/torralba/iccv2005/
22
What do we do with many classes?
Styles, lighting conditions, etc, etc, etc
Need to detect Nclasses Nviews Nstyles, in
clutter. Lots of variability within classes, and
across viewpoints.
23
Shared features
  • Is learning the object class 1000 easier than
    learning the first?
  • Can we transfer knowledge from one object to
    another?
  • Are the shared properties interesting by
    themselves?


24
Multitask learning
R. Caruana. Multitask Learning. ML 1997
MTL improves generalization by leveraging the
domain-specific information contained in the
training signals of related tasks. It does this
by training tasks in parallel while using a
shared representation.
vs.
Sejnowski Rosenberg 1986 Hinton 1986 Le Cun
et al. 1989 Suddarth Kergosien 1990 Pratt et
al. 1991 Sharkey Sharkey 1992
25
Multitask learning
R. Caruana. Multitask Learning. ML 1997
Primary task detect door knobs
Tasks used
  • horizontal location of right door jamb
  • width of left door jamb
  • width of right door jamb
  • horizontal location of left edge of door
  • horizontal location of right edge of door
  • horizontal location of doorknob
  • single or double door
  • horizontal location of doorway center
  • width of doorway
  • horizontal location of left door jamb

26
Sharing invariances
S. Thrun. Is Learning the n-th Thing Any Easier
Than Learning The First? NIPS 1996 Knowledge is
transferred between tasks via a learned model of
the invariances of the domain object recognition
is invariant to rotation, translation, scaling,
lighting, These invariances are common to all
object recognition tasks.
Toy world
With sharing
Without sharing
27
Sharing transformations
  • Miller, E., Matsakis, N., and Viola, P. (2000).
    Learning from one example through shared
    densities on transforms. In IEEE Computer Vision
    and Pattern Recognition.

Transformations are shared and can be learnt from
other tasks.
28
Models of object recognition
I. Biederman, Recognition-by-components A
theory of human image understanding,
Psychological Review, 1987. M. Riesenhuber and
T. Poggio, Hierarchical models of object
recognition in cortex, Nature Neuroscience 1999.
T. Serre, L. Wolf and T. Poggio. Object
recognition with features inspired by visual
cortex. CVPR 2005
29
Sharing in constellation models
Pictorial StructuresFischler Elschlager, IEEE
Trans. Comp. 1973
SVM DetectorsHeisele, Poggio, et. al., NIPS 2001
Constellation ModelFergus, Perona, Zisserman,
CVPR 2003
Model-Guided SegmentationMori, Ren, Efros,
Malik, CVPR 2004
30
Variational EM
Random initialization
Fei-Fei, Fergus, Perona, ICCV 2003
(Attias, Hinton, Beal, etc.)
Slide from Fei Fei Li
31
Grand piano
Slide from Fei Fei Li
32
Reusable Parts
Krempp, Geman, Amit Sequential Learning of
Reusable Parts for Object Detection. TR 2002
Goal Look for a vocabulary of edges that reduces
the number of features.
Examples of reused parts
Number of features
Number of classes
33
Sharing patches
  • Bart and Ullman, 2004

For a new class, use only features similar to
features that where good for other classes
Proposed Dog features
34
Additive models and boosting
  • Independent binary classifiers

Class 1
Class 2
Class 3
  • Binary classifiers that share features

Class 1
Class 2
Class 3
35
Boosting
  • Boosting fits the additive model

by minimizing the exponential loss
Training samples
The exponential loss is a differentiable upper
bound to the misclassification error.
36
Why boosting?
  • A simple algorithm for learning robust
    classifiers
  • Freund Shapire, 1995
  • Friedman, Hastie, Tibshhirani, 1998
  • Provides efficient algorithm for sparse visual
    feature selection
  • Tieu Viola, 2000
  • Viola Jones, 2003
  • Easy to implement, not requires external
    optimization tools.

37
Weak detectors
  • Part based similar to part-based generative
    models. We create weak detectors by using parts
    and voting for the object center location

Screen model
Car model
These features are used for the detector on the
course web site.
38
Weak detectors
  • Tieu and Viola, CVPR 2000
  • Viola and Jones, ICCV 2001
  • Carmichael, Hebert 2004
  • Yuille, Snow, Nitzbert, 1998
  • Amit, Geman 1998
  • Papageorgiou, Poggio, 2000
  • Heisele, Serre, Poggio, 2001
  • Agarwal, Awan, Roth, 2004
  • Schneiderman, Kanade 2004

39
Weak detectors
First we collect a set of part templates from a
set of training objects. Vidal-Naquet, Ullman
(2003)

40
Weak detectors
We now define a family of weak detectors as



Better than chance
41
Example screen detection
Thresholded output
Feature output
Strong classifier


Adding features
Final classification
Strong classifier at iteration 200
42
Multi-class Boosting
We use the exponential multi-class cost function
classes
classifier output for class c
membership in class c, 1/-1
cost function
Freund Shapire, 1995 Friedman, Hastie,
Tibshhirani, 1998
43
Weak learners are shared
At each boosting round, we add a perturbation or
weak learner which is shared across some
classes
We add the weak classifier that provides the best
reduction of the exponential cost
Freund Shapire, 1995 Friedman, Hastie,
Tibshhirani, 1998
44
Summary of our algorithm for finding shared
features
  • It is an iterative algorithm that adds one
    feature at each iteration
  • At each iteration, the algorithm selects from a
    dictionary of features, the best feature and the
    set of object classes to which the feature has to
    be applied.
  • All the training samples are reweighted to
    increase the weight of samples for which the
    previously selected features provided wrong
    labels.

45
Specific feature
pedestrian
chair
Traffic light
sign
face
Background class
Non-shared feature this feature is too specific
to faces.
46
Shared feature
shared feature
47
Shared vs. specific features
48
Shared vs. specific features
49
How the features are shared across objects
(features sorted left-to-right from generic to
specific)
Torralba, Murphy, Freeman. CVPR 2004.
50
Red shared features Blue independent
features
Sharing features shows sub-linear scaling of
features with objects (for area under ROC 0.9).
Results averaged over 8 training sets, and
different combinations of objects. Error bars
show variability.
51
Red shared features Blue independent
features
52
(No Transcript)
53
An application of feature sharing Object
clustering
Count number of common features between objects
54
Multi-view object detectiontrain for object and
orientation
Sharing features is a natural approach to
view-invariant object detection.
View invariant features
View specific features
55
Multi-view object detection
Sharing is not a tree. Depends also on 3D
symmetries.


56
Multi-view object detection
Strong learner H response for car as function of
assumed view angle
57
Generalization as a function of object
similarities
Number of training samples per class
Number of training samples per class
Each point in the graphs is the average over the
12 classes.
58
PASCAL dataset
59
From shared to specific features
Face detection and recognition
60
Hierarchical Topic Models
Pr(topic doc)
  • Topic models typically use a bag of words
    approx.
  • Learning topics allows transfer of information
    within a corpus of related documents
  • Mixing proportions capture the distinctive
    features of particular documents

a
p
z
q
K
x
N
J
Pr(word topic)
Latent Dirichlet Allocation (LDA)Blei, Ng,
Jordan, JMLR 2003
61
Hierarchical Topic Models
S
Pr(xword ztopic) Pr(ztopic doc)
Pr(xword doc)
Pr(topic doc)
topic
a
p
z
q
K
x
N
J
Pr(word topic)
Latent Dirichlet Allocation (LDA)Blei, Ng,
Jordan, JMLR 2003
62
Hierarchical Topic Models
Pr(topic doc)
a
p
z
q
K
Some previous work on bag of features models
x
N
J
Object Recognition (Sivic et. al., ICCV
2005) Scene Recognition (Fei-Fei et. al., CVPR
2005)
Pr(word topic)
Latent Dirichlet Allocation (LDA)Blei, Ng,
Jordan, JMLR 2003
63
Hierarchical Sharing and Context
E. Sudderth, A. Torralba, W. T. Freeman, and A.
Wilsky. ICCV 2005.
  • Scenes share objects
  • Objects share parts
  • Parts share features

64
From images to visual words
Maximally StableExtremal Regions
Linked Sequencesof Canny Edges
Affinely AdaptedHarris Corners
  • Some invariance to lighting pose variations
  • Dense, multiscale, over-segmentation of image

65
From images to visual words
SIFT Descriptors
  • Normalized histograms of orientation energy
  • Compute 1,000 word dictionary via K-means
  • Map each feature to nearest visual word

Lowe, IJCV 2004
appearance offeature i in image j
2D position offeature i in image j
66
Object models
Constellation model
Bag of words
Structured clusters
E. Sudderth, A. Torralba, W. T. Freeman, and A.
Wilsky. ICCV 2005.
67
Counting Objects Parts
How many parts?
68
Generative Model for Objects
69
Graphical Model for Objects
p
z
Y
For each of J images, sample a reference position
r
z
m
h
L
w
y
K
N
K
J
70
Parametric Object Model
  • For a fixed reference position, the generative
    model is equivalent to a finite mixture model

Distribution of appearances for each part
Mixture of K parts
Feature appearance
Distribution of feature locations for each part
Feature location
Weights
  • How many parts should we choose?
  • Too few reduces model accuracy
  • Too many causes overfitting poor generalization

71
Dirichlet Process Object Model
  • Dirichlet process allows using an infinite mixture

Dirichlet Processes define priors over the
mixture weights pok
  • Some weights are effectively zero which
    corresponds to having a finite number of parts
    (automatically selected from the data).

72
Dirichlet Process Object Model
p
z
a
Y
r
z
m
h
L
w
y


N
J
73
Decomposing Faces into Parts
Number of parts
Number training images
4 Images
16 Images
64 Images
74
Multiclass object model
  • We want to model N object classes jointly
  • We want an efficient representation
  • We want to transfer between categories
  • Furthermore,
  • We do not know how many parts to share
  • We do not know how many parts each object should
    use (each object needs different number of parts).

75
Learning Shared Parts
  • Objects are often locally similar in appearance
  • Discover parts shared across categories
  • Need unsupervised methods for part discovery

Sharing features in a discriminative framework
(Torralba, Murphy, Freeman, CVPR 2004)
76
HDP Object Model
  • We learn the number of parts.
  • Each object uses a different number of parts.
  • The model assumes a known number of object
    categories.

77
HDP Object Model
There is no context, so the model is happy in
creating impossible part combinations.
78
HDP Object Model
Global Dirichlet process learns number of shared
parts
g
b
p
z
a
Y
Reference position allows consistent spatial
model
Objects reuse global parts in different
proportions
r
z
m
Parts location model
h
L
w
y


N
Parts appearance model
J
Joint model of O objects
O
79
Learning HDPs Gibbs Sampling
g
b
R
p
z
a
Y
Integrate
r
Sample
Sample(via implicit table assignments)
z
m
H
y
h
L
w
y
H
w


N
J
Integrate
Integrate
O
80
Sharing Parts 16 Categories
  • Caltech 101 Dataset (Li Perona)
  • Horses (Borenstein Ullman)
  • Cat dog faces (Vidal-Naquet Ullman)
  • Bikes from Graz-02 (Opelt Pinz)
  • Google

81
Visualization of Shared Parts
Pr(position part)
Pr(appearance part)
82
Visualization of Shared Parts
Pr(position part)
Pr(appearance part)
83
Visualization of Part Densities
MDS Embedding of Pr(part object)
84
Detection Task
versus
85
Detection Results
Detection vs. Training Set Size
6 Training Images per Category
86
Recognition Task
versus
87
Recognition Results
6 Training Images per Category
Recognition vs. Training Set Size
88
Context
What do you think are the hidden objects?
1
2
89
Context
What do you think are the hidden objects?
Even without local object models, we can make
reasonable detections!
90
The multiple personalities of a blob
91
The multiple personalities of a blob
Human vision Biederman, Bar Ullman, Palmer,
92
Context relationships between objects
Detect first simple objects (reliable detectors)
that provide strong contextual constraints to the
target (screen -gt keyboard -gt mouse)
93
Global context location priming
How far can we go without object detectors?
  • Context features that represent the scene instead
    of other objects.
  • The global features can provide
  • Object presence
  • Location priming
  • Scale priming




94
Object global features
First we create a dictionary of scene features
and object locations
Associated screen location
Feature map

.
.
.
Only the vertical position of the object is well
constrained by the global features
95
Object global features
How to compute the global features
96
Car detection with global features
Features selected by boosting
Car

Boosting round
97
Combining global and local



ROC for same total number of features (100
boosting rounds)
car
building
road
screen
keyboard
mouse
desk
Global and local
Only local
98
Clustering of objects with local and global
feature sharing
Clustering with local features
Clustering with global and local features
Objects are similar if they share local features
and they appear in the same contexts.
99
Conclusions
  • Sharing information at multiple levels leads to
    reduced computation better generalization
  • What are the object representations that allow
    transfer between classes?
Write a Comment
User Comments (0)
About PowerShow.com