Title: Object Recognition
1Object Recognition
- Outline
- Introduction
- Representation Concept
- Representation Features
- Learning Recognition
- Segmentation Recognition
2- Credits major sources of material, including
figures and slides were - Riesenhuber Poggio, Hierarchical models of
object recognition in cortex. Nature
Neuroscience, 1991. - B. Mel. SeeMore. Neural Computation, 1997.
- Ullman, Vidal-Naquet, Sari. Visual features of
intermediate complexity and their use in
classification. Nature Neuroscience, 2002. - David G. Lowe. Distinctive Image Features from
Scale-Invariant Keypoints. Int. J. of Computer
Vision, 2004. - and various resources on the WWW
3Why is it difficult?
- Because appearance drastically varies with
- position/pose/scale
- lighting/shadows
- articulation/expression
- partial occlusion
- need invariant recognition!
4The Classical View
Historically
Feature Extraction
Segmentation
Recognition
Image
Problem Bottom-up segmentation only works in
very limited range of situations! This
architecture is fundamentally flawed!
Two ways out 1) direct recognition, 2)
integration of seg.rec.
5Ventral Stream
edges, bars
objects, faces
? larger RFs, higher complexity, higher
invariance ?
K.Tanaka (IT)
D.vanEssen (V2)
6Basic Models
seminal work by Fukushima, newer version by
Riesenhuber and Poggio
7Questions
- what are the intermediate features?
- how/why are they being learned?
- how is invariance computation implemented?
- what nonlinearities at what level (dendrites?)
- how is invariance learned?
- temporal continuity role of eye movements
- basic model is feedforward, what do feedback
connections do? - attention/segmentation/bayesian inference?
8Representation Concept
- 3-d models wont talk about
- view-based
- holistic descriptions of a view
- invariant features/histogram techniques
- spatial constellation of localized features
9Holistic Descriptions ITemplates
- Idea
- compare image (regions) directly to template
- image patches, object template are represented as
high-dimensional vectors - simple comparison metrics (Euclidean distance,
normalized correlation, ...) - Problem
- such metrics not robust w.r.t. even small changes
in position/aspect/scale changes or deformations - ? difficult to achieve invariance
10Holistic Descriptions IIEigenspace Approach
- Somewhat better Eigenspace approaches
- perform Principal Component Analysis (PCA) on
training images (e.g. Eigenfaces - compare images by projecting on subset of the PCs
MuraseNayar (1995)
TurkPentland (1992)
11Assessment
- quite successful for segmented and carefully
aligned images (e.g., eyes and nose are at the
same pixel coordinates in all images) - but similar problems as above
- not well-suited for clutter
- problems with occlusions
- some notable extensions trying to deal with this
(e.g., Leonardis, 1996,1997)
12Feature Histograms
Idea reach invariance by computing invariant
features Examples Mel (1997), SchieleCrowley
(1997,2000)
histogram pooling throw occurrences of simple
feature from all image regions together into one
bin
13- Assessment
- works very well for segmented images with
- only one object, but...
- Problem
- histograms of simple features over the whole
image leads to a superposition catastrophe,
lacks a binding mechanism - consider several objects in scene histogram
contains all their features no representation of
which features came from same object - system breaks down for clutter or complex
backgrounds
14B. Mel (1997)
15Training and test images, performance
A
B
C
D
E
16Feature Constellations
Observation holistic templates and histogram
techniques cant handle cluttered scenes
well Idea How about constellations of
features? E.g. face is constellation of eyes,
nose, mouth, etc.
17Representation Features
- Only discuss local features
- image patches
- wavelet basis, e.g., Haar, Gabor
- complex features, e.g., SIFT ( Scale Invariant
Feature Transform)
18Image Patches
Ullman, Vidal-Naquet, Sali (2002)
merit
likelihood ratio
weight
19Intermediate complexity is best (trivial result,
really)
20Recognition examples
21Gabor Wavelets
image space
frequency space
- in frequency space Gabor wavelet is a Gaussian
- wavelet different wavelets are scaled/rotated
versions of a mother wavelet
22Gabor Wavelets as filters
Gabor filters sin() and cos() part
compute correlation of image with filter at every
location x0
23Tiling of frequency space Jets
measured frequency tuning of biological neurons
(left) and dense coverage
applying different Gabor filters (with different
k) to same image location gives vector of filter
responses Jet
24SIFT Features
- step 1 find scale space extrema
25- step 2 apply contrast and curvature requirements
26- step 3 local image descriptor extracted at key
points is a 128-dim vector
27Learning and Recognition
- top-down model matching
- Elastic graph matching
- bottom-up indexing
- with or without shared features
28Elastic Graph Matching (EGM)
Representation graph nodes labelled with Jets
(Gabor filter responses of different
scales/orientations) Matching Minimize cost
function that punishes dissimilarities of Gabor
responses and distortions of the graph through
stochastic optimization techniques
29Bunch Graphs
Idea add invariance by labelling graph nodes
with collection or bunch of different feature
exemplars (Wiskott et.al.,1995, 1997) Advantage
can decouple finding the facial features from the
identification Matching uses a MAX rule.
30Indexing Methods
- when you want to recognize very many objects,
its inefficient to individually check for each
model by searching for all of its features in a
top-down fashion - better indexing methods
- also share features among object models
31Recognition with SIFT features
- recognition extract SIFT features match to
nearest neighbor in data base of stored features
use Hough transform to pool votes
32Recognition with Gabor Jets and Color Features
33Scaling Behavior when Sharing Features between
models
- Recognition speed limited more by number of
features rather than number of object models,
modest number of features o.k. - can incorporate many feature types
- can incorporate stereo (reasoning about
occlusions)
34Hierarchies of Features
- Long history of using hierarchies
- Fukushimas Neocognitron (1983),
- NelsonSelinger (1998,1999)
- Advantages using hierarchy
- faster learning and processing
- better grip on correlated
- deformations
- easier to find proper specificity
- vs. invariance tradeoff?
35Feature Learning
- Unsupervised clustering not necessarily optimal
for discrimination - Use big bag of features, fish out the useful ones
(e.g. via boosting Viola, 1997) takes very long
to train, since you have to consider every
feature from that big bag - Note usefulness of one feature depends on the
which other ones youre using already. - Learn higher level features as (nonlinear)
combinations of lower level features (Perona
et.al., 2000) also takes very long to train,
only up to 5 features. But could use locality
constraint
36Feedback
Question Why all the feedback connections in the
brain? Important for on-line processing? Neurosci
ence Object recognition in 150 ms (Thorpe
et.al., 1996), but interesting temporal response
properties of IT neurons (OramRichmond, 1999)
some V1 neurons restore line behind an
occluder Idea Feed-forward architecture cant
correct errors made at early stages later on.
Feedback architecture can! High level
hypotheses try to reinforce their lower level
evidence while hypotheses compete at all
levels.
37Recognition Segmentation
- Basic Idea integrate recognition with
segmentation in a feedback architecture - object hypotheses reinforce their supporting
evidence and inhibit competing evidence,
suppressing features that do not belong to them
(idea goes back to at least the PDP books) - at the same time restore missing features due to
partial occlusion (associative memory property)
38Current work in this area
- mostly demonstrating how recognition can aid
segmentation - what is missing is a clear and elegant
demonstration of a truly integrated system that
shows how the two kinds of processing help each
other - Maybe dont treat as two kinds of processing but
one inference problem - how best to do this? million dollar question