Object Recognition - PowerPoint PPT Presentation

1 / 38

About This Presentation

Title:

Object Recognition

Description:

Riesenhuber & Poggio, Hierarchical models of object recognition in cortex. ... neighbor in data base of stored features; use Hough transform to pool votes ... – PowerPoint PPT presentation

Number of Views:115

Avg rating:3.0/5.0

Slides: 39

Provided by: jochent

Category:

more less

Transcript and Presenter's Notes

Title: Object Recognition

1
Object Recognition

Outline
Introduction
Representation Concept
Representation Features
Learning Recognition
Segmentation Recognition

Credits major sources of material, including
figures and slides were
Riesenhuber Poggio, Hierarchical models of
object recognition in cortex. Nature
Neuroscience, 1991.
B. Mel. SeeMore. Neural Computation, 1997.
Ullman, Vidal-Naquet, Sari. Visual features of
intermediate complexity and their use in
classification. Nature Neuroscience, 2002.
David G. Lowe. Distinctive Image Features from
Scale-Invariant Keypoints. Int. J. of Computer
Vision, 2004.
and various resources on the WWW

3
Why is it difficult?

Because appearance drastically varies with

position/pose/scale
lighting/shadows

articulation/expression
partial occlusion

need invariant recognition!

4
The Classical View
Historically
Feature Extraction
Segmentation
Recognition
Image
Problem Bottom-up segmentation only works in
very limited range of situations! This
architecture is fundamentally flawed!
Two ways out 1) direct recognition, 2)
integration of seg.rec.
5
Ventral Stream
edges, bars
objects, faces
? larger RFs, higher complexity, higher
invariance ?
K.Tanaka (IT)
D.vanEssen (V2)
6
Basic Models
seminal work by Fukushima, newer version by
Riesenhuber and Poggio
7
Questions

what are the intermediate features?
how/why are they being learned?
how is invariance computation implemented?
what nonlinearities at what level (dendrites?)
how is invariance learned?
temporal continuity role of eye movements
basic model is feedforward, what do feedback
connections do?
attention/segmentation/bayesian inference?

8
Representation Concept

3-d models wont talk about
view-based
holistic descriptions of a view
invariant features/histogram techniques
spatial constellation of localized features

9
Holistic Descriptions ITemplates

Idea
compare image (regions) directly to template
image patches, object template are represented as
high-dimensional vectors
simple comparison metrics (Euclidean distance,
normalized correlation, ...)
Problem
such metrics not robust w.r.t. even small changes
in position/aspect/scale changes or deformations
? difficult to achieve invariance

10
Holistic Descriptions IIEigenspace Approach

Somewhat better Eigenspace approaches
perform Principal Component Analysis (PCA) on
training images (e.g. Eigenfaces
compare images by projecting on subset of the PCs

MuraseNayar (1995)
TurkPentland (1992)
11
Assessment

quite successful for segmented and carefully
aligned images (e.g., eyes and nose are at the
same pixel coordinates in all images)
but similar problems as above
not well-suited for clutter
problems with occlusions
some notable extensions trying to deal with this
(e.g., Leonardis, 1996,1997)

12
Feature Histograms
Idea reach invariance by computing invariant
features Examples Mel (1997), SchieleCrowley
(1997,2000)
histogram pooling throw occurrences of simple
feature from all image regions together into one
bin
13

Assessment
works very well for segmented images with
only one object, but...
Problem
histograms of simple features over the whole
image leads to a superposition catastrophe,
lacks a binding mechanism
consider several objects in scene histogram
contains all their features no representation of
which features came from same object
system breaks down for clutter or complex
backgrounds

14
B. Mel (1997)
15
Training and test images, performance
A
B
C
D
E
16
Feature Constellations
Observation holistic templates and histogram
techniques cant handle cluttered scenes
well Idea How about constellations of
features? E.g. face is constellation of eyes,
nose, mouth, etc.
17
Representation Features

Only discuss local features
image patches
wavelet basis, e.g., Haar, Gabor
complex features, e.g., SIFT ( Scale Invariant
Feature Transform)

18
Image Patches
Ullman, Vidal-Naquet, Sali (2002)
merit
likelihood ratio
weight
19
Intermediate complexity is best (trivial result,
really)
20
Recognition examples
21
Gabor Wavelets
image space
frequency space

in frequency space Gabor wavelet is a Gaussian
wavelet different wavelets are scaled/rotated
versions of a mother wavelet

22
Gabor Wavelets as filters
Gabor filters sin() and cos() part
compute correlation of image with filter at every
location x0
23
Tiling of frequency space Jets
measured frequency tuning of biological neurons
(left) and dense coverage
applying different Gabor filters (with different
k) to same image location gives vector of filter
responses Jet
24
SIFT Features

step 1 find scale space extrema

step 2 apply contrast and curvature requirements

step 3 local image descriptor extracted at key
points is a 128-dim vector

27
Learning and Recognition

top-down model matching
Elastic graph matching
bottom-up indexing
with or without shared features

28
Elastic Graph Matching (EGM)
Representation graph nodes labelled with Jets
(Gabor filter responses of different
scales/orientations) Matching Minimize cost
function that punishes dissimilarities of Gabor
responses and distortions of the graph through
stochastic optimization techniques
29
Bunch Graphs
Idea add invariance by labelling graph nodes
with collection or bunch of different feature
exemplars (Wiskott et.al.,1995, 1997) Advantage
can decouple finding the facial features from the
identification Matching uses a MAX rule.
30
Indexing Methods

when you want to recognize very many objects,
its inefficient to individually check for each
model by searching for all of its features in a
top-down fashion
better indexing methods
also share features among object models

31
Recognition with SIFT features

recognition extract SIFT features match to
nearest neighbor in data base of stored features
use Hough transform to pool votes

32
Recognition with Gabor Jets and Color Features
33
Scaling Behavior when Sharing Features between
models

Recognition speed limited more by number of
features rather than number of object models,
modest number of features o.k.
can incorporate many feature types
can incorporate stereo (reasoning about
occlusions)

34
Hierarchies of Features

Long history of using hierarchies
Fukushimas Neocognitron (1983),
NelsonSelinger (1998,1999)
Advantages using hierarchy
faster learning and processing
better grip on correlated
deformations
easier to find proper specificity
vs. invariance tradeoff?

35
Feature Learning

Unsupervised clustering not necessarily optimal
for discrimination
Use big bag of features, fish out the useful ones
(e.g. via boosting Viola, 1997) takes very long
to train, since you have to consider every
feature from that big bag
Note usefulness of one feature depends on the
which other ones youre using already.
Learn higher level features as (nonlinear)
combinations of lower level features (Perona
et.al., 2000) also takes very long to train,
only up to 5 features. But could use locality
constraint

36
Feedback
Question Why all the feedback connections in the
brain? Important for on-line processing? Neurosci
ence Object recognition in 150 ms (Thorpe
et.al., 1996), but interesting temporal response
properties of IT neurons (OramRichmond, 1999)
some V1 neurons restore line behind an
occluder Idea Feed-forward architecture cant
correct errors made at early stages later on.
Feedback architecture can! High level
hypotheses try to reinforce their lower level
evidence while hypotheses compete at all
levels.
37
Recognition Segmentation

Basic Idea integrate recognition with
segmentation in a feedback architecture
object hypotheses reinforce their supporting
evidence and inhibit competing evidence,
suppressing features that do not belong to them
(idea goes back to at least the PDP books)
at the same time restore missing features due to
partial occlusion (associative memory property)

38
Current work in this area

mostly demonstrating how recognition can aid
segmentation
what is missing is a clear and elegant
demonstration of a truly integrated system that
shows how the two kinds of processing help each
other
Maybe dont treat as two kinds of processing but
one inference problem
how best to do this? million dollar question

Write a Comment

User Comments (0)