Visual Object Recognition - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

Visual Object Recognition

Description:

Perceptual and Sensory Augmented Computing. Visual Object Recognition Tutorial ... Tour Montparnasse. Colosseum. Viktualienmarkt. Maypole. Old Town Square (Prague) 29 ... – PowerPoint PPT presentation

Number of Views:118

Avg rating:3.0/5.0

Slides: 26

Provided by: bastia

Category:

more less

Transcript and Presenter's Notes

Title: Visual Object Recognition

1
Visual Object Recognition

Bastian Leibe
Computer Vision Laboratory
ETH Zurich
Chicago, 14.07.2008

Kristen Grauman Department of Computer
Sciences University of Texas in Austin
2
Outline

Detection with Global Appearance Sliding
Windows
Local Invariant Features Detection Description
Specific Object Recognition with Local Features
? Coffee Break ?
Visual Words Indexing, Bags of Words
Categorization
Matching Local Features
Part-Based Models for Categorization
Current Challenges and Research Directions

2
K. Grauman, B. Leibe
3
Recognition with Local Features

Image content is transformed into local features
that are invariant to translation, rotation, and
scale
Goal Verify if they belong to a consistent
configuration

Local Features, e.g. SIFT
Slide credit David Lowe
4
Finding Consistent Configurations

Global spatial models
Generalized Hough Transform Lowe99
RANSAC Obdrzalek02, Chum05, Nister06
Basic assumption object is planar
Assumption is often justified in practice
Valid for many structures on buildings
Sufficient for small viewpoint variations on 3D
objects

5
Hough Transform

Origin Detection of straight lines in clutter
Basic idea each candidate point votes for all
lines that it is consistent with.
Votes are accumulated in quantized array
Local maxima correspond to candidate lines
Representation of a line
Usual form y a x b has a singularity around
90º.
Better parameterization x cos(?) y sin(?) ?

6
Examples

Hough transform for a square (left) and a circle
(right)

7
Hough Transform Noisy Line

Problem Finding the true maximum

?
?
Tokens
Votes
Slide credit David Lowe
8
Hough Transform Noisy Input

Problem Lots of spurious maxima

?
?
Tokens
Votes
Slide credit David Lowe
9
Generalized Hough Transform Ballard81

Generalization for an arbitrary contour or shape
Choose reference point for the contour (e.g.
center)
For each point on the contour remember where it
is located w.r.t. to the reference point
Remember radius r and angle ?relative to the
contour tangent
Recognition whenever you find a contour point,
calculate the tangent angle and vote for all
possible reference points
Instead of reference point, can also vote for
transformation
? The same idea can be used with local features!

Slide credit Bernt Schiele
10
Gen. Hough Transform with Local Features

For every feature, store possible occurrences

For new image, let the matched features vote for
possible object positions

Object identity
Pose
Relative position

11
When is the Hough transform useful?

Textbooks wrongly imply that it is useful mostly
for finding lines
In fact, it can be very effective for recognizing
arbitrary shapes or objects
The key to efficiency is to have each feature
(token) determine as many parameters as possible
For example, lines can be detected much more
efficiently from small edge elements (or points
with local gradients) than from just points
For object recognition, each token should predict
location, scale, and orientation (4D array)
Bottom line The Hough transform can extract
feature groupings from clutter in linear time!

Slide credit David Lowe
12
3D Object Recognition

Gen. HT for Recognition
Typically only 3 feature matches needed for
recognition
Extra matches provide robustness
Affine model can be used for planar objects

Lowe99
Slide credit David Lowe
13
View Interpolation

Training
Training views from similar viewpoints are
clusteredbased on feature matches.
Matching features between adjacent views are
linked.
Recognition
Feature matches may bespread over several
training viewpoints.
? Use the known links to transfer votes to
other viewpoints.

Lowe01
Slide credit David Lowe
14
Recognition Using View Interpolation
Lowe01
Slide credit David Lowe
15
Location Recognition
Training
Lowe04
Slide credit David Lowe
16
Applications

Sony Aibo(Evolution Robotics)
SIFT usage
Recognize docking station
Communicate with visual cards
Other uses
Place recognition
Loop closure in SLAM

Slide credit David Lowe
17
RANSAC (RANdom SAmple Consensus) Fischler81

Randomly choose a minimal subset of data points
necessary to fit a model (a sample)
Points within some distance threshold t of model
are a consensus set. Size of consensus set is
models support.
Repeat for N samples model with biggest support
is most robust fit
Points within distance t of best model are
inliers
Fit final model to all inliers

Slide credit David Lowe
18
Slide credit David Forsyth
19
RANSAC How many samples?

How many samples are needed?
Suppose w is fraction of inliers (points from
line).
n points needed to define hypothesis (2 for
lines)
k samples chosen.
Prob. that a single sample of n points is
correct
Prob. that all samples fail is
? Choose k high enough to keep this below desired
failure rate.

Slide credit David Lowe
20
RANSAC Computed k (p0.99)
Slide credit David Lowe
21
After RANSAC

RANSAC divides data into inliers and outliers and
yields estimate computed from minimal set of
inliers
Improve this initial estimate with estimation
over all inliers (e.g. with standard
least-squares minimization)
But this may change inliers, so alternate fitting
with re-classification as inlier/outlier

Slide credit David Lowe
22
Example Finding Feature Matches

Find best stereo match within a square search
window (here 300 pixels2)
Global transformation model epipolar geometry

from Hartley Zisserman
Slide credit David Lowe
23
Example Finding Feature Matches

Find best stereo match within a square search
window (here 300 pixels2)
Global transformation model epipolar geometry

before RANSAC
after RANSAC
from Hartley Zisserman
Slide credit David Lowe
24
Comparison

Gen. Hough Transform
Advantages
Very effective for recognizing arbitrary shapes
or objects
Can handle high percentage of outliers (gt95)
Extracts groupings from clutter in linear time
Disadvantages
Quantization issues
Only practical for small number of dimensions (up
to 4)
Improvements available
Probabilistic Extensions
Continuous Voting Space

RANSAC
Advantages
General method suited to large range of problems
Easy to implement
Independent of number of dimensions
Disadvantages
Only handles moderate number of outliers (lt50)
Many variants available, e.g.
PROSAC Progressive RANSAC Chum05
Preemptive RANSAC Nister05

25
Example Applications

Mobile tourist guide
Self-localization
Object/building recognition
Photo/video augmentation

Quack, Leibe, Van Gool, CIVR08
26
Web Demo Movie Poster Recognition
50000 movieposters indexed
Query-by-imagefrom mobile phoneavailable in
Switzer-land
27
Application Large-Scale Retrieval
Query
Results from 5k Flickr images (demo available for
100k set)
Philbin CVPR07
28
Application Image Auto-Annotation
Moulin Rouge
Old Town Square (Prague)
Tour Montparnasse
Colosseum
ViktualienmarktMaypole
Left Wikipedia imageRight closest match from
Flickr
Quack CIVR08
29
Outline