Title: Visual Object Recognition
1Visual Object Recognition
- Bastian Leibe
- Computer Vision Laboratory
- ETH Zurich
- Chicago, 14.07.2008
Kristen Grauman Department of Computer
Sciences University of Texas in Austin
2Outline
- Detection with Global Appearance Sliding
Windows - Local Invariant Features Detection Description
- Specific Object Recognition with Local Features
- ? Coffee Break ?
- Visual Words Indexing, Bags of Words
Categorization - Matching Local Features
- Part-Based Models for Categorization
- Current Challenges and Research Directions
2
K. Grauman, B. Leibe
3Recognition with Local Features
- Image content is transformed into local features
that are invariant to translation, rotation, and
scale - Goal Verify if they belong to a consistent
configuration
Local Features, e.g. SIFT
Slide credit David Lowe
4Finding Consistent Configurations
- Global spatial models
- Generalized Hough Transform Lowe99
- RANSAC Obdrzalek02, Chum05, Nister06
- Basic assumption object is planar
- Assumption is often justified in practice
- Valid for many structures on buildings
- Sufficient for small viewpoint variations on 3D
objects
5Hough Transform
- Origin Detection of straight lines in clutter
- Basic idea each candidate point votes for all
lines that it is consistent with. - Votes are accumulated in quantized array
- Local maxima correspond to candidate lines
- Representation of a line
- Usual form y a x b has a singularity around
90º. - Better parameterization x cos(?) y sin(?) ?
6Examples
- Hough transform for a square (left) and a circle
(right)
7Hough Transform Noisy Line
- Problem Finding the true maximum
?
?
Tokens
Votes
Slide credit David Lowe
8Hough Transform Noisy Input
- Problem Lots of spurious maxima
?
?
Tokens
Votes
Slide credit David Lowe
9Generalized Hough Transform Ballard81
- Generalization for an arbitrary contour or shape
- Choose reference point for the contour (e.g.
center) - For each point on the contour remember where it
is located w.r.t. to the reference point - Remember radius r and angle ?relative to the
contour tangent - Recognition whenever you find a contour point,
calculate the tangent angle and vote for all
possible reference points - Instead of reference point, can also vote for
transformation - ? The same idea can be used with local features!
Slide credit Bernt Schiele
10Gen. Hough Transform with Local Features
- For every feature, store possible occurrences
- For new image, let the matched features vote for
possible object positions
- Object identity
- Pose
- Relative position
11When is the Hough transform useful?
- Textbooks wrongly imply that it is useful mostly
for finding lines - In fact, it can be very effective for recognizing
arbitrary shapes or objects - The key to efficiency is to have each feature
(token) determine as many parameters as possible - For example, lines can be detected much more
efficiently from small edge elements (or points
with local gradients) than from just points - For object recognition, each token should predict
location, scale, and orientation (4D array) - Bottom line The Hough transform can extract
feature groupings from clutter in linear time!
Slide credit David Lowe
123D Object Recognition
- Gen. HT for Recognition
- Typically only 3 feature matches needed for
recognition - Extra matches provide robustness
- Affine model can be used for planar objects
Lowe99
Slide credit David Lowe
13View Interpolation
- Training
- Training views from similar viewpoints are
clusteredbased on feature matches. - Matching features between adjacent views are
linked. - Recognition
- Feature matches may bespread over several
training viewpoints. - ? Use the known links to transfer votes to
other viewpoints.
Lowe01
Slide credit David Lowe
14Recognition Using View Interpolation
Lowe01
Slide credit David Lowe
15Location Recognition
Training
Lowe04
Slide credit David Lowe
16Applications
- Sony Aibo(Evolution Robotics)
- SIFT usage
- Recognize docking station
- Communicate with visual cards
- Other uses
- Place recognition
- Loop closure in SLAM
Slide credit David Lowe
17RANSAC (RANdom SAmple Consensus) Fischler81
- Randomly choose a minimal subset of data points
necessary to fit a model (a sample) - Points within some distance threshold t of model
are a consensus set. Size of consensus set is
models support. - Repeat for N samples model with biggest support
is most robust fit - Points within distance t of best model are
inliers - Fit final model to all inliers
Slide credit David Lowe
18Slide credit David Forsyth
19RANSAC How many samples?
- How many samples are needed?
- Suppose w is fraction of inliers (points from
line). - n points needed to define hypothesis (2 for
lines) - k samples chosen.
- Prob. that a single sample of n points is
correct - Prob. that all samples fail is
- ? Choose k high enough to keep this below desired
failure rate.
Slide credit David Lowe
20RANSAC Computed k (p0.99)
Slide credit David Lowe
21After RANSAC
- RANSAC divides data into inliers and outliers and
yields estimate computed from minimal set of
inliers - Improve this initial estimate with estimation
over all inliers (e.g. with standard
least-squares minimization) - But this may change inliers, so alternate fitting
with re-classification as inlier/outlier
Slide credit David Lowe
22Example Finding Feature Matches
- Find best stereo match within a square search
window (here 300 pixels2) - Global transformation model epipolar geometry
from Hartley Zisserman
Slide credit David Lowe
23Example Finding Feature Matches
- Find best stereo match within a square search
window (here 300 pixels2) - Global transformation model epipolar geometry
before RANSAC
after RANSAC
from Hartley Zisserman
Slide credit David Lowe
24Comparison
- Gen. Hough Transform
- Advantages
- Very effective for recognizing arbitrary shapes
or objects - Can handle high percentage of outliers (gt95)
- Extracts groupings from clutter in linear time
- Disadvantages
- Quantization issues
- Only practical for small number of dimensions (up
to 4) - Improvements available
- Probabilistic Extensions
- Continuous Voting Space
- RANSAC
- Advantages
- General method suited to large range of problems
- Easy to implement
- Independent of number of dimensions
- Disadvantages
- Only handles moderate number of outliers (lt50)
- Many variants available, e.g.
- PROSAC Progressive RANSAC Chum05
- Preemptive RANSAC Nister05
25Example Applications
- Mobile tourist guide
- Self-localization
- Object/building recognition
- Photo/video augmentation
Quack, Leibe, Van Gool, CIVR08
26Web Demo Movie Poster Recognition
50000 movieposters indexed
Query-by-imagefrom mobile phoneavailable in
Switzer-land
27Application Large-Scale Retrieval
Query
Results from 5k Flickr images (demo available for
100k set)
Philbin CVPR07
28Application Image Auto-Annotation
Moulin Rouge
Old Town Square (Prague)
Tour Montparnasse
Colosseum
ViktualienmarktMaypole
Left Wikipedia imageRight closest match from
Flickr
Quack CIVR08
29Outline
- Detection with Global Appearance Sliding
Windows - Local Invariant Features Detection Description
- Specific Object Recognition with Local Features
- ? Coffee Break ?
- Visual Words Indexing, Bags of Words
Categorization - Matching Local Features
- Part-Based Models for Categorization
- Current Challenges and Research Directions
29
K. Grauman, B. Leibe