Title: Visual Object Recognition
1Visual Object Recognition
- Bastian Leibe
- Computer Vision Laboratory
- ETH Zurich
- Chicago, 14.07.2008
Kristen Grauman Department of Computer
Sciences University of Texas in Austin
2Outline
- Detection with Global Appearance Sliding
Windows - Local Invariant Features Detection Description
- Specific Object Recognition with Local Features
- ? Coffee Break ?
- Visual Words Indexing, Bags of Words
Categorization - Matching Local Features
- Part-Based Models for Categorization
- Current Challenges and Research Directions
2
K. Grauman, B. Leibe
3Recognition of Object Categories
- We no longer have exact correspondences
- On a local level, wecan still detect similar
parts. - Represent objectsby their parts
- ? Bag-of-features
- How can weimprove on this?
- Encode structure
Slide credit Rob Fergus
4Part-Based Models
- Fischler Elschlager 1973
- Model has two components
- parts (2D image fragments)
- structure (configuration of parts)
5Different Connectivity Structures
O(N6)
O(N2)
O(N3)
O(N2)
Fergus et al. 03 Fei-Fei et al. 03
Leibe et al. 04, 08Crandall et al. 05 Fergus
et al. 05
Crandall et al. 05
Felzenszwalb Huttenlocher 05
Bouchard Triggs 05
Carneiro Lowe 06
Csurka 04 Vasconcelos 00
from Carneiro Lowe, ECCV06
6Spatial Models Considered Here
Star shape model
Fully connected shape model
- e.g. Constellation Model
- Parts fully connected
- Recognition complexity O(NP)
- Method Exhaustive search
- e.g. ISM
- Parts mutually independent
- Recognition complexity O(NP)
- Method Gen. Hough Transform
Slide credit Rob Fergus
7Constellation Model
- Joint model for appearance and shape
8Constellation Model
9Constellation Model Learning Procedure
- Goal Find regions their location, scale
appearance - Initialize model parameters
- Use EM and iterate to convergence
- E-step Compute assignments for which regions are
foreground/background - M-step Update model parameters
- Trying to maximize likelihood consistency in
shape appearance
10Example Motorbikes
11Example Motorbikes (2)
12Example Spotted Cats
13Discussion Constellation Model
- Advantages
- Works well for many different object categories
- Can adapt well to categories where
- Shape is more important
- Appearance is more important
- Everything is learned from training data
- Weakly-supervised training possible
- Disadvantages
- Model contains many parameters that need to be
estimated - Cost increases exponentially with increasing
number of parameters - ? Fully connected model restricted to small
number of parts.
14Implicit Shape Model (ISM)
- Basic ideas
- Learn an appearance codebook
- Learn a star-topology structural model
- Features are considered independent given obj.
center - Algorithm probabilistic Gen. Hough Transform
- Exact correspondences ? Prob. match to object
part - NN matching ? Soft matching
- Feature location on obj. ? Part location
distribution - Uniform votes ? Probabilistic vote weighting
- Quantized Hough array ? Continuous Hough space
15Codebook Representation
- Extraction of local object features
- Interest Points (e.g. Harris detector)
- Sparse representation of the object appearance
- Collect features from whole training set
- Example
16Agglomerative Clustering
- Algorithm (Average-Link)
- Start with each patch as a cluster of its own
- Repeat
- Merge the two most similar clusters X and Y,
where the similarity between two clusters is
defined as the average similarity between their
members - Until
- Commonly used similarity measures
- Normalized correlation
- Euclidean distances
17Appearance Codebook
- Clustering Results
- Visual similarity preserved
- Wheel parts, window corners, fenders, ...
- Store cluster centers as Appearance Codebook
18Gen. Hough Transform with Local Features
- For every feature, store possible occurrences
- For new image, let the matched features vote for
possible object positions
- Object identity
- Pose
- Relative position
19Implicit Shape Model - Representation
- Learn appearance codebook
- Extract local features at interest points
- Agglomerative clustering ? codebook
- Learn spatial distributions
- Match codebook to training images
- Record matching positions on object
local figure-ground labels
20Leibe04, Leibe08
21Implicit Shape Model - Recognition
Interest Points
Leibe04, Leibe08
22Implicit Shape Model - Recognition
Interest Points
Leibe04, Leibe08
23Leibe04, Leibe08
24Example Results on Cows
25Example Results on Cows
26Example Results on Cows
27Example Results on Cows
28Example Results on Cows
1st hypothesis
29Example Results on Cows
2nd hypothesis
30Example Results on Cows
3rd hypothesis
31Scale Invariant Voting
- Scale-invariant feature selection
- Scale-invariant interest points
- Rescale extracted patches
- Match to constant-size codebook
- Generate scale votes
- Scale as 3rd dimension in voting space
- Search for maxima in 3D voting space
32Scale Voting Adaptive Search Window
- Voting equations
- ? Relative error, proportional to hypothesis
scale - ? Vote density decreases with increasing scale
- Adapt search window
- Increase size with hypothesis scale
- Intuitive interpretation detection tolerance
33Scale Voting Efficient Computation
- Mean-Shift formulation for refinement
- Scale-adaptive balloon density estimator
Scale votes
34Leibe04, Leibe08
35Detection Results
- Qualitative Performance
- Recognizes different kinds of objects
- Robust to clutter, occlusion, noise, low contrast
36Figure-Ground Segregation
- Problem extensively studied in Psychophysics
- Experiments with ambiguousfigure-ground stimuli
- Results
- Evidence that object recognition canand does
operate before figure-ground organization - Interpreted as Gestalt cue familiarity.
M.A. Peterson, Object Recognition Processes Can
and Do Operate Before Figure-Ground
Organization, Cur. Dir. in Psych. Sc.,
3105-111, 1994.
37ISM Top-Down Segmentation
Leibe04, Leibe08
38Segmentation Probabilistic Formulation
- Influence of patch on object hypothesis (vote
weight)
- Backprojection to features f and pixels p
Leibe04, Leibe08
39Segmentation Probabilistic Formulation
- Hypothesis generation
- Segmentation
Leibe04, Leibe08
40Derivation Top-down segmentation
Leibe04, Leibe08
41Derivation Top-down Segmentation
Leibe04, Leibe08
42Derivation Top-down Segmentation
- Hypothesis generation
- Segmentation
Leibe04, Leibe08
43Derivation Top-down Segmentation
- Hypothesis generation
- Segmentation
Leibe04, Leibe08
44Derivation Top-down Segmentation
- Hypothesis generation
- Segmentation
Leibe04, Leibe08
45Leibe04, Leibe08
46Segmentation
- Interpretation of p(figure) map
- per-pixel confidence in object hypothesis
- Use for hypothesis verification
Leibe04, Leibe08
47Example Results Motorbikes
48Example Results Cows
- Training
- 112 hand-segmented images
- Results on novel sequences
Single-frame recognition - No temporal continuity
used!
Leibe04, Leibe08
49Example Results Chairs
Dining room chairs
Office chairs
50Inferring Other Information Part Labels
Thomas07
51Inferring Other Information Part Labels (2)
Thomas07
52Inferring Other Information Depth Maps
Depth from a single image
Thomas07
53Application for Pedestrian Detection
- Estimating Articulation
- Rotation-Invariant Detection
Leibe, Seemann, Schiele, CVPR05
Mikolajczyk, Leibe, Schiele, CVPR06
54Outline
- Detection with Global Appearance Sliding
Windows - Local Invariant Features Detection Description
- Specific Object Recognition with Local Features
- ? Coffee Break ?
- Visual Words Indexing, Bags of Words
Categorization - Matching Local Features
- Part-Based Models for Categorization
- Current Challenges and Research Directions
54
K. Grauman, B. Leibe