Building local part models for categorylevel recognition Cordelia Schmid, INRIA - PowerPoint PPT Presentation

1 / 54

About This Presentation

Title:

Building local part models for categorylevel recognition Cordelia Schmid, INRIA

Description:

Building local part models for categorylevel recognition Cordelia Schmid, INRIA – PowerPoint PPT presentation

Number of Views:54

Avg rating:3.0/5.0

Slides: 55

Provided by: Movi5

Category:

more less

Transcript and Presenter's Notes

Title: Building local part models for categorylevel recognition Cordelia Schmid, INRIA

1
Building local part models for category-level
recognitionCordelia Schmid, INRIA

Joint work with J. Zhang, M. Marszalek, S.
Lazebnik, J. Ponce

2
Motivation

Invariant local descriptors
gt robust recognition of specific objects or
scenes

3
Motivation

Recognition of textures and object classes
gt description of intra-class variation,
selection of discriminant features, spatial
relations

texture recognition
car detection
4
Local parts textons/visual words

Clusters of local descriptor

5
Semi-local parts

Descriptors geometric relations

6
Overview

Textons / visual words SMV
Semi-local parts maximum entropy framework

7
Visual words SVM
SMV
Classification
Extract invariant regions
Compute invariant descriptors
Find clusters and signatures
Compute distance matrix
8
Region extraction
affine-invariant Harris detector Mikolajczyk
and Schmid02
affine-invariant Laplacian detector Lindeberg98
9
Region extraction

Scale/affine rectification process

Rectified patches (rotational ambiguity)
10
Descriptors SIFT Lowe99

distribution of the gradient over an image patch

3D histogram
gradient
image patch
x
?
?
y
4x4 location grid and 8 orientations (128
dimensions)
very good performance in image matching
Mikolaczyk and Schmid03
11
Descriptors SPIN Lazebnik03
12
Region extraction and description

Combination of detectors and descriptors
H Harris, L Laplacian, SIFT, SPIN
Different levels of invariance
HA Harris affine
HSR Harris scale rotation
HS Harris scale

13
Signature and EMD

Hierarchical clustering
Signature
cluster center and
relative weight
Earth movers distance
robust distance, optimizes the flow between
distributions
computed from ground distances
can match signatures of different size
not sensitive to the number of clusters

S ( m1 , w1 ) , , ( mk , wk )
mi
wi
D( S , S ) ?i,j fij d( mi , mj) / ?i,j
fij
d( mi , mj)
14
Vocabulary - distance

K-means clustering of the training images
cluster centers ? vocabulary
Frequency histogram of the clusters for each
image
Histogram comparison with - distance

15
Classification

Distance D(I1,I2) between two images
Gaussian kernel
Binary or multi-class SVM

16
In-depth study of the approach

Evaluation of different parameters of the system
Comparison with existing methods (textures,
categories)
Influence of the background features
Winner of several competitions of the PASCAL
challenge

17
Evaluation of detectors and descriptors

The combination of detectors and descriptors
gives best results
Laplacian SIFT is acceptable with less
computational cost

18
Evaluation of invariance

Best level of invariance depends on the dataset
Affine is rarely an advantage

19
Evaluation of different kernels

EMD and -kernel give comparable results

20
Texture classification

Lazebnik et al. (CVPR03)
RIFTSPIN for sparse interest point description
EMDKNN classifier
VZ-Joint (Vamar and Zisserman CVPR 2003)
Image patch descriptor for dense description
(every pixel)
-distance KNN classifier
Eric Hayman (ECCV 2004)
VZ-joint for dense description (every pixel)
-kernel SVM
Global Gabor MeanSTD (Manjunath et.al PAMI
1996)
Gabor features for single feature vector
description
Mahalanobis distance KNN classifier

21
UIUC textures textured surfaces

25 classes, 40 sample images each

22
Comparison on UIUC
23
CUReT texture dataset
felt
plaster
styrofoam

61 classes, 92 sample images each
significant illumination changes, viewpoint
changes

24
Comparison on CUReT
25
Category classification

Constellation model Fergus03
Bag of features Csurka04
Matching kernels Wallraven03, Grauman03
Features selection Dorko03,Opelt04

26
Xerox 7 categories
bikes
books
building
cars
people
phones
trees
27
Misclassified images of Xerox7
books - misclassified into faces, faces, buildings
buildings - misclassified into faces, trees, trees
cars - misclassified into buildings, phones,
phones
28
Graz bike and people database
bikes
people
background
29
Misclassified images of Graz dataset
misclassified bikes
misclassified people
30
Comparison on the CalTech database
31
Category PASCAL dataset
bikes
cars
motorbikes
people
test set 1
test set 2
training
32
Influence of background

Three types of background
Original background features
Randomly background features
Constant background features
Three test group
Background features only
Foreground features with different types of
background for training and testing
Foreground features with different types of
background for training but test on the original
test set.

33
BF training/testing
34
FFBF training/testing
35
Training (FFBF) / original test set
36
Semi-local part maximum entropy framework

Semi-local parts a higher-level image
representation
Combination of appearance and spatial layout
Maximum entropy a probabilistic framework for
combining parts and inter-part relations
Discriminative framework
No independence assumptions
Many kinds of features, relations can be combined
within a single framework
Optimization problem is convex, finding exact
optimum is tractable

37
Semi-Local Parts

Geometric invariance (scale, similarity, affine)
Robustness to clutter, occlusion, intra-class
variability
Weakly supervised learning

38
Learning a Part Vocabulary

Ideal approach simultaneous correspondence
search across entire training set

39
Two-Image Matching

Goal to find collections of local regions that
can be mapped onto each other using a single
rigid transformation
Implementation local search based on geometric
and photometric consistency constraints
Returns multiple correspondence hypotheses
Automatically determines number of regions in
correspondence
Works on unsegmented, cluttered images (weakly
supervised learning)

A
40
Scale-Invariant Parts

Contour-based detector Jurie Schmid 04

41
Learning a Part Vocabulary

Match multiple pairs of training images from the
same class to produce candidate parts
Perform part selection (validation)
Match candidate part against validation set (both
positive and negative images)
Validation score -distance between
repeatability histograms for positive and
negative images
Learn a probabilistic model of the object class
Naïve Bayes
Exponential model

42
Feature Functions

(Absolute) repeatability of a detected part
instance number of detected regions, denoted
?k(I )
Single-part features
Overlap features

43
CalTech Database

Four classes airplanes, cars (rear), faces,
motorbikes
100 training images per class
50 initial images (50 largest candidate parts
retained)
50 validation (20 highest-scoring parts retained)
200 test images per class
300 total

44
CalTech Database Parts
45
CalTech Results
46
Airplane Part Detection
misclassified image
47
Car Part Detection
misclassified image
48
Face Part Detection
misclassified image
49
Motorbike Part Detection
misclassified image
50
The Birds Database

Six classes egret, mandarin duck, snowy owl,
puffin, toucan, wood duck
50 training images per class
20 initial images (50 largest candidate parts
retained)
30 validation (20 highest-scoring parts retained)
50 test images per class
100 total

51
Bird Parts
52
Birds Database Results
53
Bird Part Detection
54
Bird Part Detection (cont.)
55
Misclassified Images
56
Classification rate vs. dictionary size
Birds
Caltech
57
Conclusion