Building local part models for categorylevel recognition Cordelia Schmid, INRIA - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

Building local part models for categorylevel recognition Cordelia Schmid, INRIA

Description:

Building local part models for categorylevel recognition Cordelia Schmid, INRIA – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 55
Provided by: Movi5
Category:

less

Transcript and Presenter's Notes

Title: Building local part models for categorylevel recognition Cordelia Schmid, INRIA


1
Building local part models for category-level
recognitionCordelia Schmid, INRIA
  • Joint work with J. Zhang, M. Marszalek, S.
    Lazebnik, J. Ponce

2
Motivation
  • Invariant local descriptors
  • gt robust recognition of specific objects or
    scenes


3
Motivation
  • Recognition of textures and object classes
  • gt description of intra-class variation,
    selection of discriminant features, spatial
    relations

texture recognition
car detection
4
Local parts textons/visual words
  • Clusters of local descriptor

5
Semi-local parts
  • Descriptors geometric relations

6
Overview
  • Textons / visual words SMV
  • Semi-local parts maximum entropy framework

7
Visual words SVM
SMV
Classification
Extract invariant regions
Compute invariant descriptors
Find clusters and signatures
Compute distance matrix
8
Region extraction
affine-invariant Harris detector Mikolajczyk
and Schmid02
affine-invariant Laplacian detector Lindeberg98
9
Region extraction
  • Scale/affine rectification process

Rectified patches (rotational ambiguity)
10
Descriptors SIFT Lowe99
  • distribution of the gradient over an image patch

3D histogram
gradient
image patch
x
?
?
y
4x4 location grid and 8 orientations (128
dimensions)
very good performance in image matching
Mikolaczyk and Schmid03
11
Descriptors SPIN Lazebnik03
12
Region extraction and description
  • Combination of detectors and descriptors
  • H Harris, L Laplacian, SIFT, SPIN
  • Different levels of invariance
  • HA Harris affine
  • HSR Harris scale rotation
  • HS Harris scale

13
Signature and EMD
  • Hierarchical clustering
  • Signature
  • cluster center and
    relative weight
  • Earth movers distance
  • robust distance, optimizes the flow between
    distributions
  • computed from ground distances
  • can match signatures of different size
  • not sensitive to the number of clusters

S ( m1 , w1 ) , , ( mk , wk )
mi
wi
D( S , S ) ?i,j fij d( mi , mj) / ?i,j
fij
d( mi , mj)
14
Vocabulary - distance
  • K-means clustering of the training images
  • cluster centers ? vocabulary
  • Frequency histogram of the clusters for each
    image
  • Histogram comparison with - distance

15
Classification
  • Distance D(I1,I2) between two images
  • Gaussian kernel
  • Binary or multi-class SVM

16
In-depth study of the approach
  • Evaluation of different parameters of the system
  • Comparison with existing methods (textures,
    categories)
  • Influence of the background features
  • Winner of several competitions of the PASCAL
    challenge

17
Evaluation of detectors and descriptors
  • The combination of detectors and descriptors
    gives best results
  • Laplacian SIFT is acceptable with less
    computational cost

18
Evaluation of invariance
  • Best level of invariance depends on the dataset
  • Affine is rarely an advantage

19
Evaluation of different kernels
  • EMD and -kernel give comparable results

20
Texture classification
  • Lazebnik et al. (CVPR03)
  • RIFTSPIN for sparse interest point description
  • EMDKNN classifier
  • VZ-Joint (Vamar and Zisserman CVPR 2003)
  • Image patch descriptor for dense description
    (every pixel)
  • -distance KNN classifier
  • Eric Hayman (ECCV 2004)
  • VZ-joint for dense description (every pixel)
  • -kernel SVM
  • Global Gabor MeanSTD (Manjunath et.al PAMI
    1996)
  • Gabor features for single feature vector
    description
  • Mahalanobis distance KNN classifier

21
UIUC textures textured surfaces
  • 25 classes, 40 sample images each

22
Comparison on UIUC
23
CUReT texture dataset
felt
plaster
styrofoam
  • 61 classes, 92 sample images each
  • significant illumination changes, viewpoint
    changes

24
Comparison on CUReT
25
Category classification
  • Constellation model Fergus03
  • Bag of features Csurka04
  • Matching kernels Wallraven03, Grauman03
  • Features selection Dorko03,Opelt04

26
Xerox 7 categories
bikes
books
building
cars
people
phones
trees
27
Misclassified images of Xerox7
books - misclassified into faces, faces, buildings
buildings - misclassified into faces, trees, trees
cars - misclassified into buildings, phones,
phones
28
Graz bike and people database
bikes
people
background
29
Misclassified images of Graz dataset
misclassified bikes
misclassified people
30
Comparison on the CalTech database
31
Category PASCAL dataset
bikes
cars
motorbikes
people
test set 1
test set 2
training
32
Influence of background
  • Three types of background
  • Original background features
  • Randomly background features
  • Constant background features
  • Three test group
  • Background features only
  • Foreground features with different types of
    background for training and testing
  • Foreground features with different types of
    background for training but test on the original
    test set.

33
BF training/testing
34
FFBF training/testing
35
Training (FFBF) / original test set
36
Semi-local part maximum entropy framework
  • Semi-local parts a higher-level image
    representation
  • Combination of appearance and spatial layout
  • Maximum entropy a probabilistic framework for
    combining parts and inter-part relations
  • Discriminative framework
  • No independence assumptions
  • Many kinds of features, relations can be combined
    within a single framework
  • Optimization problem is convex, finding exact
    optimum is tractable

37
Semi-Local Parts
  • Geometric invariance (scale, similarity, affine)
  • Robustness to clutter, occlusion, intra-class
    variability
  • Weakly supervised learning

38
Learning a Part Vocabulary
  • Ideal approach simultaneous correspondence
    search across entire training set

39
Two-Image Matching
  • Goal to find collections of local regions that
    can be mapped onto each other using a single
    rigid transformation
  • Implementation local search based on geometric
    and photometric consistency constraints
  • Returns multiple correspondence hypotheses
  • Automatically determines number of regions in
    correspondence
  • Works on unsegmented, cluttered images (weakly
    supervised learning)

A
40
Scale-Invariant Parts
  • Contour-based detector Jurie Schmid 04

41
Learning a Part Vocabulary
  • Match multiple pairs of training images from the
    same class to produce candidate parts
  • Perform part selection (validation)
  • Match candidate part against validation set (both
    positive and negative images)
  • Validation score -distance between
    repeatability histograms for positive and
    negative images
  • Learn a probabilistic model of the object class
  • Naïve Bayes
  • Exponential model

42
Feature Functions
  • (Absolute) repeatability of a detected part
    instance number of detected regions, denoted
    ?k(I )
  • Single-part features
  • Overlap features

43
CalTech Database
  • Four classes airplanes, cars (rear), faces,
    motorbikes
  • 100 training images per class
  • 50 initial images (50 largest candidate parts
    retained)
  • 50 validation (20 highest-scoring parts retained)
  • 200 test images per class
  • 300 total

44
CalTech Database Parts
45
CalTech Results
46
Airplane Part Detection
misclassified image
47
Car Part Detection
misclassified image
48
Face Part Detection
misclassified image
49
Motorbike Part Detection
misclassified image
50
The Birds Database
  • Six classes egret, mandarin duck, snowy owl,
    puffin, toucan, wood duck
  • 50 training images per class
  • 20 initial images (50 largest candidate parts
    retained)
  • 30 validation (20 highest-scoring parts retained)
  • 50 test images per class
  • 100 total

51
Bird Parts
52
Birds Database Results
53
Bird Part Detection
54
Bird Part Detection (cont.)
55
Misclassified Images
56
Classification rate vs. dictionary size
Birds
Caltech
57
Conclusion
  • Both local part models perform well for category
    recognition
  • Classification results with both methods are
    excellent
  • Future work localization, preliminary results
    with semi-local parts
Write a Comment
User Comments (0)
About PowerShow.com