Lecture 9 Bag of Words models

About This Presentation

Title:

Lecture 9 Bag of Words models

Description:

Lecture 9 Bag of Words models – PowerPoint PPT presentation

Number of Views:960

Avg rating:3.0/5.0

Slides: 96

Provided by: robfe

Category:

more less

Transcript and Presenter's Notes

Title: Lecture 9 Bag of Words models

1
Lecture 9 Bag of Words models
2
Admin

Assignment 2 due
Assignment 3 out shortly

3
(No Transcript)
4
Analogy to documents
Of all the sensory impressions proceeding to the
brain, the visual experiences are the dominant
ones. Our perception of the world around us is
based essentially on the messages that reach the
brain from our eyes. For a long time it was
thought that the retinal image was transmitted
point by point to visual centers in the brain
the cerebral cortex was a movie screen, so to
speak, upon which the image in the eye was
projected. Through the discoveries of Hubel and
Wiesel we now know that behind the origin of the
visual perception in the brain there is a
considerably more complicated course of events.
By following the visual impulses along their path
to the various cell layers of the optical cortex,
Hubel and Wiesel have been able to demonstrate
that the message about the image falling on the
retina undergoes a step-wise analysis in a system
of nerve cells stored in columns. In this system
each cell has its specific function and is
responsible for a specific detail in the pattern
of the retinal image.
5
A clarification definition of BoW

Looser definition
Independent features

6
A clarification definition of BoW

Looser definition
Independent features
Stricter definition
Independent features
histogram representation

7
(No Transcript)
8
Representation
2.
1.
3.
9
1.Feature detection and representation
10
1.Feature detection and representation

Regular grid
Vogel Schiele, 2003
Fei-Fei Perona, 2005

11
1.Feature detection and representation

Regular grid
Vogel Schiele, 2003
Fei-Fei Perona, 2005
Interest point detector
Csurka, et al. 2004
Fei-Fei Perona, 2005
Sivic, et al. 2005

12
1.Feature detection and representation

Regular grid
Vogel Schiele, 2003
Fei-Fei Perona, 2005
Interest point detector
Csurka, Bray, Dance Fan, 2004
Fei-Fei Perona, 2005
Sivic, Russell, Efros, Freeman Zisserman, 2005
Other methods
Random sampling (Vidal-Naquet Ullman, 2002)
Segmentation based patches (Barnard, Duygulu,
Forsyth, de Freitas, Blei, Jordan, 2003)

13
1.Feature detection and representation
Compute SIFT descriptor Lowe99
Normalize patch
Detect patches Mikojaczyk and Schmid 02 Mata,
Chum, Urban Pajdla, 02 Sivic Zisserman,
03
Slide credit Josef Sivic
14
1.Feature detection and representation
15
2. Codewords dictionary formation
16
2. Codewords dictionary formation
Vector quantization
Slide credit Josef Sivic
17
2. Codewords dictionary formation
Fei-Fei et al. 2005
18
Image patch examples of codewords
Sivic et al. 2005
19
3. Image representation
frequency
codewords
20
Representation
2.
1.
3.
21
Learning and Recognition
category models (and/or) classifiers
22
Learning and Recognition

Generative method
- graphical models
Discriminative method
- SVM

category models (and/or) classifiers
23
2 generative models

Naïve Bayes classifier
Csurka Bray, Dance Fan, 2004
Hierarchical Bayesian text models (pLSA and LDA)
Background Hoffman 2001, Blei, Ng Jordan, 2004
Object categorization Sivic et al. 2005,
Sudderth et al. 2005
Natural scene categorization Fei-Fei et al. 2005

24
First, some notations

wn each patch in an image
wn 0,0,1,,0,0T
w a collection of all N patches in an image
w w1,w2,,wN
dj the jth image in an image collection
c category of the image
z theme or topic of the patch

25
Case 1 the Naïve Bayes model
w
c
N
Csurka et al. 2004
26
Csurka et al. 2004
27
Csurka et al. 2004
28
Case 2 Hierarchical Bayesian text models
Probabilistic Latent Semantic Analysis (pLSA)
Hoffman, 2001
Latent Dirichlet Allocation (LDA)
Blei et al., 2001
29
Case 2 Hierarchical Bayesian text models
Probabilistic Latent Semantic Analysis (pLSA)
Sivic et al. ICCV 2005
30
Case 2 Hierarchical Bayesian text models
Latent Dirichlet Allocation (LDA)
Fei-Fei et al. ICCV 2005
31
Case 2 the pLSA model
32
Case 2 the pLSA model
Slide credit Josef Sivic
33
Case 2 Recognition using pLSA
Slide credit Josef Sivic
34
Case 2 Learning the pLSA parameters
Observed counts of word i in document j
Maximize likelihood of data using EM
M number of codewords N number of images
Slide credit Josef Sivic
35
Demo

Course website

36
task face detection no labeling
37
Demo feature detection

Output of crude feature detector
Find edges
Draw points randomly from edge set
Draw from uniform distribution to get scale

38
Demo learnt parameters

Learning the model do_plsa(config_file_1)
Evaluate and visualize the model
do_plsa_evaluation(config_file_1)

Codeword distributions per theme (topic)
Theme distributions per image
39
Demo recognition examples
40
Demo categorization results

Performance of each theme

41
Learning and Recognition

Generative method
- graphical models
Discriminative method
- SVM

category models (and/or) classifiers
42
Discriminative methods based on bag of words
representation
Decisionboundary
Zebra
Non-zebra
43
Discriminative methods based on bag of words
representation

Grauman Darrell, 2005, 2006
SVM w/ Pyramid Match kernels
Others
Csurka, Bray, Dance Fan, 2004
Serre Poggio, 2005

44
Summary Pyramid match kernel
optimal partial matching between sets of features
Grauman Darrell, 2005, Slide credit Kristen
Grauman
45
Pyramid Match (Grauman Darrell 2005)
Histogram intersection
Slide credit Kristen Grauman
46
Pyramid Match (Grauman Darrell 2005)
Histogram intersection
Slide credit Kristen Grauman
47
Pyramid match kernel

Weights inversely proportional to bin size
Normalize kernel values to avoid favoring large
sets

Slide credit Kristen Grauman
48
Example pyramid match
Level 0
Slide credit Kristen Grauman
49
Example pyramid match
Level 1
Slide credit Kristen Grauman
50
Example pyramid match
Level 2
Slide credit Kristen Grauman
51
Example pyramid match
pyramid match
optimal match
Slide credit Kristen Grauman
52
Summary Pyramid match kernel
optimal partial matching between sets of features
number of new matches at level i
difficulty of a match at level i
Slide credit Kristen Grauman
53
Object recognition results

ETH-80 database 8 object classes
(Eichhorn and Chapelle 2004)
Features
Harris detector
PCA-SIFT descriptor, d10

Slide credit Kristen Grauman
54
Object recognition results

Caltech objects database 101 object classes
Features
SIFT detector
PCA-SIFT descriptor, d10
30 training images / class
43 recognition rate
(1 chance performance)
0.002 seconds per match

Slide credit Kristen Grauman
55
(No Transcript)
56
What about spatial info?
?
57
What about spatial info?

Feature level
Spatial influence through correlogram features
Savarese, Winn and Criminisi, CVPR 2006

58
What about spatial info?

Feature level
Generative models
Sudderth, Torralba, Freeman Willsky, 2005, 2006
Niebles Fei-Fei, CVPR 2007

59
What about spatial info?

Feature level
Generative models
Sudderth, Torralba, Freeman Willsky, 2005, 2006
Niebles Fei-Fei, CVPR 2007

60
3D scene models
Sudderth, Torralba, Freeman, Wilsky, CVPR 06
Object locations
Object parts
Visual words belongingto each object part
Projection of scene ontoimage plane
61
Lazebnik, Schmid Ponce, 2006
Slide credit S. Lazebnik
62
Slide credit S. Lazebnik
63
Invariance issues

Scale and rotation
Implicit
Detectors and descriptors

Kadir and Brady. 2003
64
Invariance issues

Scale and rotation
Occlusion
Implicit in the models
Codeword distribution small variations
(In theory) Theme (z) distribution different
occlusion patterns

65
Invariance issues

Scale and rotation
Occlusion
Translation
Encode (relative) location information
Sudderth, Torralba, Freeman Willsky, 2005, 2006
Niebles Fei-Fei, 2007

66
Invariance issues

Scale and rotation
Occlusion
Translation
View point (in theory)
Codewords detector and descriptor
Theme distributions different view points

Fergus, Fei-Fei, Perona Zisserman, 2005
67
Model properties

Intuitive
Analogy to documents

68
Model properties

Intuitive
Analogy to documents
Analogy to human vision

Olshausen and Field, 2004, Fei-Fei and Perona,
2005
69
Model properties
Sivic, Russell, Efros, Freeman, Zisserman, 2005

Intuitive
generative models
Convenient for weakly- or un-supervised,
incremental training
Prior information
Flexibility (e.g. HDP)

Li, Wang Fei-Fei, CVPR 2007
70
Model properties

Intuitive
generative models
Discriminative method
Computationally efficient

Grauman et al. CVPR 2005
71
Model properties

Intuitive
generative models
Discriminative method
Learning and recognition relatively fast
Compare to other methods

72
Weakness of the model

No rigorous geometric information of the object
components
Its intuitive to most of us that objects are
made of parts no such information
Not extensively tested yet for
View point invariance
Scale invariance
Segmentation and localization unclear

73
Model Parts and Structure
74
Representation

Object as set of parts
Generative representation
Model
Relative locations between parts
Appearance of part
Issues
How to model location
How to represent appearance
Sparse or dense (pixels or regions)
How to handle occlusion/clutter

Figure from Fischler Elschlager 73
75
The correspondence problem

Model with P parts
Image with N possible assignments for each part
Consider mapping to be 1-1

Image
Model

NP combinations!!!

76
The correspondence problem

1 1 mapping
Each part assigned to unique feature
As opposed to
1 Many
Bag of words approaches
Sudderth, Torralba, Freeman 05
Loeff, Sorokin, Arora and Forsyth 05

Conditional Random Field
- Quattoni, Collins and Darrell, 04

77
History of Parts and Structure approaches

Fischler Elschlager 1973
Yuille 91
Brunelli Poggio 93
Lades, v.d. Malsburg et al. 93
Cootes, Lanitis, Taylor et al. 95
Amit Geman 95, 99
Perona et al. 95, 96, 98, 00, 03, 04, 05
Felzenszwalb Huttenlocher 00, 04, 08
Crandall Huttenlocher 05, 06
Leibe Schiele 03, 04
Many papers since 2000

78
Sparse representation
Computationally tractable (105 pixels ? 101 --
102 parts) Generative representation of class
Avoid modeling global variability Success in
specific object recognition
- Throw away most image information - Parts need
to be distinctive to separate from other classes
79
Connectivity of parts

Complexity is given by size of maximal clique in
graph
Consider a 3 part model
Each part has set of N possible locations in
image
Location of parts 2 3 is independent, given
location of L
Each part has an appearance term, independent
between parts.

Shape Model
Factor graph
Variables
L
3
2
L
3
2
S(L,2)
S(L,3)
A(L)
A(2)
A(3)
Factors
S(L)
Shape
Appearance
80
from Sparse Flexible Models of Local
FeaturesGustavo Carneiro and David Lowe, ECCV
2006
Different connectivity structures
Felzenszwalb Huttenlocher 00
Fergus et al. 03 Fei-Fei et al. 03
Crandall et al. 05 Fergus et al. 05
Crandall et al. 05
O(N2)
O(N6)
O(N2)
O(N3)
Csurka 04 Vasconcelos 00
Bouchard Triggs 05
Carneiro Lowe 06
81
How much does shape help?

Crandall, Felzenszwalb, Huttenlocher CVPR05
Shape variance increases with increasing model
complexity
Do get some benefit from shape

82
Some class-specific graphs

Articulated motion
People
Animals
Special parameterisations
Limb angles

Images from Kumar, Torr and Zisserman 05,
Felzenszwalb Huttenlocher 05
83
Hierarchical representations

Pixels ? Pixel groupings ? Parts ? Object

Multi-scale approach increases number of
low-level features
Amit and Geman 98
Bouchard Triggs 05
Felzenszwalb, McAllester Ramanan 08

Images from Amit98,Bouchard05,Felzenszwalb08
84
Stochastic Grammar of ImagesS.C. Zhu et al. and
D. Mumford
85
Context and Hierarchy in a Probabilistic Image
ModelJin Geman (2006)
e.g. animals, trees, rocks
e.g. contours, intermediate objects
e.g. linelets, curvelets, T-junctions
e.g. discontinuities, gradient
animal head instantiated by tiger head
86
How to model location?

Explicit Probability density functions
Implicit Voting scheme
Invariance
Translation
Scaling
Similarity/affine
Viewpoint

87
Explicit shape model

Cartesian
E.g. Gaussian distribution
Parameters of model, ? and ?
Independence corresponds to zeros in ?
Burl et al. 96, Weber et al. 00, Fergus et al.
03
Polar
Convenient forinvariance to rotation

Mikolajczyk et al., CVPR 06
88
Implicit shape model

Use Hough space voting to find object
Leibe and Schiele 03,05

Learning

Learn appearance codebook
Cluster over interest points on training images
Learn spatial distributions
Match codebook to training images
Record matching positions on object
Centroid is given

Recognition
Interest Points
89
Multiple view points
Thomas, Ferrari, Leibe, Tuytelaars, Schiele, and
L. Van Gool. Towards Multi-View Object Class
Detection, CVPR 06
Hoiem, Rother, Winn, 3D LayoutCRF for Multi-View
Object Class Recognition and Segmentation, CVPR
07
90
Appearance representation