Title: Lecture 9 Bag of Words models
1Lecture 9 Bag of Words models
2Admin
- Assignment 2 due
- Assignment 3 out shortly
3(No Transcript)
4Analogy to documents
Of all the sensory impressions proceeding to the
brain, the visual experiences are the dominant
ones. Our perception of the world around us is
based essentially on the messages that reach the
brain from our eyes. For a long time it was
thought that the retinal image was transmitted
point by point to visual centers in the brain
the cerebral cortex was a movie screen, so to
speak, upon which the image in the eye was
projected. Through the discoveries of Hubel and
Wiesel we now know that behind the origin of the
visual perception in the brain there is a
considerably more complicated course of events.
By following the visual impulses along their path
to the various cell layers of the optical cortex,
Hubel and Wiesel have been able to demonstrate
that the message about the image falling on the
retina undergoes a step-wise analysis in a system
of nerve cells stored in columns. In this system
each cell has its specific function and is
responsible for a specific detail in the pattern
of the retinal image.
5A clarification definition of BoW
- Looser definition
- Independent features
6A clarification definition of BoW
- Looser definition
- Independent features
- Stricter definition
- Independent features
- histogram representation
7(No Transcript)
8Representation
2.
1.
3.
91.Feature detection and representation
101.Feature detection and representation
- Regular grid
- Vogel Schiele, 2003
- Fei-Fei Perona, 2005
111.Feature detection and representation
- Regular grid
- Vogel Schiele, 2003
- Fei-Fei Perona, 2005
- Interest point detector
- Csurka, et al. 2004
- Fei-Fei Perona, 2005
- Sivic, et al. 2005
121.Feature detection and representation
- Regular grid
- Vogel Schiele, 2003
- Fei-Fei Perona, 2005
- Interest point detector
- Csurka, Bray, Dance Fan, 2004
- Fei-Fei Perona, 2005
- Sivic, Russell, Efros, Freeman Zisserman, 2005
- Other methods
- Random sampling (Vidal-Naquet Ullman, 2002)
- Segmentation based patches (Barnard, Duygulu,
Forsyth, de Freitas, Blei, Jordan, 2003)
131.Feature detection and representation
Compute SIFT descriptor Lowe99
Normalize patch
Detect patches Mikojaczyk and Schmid 02 Mata,
Chum, Urban Pajdla, 02 Sivic Zisserman,
03
Slide credit Josef Sivic
141.Feature detection and representation
152. Codewords dictionary formation
162. Codewords dictionary formation
Vector quantization
Slide credit Josef Sivic
172. Codewords dictionary formation
Fei-Fei et al. 2005
18Image patch examples of codewords
Sivic et al. 2005
193. Image representation
frequency
codewords
20Representation
2.
1.
3.
21Learning and Recognition
category models (and/or) classifiers
22Learning and Recognition
- Generative method
- - graphical models
- Discriminative method
- - SVM
category models (and/or) classifiers
232 generative models
- Naïve Bayes classifier
- Csurka Bray, Dance Fan, 2004
- Hierarchical Bayesian text models (pLSA and LDA)
- Background Hoffman 2001, Blei, Ng Jordan, 2004
- Object categorization Sivic et al. 2005,
Sudderth et al. 2005 - Natural scene categorization Fei-Fei et al. 2005
24First, some notations
- wn each patch in an image
- wn 0,0,1,,0,0T
- w a collection of all N patches in an image
- w w1,w2,,wN
- dj the jth image in an image collection
- c category of the image
- z theme or topic of the patch
25Case 1 the Naïve Bayes model
w
c
N
Csurka et al. 2004
26Csurka et al. 2004
27Csurka et al. 2004
28Case 2 Hierarchical Bayesian text models
Probabilistic Latent Semantic Analysis (pLSA)
Hoffman, 2001
Latent Dirichlet Allocation (LDA)
Blei et al., 2001
29Case 2 Hierarchical Bayesian text models
Probabilistic Latent Semantic Analysis (pLSA)
Sivic et al. ICCV 2005
30Case 2 Hierarchical Bayesian text models
Latent Dirichlet Allocation (LDA)
Fei-Fei et al. ICCV 2005
31Case 2 the pLSA model
32Case 2 the pLSA model
Slide credit Josef Sivic
33Case 2 Recognition using pLSA
Slide credit Josef Sivic
34Case 2 Learning the pLSA parameters
Observed counts of word i in document j
Maximize likelihood of data using EM
M number of codewords N number of images
Slide credit Josef Sivic
35Demo
36task face detection no labeling
37Demo feature detection
- Output of crude feature detector
- Find edges
- Draw points randomly from edge set
- Draw from uniform distribution to get scale
38Demo learnt parameters
- Learning the model do_plsa(config_file_1)
- Evaluate and visualize the model
do_plsa_evaluation(config_file_1)
Codeword distributions per theme (topic)
Theme distributions per image
39Demo recognition examples
40Demo categorization results
- Performance of each theme
41Learning and Recognition
- Generative method
- - graphical models
- Discriminative method
- - SVM
category models (and/or) classifiers
42Discriminative methods based on bag of words
representation
Decisionboundary
Zebra
Non-zebra
43Discriminative methods based on bag of words
representation
- Grauman Darrell, 2005, 2006
- SVM w/ Pyramid Match kernels
- Others
- Csurka, Bray, Dance Fan, 2004
- Serre Poggio, 2005
44Summary Pyramid match kernel
optimal partial matching between sets of features
Grauman Darrell, 2005, Slide credit Kristen
Grauman
45Pyramid Match (Grauman Darrell 2005)
Histogram intersection
Slide credit Kristen Grauman
46Pyramid Match (Grauman Darrell 2005)
Histogram intersection
Slide credit Kristen Grauman
47Pyramid match kernel
- Weights inversely proportional to bin size
- Normalize kernel values to avoid favoring large
sets
Slide credit Kristen Grauman
48Example pyramid match
Level 0
Slide credit Kristen Grauman
49Example pyramid match
Level 1
Slide credit Kristen Grauman
50Example pyramid match
Level 2
Slide credit Kristen Grauman
51Example pyramid match
pyramid match
optimal match
Slide credit Kristen Grauman
52Summary Pyramid match kernel
optimal partial matching between sets of features
number of new matches at level i
difficulty of a match at level i
Slide credit Kristen Grauman
53Object recognition results
- ETH-80 database 8 object classes
- (Eichhorn and Chapelle 2004)
- Features
- Harris detector
- PCA-SIFT descriptor, d10
Slide credit Kristen Grauman
54Object recognition results
- Caltech objects database 101 object classes
- Features
- SIFT detector
- PCA-SIFT descriptor, d10
- 30 training images / class
- 43 recognition rate
- (1 chance performance)
- 0.002 seconds per match
Slide credit Kristen Grauman
55(No Transcript)
56What about spatial info?
?
57What about spatial info?
- Feature level
- Spatial influence through correlogram features
Savarese, Winn and Criminisi, CVPR 2006
58What about spatial info?
- Feature level
- Generative models
- Sudderth, Torralba, Freeman Willsky, 2005, 2006
- Niebles Fei-Fei, CVPR 2007
59What about spatial info?
- Feature level
- Generative models
- Sudderth, Torralba, Freeman Willsky, 2005, 2006
- Niebles Fei-Fei, CVPR 2007
603D scene models
Sudderth, Torralba, Freeman, Wilsky, CVPR 06
Object locations
Object parts
Visual words belongingto each object part
Projection of scene ontoimage plane
61Lazebnik, Schmid Ponce, 2006
Slide credit S. Lazebnik
62Slide credit S. Lazebnik
63Invariance issues
- Scale and rotation
- Implicit
- Detectors and descriptors
Kadir and Brady. 2003
64Invariance issues
- Scale and rotation
- Occlusion
- Implicit in the models
- Codeword distribution small variations
- (In theory) Theme (z) distribution different
occlusion patterns
65Invariance issues
- Scale and rotation
- Occlusion
- Translation
- Encode (relative) location information
- Sudderth, Torralba, Freeman Willsky, 2005, 2006
- Niebles Fei-Fei, 2007
66Invariance issues
- Scale and rotation
- Occlusion
- Translation
- View point (in theory)
- Codewords detector and descriptor
- Theme distributions different view points
Fergus, Fei-Fei, Perona Zisserman, 2005
67Model properties
- Intuitive
- Analogy to documents
68Model properties
- Intuitive
- Analogy to documents
- Analogy to human vision
Olshausen and Field, 2004, Fei-Fei and Perona,
2005
69Model properties
Sivic, Russell, Efros, Freeman, Zisserman, 2005
- Intuitive
- generative models
- Convenient for weakly- or un-supervised,
incremental training - Prior information
- Flexibility (e.g. HDP)
Li, Wang Fei-Fei, CVPR 2007
70Model properties
- Intuitive
- generative models
- Discriminative method
- Computationally efficient
Grauman et al. CVPR 2005
71Model properties
- Intuitive
- generative models
- Discriminative method
- Learning and recognition relatively fast
- Compare to other methods
72Weakness of the model
- No rigorous geometric information of the object
components - Its intuitive to most of us that objects are
made of parts no such information - Not extensively tested yet for
- View point invariance
- Scale invariance
- Segmentation and localization unclear
73Model Parts and Structure
74Representation
- Object as set of parts
- Generative representation
- Model
- Relative locations between parts
- Appearance of part
- Issues
- How to model location
- How to represent appearance
- Sparse or dense (pixels or regions)
- How to handle occlusion/clutter
Figure from Fischler Elschlager 73
75The correspondence problem
- Model with P parts
- Image with N possible assignments for each part
- Consider mapping to be 1-1
Image
Model
76The correspondence problem
- 1 1 mapping
- Each part assigned to unique feature
- As opposed to
- 1 Many
- Bag of words approaches
- Sudderth, Torralba, Freeman 05
- Loeff, Sorokin, Arora and Forsyth 05
- Conditional Random Field
- - Quattoni, Collins and Darrell, 04
77History of Parts and Structure approaches
- Fischler Elschlager 1973
- Yuille 91
- Brunelli Poggio 93
- Lades, v.d. Malsburg et al. 93
- Cootes, Lanitis, Taylor et al. 95
- Amit Geman 95, 99
- Perona et al. 95, 96, 98, 00, 03, 04, 05
- Felzenszwalb Huttenlocher 00, 04, 08
- Crandall Huttenlocher 05, 06
- Leibe Schiele 03, 04
- Many papers since 2000
78Sparse representation
Computationally tractable (105 pixels ? 101 --
102 parts) Generative representation of class
Avoid modeling global variability Success in
specific object recognition
- Throw away most image information - Parts need
to be distinctive to separate from other classes
79Connectivity of parts
- Complexity is given by size of maximal clique in
graph - Consider a 3 part model
- Each part has set of N possible locations in
image - Location of parts 2 3 is independent, given
location of L - Each part has an appearance term, independent
between parts.
Shape Model
Factor graph
Variables
L
3
2
L
3
2
S(L,2)
S(L,3)
A(L)
A(2)
A(3)
Factors
S(L)
Shape
Appearance
80from Sparse Flexible Models of Local
FeaturesGustavo Carneiro and David Lowe, ECCV
2006
Different connectivity structures
Felzenszwalb Huttenlocher 00
Fergus et al. 03 Fei-Fei et al. 03
Crandall et al. 05 Fergus et al. 05
Crandall et al. 05
O(N2)
O(N6)
O(N2)
O(N3)
Csurka 04 Vasconcelos 00
Bouchard Triggs 05
Carneiro Lowe 06
81How much does shape help?
- Crandall, Felzenszwalb, Huttenlocher CVPR05
- Shape variance increases with increasing model
complexity - Do get some benefit from shape
82Some class-specific graphs
- Articulated motion
- People
- Animals
- Special parameterisations
- Limb angles
Images from Kumar, Torr and Zisserman 05,
Felzenszwalb Huttenlocher 05
83Hierarchical representations
- Pixels ? Pixel groupings ? Parts ? Object
- Multi-scale approach increases number of
low-level features - Amit and Geman 98
- Bouchard Triggs 05
- Felzenszwalb, McAllester Ramanan 08
Images from Amit98,Bouchard05,Felzenszwalb08
84Stochastic Grammar of ImagesS.C. Zhu et al. and
D. Mumford
85Context and Hierarchy in a Probabilistic Image
ModelJin Geman (2006)
e.g. animals, trees, rocks
e.g. contours, intermediate objects
e.g. linelets, curvelets, T-junctions
e.g. discontinuities, gradient
animal head instantiated by tiger head
86How to model location?
- Explicit Probability density functions
- Implicit Voting scheme
- Invariance
- Translation
- Scaling
- Similarity/affine
- Viewpoint
87Explicit shape model
- Cartesian
- E.g. Gaussian distribution
- Parameters of model, ? and ?
- Independence corresponds to zeros in ?
- Burl et al. 96, Weber et al. 00, Fergus et al.
03 - Polar
- Convenient forinvariance to rotation
Mikolajczyk et al., CVPR 06
88Implicit shape model
- Use Hough space voting to find object
- Leibe and Schiele 03,05
Learning
- Learn appearance codebook
- Cluster over interest points on training images
- Learn spatial distributions
- Match codebook to training images
- Record matching positions on object
- Centroid is given
Recognition
Interest Points
89Multiple view points
Thomas, Ferrari, Leibe, Tuytelaars, Schiele, and
L. Van Gool. Towards Multi-View Object Class
Detection, CVPR 06
Hoiem, Rother, Winn, 3D LayoutCRF for Multi-View
Object Class Recognition and Segmentation, CVPR
07
90Appearance representation
Lepetit and Fua CVPR 2005
Figure from Winn Shotton, CVPR 06
91Region operators
- Local maxima of interest operator function
- Can give scale/orientation invariance
Figures from Kadir, Zisserman and Brady 04
92Occlusion
- Explicit
- Additional match of each part to missing state
- Implicit
- Truncated minimum probability of appearance
µpart
Appearance space
Log probability
93Background clutter
- Explicit model
- Generative model for clutter as well as
foreground object - Use a sub-window
- At correct position, no clutter is present
94Efficient search methods
- Interpretation tree (Grimson 87)
- Condition on assigned parts to give search
regions for remaining ones - Branch bound, A
95Distance transforms
- Felzenszwalb and Huttenlocher 00 05
- Distance transforms
- O(N2P) ? O(NP) for tree structured models
- Potentials must take certain form, e.g. Gaussian
- Permits exhaustive search for each parts
location - No need for feature detectors in recognition
96Demo Web Page
97Parts and Structure modelsSummary
- Correspondence problem
- Efficient methods for large parts and
positions in image - Challenge to get representation with desired
invariance - Future directions
- Multiple views
- Approaches to learning
- Multiple category training