Lecture 9 Bag of Words models - PowerPoint PPT Presentation

1 / 95
About This Presentation
Title:

Lecture 9 Bag of Words models

Description:

Lecture 9 Bag of Words models – PowerPoint PPT presentation

Number of Views:957
Avg rating:3.0/5.0
Slides: 96
Provided by: robfe
Category:
Tags: ass | bag | lecture | models | words

less

Transcript and Presenter's Notes

Title: Lecture 9 Bag of Words models


1
Lecture 9 Bag of Words models
2
Admin
  • Assignment 2 due
  • Assignment 3 out shortly

3
(No Transcript)
4
Analogy to documents
Of all the sensory impressions proceeding to the
brain, the visual experiences are the dominant
ones. Our perception of the world around us is
based essentially on the messages that reach the
brain from our eyes. For a long time it was
thought that the retinal image was transmitted
point by point to visual centers in the brain
the cerebral cortex was a movie screen, so to
speak, upon which the image in the eye was
projected. Through the discoveries of Hubel and
Wiesel we now know that behind the origin of the
visual perception in the brain there is a
considerably more complicated course of events.
By following the visual impulses along their path
to the various cell layers of the optical cortex,
Hubel and Wiesel have been able to demonstrate
that the message about the image falling on the
retina undergoes a step-wise analysis in a system
of nerve cells stored in columns. In this system
each cell has its specific function and is
responsible for a specific detail in the pattern
of the retinal image.
5
A clarification definition of BoW
  • Looser definition
  • Independent features

6
A clarification definition of BoW
  • Looser definition
  • Independent features
  • Stricter definition
  • Independent features
  • histogram representation

7
(No Transcript)
8
Representation
2.
1.
3.
9
1.Feature detection and representation
10
1.Feature detection and representation
  • Regular grid
  • Vogel Schiele, 2003
  • Fei-Fei Perona, 2005

11
1.Feature detection and representation
  • Regular grid
  • Vogel Schiele, 2003
  • Fei-Fei Perona, 2005
  • Interest point detector
  • Csurka, et al. 2004
  • Fei-Fei Perona, 2005
  • Sivic, et al. 2005

12
1.Feature detection and representation
  • Regular grid
  • Vogel Schiele, 2003
  • Fei-Fei Perona, 2005
  • Interest point detector
  • Csurka, Bray, Dance Fan, 2004
  • Fei-Fei Perona, 2005
  • Sivic, Russell, Efros, Freeman Zisserman, 2005
  • Other methods
  • Random sampling (Vidal-Naquet Ullman, 2002)
  • Segmentation based patches (Barnard, Duygulu,
    Forsyth, de Freitas, Blei, Jordan, 2003)

13
1.Feature detection and representation
Compute SIFT descriptor Lowe99
Normalize patch
Detect patches Mikojaczyk and Schmid 02 Mata,
Chum, Urban Pajdla, 02 Sivic Zisserman,
03
Slide credit Josef Sivic
14
1.Feature detection and representation
15
2. Codewords dictionary formation
16
2. Codewords dictionary formation
Vector quantization
Slide credit Josef Sivic
17
2. Codewords dictionary formation
Fei-Fei et al. 2005
18
Image patch examples of codewords
Sivic et al. 2005
19
3. Image representation
frequency
codewords
20
Representation
2.
1.
3.
21
Learning and Recognition
category models (and/or) classifiers
22
Learning and Recognition
  • Generative method
  • - graphical models
  • Discriminative method
  • - SVM

category models (and/or) classifiers
23
2 generative models
  • Naïve Bayes classifier
  • Csurka Bray, Dance Fan, 2004
  • Hierarchical Bayesian text models (pLSA and LDA)
  • Background Hoffman 2001, Blei, Ng Jordan, 2004
  • Object categorization Sivic et al. 2005,
    Sudderth et al. 2005
  • Natural scene categorization Fei-Fei et al. 2005

24
First, some notations
  • wn each patch in an image
  • wn 0,0,1,,0,0T
  • w a collection of all N patches in an image
  • w w1,w2,,wN
  • dj the jth image in an image collection
  • c category of the image
  • z theme or topic of the patch

25
Case 1 the Naïve Bayes model
w
c
N
Csurka et al. 2004
26
Csurka et al. 2004
27
Csurka et al. 2004
28
Case 2 Hierarchical Bayesian text models
Probabilistic Latent Semantic Analysis (pLSA)
Hoffman, 2001
Latent Dirichlet Allocation (LDA)
Blei et al., 2001
29
Case 2 Hierarchical Bayesian text models
Probabilistic Latent Semantic Analysis (pLSA)
Sivic et al. ICCV 2005
30
Case 2 Hierarchical Bayesian text models
Latent Dirichlet Allocation (LDA)
Fei-Fei et al. ICCV 2005
31
Case 2 the pLSA model
32
Case 2 the pLSA model
Slide credit Josef Sivic
33
Case 2 Recognition using pLSA
Slide credit Josef Sivic
34
Case 2 Learning the pLSA parameters
Observed counts of word i in document j
Maximize likelihood of data using EM
M number of codewords N number of images
Slide credit Josef Sivic
35
Demo
  • Course website

36
task face detection no labeling
37
Demo feature detection
  • Output of crude feature detector
  • Find edges
  • Draw points randomly from edge set
  • Draw from uniform distribution to get scale

38
Demo learnt parameters
  • Learning the model do_plsa(config_file_1)
  • Evaluate and visualize the model
    do_plsa_evaluation(config_file_1)

Codeword distributions per theme (topic)
Theme distributions per image
39
Demo recognition examples
40
Demo categorization results
  • Performance of each theme

41
Learning and Recognition
  • Generative method
  • - graphical models
  • Discriminative method
  • - SVM

category models (and/or) classifiers
42
Discriminative methods based on bag of words
representation
Decisionboundary
Zebra
Non-zebra
43
Discriminative methods based on bag of words
representation
  • Grauman Darrell, 2005, 2006
  • SVM w/ Pyramid Match kernels
  • Others
  • Csurka, Bray, Dance Fan, 2004
  • Serre Poggio, 2005

44
Summary Pyramid match kernel
optimal partial matching between sets of features
Grauman Darrell, 2005, Slide credit Kristen
Grauman
45
Pyramid Match (Grauman Darrell 2005)
Histogram intersection
Slide credit Kristen Grauman
46
Pyramid Match (Grauman Darrell 2005)
Histogram intersection
Slide credit Kristen Grauman
47
Pyramid match kernel
  • Weights inversely proportional to bin size
  • Normalize kernel values to avoid favoring large
    sets

Slide credit Kristen Grauman
48
Example pyramid match
Level 0
Slide credit Kristen Grauman
49
Example pyramid match
Level 1
Slide credit Kristen Grauman
50
Example pyramid match
Level 2
Slide credit Kristen Grauman
51
Example pyramid match
pyramid match
optimal match
Slide credit Kristen Grauman
52
Summary Pyramid match kernel
optimal partial matching between sets of features
number of new matches at level i
difficulty of a match at level i
Slide credit Kristen Grauman
53
Object recognition results
  • ETH-80 database 8 object classes
  • (Eichhorn and Chapelle 2004)
  • Features
  • Harris detector
  • PCA-SIFT descriptor, d10

Slide credit Kristen Grauman
54
Object recognition results
  • Caltech objects database 101 object classes
  • Features
  • SIFT detector
  • PCA-SIFT descriptor, d10
  • 30 training images / class
  • 43 recognition rate
  • (1 chance performance)
  • 0.002 seconds per match

Slide credit Kristen Grauman
55
(No Transcript)
56
What about spatial info?
?
57
What about spatial info?
  • Feature level
  • Spatial influence through correlogram features
    Savarese, Winn and Criminisi, CVPR 2006

58
What about spatial info?
  • Feature level
  • Generative models
  • Sudderth, Torralba, Freeman Willsky, 2005, 2006
  • Niebles Fei-Fei, CVPR 2007

59
What about spatial info?
  • Feature level
  • Generative models
  • Sudderth, Torralba, Freeman Willsky, 2005, 2006
  • Niebles Fei-Fei, CVPR 2007

60
3D scene models
Sudderth, Torralba, Freeman, Wilsky, CVPR 06
Object locations
Object parts
Visual words belongingto each object part
Projection of scene ontoimage plane
61
Lazebnik, Schmid Ponce, 2006
Slide credit S. Lazebnik
62
Slide credit S. Lazebnik
63
Invariance issues
  • Scale and rotation
  • Implicit
  • Detectors and descriptors

Kadir and Brady. 2003
64
Invariance issues
  • Scale and rotation
  • Occlusion
  • Implicit in the models
  • Codeword distribution small variations
  • (In theory) Theme (z) distribution different
    occlusion patterns

65
Invariance issues
  • Scale and rotation
  • Occlusion
  • Translation
  • Encode (relative) location information
  • Sudderth, Torralba, Freeman Willsky, 2005, 2006
  • Niebles Fei-Fei, 2007

66
Invariance issues
  • Scale and rotation
  • Occlusion
  • Translation
  • View point (in theory)
  • Codewords detector and descriptor
  • Theme distributions different view points

Fergus, Fei-Fei, Perona Zisserman, 2005
67
Model properties
  • Intuitive
  • Analogy to documents

68
Model properties
  • Intuitive
  • Analogy to documents
  • Analogy to human vision

Olshausen and Field, 2004, Fei-Fei and Perona,
2005
69
Model properties
Sivic, Russell, Efros, Freeman, Zisserman, 2005
  • Intuitive
  • generative models
  • Convenient for weakly- or un-supervised,
    incremental training
  • Prior information
  • Flexibility (e.g. HDP)

Li, Wang Fei-Fei, CVPR 2007
70
Model properties
  • Intuitive
  • generative models
  • Discriminative method
  • Computationally efficient

Grauman et al. CVPR 2005
71
Model properties
  • Intuitive
  • generative models
  • Discriminative method
  • Learning and recognition relatively fast
  • Compare to other methods

72
Weakness of the model
  • No rigorous geometric information of the object
    components
  • Its intuitive to most of us that objects are
    made of parts no such information
  • Not extensively tested yet for
  • View point invariance
  • Scale invariance
  • Segmentation and localization unclear

73
Model Parts and Structure
74
Representation
  • Object as set of parts
  • Generative representation
  • Model
  • Relative locations between parts
  • Appearance of part
  • Issues
  • How to model location
  • How to represent appearance
  • Sparse or dense (pixels or regions)
  • How to handle occlusion/clutter

Figure from Fischler Elschlager 73
75
The correspondence problem
  • Model with P parts
  • Image with N possible assignments for each part
  • Consider mapping to be 1-1

Image
Model
  • NP combinations!!!

76
The correspondence problem
  • 1 1 mapping
  • Each part assigned to unique feature
  • As opposed to
  • 1 Many
  • Bag of words approaches
  • Sudderth, Torralba, Freeman 05
  • Loeff, Sorokin, Arora and Forsyth 05
  • Conditional Random Field
  • - Quattoni, Collins and Darrell, 04

77
History of Parts and Structure approaches
  • Fischler Elschlager 1973
  • Yuille 91
  • Brunelli Poggio 93
  • Lades, v.d. Malsburg et al. 93
  • Cootes, Lanitis, Taylor et al. 95
  • Amit Geman 95, 99
  • Perona et al. 95, 96, 98, 00, 03, 04, 05
  • Felzenszwalb Huttenlocher 00, 04, 08
  • Crandall Huttenlocher 05, 06
  • Leibe Schiele 03, 04
  • Many papers since 2000

78
Sparse representation
Computationally tractable (105 pixels ? 101 --
102 parts) Generative representation of class
Avoid modeling global variability Success in
specific object recognition
- Throw away most image information - Parts need
to be distinctive to separate from other classes
79
Connectivity of parts
  • Complexity is given by size of maximal clique in
    graph
  • Consider a 3 part model
  • Each part has set of N possible locations in
    image
  • Location of parts 2 3 is independent, given
    location of L
  • Each part has an appearance term, independent
    between parts.

Shape Model
Factor graph
Variables
L
3
2
L
3
2
S(L,2)
S(L,3)
A(L)
A(2)
A(3)
Factors
S(L)
Shape
Appearance
80
from Sparse Flexible Models of Local
FeaturesGustavo Carneiro and David Lowe, ECCV
2006
Different connectivity structures
Felzenszwalb Huttenlocher 00
Fergus et al. 03 Fei-Fei et al. 03
Crandall et al. 05 Fergus et al. 05
Crandall et al. 05
O(N2)
O(N6)
O(N2)
O(N3)
Csurka 04 Vasconcelos 00
Bouchard Triggs 05
Carneiro Lowe 06
81
How much does shape help?
  • Crandall, Felzenszwalb, Huttenlocher CVPR05
  • Shape variance increases with increasing model
    complexity
  • Do get some benefit from shape

82
Some class-specific graphs
  • Articulated motion
  • People
  • Animals
  • Special parameterisations
  • Limb angles

Images from Kumar, Torr and Zisserman 05,
Felzenszwalb Huttenlocher 05
83
Hierarchical representations
  • Pixels ? Pixel groupings ? Parts ? Object
  • Multi-scale approach increases number of
    low-level features
  • Amit and Geman 98
  • Bouchard Triggs 05
  • Felzenszwalb, McAllester Ramanan 08

Images from Amit98,Bouchard05,Felzenszwalb08
84
Stochastic Grammar of ImagesS.C. Zhu et al. and
D. Mumford
85
Context and Hierarchy in a Probabilistic Image
ModelJin Geman (2006)
e.g. animals, trees, rocks
e.g. contours, intermediate objects
e.g. linelets, curvelets, T-junctions
e.g. discontinuities, gradient
animal head instantiated by tiger head
86
How to model location?
  • Explicit Probability density functions
  • Implicit Voting scheme
  • Invariance
  • Translation
  • Scaling
  • Similarity/affine
  • Viewpoint

87
Explicit shape model
  • Cartesian
  • E.g. Gaussian distribution
  • Parameters of model, ? and ?
  • Independence corresponds to zeros in ?
  • Burl et al. 96, Weber et al. 00, Fergus et al.
    03
  • Polar
  • Convenient forinvariance to rotation

Mikolajczyk et al., CVPR 06
88
Implicit shape model
  • Use Hough space voting to find object
  • Leibe and Schiele 03,05

Learning
  • Learn appearance codebook
  • Cluster over interest points on training images
  • Learn spatial distributions
  • Match codebook to training images
  • Record matching positions on object
  • Centroid is given

Recognition
Interest Points
89
Multiple view points
Thomas, Ferrari, Leibe, Tuytelaars, Schiele, and
L. Van Gool. Towards Multi-View Object Class
Detection, CVPR 06
Hoiem, Rother, Winn, 3D LayoutCRF for Multi-View
Object Class Recognition and Segmentation, CVPR
07
90
Appearance representation
  • SIFT
  • Decision trees

Lepetit and Fua CVPR 2005
  • HoG detectors

Figure from Winn Shotton, CVPR 06
91
Region operators
  • Local maxima of interest operator function
  • Can give scale/orientation invariance

Figures from Kadir, Zisserman and Brady 04
92
Occlusion
  • Explicit
  • Additional match of each part to missing state
  • Implicit
  • Truncated minimum probability of appearance

µpart
Appearance space
Log probability
93
Background clutter
  • Explicit model
  • Generative model for clutter as well as
    foreground object
  • Use a sub-window
  • At correct position, no clutter is present

94
Efficient search methods
  • Interpretation tree (Grimson 87)
  • Condition on assigned parts to give search
    regions for remaining ones
  • Branch bound, A

95
Distance transforms
  • Felzenszwalb and Huttenlocher 00 05
  • Distance transforms
  • O(N2P) ? O(NP) for tree structured models
  • Potentials must take certain form, e.g. Gaussian
  • Permits exhaustive search for each parts
    location
  • No need for feature detectors in recognition

96
Demo Web Page
97
Parts and Structure modelsSummary
  • Correspondence problem
  • Efficient methods for large parts and
    positions in image
  • Challenge to get representation with desired
    invariance
  • Future directions
  • Multiple views
  • Approaches to learning
  • Multiple category training
Write a Comment
User Comments (0)
About PowerShow.com