Title: Probabilistic Graphical Models for Scene and Object Recognition
1Probabilistic Graphical Models for Scene and
Object Recognition
- Kevin MurphyMIT CSAIL (Computer Science
Artificial Intelligence Lab)
2Goal build learning machine
- How can a machine learn a model of the world?
- How can it use this model to act?
Model
World
3Goal build learning machine
- How can a machine learn a model of the world?
- How can it use this model to act?
Model
World
4Goal build learning machine
- How can we train a machine to estimate the hidden
state of the world from noisy data?
Hidden state
Estimate
Observations
5A trainable integrated vision system
Objectdetection
Scene classification
Place recognition
Joint work with Antonio Torralba and Bill Freeman
6What is this?
7Temporal context
8What is this?
9Global scene context
10Global scene context
11Global scene context
Torralba, IJCV 2003
12The need for contextual reasoning
- Local evidence often ambiguous
- Must fuse multiple sources of information
- (Not just for computer vision)
13The need for probabilistic reasoning
- Use probabilistic models
- Probability theory is nothing but common sense
reduced to calculation Pierre Simon Laplace - Use probabilistic graphical models
- Graphical models provide a natural tool for
dealing with two problems that occur throughout
applied mathematics and engineering --
uncertainty and complexity. Michael I. Jordan
14Outline
15Outline
1. Probabilisticgraphical models
2. Place/ scenerecognition
16Probabilistic graphical models
Probabilistic models
Graphical models
Undirected
Directed
(Markov Randomfields - MRFs)
(Bayesian networks)
17Bayesian networks
- Qualitative partDirected acylic graph (DAG)
- Nodes random variables
- Edges direct influence
- Quantitative partconditional probability
distributions (CPDs) - P(Xi XPai)
- Together, defines joint probability distribution
in factored form
Earthquake
Burglary
Alarm
Radio
Radio
Pearl, 1988
18Applications of PGMs
- State estimation P(Hv)
- Speech recognition (HMMs)
- Computational biology
- Error-correcting codes
- Medical and fault-diagnosis
- Computer vision
19Outline
1. Probabilisticgraphical models
2. Place/ scenerecognition
Torralba, Murphy, Freeman, Rubin, ICCV 2003
20Scene classification
Office
Corridor
Street
21Place recognition
Office 610
Office 615
Draper street
59 other places
22Wearable test-bed v1
23Heads-up display
Office (610)
Bookshelf
Screen
Screen
Desk
24Wearable test-bed v2
Antonio Torralba
25Why wearable test-bed?
- Aid for visually impaired
- Blindsight Corp.
- Proxy for mobile robot
- Avoids control problem
- Easy to use indoors and outdoors
- Challenging, but realistic, test conditions for
vision/ learning system
26Global image featuresthe gist of the scene
- Average filter outputs at multiple scales,
orientations and locations - Dimensionality reduction via PCA(from 384 to 80)
Oliva Torralba, 2001
27Example visual gists
28Gist classifier
Mixture of Gaussians (MoG)
Learn with EM
Location
gist
29Temporal context helps
30Temporal classifier
Lt-1
Lt
vt-1g
vtg
Hidden Markov model (HMM)
31Observation model
Lt-1
Lt
vt-1g
vtg
mixture of Gaussians
32Transition matrix topological map
Lt-1
Lt
Learn bycounting observed transitions
vt-1g
vtg
33Place recognition over time
34Performance in novel environment
?
35Place and scene recognition
Factorial HMM
St-1
St
Lt-1
Lt
vt-1g
vtg
36Performance in novel environment
Place
Scene-type
37Indoor/outdoor classification
Place
Scene-type
Indoor/outdoor
38Place/scene recognition demo
39ER1 mobile robot test-bed
Roth, Murphy, Kaelbling, in progress
40HMM beats MoG
Office
Corridor
Street
41Generative vs discriminative
- Generative
- Mixtures ofGaussians
- Discriminative
- Neural net
- SVM
- Boosted decision stumps
S
S
vg
vg
42Generative vs discriminative
Office
Corridor
Street
- Baseline
- MoG
- Discriminative
- HMM
43Discriminative temporal classifier v1
c.f. Input-output HMM
St-1
St
vt-1g
vtg
Bengio Frasconi, 1996
44Label-bias problem
- Backwards information blocked by hidden child
X
St-1
St
vt-1g
vtg
McCallum, Freitag, Pereira, 2000
45Discriminative temporal classifier v2
Conditional random field (CRF)
St-1
St
vt-1g
vtg
McCallum, Freitag, Pereira, 2000
46CRF beats HMM
Office
Corridor
Street
- Baseline
- MoG
- Boosted stumps
- HMM
- CRF
474 kinds of PGM
484 kinds of PGM
494 kinds of PGM
504 kinds of PGM
514 kinds of PGM
52Outline
1. Probabilisticgraphical models
3.Space-efficient learning
2. Place/ scenerecognition
Binder, Murphy, Russell, IJCAI 1997
53Parameter estimation in CRFs
St-1
St
vt-1g
vtg
54Potential functions
St-1
St
vt-1g
vtg
55Partition function
St-1
St
vt-1g
vtg
56Loglinear observation model
57Loglinear observation model
Output of boosted scene-type classifier
58Loglinear transition model
59Loglinear transition model
Indicator function
60Parameter estimation in CRFs
- Estimate ws in ?, ?
- Use (generalized) iterative scaling
- Slow
- Use (conjugate) gradient descent
onlog-likelihood function - Faster
- Both algorithms find globally optimal w
- No missing data (supervised learning)
- Convex loss function
61Gradient of log-likelihood
62Gradient of log-likelihood
Number of state transitions
63Gradient of log-likelihood
Expected number of transitions
64Gradient of log-likelihood
Expected number of transitions
65Gradient of log-likelihood
Need to compute marginals and pairwise marginals
66Belief propagation for chains
?24
?1
?12
- Forwards
- Backwards
- Combine
S24
S12
S1
b24
b12
b1
67Inference complexity
- Time O(S2 T) time
- Matrix-vector multiply per time-step
- Space O(S T)
- Store ?t, t1T, until backwards pass
68Learning complexity
- Time O(N S2 T) time
- N iterations (calls to forwards-backwards)
- Space O(S T)
- Store ?t, t1T, until backwards pass
- But sufficient statistics have size O(S2)!
69Running out of space
- S can be large
- Discretization of continuous state-space
- Product of many variables(e.g., words x phones x
sub-phones) - T can be large
- Video, speech, bio-sequences
- Difficult to train complex temporal models on
long sequences
70Trading time for space
- FwdBack O(S T) space, O(S2 T) time
- VarElim O(S2) space, O(S2 T2) timeDarwiche,
2001 - Island O(S logk T) space, O(S2 T logk T) time
Binder, Murphy, Russell, 1997
71Island algorithm in practice
- DBN for DNA splice-site detection
- states S 106
- sequence length T105
- Space decreased by x103
- Time increased by x2
- Incorporated into GMTk speech toolbox
72The island algorithm
- Store messages at k1 islands
- Call recursively on each segment
?24
?1
?12
S24
S12
S1
b24
b12
b1
73Complexity analysis
- Space
- O(S)
- O(S)
- O(S)
- O(S logk T)
- Time
- O(S2 T)
- O(S2 2 (T/2))
- O(S2 4 (T/4))
- O(S2 T logkT)
logk T
74Complexity analysis
- Space
- O(S)
- O(S)
- O(S)
- O(S logk T)
- O(S 2)
- Time
- O(S2 T)
- O(S2 2 (T/2))
- O(S2 4 (T/4))
- O(S2 T logkT)
- O(S2 T 2)
logk T
k p T
75Outline
1. Probabilisticgraphical models
4. Object detection
3.Space-efficient learning
2. Place/ scenerecognition
Murphy, Torralba, Freeman, NIPS 2003
76Object recognition/ detection
Lowe, 2004
Nene, Nayar Murase, 1996
Leibe Schiele, 2003
Agarwal Roth, 2002
77Object recognition/ detection
Lowe, 2004
Nene, Nayar Murase, 1996
Leibe Schiele, 2003
Agarwal Roth, 2002
78Instance recognition
Nene, Nayar Murase, 1996
79Object recognition/ detection
Lowe, 2004
Nene, Nayar Murase, 1996
Leibe Schiele, 2003
Agarwal Roth, 2002
80Instance detection
Lowe, 2004
81Object recognition/ detection
Lowe, 2004
Nene, Nayar Murase, 1996
Leibe Schiele, 2003
Agarwal Roth, 2002
82Class recognition
Leibe Schiele, 2003
83Object recognition/ detection
Lowe, 2004
Nene, Nayar Murase, 1996
Leibe Schiele, 2003
Agarwal Roth, 2002
84Class detection
Agarwal Roth, 2002
85Standard model
- Train classifier for object vs background
- Slide each classifier across image pyramid
Rowley, Baluja Kanade, 1995 Schneiderman
Kanade, 2000Papageorgio Poggio, 2000 Viola
Jones, 2001 Agarwal Roth, 2002 et al
86Standard model as PGM
Output of classifier
Patch feature vector
Class 1
Class C
87Feature vectors
Output of classifier
Patch feature vector
Class 1
Class C
88Feature vectors, vic 2 R720
1. Apply filter
2. Energy, kurtosis
.
57.3
Dictionary of 30 spatial masks
3. Apply spatial mask
4. Average response
c.f., Viola Jones, 2001
89Classifier
Output of classifier
Patch feature vector
Class 1
Class C
90Classifier
- Support Vector Machine
- Neural network
- Naïve Bayes
- Boosted decision stumps
Output of classifier
d1C
dNC
d11
dN1
. . .
Patch feature vector
. . .
. . .
v1C
VNC
v11
vN1
Class 1
Class C
91Examples of features selected by boosting
Screen
Pedestrian
Building
92Output of classifiers
. . .
Class 1
Class C
93Find local maxima
. . .
Class 1
Class C
94Apply threshold
. . .
Class 1
Class C
95Final hypothesis
X11
X12
Xc1
. . .
Class 1
Class C
96Characteristics of standard model
- Feedforward (no iteration)
- Only uses local evidence
- Classes are treated independently
97Local features are ambiguous
98Local features are ambiguous
99Add global features
. . .
Class 1
Class C
100How use global features?
P(detector on local features, gist) ?
d1
dc
. . .
Class 1
Class C
101Use global features to predict location
Torralba, IJCV 2003
102Use global features to predict location
Torralba, IJCV 2003
103Training
Regression
104Testing
- Scenes are arranged in horizontal layers
105Combining
Output of boosted classifier
Deviation from predicted location
d1
dc
. . .
Class 1
Class C
106Demo
107How many objects?
X11
X12
Xc1
. . .
Class 1
Class C
108Number of objects is a random variable
. . .
Class 1
Class C
109Object-presence detection
Ec 1 if Nc gt 0 (present) 0 if Nc 0
(absent)
. . .
Class 1
Class C
110Keyboard-presence detection
- Useful for image retrieval
E0
E1
E0
111Max detection
. . .
Class 1
Class C
112Keyboard present?
113Add global features
. . .
Class 1
Class C
114Detectors vs DetectorsGist
Detection rate
False alarm rate
Keyboards
115Detectors vs DetectorsGist
Detection rate
False alarm rate
Screens
116Detectors vs DetectorsGist
deskFrontal
carSide
bookshelf
keyboard
screenFrontal
personWalking
117Evaluation on UIUC test-set
Our method
ArgarwalRoth (ECCV02)
Recall
1-precision
118Outline
1. Probabilisticgraphical models
4. Object detection
3.Space-efficient learning
2. Place/ scenerecognition
5.Scene recognition object detection
Murphy, Torralba, Freeman, NIPS 2003
119Many object types co-occur
120Dont want to model correlation directly
121Scene-type is hidden common cause
Office
Street
122Objects are conditionally independent given
scene-type
. . .
Class 1
Class C
123Global information only
. . .
Class 1
Class C
124Global information only
Ekbd
Ecar
Scene
vg
No temporal integration
125Bringing back time
. . .
Class 1
Class C
126Predicting object presence given place
Place tracking
Estimated presence
True presence
127Putting it all together
Local Global Temporal
. . .
Class 1
Class C
128Joint place recognition, scene classification and
object detection
Objects
Scene
Place
129Outline
- Probabilistic graphical models
- Place and scene recognition
- Temporal context
- Efficient inference and learning
- Object detection
- Standard approach
- Global scene context
- Joint scene recognition and object detection
- Future work
130Spatial relations
131Adding relational constraints
Local Global Temporal Relational
. . .
Class 1
Class C
132Difficulties
- Graph has cycles
- Intractable to do exact inference/ learning
- Use loopy belief propagation?
- O(N2) intra-class cross-arcs
- Hope N is small (use hierarchy to group if not)
- O(C2) inter-class cross-arcs
- Make structure conditional on scene-type
- Graph does not have fixed size/ structure
133Future work
- More expressive probability models
- Dynamic numbers of objects/ relations
- Efficient approximate inference / learning
- Semi-supervised learning
134Summary
- Probabilistic graphical models provide a plug
and play methodology for combining learnable
components in a coherent way. - Numerous application areas
- Computer vision
- Computational biology
- Natural language processing
- etc.
135(No Transcript)
136Computing the feature vector
Bank of 12 filters
1. filter
2. Raise to a power
.
Average response
Vp 57.3
Dictionary of 30 spatial masks
3. Apply spatial mask
4. Average response
1371 Convolve patch with filter
convolution
Bank of 12 filters
Long edges
Gaussianderivatives
Laplacian
Corner
1382 Raise to a power pointwise
Bank of 12 filters
Histogram of filter bank responsescan be
characterized by ?2 and K
1392 Raise to a power pointwise
Bank of 12 filters
Histogram of filter bank responsescan be
characterized by ?2 and K
g 2 or 4
Variance
Kurtosis
(useful for texture analysis)
1403 Apply spatial mask
Bank of 12 filters
.
Dictionary of 30 spatial masks
c.f., ViolaJones
1414 Compute average response
Bank of 12 filters
.
Average response
Vp 57.3
Dictionary of 30 spatial masks
12 x 30 x 2 720 features per patch
142Training data
- Hand-annotated 20 object types in 2500
imagesacquired with wearable web-cam and digital
camera - 50-200 positive patches ( 30x50 pixels) per
class - 1000 negative patches (randomly sampled)
GUI Annotation tool by E. Pasztor
143Benefit of location priming
car
keyboard
screen
pedestrian
144Location priming kills false positives
Location
Detector
Both
145Performance of our standard model
146Boosting
- Sequentially construct weighted combinations of
simple (weak) classifiers
Freund Schapire Friedman, Hastie Tibshirani
et al
147Boosting
- Sequentially fit an additive function
Strong learner
Weak learner
Feature vector
148Boosting
- Sequentially fit an additive function
- At each round t, we minimize the residual loss
input
Desired output
149Boosting
- Sequentially fit an additive function
- At each round t, we minimize the residual loss
- We use regression stumps as weak learners
150Advantages of boosting
- Creates very accurate, very fast classifiers
- Training is fast and easy to implement
- Can handle high-dimensional data(stumps perform
feature selection)
151Converting boosting output to a probability
distribution
P(Pik1b) s(lT 1 b)
sigmoid
weights
Offset/bias term
152Place and scene recognition
St-1
St
Lt-1
Lt
vt-1g
vtg
153Collaborators
- William Freeman
- Antonio Torralba
- Leslie Kaelbling
- Dan Roth