Probabilistic Graphical Models for Scene and Object Recognition

About This Presentation

Title:

Probabilistic Graphical Models for Scene and Object Recognition

Description:

Probabilistic Graphical Models for Scene and Object Recognition – PowerPoint PPT presentation

Number of Views:130

Avg rating:3.0/5.0

Slides: 154

Provided by: KPM

Category:

more less

Transcript and Presenter's Notes

Title: Probabilistic Graphical Models for Scene and Object Recognition

1
Probabilistic Graphical Models for Scene and
Object Recognition

Kevin MurphyMIT CSAIL (Computer Science
Artificial Intelligence Lab)

2
Goal build learning machine

How can a machine learn a model of the world?
How can it use this model to act?

Model
World
3
Goal build learning machine

How can a machine learn a model of the world?
How can it use this model to act?

Model
World
4
Goal build learning machine

How can we train a machine to estimate the hidden
state of the world from noisy data?

Hidden state
Estimate
Observations
5
A trainable integrated vision system
Objectdetection
Scene classification
Place recognition
Joint work with Antonio Torralba and Bill Freeman
6
What is this?
7
Temporal context
8
What is this?
9
Global scene context
10
Global scene context
11
Global scene context
Torralba, IJCV 2003
12
The need for contextual reasoning

Local evidence often ambiguous
Must fuse multiple sources of information
(Not just for computer vision)

13
The need for probabilistic reasoning

Use probabilistic models
Probability theory is nothing but common sense
reduced to calculation Pierre Simon Laplace
Use probabilistic graphical models
Graphical models provide a natural tool for
dealing with two problems that occur throughout
applied mathematics and engineering --
uncertainty and complexity. Michael I. Jordan

14
Outline
15
Outline
1. Probabilisticgraphical models
2. Place/ scenerecognition
16
Probabilistic graphical models
Probabilistic models
Graphical models
Undirected
Directed
(Markov Randomfields - MRFs)
(Bayesian networks)
17
Bayesian networks

Qualitative partDirected acylic graph (DAG)
Nodes random variables
Edges direct influence
Quantitative partconditional probability
distributions (CPDs)
P(Xi XPai)
Together, defines joint probability distribution
in factored form

Earthquake
Burglary
Alarm
Radio
Radio
Pearl, 1988
18
Applications of PGMs

State estimation P(Hv)
Speech recognition (HMMs)
Computational biology
Error-correcting codes
Medical and fault-diagnosis
Computer vision

19
Outline
1. Probabilisticgraphical models
2. Place/ scenerecognition
Torralba, Murphy, Freeman, Rubin, ICCV 2003
20
Scene classification
Office
Corridor
Street
21
Place recognition
Office 610
Office 615
Draper street
59 other places
22
Wearable test-bed v1
23
Heads-up display
Office (610)
Bookshelf
Screen
Screen
Desk
24
Wearable test-bed v2
Antonio Torralba
25
Why wearable test-bed?

Aid for visually impaired
Blindsight Corp.
Proxy for mobile robot
Avoids control problem
Easy to use indoors and outdoors
Challenging, but realistic, test conditions for
vision/ learning system

26
Global image featuresthe gist of the scene

Average filter outputs at multiple scales,
orientations and locations
Dimensionality reduction via PCA(from 384 to 80)

Oliva Torralba, 2001
27
Example visual gists
28
Gist classifier
Mixture of Gaussians (MoG)
Learn with EM
Location
gist
29
Temporal context helps
30
Temporal classifier
Lt-1
Lt
vt-1g
vtg
Hidden Markov model (HMM)
31
Observation model
Lt-1
Lt
vt-1g
vtg
mixture of Gaussians
32
Transition matrix topological map
Lt-1
Lt
Learn bycounting observed transitions
vt-1g
vtg
33
Place recognition over time
34
Performance in novel environment
?
35
Place and scene recognition
Factorial HMM
St-1
St
Lt-1
Lt
vt-1g
vtg
36
Performance in novel environment
Place
Scene-type
37
Indoor/outdoor classification
Place
Scene-type
Indoor/outdoor
38
Place/scene recognition demo
39
ER1 mobile robot test-bed
Roth, Murphy, Kaelbling, in progress
40
HMM beats MoG
Office
Corridor
Street

Baseline
MoG
HMM

41
Generative vs discriminative

Generative
Mixtures ofGaussians

Discriminative
Neural net
SVM
Boosted decision stumps

S
S
vg
vg
42
Generative vs discriminative
Office
Corridor
Street

Baseline
MoG
Discriminative
HMM

43
Discriminative temporal classifier v1
c.f. Input-output HMM
St-1
St
vt-1g
vtg
Bengio Frasconi, 1996
44
Label-bias problem

Backwards information blocked by hidden child

X
St-1
St
vt-1g
vtg
McCallum, Freitag, Pereira, 2000
45
Discriminative temporal classifier v2
Conditional random field (CRF)
St-1
St
vt-1g
vtg
McCallum, Freitag, Pereira, 2000
46
CRF beats HMM
Office
Corridor
Street

Baseline
MoG
Boosted stumps
HMM
CRF

47
4 kinds of PGM
48
4 kinds of PGM
49
4 kinds of PGM
50
4 kinds of PGM
51
4 kinds of PGM
52
Outline
1. Probabilisticgraphical models
3.Space-efficient learning
2. Place/ scenerecognition
Binder, Murphy, Russell, IJCAI 1997
53
Parameter estimation in CRFs
St-1
St
vt-1g
vtg
54
Potential functions
St-1
St
vt-1g
vtg
55
Partition function
St-1
St
vt-1g
vtg
56
Loglinear observation model
57
Loglinear observation model
Output of boosted scene-type classifier
58
Loglinear transition model
59
Loglinear transition model
Indicator function
60
Parameter estimation in CRFs

Estimate ws in ?, ?
Use (generalized) iterative scaling
Slow
Use (conjugate) gradient descent
onlog-likelihood function
Faster
Both algorithms find globally optimal w
No missing data (supervised learning)
Convex loss function

61
Gradient of log-likelihood
62
Gradient of log-likelihood
Number of state transitions
63
Gradient of log-likelihood
Expected number of transitions
64
Gradient of log-likelihood
Expected number of transitions
65
Gradient of log-likelihood
Need to compute marginals and pairwise marginals
66
Belief propagation for chains
?24
?1
?12

Forwards
Backwards
Combine

S24
S12
S1
b24
b12
b1
67
Inference complexity

Time O(S2 T) time
Matrix-vector multiply per time-step
Space O(S T)
Store ?t, t1T, until backwards pass

68
Learning complexity

Time O(N S2 T) time
N iterations (calls to forwards-backwards)
Space O(S T)
Store ?t, t1T, until backwards pass
But sufficient statistics have size O(S2)!

69
Running out of space

S can be large
Discretization of continuous state-space
Product of many variables(e.g., words x phones x
sub-phones)
T can be large
Video, speech, bio-sequences
Difficult to train complex temporal models on
long sequences

70
Trading time for space

FwdBack O(S T) space, O(S2 T) time
VarElim O(S2) space, O(S2 T2) timeDarwiche,
2001
Island O(S logk T) space, O(S2 T logk T) time
Binder, Murphy, Russell, 1997

71
Island algorithm in practice

DBN for DNA splice-site detection
states S 106
sequence length T105
Space decreased by x103
Time increased by x2
Incorporated into GMTk speech toolbox

72
The island algorithm

Store messages at k1 islands
Call recursively on each segment

?24
?1
?12

S24
S12
S1
b24
b12
b1
73
Complexity analysis

Space
O(S)
O(S)
O(S)
O(S logk T)

Time
O(S2 T)
O(S2 2 (T/2))
O(S2 4 (T/4))
O(S2 T logkT)

logk T

74
Complexity analysis

Space
O(S)
O(S)
O(S)
O(S logk T)
O(S 2)

Time
O(S2 T)
O(S2 2 (T/2))
O(S2 4 (T/4))
O(S2 T logkT)
O(S2 T 2)

logk T

k p T
75
Outline
1. Probabilisticgraphical models
4. Object detection
3.Space-efficient learning
2. Place/ scenerecognition
Murphy, Torralba, Freeman, NIPS 2003
76
Object recognition/ detection
Lowe, 2004
Nene, Nayar Murase, 1996
Leibe Schiele, 2003
Agarwal Roth, 2002
77
Object recognition/ detection
Lowe, 2004
Nene, Nayar Murase, 1996
Leibe Schiele, 2003
Agarwal Roth, 2002
78
Instance recognition
Nene, Nayar Murase, 1996
79
Object recognition/ detection
Lowe, 2004
Nene, Nayar Murase, 1996
Leibe Schiele, 2003
Agarwal Roth, 2002
80
Instance detection
Lowe, 2004
81
Object recognition/ detection
Lowe, 2004
Nene, Nayar Murase, 1996
Leibe Schiele, 2003
Agarwal Roth, 2002
82
Class recognition
Leibe Schiele, 2003
83
Object recognition/ detection
Lowe, 2004
Nene, Nayar Murase, 1996
Leibe Schiele, 2003
Agarwal Roth, 2002
84
Class detection
Agarwal Roth, 2002
85
Standard model

Train classifier for object vs background
Slide each classifier across image pyramid

Rowley, Baluja Kanade, 1995 Schneiderman
Kanade, 2000Papageorgio Poggio, 2000 Viola
Jones, 2001 Agarwal Roth, 2002 et al
86
Standard model as PGM
Output of classifier
Patch feature vector
Class 1
Class C
87
Feature vectors
Output of classifier
Patch feature vector
Class 1
Class C
88
Feature vectors, vic 2 R720
1. Apply filter
2. Energy, kurtosis
.
57.3
Dictionary of 30 spatial masks
3. Apply spatial mask
4. Average response
c.f., Viola Jones, 2001
89
Classifier
Output of classifier
Patch feature vector
Class 1
Class C
90
Classifier

Support Vector Machine
Neural network
Naïve Bayes
Boosted decision stumps

Output of classifier
d1C
dNC
d11
dN1
. . .
Patch feature vector
. . .
. . .
v1C
VNC
v11
vN1
Class 1
Class C
91
Examples of features selected by boosting
Screen
Pedestrian
Building
92
Output of classifiers
. . .
Class 1
Class C
93
Find local maxima
. . .
Class 1
Class C
94
Apply threshold
. . .
Class 1
Class C
95
Final hypothesis
X11
X12
Xc1
. . .
Class 1
Class C
96
Characteristics of standard model

Feedforward (no iteration)
Only uses local evidence
Classes are treated independently

97
Local features are ambiguous
98
Local features are ambiguous
99
Add global features
. . .
Class 1
Class C
100
How use global features?
P(detector on local features, gist) ?
d1
dc
. . .
Class 1
Class C
101
Use global features to predict location
Torralba, IJCV 2003
102
Use global features to predict location
Torralba, IJCV 2003
103
Training
Regression

104
Testing

Scenes are arranged in horizontal layers

105
Combining
Output of boosted classifier
Deviation from predicted location
d1
dc
. . .
Class 1
Class C
106
Demo
107
How many objects?
X11
X12
Xc1
. . .
Class 1
Class C
108
Number of objects is a random variable
. . .
Class 1
Class C
109
Object-presence detection
Ec 1 if Nc gt 0 (present) 0 if Nc 0
(absent)
. . .
Class 1
Class C
110
Keyboard-presence detection

Useful for image retrieval

E0
E1
E0
111
Max detection
. . .
Class 1
Class C
112
Keyboard present?
113
Add global features
. . .
Class 1
Class C
114
Detectors vs DetectorsGist
Detection rate
False alarm rate
Keyboards
115
Detectors vs DetectorsGist
Detection rate
False alarm rate
Screens
116
Detectors vs DetectorsGist
deskFrontal
carSide
bookshelf
keyboard
screenFrontal
personWalking
117
Evaluation on UIUC test-set
Our method
ArgarwalRoth (ECCV02)
Recall
1-precision
118
Outline
1. Probabilisticgraphical models
4. Object detection
3.Space-efficient learning
2. Place/ scenerecognition
5.Scene recognition object detection
Murphy, Torralba, Freeman, NIPS 2003
119
Many object types co-occur
120
Dont want to model correlation directly
121
Scene-type is hidden common cause
Office
Street
122
Objects are conditionally independent given
scene-type
. . .
Class 1
Class C
123
Global information only
. . .
Class 1
Class C
124
Global information only
Ekbd
Ecar
Scene
vg
No temporal integration
125
Bringing back time
. . .
Class 1
Class C
126
Predicting object presence given place
Place tracking
Estimated presence
True presence
127
Putting it all together
Local Global Temporal
. . .
Class 1
Class C
128
Joint place recognition, scene classification and
object detection
Objects
Scene
Place
129
Outline

Probabilistic graphical models
Place and scene recognition
Temporal context
Efficient inference and learning
Object detection
Standard approach
Global scene context
Joint scene recognition and object detection
Future work

130
Spatial relations
131
Adding relational constraints
Local Global Temporal Relational
. . .
Class 1
Class C
132
Difficulties

Graph has cycles
Intractable to do exact inference/ learning
Use loopy belief propagation?
O(N2) intra-class cross-arcs
Hope N is small (use hierarchy to group if not)
O(C2) inter-class cross-arcs
Make structure conditional on scene-type
Graph does not have fixed size/ structure

133
Future work

More expressive probability models
Dynamic numbers of objects/ relations
Efficient approximate inference / learning
Semi-supervised learning

134
Summary

Probabilistic graphical models provide a plug
and play methodology for combining learnable
components in a coherent way.
Numerous application areas
Computer vision
Computational biology
Natural language processing
etc.

135
(No Transcript)
136
Computing the feature vector
Bank of 12 filters
1. filter
2. Raise to a power
.
Average response
Vp 57.3
Dictionary of 30 spatial masks
3. Apply spatial mask
4. Average response
137
1 Convolve patch with filter
convolution
Bank of 12 filters
Long edges
Gaussianderivatives
Laplacian
Corner
138
2 Raise to a power pointwise
Bank of 12 filters
Histogram of filter bank responsescan be
characterized by ?2 and K
139
2 Raise to a power pointwise
Bank of 12 filters
Histogram of filter bank responsescan be
characterized by ?2 and K
g 2 or 4
Variance
Kurtosis
(useful for texture analysis)
140
3 Apply spatial mask
Bank of 12 filters
.
Dictionary of 30 spatial masks
c.f., ViolaJones
141
4 Compute average response
Bank of 12 filters
.
Average response
Vp 57.3
Dictionary of 30 spatial masks
12 x 30 x 2 720 features per patch
142
Training data

Hand-annotated 20 object types in 2500
imagesacquired with wearable web-cam and digital
camera
50-200 positive patches ( 30x50 pixels) per
class
1000 negative patches (randomly sampled)

GUI Annotation tool by E. Pasztor
143
Benefit of location priming
car
keyboard

Gist
Detector
Both

screen
pedestrian
144
Location priming kills false positives
Location
Detector
Both
145
Performance of our standard model
146
Boosting

Sequentially construct weighted combinations of
simple (weak) classifiers

Freund Schapire Friedman, Hastie Tibshirani
et al
147
Boosting

Sequentially fit an additive function

Strong learner
Weak learner
Feature vector
148
Boosting

Sequentially fit an additive function
At each round t, we minimize the residual loss

input
Desired output
149
Boosting

Sequentially fit an additive function
At each round t, we minimize the residual loss
We use regression stumps as weak learners

150
Advantages of boosting

Creates very accurate, very fast classifiers
Training is fast and easy to implement
Can handle high-dimensional data(stumps perform
feature selection)

151
Converting boosting output to a probability
distribution
P(Pik1b) s(lT 1 b)
sigmoid
weights
Offset/bias term
152
Place and scene recognition
St-1
St
Lt-1
Lt
vt-1g
vtg
153
Collaborators

William Freeman
Antonio Torralba
Leslie Kaelbling
Dan Roth

Write a Comment

User Comments (0)

About PowerShow.com

Probabilistic Graphical Models for Scene and Object Recognition - PowerPoint PPT Presentation

Probabilistic Graphical Models for Scene and Object Recognition

Probabilistic Graphical Models for Scene and Object Recognition – PowerPoint PPT presentation