Probabilistic Graphical Models for Scene and Object Recognition - PowerPoint PPT Presentation

1 / 153
About This Presentation
Title:

Probabilistic Graphical Models for Scene and Object Recognition

Description:

Probabilistic Graphical Models for Scene and Object Recognition – PowerPoint PPT presentation

Number of Views:130
Avg rating:3.0/5.0
Slides: 154
Provided by: KPM
Category:

less

Transcript and Presenter's Notes

Title: Probabilistic Graphical Models for Scene and Object Recognition


1
Probabilistic Graphical Models for Scene and
Object Recognition
  • Kevin MurphyMIT CSAIL (Computer Science
    Artificial Intelligence Lab)

2
Goal build learning machine
  • How can a machine learn a model of the world?
  • How can it use this model to act?

Model
World
3
Goal build learning machine
  • How can a machine learn a model of the world?
  • How can it use this model to act?

Model
World
4
Goal build learning machine
  • How can we train a machine to estimate the hidden
    state of the world from noisy data?

Hidden state
Estimate
Observations
5
A trainable integrated vision system
Objectdetection
Scene classification
Place recognition
Joint work with Antonio Torralba and Bill Freeman
6
What is this?
7
Temporal context
8
What is this?
9
Global scene context
10
Global scene context
11
Global scene context
Torralba, IJCV 2003
12
The need for contextual reasoning
  • Local evidence often ambiguous
  • Must fuse multiple sources of information
  • (Not just for computer vision)

13
The need for probabilistic reasoning
  • Use probabilistic models
  • Probability theory is nothing but common sense
    reduced to calculation Pierre Simon Laplace
  • Use probabilistic graphical models
  • Graphical models provide a natural tool for
    dealing with two problems that occur throughout
    applied mathematics and engineering --
    uncertainty and complexity. Michael I. Jordan

14
Outline
15
Outline
1. Probabilisticgraphical models
2. Place/ scenerecognition
16
Probabilistic graphical models
Probabilistic models
Graphical models
Undirected
Directed
(Markov Randomfields - MRFs)
(Bayesian networks)
17
Bayesian networks
  • Qualitative partDirected acylic graph (DAG)
  • Nodes random variables
  • Edges direct influence
  • Quantitative partconditional probability
    distributions (CPDs)
  • P(Xi XPai)
  • Together, defines joint probability distribution
    in factored form

Earthquake
Burglary
Alarm
Radio
Radio
Pearl, 1988
18
Applications of PGMs
  • State estimation P(Hv)
  • Speech recognition (HMMs)
  • Computational biology
  • Error-correcting codes
  • Medical and fault-diagnosis
  • Computer vision

19
Outline
1. Probabilisticgraphical models
2. Place/ scenerecognition
Torralba, Murphy, Freeman, Rubin, ICCV 2003
20
Scene classification
Office
Corridor
Street
21
Place recognition
Office 610
Office 615
Draper street
59 other places
22
Wearable test-bed v1
23
Heads-up display
Office (610)
Bookshelf
Screen
Screen
Desk
24
Wearable test-bed v2
Antonio Torralba
25
Why wearable test-bed?
  • Aid for visually impaired
  • Blindsight Corp.
  • Proxy for mobile robot
  • Avoids control problem
  • Easy to use indoors and outdoors
  • Challenging, but realistic, test conditions for
    vision/ learning system

26
Global image featuresthe gist of the scene
  • Average filter outputs at multiple scales,
    orientations and locations
  • Dimensionality reduction via PCA(from 384 to 80)

Oliva Torralba, 2001
27
Example visual gists
28
Gist classifier
Mixture of Gaussians (MoG)
Learn with EM
Location
gist
29
Temporal context helps
30
Temporal classifier
Lt-1
Lt
vt-1g
vtg
Hidden Markov model (HMM)
31
Observation model
Lt-1
Lt
vt-1g
vtg
mixture of Gaussians
32
Transition matrix topological map
Lt-1
Lt
Learn bycounting observed transitions
vt-1g
vtg
33
Place recognition over time
34
Performance in novel environment
?
35
Place and scene recognition
Factorial HMM
St-1
St
Lt-1
Lt
vt-1g
vtg
36
Performance in novel environment
Place
Scene-type
37
Indoor/outdoor classification
Place
Scene-type
Indoor/outdoor
38
Place/scene recognition demo
39
ER1 mobile robot test-bed
Roth, Murphy, Kaelbling, in progress
40
HMM beats MoG
Office
Corridor
Street
  • Baseline
  • MoG
  • HMM

41
Generative vs discriminative
  • Generative
  • Mixtures ofGaussians
  • Discriminative
  • Neural net
  • SVM
  • Boosted decision stumps

S
S
vg
vg
42
Generative vs discriminative
Office
Corridor
Street
  • Baseline
  • MoG
  • Discriminative
  • HMM

43
Discriminative temporal classifier v1
c.f. Input-output HMM
St-1
St
vt-1g
vtg
Bengio Frasconi, 1996
44
Label-bias problem
  • Backwards information blocked by hidden child

X
St-1
St
vt-1g
vtg
McCallum, Freitag, Pereira, 2000
45
Discriminative temporal classifier v2
Conditional random field (CRF)
St-1
St
vt-1g
vtg
McCallum, Freitag, Pereira, 2000
46
CRF beats HMM
Office
Corridor
Street
  • Baseline
  • MoG
  • Boosted stumps
  • HMM
  • CRF

47
4 kinds of PGM
48
4 kinds of PGM
49
4 kinds of PGM
50
4 kinds of PGM
51
4 kinds of PGM
52
Outline
1. Probabilisticgraphical models
3.Space-efficient learning
2. Place/ scenerecognition
Binder, Murphy, Russell, IJCAI 1997
53
Parameter estimation in CRFs
St-1
St
vt-1g
vtg
54
Potential functions
St-1
St
vt-1g
vtg
55
Partition function
St-1
St
vt-1g
vtg
56
Loglinear observation model
57
Loglinear observation model
Output of boosted scene-type classifier
58
Loglinear transition model
59
Loglinear transition model
Indicator function
60
Parameter estimation in CRFs
  • Estimate ws in ?, ?
  • Use (generalized) iterative scaling
  • Slow
  • Use (conjugate) gradient descent
    onlog-likelihood function
  • Faster
  • Both algorithms find globally optimal w
  • No missing data (supervised learning)
  • Convex loss function

61
Gradient of log-likelihood
62
Gradient of log-likelihood
Number of state transitions
63
Gradient of log-likelihood
Expected number of transitions
64
Gradient of log-likelihood
Expected number of transitions
65
Gradient of log-likelihood
Need to compute marginals and pairwise marginals
66
Belief propagation for chains
?24
?1
?12

  • Forwards
  • Backwards
  • Combine

S24
S12
S1
b24
b12
b1
67
Inference complexity
  • Time O(S2 T) time
  • Matrix-vector multiply per time-step
  • Space O(S T)
  • Store ?t, t1T, until backwards pass

68
Learning complexity
  • Time O(N S2 T) time
  • N iterations (calls to forwards-backwards)
  • Space O(S T)
  • Store ?t, t1T, until backwards pass
  • But sufficient statistics have size O(S2)!

69
Running out of space
  • S can be large
  • Discretization of continuous state-space
  • Product of many variables(e.g., words x phones x
    sub-phones)
  • T can be large
  • Video, speech, bio-sequences
  • Difficult to train complex temporal models on
    long sequences

70
Trading time for space
  • FwdBack O(S T) space, O(S2 T) time
  • VarElim O(S2) space, O(S2 T2) timeDarwiche,
    2001
  • Island O(S logk T) space, O(S2 T logk T) time
    Binder, Murphy, Russell, 1997

71
Island algorithm in practice
  • DBN for DNA splice-site detection
  • states S 106
  • sequence length T105
  • Space decreased by x103
  • Time increased by x2
  • Incorporated into GMTk speech toolbox

72
The island algorithm
  • Store messages at k1 islands
  • Call recursively on each segment

?24
?1
?12


S24
S12
S1
b24
b12
b1
73
Complexity analysis
  • Space
  • O(S)
  • O(S)
  • O(S)
  • O(S logk T)
  • Time
  • O(S2 T)
  • O(S2 2 (T/2))
  • O(S2 4 (T/4))
  • O(S2 T logkT)

logk T

74
Complexity analysis
  • Space
  • O(S)
  • O(S)
  • O(S)
  • O(S logk T)
  • O(S 2)
  • Time
  • O(S2 T)
  • O(S2 2 (T/2))
  • O(S2 4 (T/4))
  • O(S2 T logkT)
  • O(S2 T 2)

logk T

k p T
75
Outline
1. Probabilisticgraphical models
4. Object detection
3.Space-efficient learning
2. Place/ scenerecognition
Murphy, Torralba, Freeman, NIPS 2003
76
Object recognition/ detection
Lowe, 2004
Nene, Nayar Murase, 1996
Leibe Schiele, 2003
Agarwal Roth, 2002
77
Object recognition/ detection
Lowe, 2004
Nene, Nayar Murase, 1996
Leibe Schiele, 2003
Agarwal Roth, 2002
78
Instance recognition
Nene, Nayar Murase, 1996
79
Object recognition/ detection
Lowe, 2004
Nene, Nayar Murase, 1996
Leibe Schiele, 2003
Agarwal Roth, 2002
80
Instance detection
Lowe, 2004
81
Object recognition/ detection
Lowe, 2004
Nene, Nayar Murase, 1996
Leibe Schiele, 2003
Agarwal Roth, 2002
82
Class recognition
Leibe Schiele, 2003
83
Object recognition/ detection
Lowe, 2004
Nene, Nayar Murase, 1996
Leibe Schiele, 2003
Agarwal Roth, 2002
84
Class detection
Agarwal Roth, 2002
85
Standard model
  • Train classifier for object vs background
  • Slide each classifier across image pyramid

Rowley, Baluja Kanade, 1995 Schneiderman
Kanade, 2000Papageorgio Poggio, 2000 Viola
Jones, 2001 Agarwal Roth, 2002 et al
86
Standard model as PGM
Output of classifier
Patch feature vector
Class 1
Class C
87
Feature vectors
Output of classifier
Patch feature vector
Class 1
Class C
88
Feature vectors, vic 2 R720
1. Apply filter
2. Energy, kurtosis
.
57.3
Dictionary of 30 spatial masks
3. Apply spatial mask
4. Average response
c.f., Viola Jones, 2001
89
Classifier
Output of classifier
Patch feature vector
Class 1
Class C
90
Classifier
  • Support Vector Machine
  • Neural network
  • Naïve Bayes
  • Boosted decision stumps

Output of classifier
d1C
dNC
d11
dN1
. . .
Patch feature vector
. . .
. . .
v1C
VNC
v11
vN1
Class 1
Class C
91
Examples of features selected by boosting
Screen
Pedestrian
Building
92
Output of classifiers
. . .
Class 1
Class C
93
Find local maxima
. . .
Class 1
Class C
94
Apply threshold
. . .
Class 1
Class C
95
Final hypothesis
X11
X12
Xc1
. . .
Class 1
Class C
96
Characteristics of standard model
  • Feedforward (no iteration)
  • Only uses local evidence
  • Classes are treated independently

97
Local features are ambiguous
98
Local features are ambiguous
99
Add global features
. . .
Class 1
Class C
100
How use global features?
P(detector on local features, gist) ?
d1
dc
. . .
Class 1
Class C
101
Use global features to predict location
Torralba, IJCV 2003
102
Use global features to predict location
Torralba, IJCV 2003
103
Training
Regression

104
Testing
  • Scenes are arranged in horizontal layers

105
Combining
Output of boosted classifier
Deviation from predicted location
d1
dc
. . .
Class 1
Class C
106
Demo
107
How many objects?
X11
X12
Xc1
. . .
Class 1
Class C
108
Number of objects is a random variable
. . .
Class 1
Class C
109
Object-presence detection
Ec 1 if Nc gt 0 (present) 0 if Nc 0
(absent)
. . .
Class 1
Class C
110
Keyboard-presence detection
  • Useful for image retrieval

E0
E1
E0
111
Max detection
. . .
Class 1
Class C
112
Keyboard present?
113
Add global features
. . .
Class 1
Class C
114
Detectors vs DetectorsGist
Detection rate
False alarm rate
Keyboards
115
Detectors vs DetectorsGist
Detection rate
False alarm rate
Screens
116
Detectors vs DetectorsGist
deskFrontal
carSide
bookshelf
keyboard
screenFrontal
personWalking
117
Evaluation on UIUC test-set
Our method
ArgarwalRoth (ECCV02)
Recall
1-precision
118
Outline
1. Probabilisticgraphical models
4. Object detection
3.Space-efficient learning
2. Place/ scenerecognition
5.Scene recognition object detection
Murphy, Torralba, Freeman, NIPS 2003
119
Many object types co-occur
120
Dont want to model correlation directly
121
Scene-type is hidden common cause
Office
Street
122
Objects are conditionally independent given
scene-type
. . .
Class 1
Class C
123
Global information only
. . .
Class 1
Class C
124
Global information only
Ekbd
Ecar
Scene
vg
No temporal integration
125
Bringing back time
. . .
Class 1
Class C
126
Predicting object presence given place
Place tracking
Estimated presence
True presence
127
Putting it all together
Local Global Temporal
. . .
Class 1
Class C
128
Joint place recognition, scene classification and
object detection
Objects
Scene
Place
129
Outline
  • Probabilistic graphical models
  • Place and scene recognition
  • Temporal context
  • Efficient inference and learning
  • Object detection
  • Standard approach
  • Global scene context
  • Joint scene recognition and object detection
  • Future work

130
Spatial relations
131
Adding relational constraints
Local Global Temporal Relational
. . .
Class 1
Class C
132
Difficulties
  • Graph has cycles
  • Intractable to do exact inference/ learning
  • Use loopy belief propagation?
  • O(N2) intra-class cross-arcs
  • Hope N is small (use hierarchy to group if not)
  • O(C2) inter-class cross-arcs
  • Make structure conditional on scene-type
  • Graph does not have fixed size/ structure

133
Future work
  • More expressive probability models
  • Dynamic numbers of objects/ relations
  • Efficient approximate inference / learning
  • Semi-supervised learning

134
Summary
  • Probabilistic graphical models provide a plug
    and play methodology for combining learnable
    components in a coherent way.
  • Numerous application areas
  • Computer vision
  • Computational biology
  • Natural language processing
  • etc.

135
(No Transcript)
136
Computing the feature vector
Bank of 12 filters
1. filter
2. Raise to a power
.
Average response
Vp 57.3
Dictionary of 30 spatial masks
3. Apply spatial mask
4. Average response
137
1 Convolve patch with filter
convolution
Bank of 12 filters
Long edges
Gaussianderivatives
Laplacian
Corner
138
2 Raise to a power pointwise
Bank of 12 filters
Histogram of filter bank responsescan be
characterized by ?2 and K
139
2 Raise to a power pointwise
Bank of 12 filters
Histogram of filter bank responsescan be
characterized by ?2 and K
g 2 or 4
Variance
Kurtosis
(useful for texture analysis)
140
3 Apply spatial mask
Bank of 12 filters
.
Dictionary of 30 spatial masks
c.f., ViolaJones
141
4 Compute average response
Bank of 12 filters
.
Average response
Vp 57.3
Dictionary of 30 spatial masks
12 x 30 x 2 720 features per patch
142
Training data
  • Hand-annotated 20 object types in 2500
    imagesacquired with wearable web-cam and digital
    camera
  • 50-200 positive patches ( 30x50 pixels) per
    class
  • 1000 negative patches (randomly sampled)

GUI Annotation tool by E. Pasztor
143
Benefit of location priming
car
keyboard
  • Gist
  • Detector
  • Both

screen
pedestrian
144
Location priming kills false positives
Location
Detector
Both
145
Performance of our standard model
146
Boosting
  • Sequentially construct weighted combinations of
    simple (weak) classifiers

Freund Schapire Friedman, Hastie Tibshirani
et al
147
Boosting
  • Sequentially fit an additive function

Strong learner
Weak learner
Feature vector
148
Boosting
  • Sequentially fit an additive function
  • At each round t, we minimize the residual loss

input
Desired output
149
Boosting
  • Sequentially fit an additive function
  • At each round t, we minimize the residual loss
  • We use regression stumps as weak learners

150
Advantages of boosting
  • Creates very accurate, very fast classifiers
  • Training is fast and easy to implement
  • Can handle high-dimensional data(stumps perform
    feature selection)

151
Converting boosting output to a probability
distribution
P(Pik1b) s(lT 1 b)
sigmoid
weights
Offset/bias term
152
Place and scene recognition
St-1
St
Lt-1
Lt
vt-1g
vtg
153
Collaborators
  • William Freeman
  • Antonio Torralba
  • Leslie Kaelbling
  • Dan Roth
Write a Comment
User Comments (0)
About PowerShow.com