Regularization of energy-based representations - PowerPoint PPT Presentation

1 / 64

About This Presentation

Title:

Regularization of energy-based representations

Description:

Regularization of energy-based representations. Minimize total ... Interpolate from. coarse grid to. finer grid, use. the interpolated. values to initilize ... – PowerPoint PPT presentation

Number of Views:20

Avg rating:3.0/5.0

Slides: 65

Provided by: michael1094

Learn more at: http://www.ai.mit.edu

Category:

more less

Transcript and Presenter's Notes

Title: Regularization of energy-based representations

1
Regularization of energy-based representations

Minimize total energy lEp(u) (1-l)Ed(u,d)
Ep(u) Stabilizing function - a smoothness
constraint
Membrane stabilizer
Ep(u) 0.5Si,j (ui,j1 ui,j)2 (ui1,j
ui,j) 2
Thin plate stabilizer
Ep(u) 0.5Si,j (ui,j1 ui,j-1 2ui,j)2
(ui1,j ui-1,j 2ui,j)2
(ui1,j1 ui,j ui1,j ui,j1)2
Linear combinations of the two
Ed(u,d) Energy function, measures compatibility
between observations and data
Ed(u,d) 0.5Si,j ci,j (di,j ui,j)2
ci,j is the inverse of the variance in
measurement di,j

2
Stabilizing function membrane stabilizer
Ep(u) 0.5Si,j (ui,j1 ui,j)2 (ui1,j
ui,j)2
ui,j
i
j
3
Stabilizing function membrane stabilizer
Ep(u) 0.5Si,j (ui,j1 ui,j)2 (ui1,j
ui,j)2
ui,j
i
j
4
Stabilizing function membrane stabilizer
Ep(u) 0.5Si,j (ui,j1 ui,j)2 (ui1,j
ui,j)2
ui,j
i
j
5
Stabilizing function membrane stabilizer
Ep(u) 0.5Si,j (ui,j1 ui,j)2 (ui1,j
ui,j)2
ui,j
i
j

ATOM

6
Stabilizing function membrane stabilizer
Ep(u) 0.5Si,j (ui,j1 ui,j)2 (ui1,j
ui,j)2
ui,j
i
j

ATOM
ui,jui,j ui,j1ui,j1 ui,jui,j1
ui,j1ui,j

1
-1
7
Stabilizing function membrane stabilizer
Ep(u) 0.5Si,j (ui,j1 ui,j)2 (ui1,j
ui,j)2
ui,j
i
j
-1

ATOM
ui,jui,j ui1,jui1,j ui,jui1,j
ui1,jui,j

2
-1
8
Stabilizing function membrane stabilizer
Ep(u) 0.5Si,j (ui,j1 ui,j)2 (ui1,j
ui,j)2
ui,j
i
j
-1

ATOM
ui,jui,j ui,j1ui,j1 ui,jui,j1
ui,j1ui,j

3
-1
-1
9
Stabilizing function membrane stabilizer
Ep(u) 0.5Si,j (ui,j1 ui,j)2 (ui1,j
ui,j)2
ui,j
i
j
-1

ATOM
ui,jui,j ui1,jui1,j ui,jui1,j
ui1,jui,j

4
-1
-1
-1
10
Stabilizing function membrane stabilizer
Ep(u) 0.5Si,j (ui,j1 ui,j)2 (ui1,j
ui,j)2
ui,j
i
j
-1

ATOM
ui,jui,j ui1,jui1,j ui,jui1,j
ui1,jui,j

4
-1
-1
-1
u
11
Stabilizing function membrane stabilizer
Ep(u) 0.5Si,j (ui,j1 ui,j)2 (ui1,j
ui,j)2
ui,j
i
j
-1

ATOM
ui,jui,j ui1,jui1,j ui,jui1,j
ui1,jui,j

4
-1
-1
-1
u

Ep(u) 0.5uTApu
Rows of Ap have the form
0 0 0 1 0 0 . 0 1 4 1 0 0 0 1 0 ..

12
Stabilizing function thin plate stabilizer
Ep(u) 0.5Si,j(ui,j1 ui,j) (ui,j-1
ui,j)2 (ui1,j ui,j)
(ui-1,j ui,j)2 2(ui1,j1 ui,j)
(ui,j ui1,j) (ui,jui,j1)2
ui,j
i
j
13
Stabilizing function thin plate stabilizer
Ep(u) 0.5Si,j(ui,j1 ui,j) (ui,j-1
ui,j)2 (ui1,j ui,j)
(ui-1,j ui,j)2 2(ui1,j1 ui,j)
(ui,j ui1,j) (ui,jui,j1)2
ui,j
i
j
14
Stabilizing function thin plate stabilizer
Ep(u) 0.5Si,j(ui,j1 ui,j) (ui,j-1
ui,j)2 (ui1,j ui,j)
(ui-1,j ui,j)2 2(ui1,j1 ui,j)
(ui,j ui1,j) (ui,jui,j1)2
ui,j
i
j
15
Stabilizing function thin plate stabilizer
Ep(u) 0.5Si,j(ui,j1 ui,j) (ui,j-1
ui,j)2 (ui1,j ui,j)
(ui-1,j ui,j)2 2(ui1,j1 ui,j)
(ui,j ui1,j) (ui,jui,j1)2
ui,j
i
j
16
Stabilizing function thin plate stabilizer
Ep(u) 0.5Si,j(ui,j1 ui,j) (ui,j-1
ui,j)2 (ui1,j ui,j)
(ui-1,j ui,j)2 2(ui1,j1 ui,j)
(ui,j ui1,j) (ui,jui,j1)2
ui,j
i
j
1
-8
2
2
ATOM
-8
-8
1
1
20
-8
2
2
1

Ep(u) 0.5uTApu
Rows of Ap have the form
0 0 1 0 0 ... 0 2 8 2 0 0 .. 1 8 20 8 1 0 0
0 0 2 8 2 0 .. 0 1 0 ..

17
Stabilizing function Examples (1-D)
points
membrane
thin plate
thin plate membrane
18
Stabilizing function Examples (2-D)
membrane
Samples from u
thin plate
membrane thin plate
19
Stabilizing function Examples (2-D)
membrane
Samples from u
thin plate
membrane thin plate
20
Energy function

Data on grid
di,j ui,j ei,j (ei,j is N(0,s2))
Ed(u,d) 0.5Si,j ci,j (di,j ui,j)2 (ci,j
s-2)
Data off grid
dk h0,0 ui,j h0,1 ui,j1 h1,0 ui1,j h1,1
ui1,j1 ei,j
Ed(u,d) 0.5Sk ck (dk, Hku)2
In all examples here we assume data on grid
Ed(u,d) 0.5 (u-d)TAd(u-d)
Ad s-2 I measurement variance assumed
constant for all data

21
Overall energy

E(u) lEp(u) (1-l)Ed(u,d) (l is
regularization factor)
0.5luTApu (1-l)(u-d)TAd(u-d)
0.5uTAu uTb const
Where
A Ap (1-l)Ad
b (1-l) Ad d
Solution for u can be directly obtained by
minimizing E(u)
u A-1 b

22
Minimizing overall energy 1-D (l 0.5)
From noisy observation
membrane
No observation noise
From noisy observation
thin plate
No observation noise
23
Minimizing overall energy 2-D (l 0.5)
Original
Noisy
Added 0 mean unit variance Gaussian noise to all
elements
24
Minimizing overall energy 2-D (l 0.5)
Original
From Noisy
membrane
thin plate
25
Minimizing overall energy 2-D (l 0.5)
Original
Noisy
Added 0 mean unit variance Gaussian noise to all
elements
26
Minimizing overall energy 2-D (l 0.5)
Original
From Noisy
membrane
thin plate
27
Minimizing energy by Relaxation

Direct computation of A-1 is inefficient
Large matrices for a 256x256 grid, A has size
65536 x 65536
Sparseness of A not utilized only a small
fraction of elements have non zero values
Relaxation replaces inversion of A with many
local estimates
ui ai,i-1(bi Sai,juj)
Updates can be done in parallel
All local computations very simple
Can be slow to converge

28
Minimizing energy by relaxation 1-D (l 0.5)
Membrane
100 iters
500 iters
1000 iters
29
Minimizing energy by relaxation 1-D (l 0.5)
Thin plate much slower to converge
1000 iters
10000 iters
100000 iters
30
Minimizing energy by relaxation 2-D (l 0.5)
Original
1000 iters
Membrane
10000 iters
100000 iters
31
Minimizing energy by relaxation 2-D (l 0.5)
Original
1000 iters
Thin plate much slower to converge
10000 iters
100000 iters
32
Prior Models

A Boltzmann distribution based on the stabilizing
function
P(u) K.exp(-Ep(u)/Tp)
K is a normalizing constant, Tp is temperature
Samples can be generated by repeated sampling of
local distributions P(uiu)
P(uiu) Ziexp(-ai,i-1(ui ui)/2Tp)
ui ai,i-1(bi Sai,juj)
This is the local estimate of ui in the
relaxation method
The variance of the local sample is Tp/ai,i

33
Samples from prior distribution 1-D
Membrane stabilizer based Boltzmann
Thin plate stabilizer based Boltzmann
34
Samples from prior distribution 2-D
Membrane prior
35
Samples from prior distribution 2-D
Thin plate prior
36
Sampling prior distributions

Samples are fractal
Tend to favour high frequencies
Multi-grid sampling to get smoother samples

Initially generate sample for a very coarse grid
37
Sampling prior distributions

Samples are fractal
Tend to favour high frequencies
Multi-grid sampling to get smoother samples

Interpolate from coarse grid to finer grid, use
the interpolated values to initilize gibbs
sampling for a less coarse grid.
38
Sampling prior distributions

Samples are fractal
Tend to favour high frequencies
Multi-grid sampling to get smoother samples

Repeat process on a finer grid
39
Sampling prior distributions

Samples are fractal
Tend to favour high frequencies
Multi-grid sampling to get smoother samples

Final sample for entire grid
40
Multigrid sampling of prior distribution
Membrane prior
Thin plate prior
41
Sensor models

Sparse data model
Uses a simple energy function
Assumption data points are all on grid
Only use sparse data model used in examples
Others such as force field models, optical flow,
image intensity etc. not simulated for this
presentation
Measurement variance assumed constant for all
data points

42
Posterior model

Simple Bayes rule
P(ud) K.exp(-Ep(u)/Tp - Ed(u))
Also a Gibbs distribution
1/Tp is the equivalent of the regularization
factor
Tp (1-l)/ l
In following figures only thin plate prior
considered

43
Sampling the posterior model (T1)
44
MAP estimation from the Gibbs posterior

Restate Gibbs posterior distribution as
P(u) K.exp(-E(u)/T)
E(u) is the total energy
T again is temperature
Not to be confused with regularization term Tp
Reduce T with iterations
iteration is defined as a complete sweep through
the data
Guaranteed convergence to MAP estimate as T goes
to 0, provided T does not go down faster than
1/log(iter), where iter is the iteration number
In practice, much faster cooling is possible
For simple sparse data sensor model, MAP estimate
must be identical to that obtained using
relaxation or matrix inversion

45
MAP estimates from posterior 1-D
Relaxation 100000 iters
Annealed Gibbs sampling 100000 iters
46
MAP estimates from posterior 2-D
Actual MAP solution
Annealed Gibbs Sampling based MAP solution
47
The contaminated Gaussian sensor model

Also a sparse data sensor model
Assumes measurement error has two modes
1. A high probability, low variance Gaussian
2. A low probability, high variance Gaussian
P(di,j u) (1-e)N(ui,j ,s12) e
N(ui,j , s22)
0.05 lt e lt 0.1 and s22 gtgt s12
Posterior probability is also a mixture of
Gaussians
(1-e) P1(di,j u) e P2(di,j u)

48
Samples from posterior using contaminated Gaussian
49
MAP estimates of contaminated Gaussian 1-D
MAP estimate using single Gaussian sensor model
MAP estimate using contaminated Gaussian sensor
model

For contaminated Gaussian there is no closed form
estimate MAP estimate
Gibbs sampling provides a MAP estimate

50
MAP estimates of contaminated Gaussian 2-D
MAP estimate using a single Gaussian sensor model
MAP estimate using a contaminated Gaussian
sensor model

For contaminated Gaussian MAP estimate obtained
using annealed Gibbs sampling

51
Why Bayesian?

Bayesian and regularization solutions identical
for some models
Bayesian approach provides several other
advantages
For complex sensor models, e.g. contaminated
Gaussian model
Provides uncertainty estimates
Provides handle to estimate optimal
regularization factor
Provides formalism for methods such as Kalman
filtering
Etc.

52
Why Bayesian? Uncertainty measurement

Blue curve is MAP estimate
Red curve shows 1 standard deviation on either
side

53
Why Bayesian? Uncertainty measurement (T1)

Figure is actually a sandwich
Surface in middle is MAP estimate
Boundaries indicate one standard deviation

54
Why Bayesian? Uncertainty measurement

Variance field
For thin plate prior variance is constant except
at boundaries
Variance of posterior fluctuates from thin plate
variance only at measured data points
Other prior distributions would have prettier
variance and covariance fields

55
Why Bayesian Optimize regularization factor

E(ud) is a Gaussian
Has two terms, 1/sqrt(2ps2) and
exp(-0.5(u-u)2/s2)
-Log (E(ud)) has two terms
E1(d) 0.5log(2ps2)
E2(d) 0.5(u-u)2/s2
Both terms are functions of s2
s2 is a function of regularization factor l
As l increases E1(d) increases, but E2(d)
decreases
There is a specific value of l at which E1(d)
E2(d) is minimum
This is the maximum likelihood estimate of l

56
Why Bayesian Optimize regularization factor
l 0.25
l 0.5
l 0.75

Black curve is MAP estimate without measurement
noise

57
Why Bayesian Optimize regularization factor
l 0.25
No measurement noise
l 0.5
l 0.75
58
Why Bayesian Optimize regularization factor 1-D
E1E2
E1
E2
log(T)