Regularization of energy-based representations - PowerPoint PPT Presentation

1 / 64
About This Presentation
Title:

Regularization of energy-based representations

Description:

Regularization of energy-based representations. Minimize total ... Interpolate from. coarse grid to. finer grid, use. the interpolated. values to initilize ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 65
Provided by: michael1094
Learn more at: http://www.ai.mit.edu
Category:

less

Transcript and Presenter's Notes

Title: Regularization of energy-based representations


1
Regularization of energy-based representations
  • Minimize total energy lEp(u) (1-l)Ed(u,d)
  • Ep(u) Stabilizing function - a smoothness
    constraint
  • Membrane stabilizer
  • Ep(u) 0.5Si,j (ui,j1 ui,j)2 (ui1,j
    ui,j) 2
  • Thin plate stabilizer
  • Ep(u) 0.5Si,j (ui,j1 ui,j-1 2ui,j)2
    (ui1,j ui-1,j 2ui,j)2
    (ui1,j1 ui,j ui1,j ui,j1)2
  • Linear combinations of the two
  • Ed(u,d) Energy function, measures compatibility
    between observations and data
  • Ed(u,d) 0.5Si,j ci,j (di,j ui,j)2
  • ci,j is the inverse of the variance in
    measurement di,j

2
Stabilizing function membrane stabilizer
Ep(u) 0.5Si,j (ui,j1 ui,j)2 (ui1,j
ui,j)2
ui,j
i
j
3
Stabilizing function membrane stabilizer
Ep(u) 0.5Si,j (ui,j1 ui,j)2 (ui1,j
ui,j)2
ui,j
i
j
4
Stabilizing function membrane stabilizer
Ep(u) 0.5Si,j (ui,j1 ui,j)2 (ui1,j
ui,j)2
ui,j
i
j
5
Stabilizing function membrane stabilizer
Ep(u) 0.5Si,j (ui,j1 ui,j)2 (ui1,j
ui,j)2
ui,j
i
j
  • ATOM

6
Stabilizing function membrane stabilizer
Ep(u) 0.5Si,j (ui,j1 ui,j)2 (ui1,j
ui,j)2
ui,j
i
j
  • ATOM
  • ui,jui,j ui,j1ui,j1 ui,jui,j1
    ui,j1ui,j

1
-1
7
Stabilizing function membrane stabilizer
Ep(u) 0.5Si,j (ui,j1 ui,j)2 (ui1,j
ui,j)2
ui,j
i
j
-1
  • ATOM
  • ui,jui,j ui1,jui1,j ui,jui1,j
    ui1,jui,j

2
-1
8
Stabilizing function membrane stabilizer
Ep(u) 0.5Si,j (ui,j1 ui,j)2 (ui1,j
ui,j)2
ui,j
i
j
-1
  • ATOM
  • ui,jui,j ui,j1ui,j1 ui,jui,j1
    ui,j1ui,j

3
-1
-1
9
Stabilizing function membrane stabilizer
Ep(u) 0.5Si,j (ui,j1 ui,j)2 (ui1,j
ui,j)2
ui,j
i
j
-1
  • ATOM
  • ui,jui,j ui1,jui1,j ui,jui1,j
    ui1,jui,j

4
-1
-1
-1
10
Stabilizing function membrane stabilizer
Ep(u) 0.5Si,j (ui,j1 ui,j)2 (ui1,j
ui,j)2
ui,j
i
j
-1
  • ATOM
  • ui,jui,j ui1,jui1,j ui,jui1,j
    ui1,jui,j

4
-1
-1
-1
u
11
Stabilizing function membrane stabilizer
Ep(u) 0.5Si,j (ui,j1 ui,j)2 (ui1,j
ui,j)2
ui,j
i
j
-1
  • ATOM
  • ui,jui,j ui1,jui1,j ui,jui1,j
    ui1,jui,j

4
-1
-1
-1
u
  • Ep(u) 0.5uTApu
  • Rows of Ap have the form
  • 0 0 0 1 0 0 . 0 1 4 1 0 0 0 1 0 ..

12
Stabilizing function thin plate stabilizer
Ep(u) 0.5Si,j(ui,j1 ui,j) (ui,j-1
ui,j)2 (ui1,j ui,j)
(ui-1,j ui,j)2 2(ui1,j1 ui,j)
(ui,j ui1,j) (ui,jui,j1)2
ui,j
i
j
13
Stabilizing function thin plate stabilizer
Ep(u) 0.5Si,j(ui,j1 ui,j) (ui,j-1
ui,j)2 (ui1,j ui,j)
(ui-1,j ui,j)2 2(ui1,j1 ui,j)
(ui,j ui1,j) (ui,jui,j1)2
ui,j
i
j
14
Stabilizing function thin plate stabilizer
Ep(u) 0.5Si,j(ui,j1 ui,j) (ui,j-1
ui,j)2 (ui1,j ui,j)
(ui-1,j ui,j)2 2(ui1,j1 ui,j)
(ui,j ui1,j) (ui,jui,j1)2
ui,j
i
j
15
Stabilizing function thin plate stabilizer
Ep(u) 0.5Si,j(ui,j1 ui,j) (ui,j-1
ui,j)2 (ui1,j ui,j)
(ui-1,j ui,j)2 2(ui1,j1 ui,j)
(ui,j ui1,j) (ui,jui,j1)2
ui,j
i
j
16
Stabilizing function thin plate stabilizer
Ep(u) 0.5Si,j(ui,j1 ui,j) (ui,j-1
ui,j)2 (ui1,j ui,j)
(ui-1,j ui,j)2 2(ui1,j1 ui,j)
(ui,j ui1,j) (ui,jui,j1)2
ui,j
i
j
1
-8
2
2
ATOM
-8
-8
1
1
20
-8
2
2
1
  • Ep(u) 0.5uTApu
  • Rows of Ap have the form
  • 0 0 1 0 0 ... 0 2 8 2 0 0 .. 1 8 20 8 1 0 0
    0 0 2 8 2 0 .. 0 1 0 ..

17
Stabilizing function Examples (1-D)
points
membrane
thin plate
thin plate membrane
18
Stabilizing function Examples (2-D)
membrane
Samples from u
thin plate
membrane thin plate
19
Stabilizing function Examples (2-D)
membrane
Samples from u
thin plate
membrane thin plate
20
Energy function
  • Data on grid
  • di,j ui,j ei,j (ei,j is N(0,s2))
  • Ed(u,d) 0.5Si,j ci,j (di,j ui,j)2 (ci,j
    s-2)
  • Data off grid
  • dk h0,0 ui,j h0,1 ui,j1 h1,0 ui1,j h1,1
    ui1,j1 ei,j
  • Ed(u,d) 0.5Sk ck (dk, Hku)2
  • In all examples here we assume data on grid
  • Ed(u,d) 0.5 (u-d)TAd(u-d)
  • Ad s-2 I measurement variance assumed
    constant for all data

21
Overall energy
  • E(u) lEp(u) (1-l)Ed(u,d) (l is
    regularization factor)
  • 0.5luTApu (1-l)(u-d)TAd(u-d)
  • 0.5uTAu uTb const
  • Where
  • A Ap (1-l)Ad
  • b (1-l) Ad d
  • Solution for u can be directly obtained by
    minimizing E(u)
  • u A-1 b

22
Minimizing overall energy 1-D (l 0.5)
From noisy observation
membrane
No observation noise
From noisy observation
thin plate
No observation noise
23
Minimizing overall energy 2-D (l 0.5)
Original
Noisy
Added 0 mean unit variance Gaussian noise to all
elements
24
Minimizing overall energy 2-D (l 0.5)
Original
From Noisy
membrane
thin plate
25
Minimizing overall energy 2-D (l 0.5)
Original
Noisy
Added 0 mean unit variance Gaussian noise to all
elements
26
Minimizing overall energy 2-D (l 0.5)
Original
From Noisy
membrane
thin plate
27
Minimizing energy by Relaxation
  • Direct computation of A-1 is inefficient
  • Large matrices for a 256x256 grid, A has size
    65536 x 65536
  • Sparseness of A not utilized only a small
    fraction of elements have non zero values
  • Relaxation replaces inversion of A with many
    local estimates
  • ui ai,i-1(bi Sai,juj)
  • Updates can be done in parallel
  • All local computations very simple
  • Can be slow to converge

28
Minimizing energy by relaxation 1-D (l 0.5)
Membrane
100 iters
500 iters
1000 iters
29
Minimizing energy by relaxation 1-D (l 0.5)
Thin plate much slower to converge
1000 iters
10000 iters
100000 iters
30
Minimizing energy by relaxation 2-D (l 0.5)
Original
1000 iters
Membrane
10000 iters
100000 iters
31
Minimizing energy by relaxation 2-D (l 0.5)
Original
1000 iters
Thin plate much slower to converge
10000 iters
100000 iters
32
Prior Models
  • A Boltzmann distribution based on the stabilizing
    function
  • P(u) K.exp(-Ep(u)/Tp)
  • K is a normalizing constant, Tp is temperature
  • Samples can be generated by repeated sampling of
    local distributions P(uiu)
  • P(uiu) Ziexp(-ai,i-1(ui ui)/2Tp)
  • ui ai,i-1(bi Sai,juj)
  • This is the local estimate of ui in the
    relaxation method
  • The variance of the local sample is Tp/ai,i

33
Samples from prior distribution 1-D
Membrane stabilizer based Boltzmann
Thin plate stabilizer based Boltzmann
34
Samples from prior distribution 2-D
Membrane prior
35
Samples from prior distribution 2-D
Thin plate prior
36
Sampling prior distributions
  • Samples are fractal
  • Tend to favour high frequencies
  • Multi-grid sampling to get smoother samples

Initially generate sample for a very coarse grid
37
Sampling prior distributions
  • Samples are fractal
  • Tend to favour high frequencies
  • Multi-grid sampling to get smoother samples

Interpolate from coarse grid to finer grid, use
the interpolated values to initilize gibbs
sampling for a less coarse grid.
38
Sampling prior distributions
  • Samples are fractal
  • Tend to favour high frequencies
  • Multi-grid sampling to get smoother samples

Repeat process on a finer grid
39
Sampling prior distributions
  • Samples are fractal
  • Tend to favour high frequencies
  • Multi-grid sampling to get smoother samples

Final sample for entire grid
40
Multigrid sampling of prior distribution
Membrane prior
Thin plate prior
41
Sensor models
  • Sparse data model
  • Uses a simple energy function
  • Assumption data points are all on grid
  • Only use sparse data model used in examples
  • Others such as force field models, optical flow,
    image intensity etc. not simulated for this
    presentation
  • Measurement variance assumed constant for all
    data points

42
Posterior model
  • Simple Bayes rule
  • P(ud) K.exp(-Ep(u)/Tp - Ed(u))
  • Also a Gibbs distribution
  • 1/Tp is the equivalent of the regularization
    factor
  • Tp (1-l)/ l
  • In following figures only thin plate prior
    considered

43
Sampling the posterior model (T1)
44
MAP estimation from the Gibbs posterior
  • Restate Gibbs posterior distribution as
  • P(u) K.exp(-E(u)/T)
  • E(u) is the total energy
  • T again is temperature
  • Not to be confused with regularization term Tp
  • Reduce T with iterations
  • iteration is defined as a complete sweep through
    the data
  • Guaranteed convergence to MAP estimate as T goes
    to 0, provided T does not go down faster than
    1/log(iter), where iter is the iteration number
  • In practice, much faster cooling is possible
  • For simple sparse data sensor model, MAP estimate
    must be identical to that obtained using
    relaxation or matrix inversion

45
MAP estimates from posterior 1-D
Relaxation 100000 iters
Annealed Gibbs sampling 100000 iters
46
MAP estimates from posterior 2-D
Actual MAP solution
Annealed Gibbs Sampling based MAP solution
47
The contaminated Gaussian sensor model
  • Also a sparse data sensor model
  • Assumes measurement error has two modes
  • 1. A high probability, low variance Gaussian
  • 2. A low probability, high variance Gaussian
  • P(di,j u) (1-e)N(ui,j ,s12) e
    N(ui,j , s22)
  • 0.05 lt e lt 0.1 and s22 gtgt s12
  • Posterior probability is also a mixture of
    Gaussians
  • (1-e) P1(di,j u) e P2(di,j u)

48
Samples from posterior using contaminated Gaussian
49
MAP estimates of contaminated Gaussian 1-D
MAP estimate using single Gaussian sensor model
MAP estimate using contaminated Gaussian sensor
model
  • For contaminated Gaussian there is no closed form
    estimate MAP estimate
  • Gibbs sampling provides a MAP estimate

50
MAP estimates of contaminated Gaussian 2-D
MAP estimate using a single Gaussian sensor model
MAP estimate using a contaminated Gaussian
sensor model
  • For contaminated Gaussian MAP estimate obtained
    using annealed Gibbs sampling

51
Why Bayesian?
  • Bayesian and regularization solutions identical
    for some models
  • Bayesian approach provides several other
    advantages
  • For complex sensor models, e.g. contaminated
    Gaussian model
  • Provides uncertainty estimates
  • Provides handle to estimate optimal
    regularization factor
  • Provides formalism for methods such as Kalman
    filtering
  • Etc.

52
Why Bayesian? Uncertainty measurement
  • Blue curve is MAP estimate
  • Red curve shows 1 standard deviation on either
    side

53
Why Bayesian? Uncertainty measurement (T1)
  • Figure is actually a sandwich
  • Surface in middle is MAP estimate
  • Boundaries indicate one standard deviation

54
Why Bayesian? Uncertainty measurement
  • Variance field
  • For thin plate prior variance is constant except
    at boundaries
  • Variance of posterior fluctuates from thin plate
    variance only at measured data points
  • Other prior distributions would have prettier
    variance and covariance fields

55
Why Bayesian Optimize regularization factor
  • E(ud) is a Gaussian
  • Has two terms, 1/sqrt(2ps2) and
    exp(-0.5(u-u)2/s2)
  • -Log (E(ud)) has two terms
  • E1(d) 0.5log(2ps2)
  • E2(d) 0.5(u-u)2/s2
  • Both terms are functions of s2
  • s2 is a function of regularization factor l
  • As l increases E1(d) increases, but E2(d)
    decreases
  • There is a specific value of l at which E1(d)
    E2(d) is minimum
  • This is the maximum likelihood estimate of l

56
Why Bayesian Optimize regularization factor
l 0.25
l 0.5
l 0.75
  • Black curve is MAP estimate without measurement
    noise

57
Why Bayesian Optimize regularization factor
l 0.25
No measurement noise
l 0.5
l 0.75
58
Why Bayesian Optimize regularization factor 1-D
E1E2
E1
E2
log(T)
  • Optimal log(T) around 1.9

59
Why Bayesian Optimize regularization factor 1-D
  • For optimal T ( 0.1496)

60
Why Bayesian Optimize regulariztion factor 2-D
E1E2
E1
E2
T
  • X axis is T (not log(T)). Optimal T about 2.7
  • Estimation E1 and E2 requires computation of
    determinant of A and Ap
  • Ap singular for thin plate prior
  • Diagonal loading not sufficient.
  • Compute determinant from eigenvalues to avoid
    underflow/overflow

61
Why Bayesian Optimize regularization factor 2-D
No observation noise
Maximum Likelihood estimate
62
Why Bayesian Kalman filter
  • Text

63
Why Bayesian Kalman filter
  • Animation 1

64
Why Bayesian Kalman filter
  • Animation 2
Write a Comment
User Comments (0)
About PowerShow.com