Title: Expectation Maximization
1Expectation Maximization
- Frank Dellaert
- Many Slides adapted from
- Jean Ponce and David Forsyth
2Missing variable problems
- In many problems, if some variables were known
the maximum likelihood inference problem would be
easy - fitting if we knew which line each token came
from, it would be easy to determine line
parameters - segmentation if we knew the segment each pixel
came from, it would be easy to determine the
segment parameters - etc.
- This sort of thing happens in statistics, too
3Missing variable problems
- Strategy
- estimate values for the missing variables
- plug these in, now estimate parameters
- re-estimate missing variables, continue
- eg
- guess which line gets which point
- now fit the lines
- now reallocate points to lines, using our
knowledge of the lines - now refit, etc.
- Like K-means !!
4Missing variables - strategy
- EM
- replace missing variable with expected values,
given fixed values of parameters - fix missing variables, choose parameters to
maximise likelihood given fixed values of missing
variables
- e.g., iterate till convergence
- allocate each point to a line with a weight,
which is the probability of point given the line - refit lines to the weighted set of points
- Converges to local extremum
5Segmentation Demo
Lost in Translation
Tokyo City Hall by Kenzo Tange
6Hidden variable labels !
7RGB Clusters
8Low Error -gt High probability
9EM
- E-step
- calculate errors
- calculate probabilities
- M-step
- re-calculate RGB clusters
- Demo with fixed sigma20segment01.m
10MATLAB code essentials
init models sigma20 m(,g)12850randn(3,1)
pi(g)1/nrClusters E-step calculate
errors Egzeros(h,w) e(I(,,c)-m(c,g))/sigma
EgEge.e unnormalized
probabilities qgpi(g)exp(-0.5Eg)
normalize probabilities pgqg./sumq
M-step Ppg RP.I(,,1) GP.I(,,2) BP.
I(,,3) pi(g)sum(P()) m(,g) sum(R())
sum(G()) sum(B())/pi(g) pi
pi/sum(pi)
11Expectation-Maximization
- See my TR online
- We have parameters ? and data U
- Also hidden nuisance variables J
- We want to find optimal ?
12Example Mixture
13Posterior in Parameter Space
14EM
15Line Fitting
- Parameters q(f, c)
- c distance to origin
c
16Lines and robustness
- We have one line, and n points
- Some come from the line, some from noise
- This is a mixture model
- We wish to determine
- line parameters
- p(comes from line)
17Estimating the mixture model
- Introduce hidden variable d for each point, 1 if
point is on the line, 0 if off. - If these are known, objective function is
- Here K is a normalising constant, kn is the noise
intensity .
18Substituting for delta(no need to know formulas
by heart)
- We shall substitute the expected value of d, for
a given q - recall q(f, c, l)
- E(d_i)1. P(d_i1q)0....
- Notice that if kn is small and positive, then if
distance is small, this value is close to 1 and
if it is large, close to zero
19Algorithm for line fitting
- Obtain some start point
- Now compute ds using formula above
- Now compute maximum likelihood estimate of
- f, c come from fitting to weighted points
- l comes by counting
20(No Transcript)
21The expected values of the deltas at the
maximum (notice the one value close to zero).
22Closeup of the fit
23Other examples
- Segmentation
- a segment is a gaussian that emits feature
vectors (which could contain colour or colour
and position or colour, texture and position). - segment parameters are mean and (perhaps)
covariance - if we knew which segment each point belonged to,
estimating these parameters would be easy - rest is on same lines as fitting line
- Fitting multiple lines
- rather like fitting one line, except there are
more hidden variables - easiest is to encode as an array of hidden
variables, which represent a table with a one
where the ith point comes from the jth line,
zeros otherwise - rest is on same lines as above
24Issues with EM
- Local maxima
- can be a serious nuisance in some problems
- no guarantee that we have reached the right
maximum - Starting
- k means to cluster the points is often a good idea
25Local maximum
26which is an excellent fit to some points
27and the deltas for this maximum
28A dataset that is well fitted by four lines
29Result of EM fitting, with one line (or at least,
one available local maximum).
30Result of EM fitting, with two lines (or at
least, one available local maximum).
31Seven lines can produce a rather logical answer
32Some generalities
- Many, but not all problems that can be attacked
with EM can also be attacked with RANSAC - need to be able to get a parameter estimate with
a manageably small number of random choices. - RANSAC is usually better
- Didnt present in the most general form
- in the general form, the likelihood may not be a
linear function of the missing variables - in this case, one takes an expectation of the
likelihood, rather than substituting expected
values of missing variables - Issue doesnt seem to arise in vision
applications.