Title: Expectation Maximization for GMM
1Expectation Maximization for GMM
- Comp344 Tutorial
- Kai Zhang
2GMM
- Model the data distribution by a combination of
Gaussian functions - Given a set of sample points, how to estimate the
parameters of the GMM?
3EM Basic Idea
- Given data X, and initial parameter Tt
- Assume a hidden variable Y
- 1. Study how Y is distributed based on current
knowledge (X and Tt), i.e., p(YX, Tt) - Compute the expectation of the joint data
likelihood under this distribution (called Q
function) - 2. Maximize this expectation w.r.t. the
to-be-determined parameter Tt1 - Iterate step 1 and 2 until convergence
4EM with GMM
- In the context of GMM
- X data points
- Y which Gaussian creates which data points
- Tparameters of the mixture model
- Constraint Pks must sum up to 1, so that p(x)
is a pdf
5- How to write the Q function under GMM setting
- Likelihood of a data set is the multiplication of
all the sample likelihood, so
6- The Q function specific for GMM is
- Plug in the definition of p(xTk), compute
derivative w.r.t. the parameters, we obtain the
iteration procedures - E step
- M step
7Posteriors
- Intuitive meaning of
- The posterior probability that xi is created by
the kth Gaussian component (soft membership) - The meaning of
- Note that it is the summation of all having
the same k - So it means the strength of the kth Gaussian
component
8Comments
- GMM can be deemed as performing a
- density estimation, in the form of a combination
of a number of Gaussian functions - clustering, where clusters correspond to the
Gaussian component, and cluster assignment can be
achieved through the bayes rule - GMM produces exactly what are needed in the Bayes
decision rule prior probability and class
conditional probability - So GMMBayes rule can compute posterior
probability, hence solving clustering problem
9Illustration
Class/points Conditional Prob X1(i1) X2(i2)
Class1,k1 (P1) P11 P(x1k1) P21 P(x2k1) Each row sum up to 1 (a Gaussian curve)
Class2,k2 (P2) P12 p(x1k2) P22 P(x2k2)
Condition P1 P21 Each column can be used to compute the posterior probability
10Illustration
Conditional probability x1 x2 x3 x4 x5
c1 P110.35 P210.35 P310.1 P410.1 P510.1
c2 P120.05 P220.05 P320.3 P420.3 P520.3
class Prior probability
c1 P12/5
c2 P23/5
class (updated) Prior Probability
c1 (28/176/11)/5
c2 (6/1721/11)/5
Posterior probability x1 x2 x3 x4 x5
c1 14/17 14/17 2/11 2/11 2/11
c2 3/17 3/17 3/11 9/11 9/11
(Updated) Conditional Probability Estimate the mean and covariance
c1 X1(14/17),X2(14/17),X3(2/11),X4(2/11),X5(2/11)
c2 X1(4/17),X2(4/17),X3(9/11),X4(9/11),X5(9/11)
11Initialization
- Perform an initial clustering and divide the data
into m clusters (e.g., simply cut one dimension
into m segments) - For the kth cluster
- Its mean is the kth Gaussian component mean (µk)
- Its covariance is the kth Gaussian component
covariance (Sk) - The portion of samples is the Prior for the kth
Gaussian component (pk)
12EM iterations
13Applications, image segmentation