Gaussian%20Mixture%20Models - PowerPoint PPT Presentation

About This Presentation

Title:

Gaussian%20Mixture%20Models

Description:

The estimation of the parameters of the distribution p given finite ... Real world datasets rarely unimodal!! University of Joensuu. Dept. of Computer Science ... – PowerPoint PPT presentation

Number of Views:162

Avg rating:3.0/5.0

Slides: 23

Provided by: csJoe

Category:

more less

Transcript and Presenter's Notes

Title: Gaussian%20Mixture%20Models

1
Gaussian Mixture Models
Clustering Methods Part 8
Ville Hautamäki

Speech and Image Processing UnitDepartment of
Computer Science
University of Joensuu, FINLAND

2
Preliminaries

We assume that the dataset X has been generated
by a parametric distribution p(X).
Estimation of the parameters of p is known as
density estimation.
We consider Gaussian distribution.

Figures taken from
http//research.microsoft.com/cmbishop/PRML/
3
Typical parameters (1)

Mean (µ) average value of p(X), also called
expectation.
Variance (s) provides a measure of variability
in p(X) around the mean.

4
Typical parameters (2)

Covariance measures how much two variables vary
together.
Covariance matrix collection of covariances
between all dimensions.
Diagonal of the covariance matrix contains the
variances of each attribute.

5
One-dimensional Gaussian

Parameters to be estimated are the mean (µ) and
variance (s)

6
Multivariate Gaussian (1)

In multivariate case we have covariance matrix
instead of variance

7
Multivariate Gaussian (2)
Diagonal
Single
Full covariance
Complete data log likelihood
8
Maximum Likelihood (ML) parameter estimation

Maximize the log likelihood formulation
Setting the gradient of the complete data log
likelihood to zero we can find the closed form
solution.
Which in the case of mean, is the sample average.

9
When one Gaussian is not enough

Real world datasets are rarely unimodal!

10
Mixtures of Gaussians
11
Mixtures of Gaussians (2)

In addition to mean and covariance parameters
(now M times), we have mixing coefficients pk.

Following properties hold for the mixing
coefficients
It can be seen as the prior probability of the
component k
12
Responsibilities (1)
Complete data
Incomplete data
Responsibilities

Component labels (red, green and blue) cannot be
observed.
We have to calculate approximations
(responsibilities).

13
Responsibilities (2)

Responsibility describes, how probably
observation vector x is from component k.
In clustering, responsibilities take values 0 and
1, and thus, it defines the hard partitioning.

14
Responsibilities (3)
We can express the marginal density p(x) as

From this, we can find the responsibility of the
kth component of x using Bayesian theorem
15
Expectation Maximization (EM)

Goal Maximize the log likelihood of the whole
data
When responsibilities are calculated, we can
maximize individually for the means, covariances
and the mixing coefficients!

16
Exact update equations
New mean estimates
Covariance estimates
Mixing coefficient estimates
17
EM Algorithm

Initialize parameters
while not converged
E step Calculate responsibilities.
M step Estimate new parameters
Calculate log likelihood of the new parameters

18
Example of EM
19
Computational complexity

Hard clustering with MSE criterion is
NP-complete.
Can we find optimal GMM in polynomial time?
Finding optimal GMM is in class NP

20
Some insights

In GMM we need to estimate the parameters, which
all are real numbers
Number of parameters
MM(D) M(D(D-1)/2)
Hard clustering has no parameters, just set
partitioning (remember optimality criteria!)

21
Some further insights (2)