Gaussian%20Mixture%20Models - PowerPoint PPT Presentation

About This Presentation
Title:

Gaussian%20Mixture%20Models

Description:

The estimation of the parameters of the distribution p given finite ... Real world datasets rarely unimodal!! University of Joensuu. Dept. of Computer Science ... – PowerPoint PPT presentation

Number of Views:162
Avg rating:3.0/5.0
Slides: 23
Provided by: csJoe
Category:

less

Transcript and Presenter's Notes

Title: Gaussian%20Mixture%20Models


1
Gaussian Mixture Models
Clustering Methods Part 8
Ville Hautamäki
  • Speech and Image Processing UnitDepartment of
    Computer Science
  • University of Joensuu, FINLAND

2
Preliminaries
  • We assume that the dataset X has been generated
    by a parametric distribution p(X).
  • Estimation of the parameters of p is known as
    density estimation.
  • We consider Gaussian distribution.

Figures taken from
http//research.microsoft.com/cmbishop/PRML/
3
Typical parameters (1)
  • Mean (µ) average value of p(X), also called
    expectation.
  • Variance (s) provides a measure of variability
    in p(X) around the mean.

4
Typical parameters (2)
  • Covariance measures how much two variables vary
    together.
  • Covariance matrix collection of covariances
    between all dimensions.
  • Diagonal of the covariance matrix contains the
    variances of each attribute.

5
One-dimensional Gaussian
  • Parameters to be estimated are the mean (µ) and
    variance (s)

6
Multivariate Gaussian (1)
  • In multivariate case we have covariance matrix
    instead of variance

7
Multivariate Gaussian (2)
Diagonal
Single
Full covariance
Complete data log likelihood
8
Maximum Likelihood (ML) parameter estimation
  • Maximize the log likelihood formulation
  • Setting the gradient of the complete data log
    likelihood to zero we can find the closed form
    solution.
  • Which in the case of mean, is the sample average.

9
When one Gaussian is not enough
  • Real world datasets are rarely unimodal!

10
Mixtures of Gaussians
11
Mixtures of Gaussians (2)
  • In addition to mean and covariance parameters
    (now M times), we have mixing coefficients pk.

Following properties hold for the mixing
coefficients
It can be seen as the prior probability of the
component k
12
Responsibilities (1)
Complete data
Incomplete data
Responsibilities
  • Component labels (red, green and blue) cannot be
    observed.
  • We have to calculate approximations
    (responsibilities).

13
Responsibilities (2)
  • Responsibility describes, how probably
    observation vector x is from component k.
  • In clustering, responsibilities take values 0 and
    1, and thus, it defines the hard partitioning.

14
Responsibilities (3)
We can express the marginal density p(x) as

From this, we can find the responsibility of the
kth component of x using Bayesian theorem
15
Expectation Maximization (EM)
  • Goal Maximize the log likelihood of the whole
    data
  • When responsibilities are calculated, we can
    maximize individually for the means, covariances
    and the mixing coefficients!

16
Exact update equations
New mean estimates
Covariance estimates
Mixing coefficient estimates
17
EM Algorithm
  • Initialize parameters
  • while not converged
  • E step Calculate responsibilities.
  • M step Estimate new parameters
  • Calculate log likelihood of the new parameters

18
Example of EM
19
Computational complexity
  • Hard clustering with MSE criterion is
    NP-complete.
  • Can we find optimal GMM in polynomial time?
  • Finding optimal GMM is in class NP

20
Some insights
  • In GMM we need to estimate the parameters, which
    all are real numbers
  • Number of parameters
  • MM(D) M(D(D-1)/2)
  • Hard clustering has no parameters, just set
    partitioning (remember optimality criteria!)

21
Some further insights (2)
  • Both optimization functions are mathematically
    rigorous!
  • Solutions minimizing MSE are always meaningful
  • Maximization of log likelihood might lead to
    singularity!

22
Example of singularity
Write a Comment
User Comments (0)
About PowerShow.com