Title: Collapsed Variational Dirichlet Process Mixture Models
1Collapsed Variational Dirichlet Process Mixture
Models
- Kenichi Kurihara, Max Welling and Yee W. Teh
- Published on IJCAI 07
Discussion led by Qi An
2Outline
- Introduction
- Four approximations to DP
- Variational Bayesian Inference
- Optimal cluster label reordering
- Experimental results
- Discussion
3Introduction
- DP is suitable for many density estimation and
data clustering applications. - Gibbs sampling solution is not efficient enough
to scale up to the large scale problems. - Truncated stick-breaking approximation is
formulated in the space of explicit,
non-exchangeable cluster labels.
4Introduction
- This paper
- propose an improved VB algorithm based on
integrating out mixture weights - compare the stick-breaking representation against
the finite symmetric Dirichlet approximation - maintain optimal ordering of cluster labels in
the stick-breaking VB algorithm
5Approximations to DP
- Truncated stick-breaking representation
The joint distribution can be expressed as
6Approximations to DP
- Finite symmetric Dirichlet approximation
The joint distribution can be expressed as
The essential difference from TSB representation
is that the cluster labels remain interchangeable
under this formulation.
7Dirichlet process is most naturally defined on a
partition space while both TSB and FSD are
defined over the cluster label space. Moreover,
TSB and FSD also live in different spaces
8Marginalization
- In variational Bayesian approximation, we assume
a factorized form for the posterior distribution. - However it is not a good assumption since changes
in p will have a considerable impact on z.
If we can integrate out p , the joint
distribution is given by
For the TSB representation
For the FSD representation
a
9VB inference
- We can then apply the VB inference on the four
approximations
The lower bound is given by
The approximated posterior distribution for TSB
and FSD are
Depending on marginalization or not, v and p may
be integrated out.
10Gaussian approximation
- For collapsed approximations, the computation for
q(zij) seems intractable due to the exponentially
large space of assignments for all other zij.
With central limit theory and Taylor expansion,
the expectation over zij will be approximated
with those expectations
11Optimal cluster label reordering
- For FSB representation, the prior assumes a
certain ordering of the clusters. - The authors claims the optimal relabelling of the
clusters is given by ordering the cluster sizes
in decreasing order.
12Experimental results
13Experimental results
14Discussion
- There is very little difference between
variational Bayesian inference in the reordered
stick-breaking representation and the finite
mixture model with symmetric Dirichlet priors. - Label reordering is important for the
stick-breaking representation - Variational approximation are much more efficient
computationally than Gibbs sampling, with almost
no loss in accuracy