Chinese Restaurants and StickBreaking: An Introduction to the Dirichlet Process presentation

About This Presentation

Transcript and Presenter's Notes

Title: Chinese Restaurants and StickBreaking: An Introduction to the Dirichlet Process

1
Chinese Restaurants and Stick-Breaking An
Introduction to the Dirichlet Process

Teg Grenager
NLP Group Lunch
February 24, 2005

2
Agenda

Motivation
Mixture Models
Dirichlet Process
Gibbs Sampling
Applications

3
Clustering

Goal learn a partition of the data, such that
Data within classes are similar
Classes are different from each other
Two very different approaches
Agglomerative build up clusters by iteratively
sticking similar things together
Mixture Model learn a generative model over the
data, treating the classes as hidden variables

4
Agglomerative Clustering
Num Clusters
Max Distance

Pros Doesnt need generative model (number of
clusters, parametric distribution)
Cons Ad-hoc, no probabilistic foundation,
intractable for large data sets

5
Mixture Model Clustering

Examples K-means, mixture of Gaussians, Naïve
Bayes
Pros Sound probabilistic foundation, efficient
even for large data sets
Cons Requires generative model, including number
of clusters (mixture components)

6
Problem
7
Big Idea

Want to use a generative model, but dont want to
decide number of clusters in advance
Suggestion put each datum in its own cluster
Problem probability of 2 clusters colliding is
zero under any density function, no stickiness
Solution instead of a density function, use a
statistical process where the probability of two
clusters falling together is non-negative
Best of both worlds stickiness with variable
number of clusters

8
Finite Mixture Model
Gaussian
Naïve Bayes
9
Dirichlet Priors (Review)

A distribution over possible parameter vectors of
the multinomial distribution
Thus values must lie in the k-dimensional simplex
Beta distribution is the 2-parameter special case
Expectation
A conjugate prior to the multinomial
Explicit formulation is ugly!

10
Infinite Mixture Model
p
?
ci
xi
N
11
Chinese Restaurant Process
12
DP Mixture Model
13
Stick-breaking Process
14
Properties of the DP

Let (?,?) be a measurable space, G0 be a
probability measure on the space, and ? be a
positive real number
A Dirichlet process is any distribution of a
random probability measure G over (?,?) such
that, for all finite partitions (A1,,Ar) of ?,
Draws G from DP are generally not distinct
The number of distinct values grows with O(log n)

15
Infinite Exchangeability

In general, an infinite set of random variables
is said to be infinitely exchangeable if for
every finite subset xi,,xn and for any
permutation ? we have
Note that infinite exchangeability is not the
same as being independent and identically
distributed (i.i.d.)!
Using DeFinettis theorem, it is possible to show
that our draws ? are infinitely exchangeable
Thus the mixture components may be sampled in any
order

16
Mixture Model Inference

We want to find a clustering of the data an
assignment of values to the hidden class variable
Sometimes we also want the component parameters
In most finite mixture models, this can be found
with EM
The Dirichlet process is a non-parametric prior,
and doesnt permit EM
We use Gibbs sampling instead

17
Gibbs Sampling 1

Algorithm 1 integrate out G, and sample the ?i
directly, conditioned on everything else
This is inefficient, because we update cluster
information for one datum at a time

18
Gibbs Sampling 2

Algorithm 2
Reintroduce a cluster variable ci which takes on
values that are the names c of the clusters
Store the parameters that are shared by all data
in class c in a new variable ?c

19
Gibbs Sampling 2 (cont.)

Algorithm 2
For i 1,,N sample ci from where H-i,c
is the posterior distribution of ?c based on the
prior G0 and all observations for which j?i and
cjc
Repeat
Works well
Note can also use variational methods (other
than EM)

20
NLP Applications

Clustering
Document clustering for topic, genre, sentiment,
Word clustering for POS, WSD, synonymy,
Topic clustering across documents (see Blei et.
al., 2004 and Teh et. al., 2004)
Noun coreference dont know how many entities
there are
Other identity uncertainty problems deduping,
etc.
Grammar induction
Sequence modeling the infinite HMM
Topic segmentation (see Grenager et. al., 2005)
Sequence models for POS tagging
Others?

21
Nested CRP
Day 1
Day 2
Day 3
22
Nested CRP (cont.)

To generate a document given a tree with L levels
Choose a path from the root of the tree to a leaf
Draw a vector ? of topic mixing proportions from
an L-dimensional Dirichlet
Generate the words in the document from a mixture
of the topics along the path, with mixing
proportions ?

23
Nested CRP (cont.)
24
References

Seminal
T.S. Ferguson. A Bayesian analysis of some
nonparametric problems. Annals of Statistics
1209-230, 1973.
C.E. Antoniak. Mixtures of Dirichlet processes
with applications to Bayesian nonparametric
problems. Annals of Statistics 21152-1174, 1974.
Foundational
M.D. Escobar and M. West. Bayesian density
estimation and inference using mixtures. Journal
of the American Statistical Association,
90577-588, 1995.
S.N. MacEachern and P. Muller. Estimating mixture
of Dirichlet process models. Journal of
Computational and Graphical Statistics,
7223-238, 1998.
R.M. Neal. Markov chain sampling methods for
Dirichlet process mixture models. Journal of
Computational and Graphical Statistics,
9249-265, 2000.
C.E. Rasmussen. The Infinite Gaussian Mixture
Model. NIPS, 2000.
H. Ishwaran and L. James. Gibbs sampling methods
for stick-breaking priors. Journal of the
American Statistical Association, 96161-173,
2001.
NLP
D.M. Blei, T.L. Griffiths, M.I. Jordan, and J.B.
Tenenbaum. Hierarchical topic models and the
nested Chinese restaurant process. NIPS, 2004.
Y.W. Teh, M.I. Jordan, M.J. Beal, and D.M. Blei.
Hierarchical Dirichlet processes. NIPS, 2004.

Write a Comment

User Comments (0)

About PowerShow.com

Chinese Restaurants and StickBreaking: An Introduction to the Dirichlet Process PowerPoint PPT Presentation