Chinese Restaurants and StickBreaking: An Introduction to the Dirichlet Process PowerPoint PPT Presentation

presentation player overlay
1 / 24
About This Presentation
Transcript and Presenter's Notes

Title: Chinese Restaurants and StickBreaking: An Introduction to the Dirichlet Process


1
Chinese Restaurants and Stick-Breaking An
Introduction to the Dirichlet Process
  • Teg Grenager
  • NLP Group Lunch
  • February 24, 2005

2
Agenda
  • Motivation
  • Mixture Models
  • Dirichlet Process
  • Gibbs Sampling
  • Applications

3
Clustering
  • Goal learn a partition of the data, such that
  • Data within classes are similar
  • Classes are different from each other
  • Two very different approaches
  • Agglomerative build up clusters by iteratively
    sticking similar things together
  • Mixture Model learn a generative model over the
    data, treating the classes as hidden variables

4
Agglomerative Clustering
Num Clusters
Max Distance
  • Pros Doesnt need generative model (number of
    clusters, parametric distribution)
  • Cons Ad-hoc, no probabilistic foundation,
    intractable for large data sets

5
Mixture Model Clustering
  • Examples K-means, mixture of Gaussians, Naïve
    Bayes
  • Pros Sound probabilistic foundation, efficient
    even for large data sets
  • Cons Requires generative model, including number
    of clusters (mixture components)

6
Problem
7
Big Idea
  • Want to use a generative model, but dont want to
    decide number of clusters in advance
  • Suggestion put each datum in its own cluster
  • Problem probability of 2 clusters colliding is
    zero under any density function, no stickiness
  • Solution instead of a density function, use a
    statistical process where the probability of two
    clusters falling together is non-negative
  • Best of both worlds stickiness with variable
    number of clusters

8
Finite Mixture Model
Gaussian
Naïve Bayes
9
Dirichlet Priors (Review)
  • A distribution over possible parameter vectors of
    the multinomial distribution
  • Thus values must lie in the k-dimensional simplex
  • Beta distribution is the 2-parameter special case
  • Expectation
  • A conjugate prior to the multinomial
  • Explicit formulation is ugly!

10
Infinite Mixture Model
p
?
ci
xi
N
11
Chinese Restaurant Process
12
DP Mixture Model
13
Stick-breaking Process
14
Properties of the DP
  • Let (?,?) be a measurable space, G0 be a
    probability measure on the space, and ? be a
    positive real number
  • A Dirichlet process is any distribution of a
    random probability measure G over (?,?) such
    that, for all finite partitions (A1,,Ar) of ?,
  • Draws G from DP are generally not distinct
  • The number of distinct values grows with O(log n)

15
Infinite Exchangeability
  • In general, an infinite set of random variables
    is said to be infinitely exchangeable if for
    every finite subset xi,,xn and for any
    permutation ? we have
  • Note that infinite exchangeability is not the
    same as being independent and identically
    distributed (i.i.d.)!
  • Using DeFinettis theorem, it is possible to show
    that our draws ? are infinitely exchangeable
  • Thus the mixture components may be sampled in any
    order

16
Mixture Model Inference
  • We want to find a clustering of the data an
    assignment of values to the hidden class variable
  • Sometimes we also want the component parameters
  • In most finite mixture models, this can be found
    with EM
  • The Dirichlet process is a non-parametric prior,
    and doesnt permit EM
  • We use Gibbs sampling instead

17
Gibbs Sampling 1
  • Algorithm 1 integrate out G, and sample the ?i
    directly, conditioned on everything else
  • This is inefficient, because we update cluster
    information for one datum at a time

18
Gibbs Sampling 2
  • Algorithm 2
  • Reintroduce a cluster variable ci which takes on
    values that are the names c of the clusters
  • Store the parameters that are shared by all data
    in class c in a new variable ?c

19
Gibbs Sampling 2 (cont.)
  • Algorithm 2
  • For i 1,,N sample ci from where H-i,c
    is the posterior distribution of ?c based on the
    prior G0 and all observations for which j?i and
    cjc
  • Repeat
  • Works well
  • Note can also use variational methods (other
    than EM)

20
NLP Applications
  • Clustering
  • Document clustering for topic, genre, sentiment,
  • Word clustering for POS, WSD, synonymy,
  • Topic clustering across documents (see Blei et.
    al., 2004 and Teh et. al., 2004)
  • Noun coreference dont know how many entities
    there are
  • Other identity uncertainty problems deduping,
    etc.
  • Grammar induction
  • Sequence modeling the infinite HMM
  • Topic segmentation (see Grenager et. al., 2005)
  • Sequence models for POS tagging
  • Others?

21
Nested CRP
Day 1
Day 2
Day 3
22
Nested CRP (cont.)
  • To generate a document given a tree with L levels
  • Choose a path from the root of the tree to a leaf
  • Draw a vector ? of topic mixing proportions from
    an L-dimensional Dirichlet
  • Generate the words in the document from a mixture
    of the topics along the path, with mixing
    proportions ?

23
Nested CRP (cont.)
24
References
  • Seminal
  • T.S. Ferguson. A Bayesian analysis of some
    nonparametric problems. Annals of Statistics
    1209-230, 1973.
  • C.E. Antoniak. Mixtures of Dirichlet processes
    with applications to Bayesian nonparametric
    problems. Annals of Statistics 21152-1174, 1974.
  • Foundational
  • M.D. Escobar and M. West. Bayesian density
    estimation and inference using mixtures. Journal
    of the American Statistical Association,
    90577-588, 1995.
  • S.N. MacEachern and P. Muller. Estimating mixture
    of Dirichlet process models. Journal of
    Computational and Graphical Statistics,
    7223-238, 1998.
  • R.M. Neal. Markov chain sampling methods for
    Dirichlet process mixture models. Journal of
    Computational and Graphical Statistics,
    9249-265, 2000.
  • C.E. Rasmussen. The Infinite Gaussian Mixture
    Model. NIPS, 2000.
  • H. Ishwaran and L. James. Gibbs sampling methods
    for stick-breaking priors. Journal of the
    American Statistical Association, 96161-173,
    2001.
  • NLP
  • D.M. Blei, T.L. Griffiths, M.I. Jordan, and J.B.
    Tenenbaum. Hierarchical topic models and the
    nested Chinese restaurant process. NIPS, 2004.
  • Y.W. Teh, M.I. Jordan, M.J. Beal, and D.M. Blei.
    Hierarchical Dirichlet processes. NIPS, 2004.
Write a Comment
User Comments (0)
About PowerShow.com