Title: Bayesian Generalized Product Partition Model
1Bayesian Generalized Product Partition Model
- By David Dunson and Ju-Hyun Park
- Presentation by Eric Wang 2/15/08
2Outline
- Introduce Product Partition Models (PPM).
- Relate PPM to DP via the Blackwell-MacQueen Polya
Urn scheme. - Introduce predictor dependence into PPM to form
Generalized PPM (GPPM). - Discussion and Results
- Conclusion
3Product Partition Model
- A PPM is formally defined as
- Where is a partition of
. - Let denote the data
for subjects in cluster h, h 1,,k. - Therefore, the probability of partition is
therefore the product of all its independent
subsets. - The posterior cohesion on after seeing data
is also a PPM, -
(1)
4Product Partition Model
- A PPM can also be induced hierarchically
- Where if ,
. - Taking induces a nonparametric PPM.
- A prior on the weights
imposes a particular form on the cohesion a
convenient choice corresponds to the Dirichlet
Process.
5Relating DP and PPM
- In DP, .
- G is seen in stick breaking. If it is
marginalized out, it yields the
Blackwell-MacQueen (1973) formulation - Where is the unique value taken by the ith
data. - The joint distribution of the a particular set
is therefore - due to the independence of the data.
6Relating DP and PPM
- It can be shown directly that the
Blackwell-MacQueen formulation leads to - Where is the number of data taking unique
value . - is the unique value of the subject
in cluster h, re-sorted by their ids - Also, , is a
normalizing constant and the cohesion is Then
(2)
(3)
7Relating DP and PPM
- From slide 3, writing the prior and likelihood
together - Notice that from (1), G can be marginalized out
to get the same form - Specifically, integrate over all possible unique
values which can be taken by for subset h.
(4)
8Relating DP and PPM
- Therefore, DP is a special case of PPM with
cohesion and
normalizing constant
. - However, (2) follows the premise of DP that data
is exhcangeable and does not incorporate
dependence on predictors. - Next, PPMs will be generalized such that
predictor dependence is incorporated.
9Generalized PPM
- The goal of the paper is to formulate (1) such
that the cohesion depends on the subjects
predictor - This can be done following a process very similar
to the non-predictor case above. - Once again, the connection between DP and PPM
will be used, this will henceforth be referred to
as GPPM - The formulation is interesting because the
predictors
will be treated as random variables rather than
known fixed values (as in KSBP).
10GPPM
- Consider the following hierarchical model
- Where ,
constitutes a base measure on
and , the parameters of the data and
predictor, respectively. - This model will segment data 1,,n into k
clusters. As before, denotes that
subject i belongs to cluster h. - and , which
denote the unique values of the parameters
associated with the subject and its predictor,
shown below
11GPPM
- The joint distribution of can be
developed in a similar manner to (2) - The conditional distribution of given
predictors is - For comparison, (2) is shown below
- The cohesion in (6) is
- (7) meets the criteria originally set out.
(5)
(6)
(2)
(7)
12GPPM
- Some thoughts on GPPM so far
- As noted earlier the posterior distribution of
PPMs are still in the class of PPMs, but with
updated cohesion. - Similiarly, the posterior of a GPPM will also
take the form of a GPPM - (2) and (6) are quite similar. The extra portion
of (6) is the marginalized probability of the
predictor . - If , then the
GPPM reverts to the Blackwell-MacQueen
formulation, seen clearly in the following
theorem.
13Generalized Polya Urn Scheme
- The following theorem shows that the GPPM can
induce a Blackwell-MacQueen Polya Urn scheme,
generalized for predictor dependence
14Generalized Polya Urn Scheme
- By the above theorem, data i will do either 1) or
2) - 1) Draw a previously unseen unique value
proportional to the concentration parameter
and the base measure on the predictor -
- 2) Draw a previously used unique value
equal to the parameters of cluster h proportional
to the number of data which have previously
chosen that unique value and the marginal
likelihoods of its predictor value across the
clusters. - Further, since the predictors are treated as
random variables, updating the posteriors on each
clusters predictor parameters means that GPPM is
a flexible, non-parametric way to adapt the
distance measure in predictor space. - In this paper G is always integrated out
however, Dunson alludes to variational techniques
which could still be developed in similar fashion
following the fast Variational DP proposed by
Kurihara et al (2006).
15Generalized Polya Urn Scheme
- Consider, for example, a Normal-Wishart prior on
the predictor as follows - Where and are multiplicative
constants and is a
Wishart distribution with degrees of
freedom and mean - Notice that this formulation adds another
multiplier to the precision of the
predictor distribution. This analogously
corresponds to kernel width in KSBP, and
encourages tight local clustering in predictor
space. - The marginal distributions on the predictors from
Theorem 1 take the forms shown on the next slide.
16Generalized Polya Urn Scheme
- The marginal distribution of the predictor in the
first weight - The marginal distribution of the predictor in the
second weight has the same functional form but
with updated hyperparameters
Non-central multivariate t-distribution with
degrees of freedom Mean and
scale
where
And is the empirical mean of the
predictors in cluster h, without predictor i.
17Generalized Polya Urn Scheme
- Posterior updating in this model is
straightforward using MCMC. The conditional
posterior of the parameters is - The indicators are updated separately from
the cluster parameters . The membership
indicators are sampled from it multinomial
posterior - Next, update the parameters conditioned on
and number of clusters k.
where is the base prior updated with
the data likelihood
and the weights from Theorem 1
18Results
- Dunson et al. demonstrates results using the
following model on conditional density regression
problems - Where
- Demonstrate results on 3 datasets
- Simulated Single Gaussian (p 2)
- Simulated Mixture of two Gaussians (p 2)
- Epidemiology data (p 3)
-
P-dimensional predictor
Data likelihood
Parameters of cluster h.
19Results
- Simulated single Gaussian data, 500 data points
- is generated iid from a uniform
distribution over (0,1). - Data was simulated using
- Algorithm was run for 10,000 iterations with
1,000 iteration burn-in. Fast mixing and good
estimates.
Raw Data
Below are conditional distributions on y for two
different values of x. The dotted lines is
truth, the solid line is the estimation, and the
dashed lines are 99 credibility intervals
y
x
20Results
- Simulated 2 Gaussian results, 500 data points
- is generated iid from a uniform distribution
over (0,1). - Data was simulated using
-
PPM
GPPM
Here, the left column of plots are for a PPM
(non-generalized, while the right column plots is
the GPPM on the same dataset. Notice much better
fitting in the bottom plots, and that the GPPM is
not dragged toward 0 as the second peak appears
when approaches 0.
21Results
- Epidemiologic Application
- DDE is shown to increase the rate of pre-term
birth. Two predictors and
correspond to DDE dose for child i, and mothers
age after normalization, respectively. - Dataset size was 2,313 subjects.
- MCMC GPPM was run for 30,000 iterations with
10,000 iteration burn-in. - The results confirmed earlier findings that DDE
causes a slightly decreasing trend as DDE level
rises. - These findings are similar to previous KSBP work
on the same dataset, but the implementation was
simpler.
22Results
Raw Data
Dashed lines indicate 99 credibility intervals
23Conclusion
- A GPPM was formulated beginning with the
Blackwell-MacQueen Polya Urn scheme. - The GPPM incorporates predictor dependence by
treating the predictor as a random variable. - It is similar in spirit to the KSBP, but is able
to bypass issues such as kernel width selection
and the inability to implement a continuous
distribution in predictor space. - Future research directions could explore Dunsons
mention of a variational method similar to the
formulation proposed in this paper.