Bayesian Generalized Product Partition Model - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Bayesian Generalized Product Partition Model

Description:

The following theorem shows that the GPPM can induce a Blackwell-MacQueen Polya ... still be developed in similar fashion following the fast Variational DP proposed ... – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 24
Provided by: Eric1166
Category:

less

Transcript and Presenter's Notes

Title: Bayesian Generalized Product Partition Model


1
Bayesian Generalized Product Partition Model
  • By David Dunson and Ju-Hyun Park
  • Presentation by Eric Wang 2/15/08

2
Outline
  • Introduce Product Partition Models (PPM).
  • Relate PPM to DP via the Blackwell-MacQueen Polya
    Urn scheme.
  • Introduce predictor dependence into PPM to form
    Generalized PPM (GPPM).
  • Discussion and Results
  • Conclusion

3
Product Partition Model
  • A PPM is formally defined as
  • Where is a partition of
    .
  • Let denote the data
    for subjects in cluster h, h 1,,k.
  • Therefore, the probability of partition is
    therefore the product of all its independent
    subsets.
  • The posterior cohesion on after seeing data
    is also a PPM,

(1)
4
Product Partition Model
  • A PPM can also be induced hierarchically
  • Where if ,
    .
  • Taking induces a nonparametric PPM.
  • A prior on the weights
    imposes a particular form on the cohesion a
    convenient choice corresponds to the Dirichlet
    Process.

5
Relating DP and PPM
  • In DP, .
  • G is seen in stick breaking. If it is
    marginalized out, it yields the
    Blackwell-MacQueen (1973) formulation
  • Where is the unique value taken by the ith
    data.
  • The joint distribution of the a particular set
    is therefore
  • due to the independence of the data.

6
Relating DP and PPM
  • It can be shown directly that the
    Blackwell-MacQueen formulation leads to
  • Where is the number of data taking unique
    value .
  • is the unique value of the subject
    in cluster h, re-sorted by their ids
  • Also, , is a
    normalizing constant and the cohesion is Then

(2)
(3)
7
Relating DP and PPM
  • From slide 3, writing the prior and likelihood
    together
  • Notice that from (1), G can be marginalized out
    to get the same form
  • Specifically, integrate over all possible unique
    values which can be taken by for subset h.

(4)
8
Relating DP and PPM
  • Therefore, DP is a special case of PPM with
    cohesion and
    normalizing constant
    .
  • However, (2) follows the premise of DP that data
    is exhcangeable and does not incorporate
    dependence on predictors.
  • Next, PPMs will be generalized such that
    predictor dependence is incorporated.

9
Generalized PPM
  • The goal of the paper is to formulate (1) such
    that the cohesion depends on the subjects
    predictor
  • This can be done following a process very similar
    to the non-predictor case above.
  • Once again, the connection between DP and PPM
    will be used, this will henceforth be referred to
    as GPPM
  • The formulation is interesting because the
    predictors
    will be treated as random variables rather than
    known fixed values (as in KSBP).

10
GPPM
  • Consider the following hierarchical model
  • Where ,
    constitutes a base measure on
    and , the parameters of the data and
    predictor, respectively.
  • This model will segment data 1,,n into k
    clusters. As before, denotes that
    subject i belongs to cluster h.
  • and , which
    denote the unique values of the parameters
    associated with the subject and its predictor,
    shown below

11
GPPM
  • The joint distribution of can be
    developed in a similar manner to (2)
  • The conditional distribution of given
    predictors is
  • For comparison, (2) is shown below
  • The cohesion in (6) is
  • (7) meets the criteria originally set out.

(5)
(6)
(2)
(7)
12
GPPM
  • Some thoughts on GPPM so far
  • As noted earlier the posterior distribution of
    PPMs are still in the class of PPMs, but with
    updated cohesion.
  • Similiarly, the posterior of a GPPM will also
    take the form of a GPPM
  • (2) and (6) are quite similar. The extra portion
    of (6) is the marginalized probability of the
    predictor .
  • If , then the
    GPPM reverts to the Blackwell-MacQueen
    formulation, seen clearly in the following
    theorem.

13
Generalized Polya Urn Scheme
  • The following theorem shows that the GPPM can
    induce a Blackwell-MacQueen Polya Urn scheme,
    generalized for predictor dependence

14
Generalized Polya Urn Scheme
  • By the above theorem, data i will do either 1) or
    2)
  • 1) Draw a previously unseen unique value
    proportional to the concentration parameter
    and the base measure on the predictor
  • 2) Draw a previously used unique value
    equal to the parameters of cluster h proportional
    to the number of data which have previously
    chosen that unique value and the marginal
    likelihoods of its predictor value across the
    clusters.
  • Further, since the predictors are treated as
    random variables, updating the posteriors on each
    clusters predictor parameters means that GPPM is
    a flexible, non-parametric way to adapt the
    distance measure in predictor space.
  • In this paper G is always integrated out
    however, Dunson alludes to variational techniques
    which could still be developed in similar fashion
    following the fast Variational DP proposed by
    Kurihara et al (2006).

15
Generalized Polya Urn Scheme
  • Consider, for example, a Normal-Wishart prior on
    the predictor as follows
  • Where and are multiplicative
    constants and is a
    Wishart distribution with degrees of
    freedom and mean
  • Notice that this formulation adds another
    multiplier to the precision of the
    predictor distribution. This analogously
    corresponds to kernel width in KSBP, and
    encourages tight local clustering in predictor
    space.
  • The marginal distributions on the predictors from
    Theorem 1 take the forms shown on the next slide.

16
Generalized Polya Urn Scheme
  • The marginal distribution of the predictor in the
    first weight
  • The marginal distribution of the predictor in the
    second weight has the same functional form but
    with updated hyperparameters

Non-central multivariate t-distribution with
degrees of freedom Mean and
scale
where
And is the empirical mean of the
predictors in cluster h, without predictor i.
17
Generalized Polya Urn Scheme
  • Posterior updating in this model is
    straightforward using MCMC. The conditional
    posterior of the parameters is
  • The indicators are updated separately from
    the cluster parameters . The membership
    indicators are sampled from it multinomial
    posterior
  • Next, update the parameters conditioned on
    and number of clusters k.

where is the base prior updated with
the data likelihood
and the weights from Theorem 1
18
Results
  • Dunson et al. demonstrates results using the
    following model on conditional density regression
    problems
  • Where
  • Demonstrate results on 3 datasets
  • Simulated Single Gaussian (p 2)
  • Simulated Mixture of two Gaussians (p 2)
  • Epidemiology data (p 3)

P-dimensional predictor
Data likelihood
Parameters of cluster h.
19
Results
  • Simulated single Gaussian data, 500 data points
  • is generated iid from a uniform
    distribution over (0,1).
  • Data was simulated using
  • Algorithm was run for 10,000 iterations with
    1,000 iteration burn-in. Fast mixing and good
    estimates.

Raw Data
Below are conditional distributions on y for two
different values of x. The dotted lines is
truth, the solid line is the estimation, and the
dashed lines are 99 credibility intervals
y
x
20
Results
  • Simulated 2 Gaussian results, 500 data points
  • is generated iid from a uniform distribution
    over (0,1).
  • Data was simulated using

PPM
GPPM
Here, the left column of plots are for a PPM
(non-generalized, while the right column plots is
the GPPM on the same dataset. Notice much better
fitting in the bottom plots, and that the GPPM is
not dragged toward 0 as the second peak appears
when approaches 0.
21
Results
  • Epidemiologic Application
  • DDE is shown to increase the rate of pre-term
    birth. Two predictors and
    correspond to DDE dose for child i, and mothers
    age after normalization, respectively.
  • Dataset size was 2,313 subjects.
  • MCMC GPPM was run for 30,000 iterations with
    10,000 iteration burn-in.
  • The results confirmed earlier findings that DDE
    causes a slightly decreasing trend as DDE level
    rises.
  • These findings are similar to previous KSBP work
    on the same dataset, but the implementation was
    simpler.

22
Results
Raw Data
Dashed lines indicate 99 credibility intervals
23
Conclusion
  • A GPPM was formulated beginning with the
    Blackwell-MacQueen Polya Urn scheme.
  • The GPPM incorporates predictor dependence by
    treating the predictor as a random variable.
  • It is similar in spirit to the KSBP, but is able
    to bypass issues such as kernel width selection
    and the inability to implement a continuous
    distribution in predictor space.
  • Future research directions could explore Dunsons
    mention of a variational method similar to the
    formulation proposed in this paper.
Write a Comment
User Comments (0)
About PowerShow.com