Bayesian%20Hierarchical%20Clustering - PowerPoint PPT Presentation

About This Presentation
Title:

Bayesian%20Hierarchical%20Clustering

Description:

... used to decide which merges are advantageous, and to decide appropriate depth of tree. Algorithm can be interpreted as approximate inference method for a DPM; ... – PowerPoint PPT presentation

Number of Views:146
Avg rating:3.0/5.0
Slides: 20
Provided by: DavidW406
Category:

less

Transcript and Presenter's Notes

Title: Bayesian%20Hierarchical%20Clustering


1
Bayesian Hierarchical Clustering
  • Paper by K. Heller and Z. Ghahramani
  • ICML 2005
  • Presented by David Williams
  • Paper Discussion Group (10.07.05)

2
Outline
  • Traditional Hierarchical Clustering
  • Bayesian Hierarchical Clustering
  • Algorithm
  • Results
  • Potential Application

3
Hierarchical Clustering
  • Given a set of data points, output is a tree
  • Leaves are the data points
  • Internal nodes are nested clusters
  • Examples
  • Evolutionary tree of living organisms
  • Internet newsgroups
  • Newswire documents

4
Traditional Hierarchical Clustering
  • Bottom-up agglomerative algorithm
  • Begin with each data point in own cluster
  • Iteratively merge two closest clusters
  • Stop when have single cluster
  • Closeness based on given distance measure (e.g.,
    Euclidean distance between cluster means)
  • Limitations
  • No guide to choosing correct number of
    clusters, or where to prune tree
  • Distance metric selection (especially for data
    such as images or sequences)
  • How to evaluate how good result is, how to
    compare to other models, how to make predictions
    and cluster new data with existing hierarchy

5
Bayesian Hierarchical Clustering (BHC)
  • Basic idea
  • Use marginal likelihoods to decide which clusters
    to merge
  • Asks what the probability is that all the data in
    a potential merge were generated from the same
    mixture component. Compare to exponentially many
    hypotheses at lower levels of the tree
  • Generative model used is a Dirichlet Process
    Mixture Model (DPM)

6
BHC Algorithm Overview
  • One-pass, bottom-up method
  • Initializes each data point in own cluster, and
    iteratively merges pairs of clusters
  • Uses a statistical hypothesis test to choose
    which clusters to merge
  • At each stage, algorithm considers merging all
    pairs of existing trees

7
BHC Algorithm Merging
  • Two hypotheses compared
  • 1. all data in the pair of trees to be merged was
    generated i.i.d. from the same probabilistic
    model with unknown parameters (e.g., a
    Gaussian)
  • 2. said data has two or more clusters in it

8
Hypothesis H1
  • Probability of the data under H1
  • Prior over the parameters
  • Dk is the data in the two trees to be merged
  • Integral is tractable when conjugate prior
    employed

9
Hypothesis H2
  • Probability of the data under H2
  • Is a product over sub-trees
  • Prior that all points belong to one cluster
  • Probability of the data in tree Tk

10
Merging Clusters
  • From Bayes Rule, the posterior probability of the
    merged hypothesis
  • The pair of trees with highest probability are
    merged
  • Natural place to cut the final tree where

11
Dirichlet Process Mixture Models (DPMs)
  • Probability of a new data point belonging to a
    cluster is proportional to the number of points
    already in that cluster
  • a controls the probability of the new point
    creating a new cluster

12
Merged Hypothesis Prior
  • DPM with a defines a prior on all partitions of
    the nk data points in Dk
  • Prior on merged hypothesis, pk, is the relative
    mass of all nk points belonging to one cluster
    versus all other partitions of those nk points,
    consistent with the tree structure.

13
DPM
  • Other quantities needed for the posterior merged
    hypothesis probabilities can also be written and
    computed with the DPM (see math/proofs in paper)

14
Results
  • Some sample results

15
(No Transcript)
16
(No Transcript)
17
Unique Aspects of Algorithm
  • Is a hierarchical way of organizing nested
    clusters, not a hierarchical generative model
  • Is derived from DPMs
  • Hypothesis test is not for one vs. two clusters
    at each stage (is one vs. many other clusterings)
  • Is not iterative and does not require sampling

18
Summary
  • Defines probabilistic model of data, can compute
    probability of new data point belonging to any
    cluster in tree.
  • Model-based criterion to decide on merging
    clusters.
  • Bayesian hypothesis testing used to decide which
    merges are advantageous, and to decide
    appropriate depth of tree.
  • Algorithm can be interpreted as approximate
    inference method for a DPM gives new lower bound
    on marginal likelihood by summing over
    exponentially many clusterings of the data.

19
Why This Paper?
  • Mixed-type data problems both continuous and
    discrete features
  • How to perform density estimation?
  • One way partition continuous data into groups
    determined by the values of the discrete
    features.
  • Problem number of groups grows quickly. (e.g., 5
    features, each of which can take 4 values, gives
    451024 groups)
  • How to determine which groups should be combined
    to reduce the total number of groups?
  • Possible solution idea in this paper, except
    rather than leaves being individual data points,
    they would be groups of data points as determined
    by the discrete feature-values
Write a Comment
User Comments (0)
About PowerShow.com