Generative Models of Affinity Matrices - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Generative Models of Affinity Matrices

Description:

Alternatively, an affinity for each pair of elements in data set. Class labels ... We have seen how they try to use different features, to go with Gestalt ... – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 44
Provided by: psi103
Category:

less

Transcript and Presenter's Notes

Title: Generative Models of Affinity Matrices


1
Generative Models of Affinity Matrices
  • Rómer Rosales and Brendan Frey
  • romer_at_psi.toronto.edu frey_at_psi.toronto.edu
  • Probabilistic and Statistical Inference Group
  • University of Toronto

2
Overview
  • Background
  • Generative Models of Affinity Matrices
  • Spectral Clustering
  • A Graphical Model of Spectral Clustering
  • A More general view of Spectral Clustering
  • Limitations in Spectral Clustering
  • New models of Affinity Matrices
  • Experimental Results
  • Conclusions

3
Background and Notation
  • Data set
  • We are given a set
  • With a given measure
  • Alternatively, an affinity for each pair
    of elements in data set
  • Class labels
  • A finite set of size M, e.g.,
  • Clusters
  • We want to infer a distribution or most likely
    class assignment for each data set element

4
Affinity Matrices
  • Different forms, e.g., with standard L2 measure
  • intuitively separates close points from far
    points
  • What scale ? What form for L?
  • Idea use simpler form, e.g.,
    and incorporate knowledge or specific
    properties of desired clustering into a well
    defined probabilistic model instead

5
Latent Representation Idea
  • There is a low-dimensional (usually) vector
    associated with each input data point
  • We don't observe the , but we observe some
    function of them (e.g.,the pair-wise affinities
    or some high-dimensional version of them)
  • Want to find a probability distribution over
    to explain the observations

6
A Generative Model for L
Joint distribution
7
Spectral Clustering Instance
  • For Spectral Clustering, choose
  • Uses same variance for all
  • In SC, key structural form is a product of hidden
    variable pairs

8
Spectral Clustering (cont.)
  • Standard Algorithm for Spectral Clustering
    Greedy Inference in this model
  • Step 1
  • Choose best assignment for (d-dim) hidden
    variables (MAP) to minimize Frobenius
    norm SVD (best d-rank approx.)
  • Step 2
  • Choose good means (and covariances) given all
    cluster eigenvector rows (e.g., use k-means,
    mixture of Gaussians, etc.)

9
Generalizing SC
  • New algorithms for optimization
  • Rooted on probabilistic view e.g., different
    forms of approximate inference in graphical
    models
  • New models based on SC components
  • Same basic components but using new structural
    form (e.g., different hidden variable
    interactions)
  • New models of affinity matrices SC

10
New Algorithms for Optimization
  • A simple example algorithm
  • Inference using the EM algorithm
  • Find posteriors over class labels
  • Find MAP estimate for means and variances
  • Jointly optimize given rest of the
    variables, instead of greedy two step
    optimization

11
New Algorithms
  • Example results

12
Example (cont.)
  • Usually converges to same solution as SC
  • Other algorithms variational inference on

13
New Models
  • A slight change in the conditional distribution
    (a more intuitive example)
  • Perhaps, can be used to explain the two ends of
    the spectrum (0 clustering inf dim.
    reduction)

14
Overview
  • Background
  • Generative Models of Affinity Matrices
  • Spectral Clustering
  • A Graphical Model of Spectral Clustering
  • Generalizing Spectral Clustering
  • Limitations in Spectral Clustering
  • New models of Affinity Matrices
  • Experimental Results
  • Conclusions

15
Usual SC Examples
16
Less Usual SC Examples
  • Which clustering is better?

17
Less Usual SC Examples
  • Spectral Clustering results

18
Less Usual SC Examples
19
(No Transcript)
20
Remarks
  • No such a thing as correct clustering (in
    general)
  • Some clusterings may just disagree or agree with
    our perception
  • Optimal choice of scale issue
  • Can usually explain those clusterings that are
    produced

21
New Models of Affinity Matrices
  • A basic Bayesian net view of affinity matrices
  • Also MRF representation, Ising Models

22
  • Bayes Net of the Bayes Net

23
Inference in the Basic Affinity Matrix BN (or MRF)
  • For example (for clustering)
  • Difficult to perform inference
  • MAP estimate of equivalent to MAX-CUT
    problem (NP-complete)
  • Can always test approximate inference

Just large
24
Scaled Affinity Matrix BN
  • Each point is only connected to subset of same
    class points, represented by random variable
  • Internal class scale as a random variable

25
Scaled Affinity Matrix
  • Avoids setting an explicit affinity scale
  • Allows different class dependent scales and
    variances
  • Representation for

26
Scaled Affinity Matrix
  • Admissible graphs
  • Constrains same class nodes to be connected
  • Indicator function
  • The remaining has simple conditional form
  • Intuition Introduces bias in random walk
  • Do not want to model interclass relationships

27
Example
C(1,1,1,1,2,2,2,2)
1 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 1 0 1 1 0 0 0 0 1
0 1 1 0 0 0 0 0 0 0 0 2 2 0 0 0 0 0 0 0 2 2 0 0 0
0 0 0 0 2 2 0 0 0 0 0 0 0 2
  • 1 1 1 1 0 0 0 0
  • 1 1 1 1 0 0 0 0
  • 1 1 1 1 0 0 0 0
  • 1 1 1 1 0 0 0 0
  • 0 0 0 0 2 2 2 2
  • 0 0 0 0 2 2 2 2
  • 0 0 0 0 2 2 2 2
  • 0 0 0 0 2 2 2 2



28
Scaled Affinity Matrix
  • Another form
  • This definition is based on k-nearest neighbors

29
Approximate Inference in the Scaled Model
  • Assume we can compute
  • Update based on expectations under this
    distribution (EM)
  • is given by the
    M-minimum spanning trees (simple proof based on
    Zahn 1971, case for a single sigma for both
    classes).
  • Polynomial time
  • However, using MAP estimate, global optimization
    tends to fall in local minima
  • Sol We use ICM on a single (pick at
    random)
  • Compute MAP for (1-MST!! for each class)
    and iterate
  • Can do this because classes are given now

30
Results (Scaled Model)
31
Results (Scaled Model)
32
Results (Scaled Model)
Connected Graph prior (As before)
K-neighbor Graph prior K4
33
Results (Scaled Model)
34
Results (Scaled Model)
35
Results (Scaled Model)
36
Conclusions
  • Affinity Matrices in terms of Bayes Nets
  • Provides probabilistic view of Spectral
    Clustering some generalizations
  • Allows to incorporate desired clustering
    properties explicitly into model
  • Scaled Affinity Matrix BN
  • Avoids setting an explicit affinity scale
  • Allows modeling different scales within classes
  • Data probability distribution no longer
    constrained to be uniform
  • A view on the clustering dimensionality
    reduction continuum

37
(No Transcript)
38
  • There is no right answer for clustering
  • One way to solve this up to a point Can learn
    beta in supervised fashion, user gives examples
    of close by points in each class e.g in a
    different setting Wagstaff et al, Ping et al
  • A way to generate LLE from our SC GM?
  • UCI real dataset
  • LLE and IB

39
  • The local minima (.
  • Dynamic L vs statics L with distance from
    different features
  • We have seen how they try to use different
    features, to go with Gestalt

40
(No Transcript)
41
Generating Affinity Matrices
  • Advantages

42
Inference (cont.)
Exp
  • Why MAP estimate is equivalent to finding M-MST?
  • I will explain here

43
A Generative Model of L
Write a Comment
User Comments (0)
About PowerShow.com