Generative Models of Affinity Matrices - PowerPoint PPT Presentation

1 / 43

About This Presentation

Title:

Generative Models of Affinity Matrices

Description:

Alternatively, an affinity for each pair of elements in data set. Class labels ... We have seen how they try to use different features, to go with Gestalt ... – PowerPoint PPT presentation

Number of Views:76

Avg rating:3.0/5.0

Slides: 44

Provided by: psi103

Category:

more less

Transcript and Presenter's Notes

Title: Generative Models of Affinity Matrices

1
Generative Models of Affinity Matrices

Rómer Rosales and Brendan Frey
romer_at_psi.toronto.edu frey_at_psi.toronto.edu
Probabilistic and Statistical Inference Group
University of Toronto

2
Overview

Background
Generative Models of Affinity Matrices
Spectral Clustering
A Graphical Model of Spectral Clustering
A More general view of Spectral Clustering
Limitations in Spectral Clustering
New models of Affinity Matrices
Experimental Results
Conclusions

3
Background and Notation

Data set
We are given a set
With a given measure
Alternatively, an affinity for each pair
of elements in data set
Class labels
A finite set of size M, e.g.,
Clusters
We want to infer a distribution or most likely
class assignment for each data set element

4
Affinity Matrices

Different forms, e.g., with standard L2 measure
intuitively separates close points from far
points
What scale ? What form for L?
Idea use simpler form, e.g.,
and incorporate knowledge or specific
properties of desired clustering into a well
defined probabilistic model instead

5
Latent Representation Idea

There is a low-dimensional (usually) vector
associated with each input data point
We don't observe the , but we observe some
function of them (e.g.,the pair-wise affinities
or some high-dimensional version of them)
Want to find a probability distribution over
to explain the observations

6
A Generative Model for L
Joint distribution
7
Spectral Clustering Instance

For Spectral Clustering, choose
Uses same variance for all
In SC, key structural form is a product of hidden
variable pairs

8
Spectral Clustering (cont.)

Standard Algorithm for Spectral Clustering
Greedy Inference in this model
Step 1
Choose best assignment for (d-dim) hidden
variables (MAP) to minimize Frobenius
norm SVD (best d-rank approx.)
Step 2
Choose good means (and covariances) given all
cluster eigenvector rows (e.g., use k-means,
mixture of Gaussians, etc.)

9
Generalizing SC

New algorithms for optimization
Rooted on probabilistic view e.g., different
forms of approximate inference in graphical
models
New models based on SC components
Same basic components but using new structural
form (e.g., different hidden variable
interactions)
New models of affinity matrices SC

10
New Algorithms for Optimization

A simple example algorithm
Inference using the EM algorithm
Find posteriors over class labels
Find MAP estimate for means and variances
Jointly optimize given rest of the
variables, instead of greedy two step
optimization

11
New Algorithms

Example results

12
Example (cont.)

Usually converges to same solution as SC
Other algorithms variational inference on

13
New Models

A slight change in the conditional distribution
(a more intuitive example)
Perhaps, can be used to explain the two ends of
the spectrum (0 clustering inf dim.
reduction)

14
Overview

Background
Generative Models of Affinity Matrices
Spectral Clustering
A Graphical Model of Spectral Clustering
Generalizing Spectral Clustering
Limitations in Spectral Clustering
New models of Affinity Matrices
Experimental Results
Conclusions

15
Usual SC Examples
16
Less Usual SC Examples

Which clustering is better?

17
Less Usual SC Examples

Spectral Clustering results

18
Less Usual SC Examples
19
(No Transcript)
20
Remarks

No such a thing as correct clustering (in
general)
Some clusterings may just disagree or agree with
our perception
Optimal choice of scale issue
Can usually explain those clusterings that are
produced

21
New Models of Affinity Matrices

A basic Bayesian net view of affinity matrices
Also MRF representation, Ising Models

Bayes Net of the Bayes Net

23
Inference in the Basic Affinity Matrix BN (or MRF)

For example (for clustering)
Difficult to perform inference
MAP estimate of equivalent to MAX-CUT
problem (NP-complete)
Can always test approximate inference

Just large
24
Scaled Affinity Matrix BN

Each point is only connected to subset of same
class points, represented by random variable
Internal class scale as a random variable

25
Scaled Affinity Matrix

Avoids setting an explicit affinity scale
Allows different class dependent scales and
variances
Representation for

26
Scaled Affinity Matrix

Admissible graphs
Constrains same class nodes to be connected
Indicator function
The remaining has simple conditional form
Intuition Introduces bias in random walk
Do not want to model interclass relationships

27
Example
C(1,1,1,1,2,2,2,2)
1 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 1 0 1 1 0 0 0 0 1
0 1 1 0 0 0 0 0 0 0 0 2 2 0 0 0 0 0 0 0 2 2 0 0 0
0 0 0 0 2 2 0 0 0 0 0 0 0 2

1 1 1 1 0 0 0 0
1 1 1 1 0 0 0 0
1 1 1 1 0 0 0 0
1 1 1 1 0 0 0 0
0 0 0 0 2 2 2 2
0 0 0 0 2 2 2 2
0 0 0 0 2 2 2 2
0 0 0 0 2 2 2 2

28
Scaled Affinity Matrix

Another form
This definition is based on k-nearest neighbors

29
Approximate Inference in the Scaled Model

Assume we can compute
Update based on expectations under this
distribution (EM)
is given by the
M-minimum spanning trees (simple proof based on
Zahn 1971, case for a single sigma for both
classes).
Polynomial time
However, using MAP estimate, global optimization
tends to fall in local minima
Sol We use ICM on a single (pick at
random)
Compute MAP for (1-MST!! for each class)
and iterate
Can do this because classes are given now

30
Results (Scaled Model)
31
Results (Scaled Model)
32
Results (Scaled Model)
Connected Graph prior (As before)
K-neighbor Graph prior K4
33
Results (Scaled Model)
34
Results (Scaled Model)
35
Results (Scaled Model)
36
Conclusions

Affinity Matrices in terms of Bayes Nets
Provides probabilistic view of Spectral
Clustering some generalizations
Allows to incorporate desired clustering
properties explicitly into model
Scaled Affinity Matrix BN
Avoids setting an explicit affinity scale
Allows modeling different scales within classes
Data probability distribution no longer
constrained to be uniform
A view on the clustering dimensionality
reduction continuum

37
(No Transcript)
38

There is no right answer for clustering
One way to solve this up to a point Can learn
beta in supervised fashion, user gives examples
of close by points in each class e.g in a
different setting Wagstaff et al, Ping et al
A way to generate LLE from our SC GM?
UCI real dataset
LLE and IB