IMM Publikationsdatabase - PowerPoint PPT Presentation

About This Presentation
Title:

IMM Publikationsdatabase

Description:

Informatics and Mathematical Modelling / Intelligent Signal Processing. 1 ... Intelligent Signal Processing. Technical University of Denmark. TexPoint fonts ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 35
Provided by: mm195
Category:

less

Transcript and Presenter's Notes

Title: IMM Publikationsdatabase


1
Clustering on the Simplex
Morten Mørup DTU Informatics Intelligent Signal
Processing Technical University of Denmark
  • TexPoint fonts used in EMF.
  • Read the TexPoint manual before you delete this
    box. AAA

2
Joint work with
Lars Kai Hansen DTU Informatics Intelligent
Signal Processing Technical University of Denmark
Christian Walder DTU Informatics Intelligent
Signal Processing Technical University of Denmark
3
Clustering
  • Cluster analysis or clustering is the assignment
    of a set of observations into subsets (called
    clusters) so that observations in the same
    cluster are similar in some sense. (Wikipedia)

4
Clustering approaches
  • K-means iterative refinement algorithm (Lloyd,
    1982 Hartigan, 1979)
  • Problem NP-complete (Megiddo and Supowit, 1984)
  • Relaxations of the hard assigment problem
  • Annealing approaches basedon temperature
    parameter(T?0 the original clustering problem is
    recovered)(see for instance Hofmann and Buhmann,
    1997)
  • Fuzzy clustering (Hathaway and Bezdek, 1988)
  • Expectation Maximization (Mixture of Gaussians)
  • Spectral Clustering

Assignmnt Step (S) Assign each data point to
the cluster with closest mean value Update Step
(C) Calculate the new mean value for each
cluster
Guarantee of optimality
No single change in assignment better than
current assignment (1-spin stability).
Drawbacks
Previously relaxations are either not exact or
dependent on some problem specific annealing
parameter in order to recover the original binary
combinatorial assignments.
5
From the K-means objective to Pairwise Clustering
K-mean objective
Pairwise Clustering (Buhmann and Hofmann, 1994)
K similarity matrix, KXTX equivalent tothe
k-means objective
6
Although Clustering is hard there is room to be
simple(x) minded!
Binary Combinatorial (BC)
Simplicial Relaxation (SR)
7
The simplicial relaxation (SR) admits standard
continuous optimization to solve for the pairwise
clustering problems.
For instance by normalization invariant projected
gradient ascent
8
Synthetic data example
K-means
SR-clustering
Brown and grey clusters each contain 1000
data-points in R2 Whereas the remaining clusters
each have 250 data-points.
9
SR-clustering algorithm driven by high density
regions
10
Thus, solutions in general substantially better
than Lloyds algorithm having the same
computational complexity
SR-clustering (?init1)
SR-clustering (?init0.01)
Lloyds K-means
11
K-means SR-clustering (?init1) SR-clustering (?
init0.01)
10 components
50 components
100 components
12
SR-clustering for Kernel based semi-supervised
learning
Kernel based semi-supervised learning based on
pairwise clustering
(Basu et al, 2004, Kulis et al. 2005, Kulis et
al, 2009)
13
Simplicial relaxation admit solving the problem
as a (non-convex) continous optimization problem
14
Class labels can be handled explicitly fixing
Must and cannot links can be absorbed into the
Kernel
Hence the problem reduces more or less to
standard SR-clustering problem for the estimation
of S
15
At stationarity we have that the gradients of
elements in each column of S that are 1 are
larger than elements that are 0. Thus,
evaluating the impact of the supervision can be
done estimating the minimal lagrange multipliers
that guarantee stationarity of the solution
obtained by the SR-clustering algorithm. This is
a convex optimization problem
Thus, Lagrange multipliers give a measure of
conflict between the data and the supervision
16
Digit classification with one miss-labeled data
observation from each class.
17
Community Detection in Complex Networks
Communities/modules a natural divisions of
network nodes into densely connected subgroups
(Newman Girvan 2003)
G(V,E)
Adjacency Matrix A
Permuted adjacency matrix PAPT
Community detection algorithm
Permutation P of graph from clustering
assignment S
18
Common Community detection objectives
  • Hamiltonian (Fu Anderson, 1986, Reichardt
    Bornholdt, 2004)
  • Modularity (Newman Girvan, 2004)

Generic problems of the form
19
Again we can make an exact relaxation to the
simplex!
20
(No Transcript)
21
(No Transcript)
22
SR-clustering of complex networks
Quality of solutions comparable to results
obtained by extensive Gibbs sampling
23
So far we have demonstrated how binary
combinatorial constraints are recovered at
stationarity when relaxing the problems to the
simplex. However, simplex constraints also
holds promising data mining properties of their
own!
24
The Principal Convex Hull (PCH)
The Convex Hull
Def The convex hull/convex envelope of X?RM?N is
the minimal convex set containing X. (Informally
it can be described as a rubber band wrapped
around the data points.) Finding the convex hull
is solvable in linear time, O(N) (McCallum and D.
Avis, 1979) However, the size of the convex set
grows exponentially with the dimensionality of
the data, O(logM-1(N)) (Dwyer, 1988)
Def The best convex set of size K according to
some measure of distortion D() (Mørup et al.
2009). (Informally it can be described as a less
flexible rubber band that wraps most of the data
points.)
25
The mathematical formulation of the Principal
Convex Hull (PCH) is given by two simplex
constraints
Principal in terms of the Frobenius norm
C Give the fraction in which observations in X
are used to form each feature (distinct
aspects/freaks). In general C will be very
sparse!! S Give the fraction each observation
resembles each distinct aspects XC.
X
X
C
S
?
(note when K large enough such that
the PCH recover the convex hull)
26
Relation between the PCH model, low rank
decomposition and clustering approaches
PCH naturally bridges clustering and low-rank
approximations!
27
Two important properties of the PCH model
The PCH model is invariant to affine
transformation and scaling
The PCH model is unique up to permutation of the
components
28
A feature extraction example
More contrast in features than obtained by
clustering approaches. As such, PCH aim for
distict aspects/regions in data
The PCH model strives to attain Platonic Ideal
Forms
29
PCH model for PET data(Positron Emission
Tomography)
Data contain 3 components High-Binding
regions Low-binding regions Non-binding
regions Each voxel given concentrationfraction
of these regions
XC
S
30
NMF spectroscopy of samples of mixtures of
propanol butanol and pentanol.
31
Collaborative filtering example
Medium size and large size Movie lens data
(www.grouplens.org) Medium size 1,000,209
ratings of 3,952 movies by 6,040 users Large
size 10,000,054 ratings of 10,677 movies given
by 71,567
32
Conclusion
  • The simplex offers unique data mining properties
  • Simplicial relaxations (SR) form exact
    relaxation of common hard assignment clustering
    problems, i.e. K-means, Pairwise Clustering and
    Community detection in graphs.
  • SR Enable to solve binary combinatorial problems
    using standard solvers from continuous
    optimization.
  • The proposed SR-clustering algorithm outperforms
    traditional iterative refinement algorithms
  • No need for annealing parameter. hard
    assignments guaranteed atstationarity (Theorem 1
    and 2)
  • Semi-Supervised learning can be posed as
    continuous optimization problem with associated
    lagrange multipliers giving an evaluation
    measure of each supervised constraint

33
Conclusion cont.
  • The Principal Convex Hull (PCH) formed by two
    types of simplex constraints
  • Extract distinct aspects of the data
  • Relevant for data mining in general where low
    rank approximation and clustering approaches
    have been invoked.

34
A reformulation of Lex Parsimoniae
The simplest explanation is usually the best.
- William of Ockham
The simplex explanation is usually the best.
Simplicity is the ultimate sophistication.
- Leonardo Da Vinci
Simplexity is the ultimate sophistication.
The presented work is described in M. Mørup and
L. K. Hansen An Exact Relaxation of Clustering,
Submitted JMLR 2009 M. Mørup, C. Walder and L. K.
Hansen Simplicial Semi-supervised Learning,
submitted M. Mørup and L. K. Hansen Platonic
Forms Revisited, submitted
Write a Comment
User Comments (0)
About PowerShow.com