Integrating%20Constraints%20and%20Metric%20Learning%20in%20Semi-Supervised%20Clustering - PowerPoint PPT Presentation

About This Presentation
Title:

Integrating%20Constraints%20and%20Metric%20Learning%20in%20Semi-Supervised%20Clustering

Description:

How to exploit supervision in clustering. Incorporate ... Metric learning and clustering are disjointed. Unsupervised Clustering with Metric Learning ... – PowerPoint PPT presentation

Number of Views:102
Avg rating:3.0/5.0
Slides: 22
Provided by: lix82
Category:

less

Transcript and Presenter's Notes

Title: Integrating%20Constraints%20and%20Metric%20Learning%20in%20Semi-Supervised%20Clustering


1
Integrating Constraints and Metric Learning in
Semi-Supervised Clustering
  • Mikhail Bilenko, Sugato Basu, Raymond J. Mooney
  • ICML 2004
  • Presented by Xin Li

2
Semi-Supervised Clustering
K4
3
Semi-Supervised Clustering
4
Semi-Supervised Clustering
5
How to exploit supervision in clustering
  • Incorporate supervision as constraints
  • Learn a distance metric using supervision
  • Integration of these two approaches

6
K-means Clustering
  • X x1,x2,
  • L l1,l2,,lk
  • Euclidean Distance
  • Minimizing

7
Clustering with constraints
  • Pairwise constraints
  • M Must-link pairs
  • (xi, xj) should be in the same cluster
  • C -- Cannot-link pairs
  • (xi, xj) should be in different clusters

8
Learning a pairwise distance metric
  • Binary Classification (xi, xj) ? 0/1
  • M ? positive examples
  • (xi, xj) are the same cluster
  • C ? negative examples
  • (xi, xj) are in different clusters
  • Apply the learned distance metric in clustering
  • Metric learning and clustering are disjointed

9
Unsupervised Clustering with Metric Learning
  • Learn a distance metric that optimize a quality
    function

10
Integrating Constraints and Metric Learning
Combining the previous two equations leads to the
following objective function that minimizes
cluster dispersion under that learned metrics
while reducing constraint violations.
11
Penalty for violating constraints
  • Penalty for violating a must-link constraints
    between distant points should be higher than that
    between nearby points.
  • Penalty for violating a cannot-link constraints
    between nearby points should be lower than that
    between nearby points.

12
MPCK-MEANS Algorithm
  • Constraints are utilized during cluster
    initialization and when assigning points to
    clusters.
  • The distance metric is adapted by re-estimating
    the weights in matrices Ah.

13
Initialization
  • An initial guess of the clusters.
  • Assign each point x to one of K clusters in a way
    that satisfies the constraints.
  • Compute the centroid of each cluster.

14
E-step
  • Every point x is assigned to the cluster that
    minimizes the sum of the distance of x to the
    cluster centroid according to the local metric
    and the cost of any constraint violations
    incurred by the cluster assignment.

15
M-Step
16
Experimental Setting
17
Single Metric, Diagonal Matrix A
18
Single Metric, Diagonal Matrix A
19
Multiple Metrics, Full Matrix A
20
Multiple Metrics, Full Matrix A
21
Conclusion and Discussion
  • This paper has presented MPCK-MEANS, a new
    approach to semi-supervised clustering.
  • Supervision and metric learning are helpful in
    clustering and multiple distance metrics are not
    necessary in most cases.
  • Question 1 If we have supervision in
    clustering, why not utilize supervision in the
    same way as in a typical classification task ?
  • Question 2 If there are infinite number of
    classes, can we gain from supervision on part of
    them ?
Write a Comment
User Comments (0)
About PowerShow.com