Unsupervised%20Learning:%20Clustering - PowerPoint PPT Presentation

About This Presentation
Title:

Unsupervised%20Learning:%20Clustering

Description:

Start with a random guess of cluster centers. Determine the membership of each data points ... the points within a rectangle to one cluster. Improved K-means ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 36
Provided by: rong7
Learn more at: http://www.cse.msu.edu
Category:

less

Transcript and Presenter's Notes

Title: Unsupervised%20Learning:%20Clustering


1
Unsupervised Learning Clustering
  • Rong Jin

2
Outline
  • Unsupervised learning
  • K means for clustering
  • Expectation Maximization algorithm for clustering

3
Unsupervised vs. Supervised Learning
  • Supervised learning
  • Training data
  • Every training example is labeled
  • Unsupervised learning
  • Training data
  • No data is labeled
  • We can still discover the structure of the data
  • Semi-supervised learning
  • Training data
  • Mixture of labeled and unlabeled data

Can you think of ways to utilize the unlabeled
data for improving predication?
4
Unsupervised Learning
  • Clustering
  • Visualization
  • Density Estimation
  • Outlier/Novelty Detection
  • Data Compression

5
Clustering/Density Estimation

age
6
Clustering for Visualization
7
Image Compression
http//www.ece.neu.edu/groups/rpl/kmeans/
8
K-means for Clustering
  • Key for clustering
  • Find cluster centers
  • Determine appropriate clusters (very very hard)
  • K-means
  • Start with a random guess of cluster centers
  • Determine the membership of each data points
  • Adjust the cluster centers

9
K-means
  1. Ask user how many clusters theyd like. (e.g.
    k5)

10
K-means
  1. Ask user how many clusters theyd like. (e.g.
    k5)
  2. Randomly guess k cluster Center locations

11
K-means
  1. Ask user how many clusters theyd like. (e.g.
    k5)
  2. Randomly guess k cluster Center locations
  3. Each datapoint finds out which Center its
    closest to. (Thus each Center owns a set of
    datapoints)

12
K-means
  1. Ask user how many clusters theyd like. (e.g.
    k5)
  2. Randomly guess k cluster Center locations
  3. Each datapoint finds out which Center its
    closest to.
  4. Each Center finds the centroid of the points it
    owns

13
K-means
  1. Ask user how many clusters theyd like. (e.g.
    k5)
  2. Randomly guess k cluster Center locations
  3. Each datapoint finds out which Center its
    closest to.
  4. Each Center finds the centroid of the points it
    owns

Any Computational Problem?
14
Improve K-means
  • Group points by region
  • KD tree
  • SR tree
  • Key difference
  • Find the closest center for each rectangle
  • Assign all the points within a rectangle to one
    cluster

15
Improved K-means
  • Find the closest center for each rectangle
  • Assign all the points within a rectangle to one
    cluster

16
Improved K-means
17
Improved K-means
18
Improved K-means
19
Improved K-means
20
Improved K-means
21
Improved K-means
22
Improved K-means
23
Improved K-means
24
Improved K-means
25
Gaussian Mixture Model for Clustering
  • Assume that data are generated from a mixture of
    Gaussian distributions
  • For each Gaussian distribution
  • Center ?i
  • Variance ?i (ignore)
  • For each data point
  • Determine membership

26
Learning a Gaussian Mixture(with known
covariance)
  • Probability

27
Learning a Gaussian Mixture(with known
covariance)
28
Gaussian Mixture Example Start
29
After First Iteration
30
After 2nd Iteration
31
After 3rd Iteration
32
After 4th Iteration
33
After 5th Iteration
34
After 6th Iteration
35
After 20th Iteration
Write a Comment
User Comments (0)
About PowerShow.com