Unsupervised Learning: Clustering - PowerPoint PPT Presentation

About This Presentation
Title:

Unsupervised Learning: Clustering

Description:

Unsupervised Learning: Clustering Some material adapted from s by Andrew Moore, CMU. ... Unsupervised Learning Supervised learning used labeled data pairs ... – PowerPoint PPT presentation

Number of Views:560
Avg rating:3.0/5.0
Slides: 10
Provided by: EricE162
Learn more at: https://cs.brynmawr.edu
Category:

less

Transcript and Presenter's Notes

Title: Unsupervised Learning: Clustering


1
Unsupervised Learning Clustering
Some material adapted from slides by Andrew
Moore, CMU. Visit http//www.autonlab.org/tutoria
ls/ for Andrews repository of Data Mining
tutorials.
2
Unsupervised Learning
  • Supervised learning used labeled data pairs (x,
    y) to learn a function f X?Y.
  • But, what if we dont have labels?
  • No labels unsupervised learning
  • Only some points are labeled semi-supervised
    learning
  • Labels may be expensive to obtain, so we only get
    a few.
  • Clustering is the unsupervised grouping of data
    points. It can be used for knowledge discovery.

3
Clustering Data
4
K-Means Clustering
  • K-Means ( k , data )
  • Randomly choose k cluster center locations
    (centroids).
  • Loop until convergence
  • Assign each point to the cluster of the closest
    centroid.
  • Reestimate the cluster centroids based on the
    data assigned to each.

5
K-Means Clustering
  • K-Means ( k , data )
  • Randomly choose k cluster center locations
    (centroids).
  • Loop until convergence
  • Assign each point to the cluster of the closest
    centroid.
  • Reestimate the cluster centroids based on the
    data assigned to each.

6
K-Means Clustering
  • K-Means ( k , data )
  • Randomly choose k cluster center locations
    (centroids).
  • Loop until convergence
  • Assign each point to the cluster of the closest
    centroid.
  • Reestimate the cluster centroids based on the
    data assigned to each.

7
K-Means Animation
Example generated by Andrew Moore using Dan
Pellegs super-duper fast K-means system Dan
Pelleg and Andrew Moore. Accelerating Exact
k-means Algorithms with Geometric
Reasoning. Proc. Conference on Knowledge
Discovery in Databases 1999.
8
Problems with K-Means
  • Very sensitive to the initial points.
  • Do many runs of k-Means, each with different
    initial centroids.
  • Seed the centroids using a better method than
    random. (e.g. Farthest-first sampling)
  • Must manually choose k.
  • Learn the optimal k for the clustering. (Note
    that this requires a performance measure.)

9
Problems with K-Means
  • How do you tell it which clustering you want?
  • Constrained clustering techniques
Write a Comment
User Comments (0)
About PowerShow.com