Cluster Analysis - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Cluster Analysis

Description:

5. 6 ... 5. 6. 1. 2. 5. 3. 4. Proximity of two clusters is the average of ... Compromise between Single and Complete Link. Strengths. Less susceptible to ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 15
Provided by: COMPUTA5
Category:

less

Transcript and Presenter's Notes

Title: Cluster Analysis


1
Cluster Analysis
  • Lecture Notes for Chapter 8

2
Announcements
  • Assignment update
  • Paper for next time
  • Focus on sections 1-3
  • Projects 10 days (have datasets ready)

3
Hierarchical Clustering Group Average
Dist(3, 6, 4, 2, 5) (Dist(3,2)Dist(3,5)Dis
t(6,2)Dist(6,5) Dist(4,2) Dist(4,5)/(32)
4
Hierarchical Clustering Group average
Proximity of two clusters is the average of
pairwise proximity between points in the two
clusters.
Dist(3,6,4,1) (0.22 0.37 0.23)/(31)
0.28
Dist(2, 5,1) (0.24 0.34)/(21)
0.29
Dist(3, 6, 4, 2, 5) (0.15 0.28 0.25
0.39 0.20 0.29)/(32)
0.26
5
Exercise
6
Hierarchical Clustering Group Average
  • Compromise between Single and Complete Link
  • Strengths
  • Less susceptible to noise and outliers
  • Limitations
  • Biased towards globular clusters

7
Hierarchical Clustering Comparison
MIN
MAX
Group Average
8
Hierarchical Clustering Problems and Limitations
  • Once a decision is made to combine two clusters,
    it cannot be undone
  • No objective function is directly minimized
  • Different schemes have problems with one or more
    of the following
  • Sensitivity to noise and outliers
  • Difficulty handling different sized clusters and
    convex shapes
  • Breaking large clusters

9
DBSCAN
  • DBSCAN is a density-based algorithm.
  • Density number of points within a specified
    radius (Eps)
  • A point is a core point if it has more than a
    specified number of points (MinPts) within Eps
  • Including core point itself
  • These are points that are at the interior of a
    cluster
  • A border point has fewer than MinPts within Eps,
    but is in the neighborhood of a core point
  • A noise point is any point that is not a core
    point or a border point.

10
DBSCAN Core, Border, and Noise Points
11
DBSCAN Algorithm
  • Label all points as core, border or noise
  • Eliminate noise points
  • Put an edge between core points that are within
    Eps of each other
  • Make each group of connected points into a
    separate cluster
  • Assign each border point to one of the clusters
    of its associated core points

12
DBSCAN Core, Border and Noise Points
Original Points
Point types core, border and noise
Eps 10, MinPts 4
13
When DBSCAN Works Well
Original Points
  • Resistant to Noise
  • Can handle clusters of different shapes and sizes

14
When DBSCAN Does NOT Work Well
If Eps is low? If Eps is high? Varying densities
Write a Comment
User Comments (0)
About PowerShow.com