Title: Cluster Analysis
1Cluster Analysis
- Lecture Notes for Chapter 8
2Announcements
- Assignment update
- Paper for next time
- Focus on sections 1-3
- Projects 10 days (have datasets ready)
3Hierarchical Clustering Group Average
Dist(3, 6, 4, 2, 5) (Dist(3,2)Dist(3,5)Dis
t(6,2)Dist(6,5) Dist(4,2) Dist(4,5)/(32)
4Hierarchical Clustering Group average
Proximity of two clusters is the average of
pairwise proximity between points in the two
clusters.
Dist(3,6,4,1) (0.22 0.37 0.23)/(31)
0.28
Dist(2, 5,1) (0.24 0.34)/(21)
0.29
Dist(3, 6, 4, 2, 5) (0.15 0.28 0.25
0.39 0.20 0.29)/(32)
0.26
5Exercise
6Hierarchical Clustering Group Average
- Compromise between Single and Complete Link
- Strengths
- Less susceptible to noise and outliers
- Limitations
- Biased towards globular clusters
7Hierarchical Clustering Comparison
MIN
MAX
Group Average
8Hierarchical Clustering Problems and Limitations
- Once a decision is made to combine two clusters,
it cannot be undone - No objective function is directly minimized
- Different schemes have problems with one or more
of the following - Sensitivity to noise and outliers
- Difficulty handling different sized clusters and
convex shapes - Breaking large clusters
9DBSCAN
- DBSCAN is a density-based algorithm.
- Density number of points within a specified
radius (Eps) - A point is a core point if it has more than a
specified number of points (MinPts) within Eps - Including core point itself
- These are points that are at the interior of a
cluster - A border point has fewer than MinPts within Eps,
but is in the neighborhood of a core point - A noise point is any point that is not a core
point or a border point.
10DBSCAN Core, Border, and Noise Points
11DBSCAN Algorithm
- Label all points as core, border or noise
- Eliminate noise points
- Put an edge between core points that are within
Eps of each other - Make each group of connected points into a
separate cluster - Assign each border point to one of the clusters
of its associated core points
12DBSCAN Core, Border and Noise Points
Original Points
Point types core, border and noise
Eps 10, MinPts 4
13When DBSCAN Works Well
Original Points
- Resistant to Noise
- Can handle clusters of different shapes and sizes
14When DBSCAN Does NOT Work Well
If Eps is low? If Eps is high? Varying densities