Title: Jeremy Tantrum and Werner Stuetzle
1HierarchicalClustering Revisited
- Jeremy Tantrum and Werner Stuetzle
- Department of Statistics,
- University of Washington
This work has been supported by NSA contract
MDA904-02-C-0446
2Current R Packages
cluster agnes agglomerative hierarchical
clustering diana divisive hierarchical
clustering mva hclust agglomerative
hierarchical clustering Specify max number
of clusters and then do dendrogram cutting!
3Implementing an R Package(Pruning)
- Algorithm
- Automatic vs Manual Pruning
- Projection Plots
4Basic Algorithm
Algorithm
5Basic Algorithm
Algorithm
- Prune nodes which are parents of two leaves
- Prune nodes which are parents of one leaf
6Basic Algorithm
Algorithm
Pruned Nodes Data are assigned to parent node
which becomes a leaf. Orphans Data belonging to
orphaned leaves are assigned to (final) clusters
using knearest neighbor classifier.
7Automatic vs Manual
Auto/Manual
Manual User clicks on each node and looking at
projection plots, decides whether or not to
prune. Automatic Computer computes p-values for
unimodality tests and prunes at a specified
level. Semi-Automatic? Computer prunes and user
checks results using projection plots.
8Projection Plots
Projection Plots
Projection Plots
Density unimodal and bimodal
CDF unimodal and empirical
P-value 0.76
9Projection Plots
Projection Plots
Projection Plots
Density unimodal and bimodal
CDF unimodal and empirical
P-value 0.00
10Projection Plots
Projection Plots
- Density
- Silvermans Gaussian smoothed density estimate.
- Put a small Gaussian at each point and sum the
result. - Parameterized by var of these Gaussians.
- Larger variance ) smoother (fewer modes)
- Non-robust to outliers!
- Makes good plots.
11Projection Plots
Projection Plots
- CDF
- Empirical CDF and MLE of closest unimodal CDF.
- Not model dependent.
- Harder to interpret by eye.
- Significance of size of DIP depends on number of
observations. - Makes good test.
12Projection Plots
Projection Plots
Empirical CDF
13Projection Plots
Projection Plots
Mode robust to choice of mode
14Projection Plots
Projection Plots
Greatest convex minorant
least concave majorant
15Projection Plots
Projection Plots
DIP max difference between these curves