Jeremy Tantrum and Werner Stuetzle - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Jeremy Tantrum and Werner Stuetzle

Description:

Pruned Nodes. Data are assigned to parent node which becomes a leaf. Orphans ... on each node and looking at projection plots, decides whether or not to prune. ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 16
Provided by: jeremyt7
Category:

less

Transcript and Presenter's Notes

Title: Jeremy Tantrum and Werner Stuetzle


1
HierarchicalClustering Revisited
  • Jeremy Tantrum and Werner Stuetzle
  • Department of Statistics,
  • University of Washington

This work has been supported by NSA contract
MDA904-02-C-0446
2
Current R Packages
cluster agnes agglomerative hierarchical
clustering diana divisive hierarchical
clustering mva hclust agglomerative
hierarchical clustering Specify max number
of clusters and then do dendrogram cutting!
3
Implementing an R Package(Pruning)
  • Algorithm
  • Automatic vs Manual Pruning
  • Projection Plots

4
Basic Algorithm
Algorithm
  • Plot Dendrogram

5
Basic Algorithm
Algorithm
  • Plot Dendrogram
  • Prune nodes which are parents of two leaves
  • Prune nodes which are parents of one leaf

6
Basic Algorithm
Algorithm
Pruned Nodes Data are assigned to parent node
which becomes a leaf. Orphans Data belonging to
orphaned leaves are assigned to (final) clusters
using knearest neighbor classifier.
7
Automatic vs Manual
Auto/Manual
Manual User clicks on each node and looking at
projection plots, decides whether or not to
prune. Automatic Computer computes p-values for
unimodality tests and prunes at a specified
level. Semi-Automatic? Computer prunes and user
checks results using projection plots.
8
Projection Plots
Projection Plots
Projection Plots
Density unimodal and bimodal
CDF unimodal and empirical
P-value 0.76
9
Projection Plots
Projection Plots
Projection Plots
Density unimodal and bimodal
CDF unimodal and empirical
P-value 0.00
10
Projection Plots
Projection Plots
  • Density
  • Silvermans Gaussian smoothed density estimate.
  • Put a small Gaussian at each point and sum the
    result.
  • Parameterized by var of these Gaussians.
  • Larger variance ) smoother (fewer modes)
  • Non-robust to outliers!
  • Makes good plots.

11
Projection Plots
Projection Plots
  • CDF
  • Empirical CDF and MLE of closest unimodal CDF.
  • Not model dependent.
  • Harder to interpret by eye.
  • Significance of size of DIP depends on number of
    observations.
  • Makes good test.

12
Projection Plots
Projection Plots
Empirical CDF
13
Projection Plots
Projection Plots
Mode robust to choice of mode
14
Projection Plots
Projection Plots
Greatest convex minorant
least concave majorant
15
Projection Plots
Projection Plots
DIP max difference between these curves
Write a Comment
User Comments (0)
About PowerShow.com