Clustering Analysis: Outline - PowerPoint PPT Presentation

1 / 13

About This Presentation

Title:

Clustering Analysis: Outline

Description:

Number of Views:66

Avg rating:3.0/5.0

Slides: 14

Provided by: csBil

Category:

Tags: analysis | clustering | outline

Transcript and Presenter's Notes

Title: Clustering Analysis: Outline

1
Clustering Analysis Outline

2
Clustering AnalysisGeneral Purpose

How to organize observed data into meaningful
structures, that is, to develop taxonomies.
Ex To organize the different species of animals
before a meaningful description of the
differences between animals is possible.
Target Both minimize within-group variation
and maximize between-group variation.

3
Clustering Analysis Categories

4
Clustering Analysis Similarity to Discriminant
and Factor Analysis

In Factor Analysis original set of variables are
reduced to smaller number of Factors, while in
clustering original set of variables are grouped.
In Discriminant Analysis Clusters are known in
advanced and discriminating variables are worked,
while in clustering we try to discover natural
clusters within the data.

5
Clustering Analysis Statistical Significance
Testing

6
Clustering Analysis Area of Application

In general, whenever one needs to classify a
"mountain" of information into manageable
meaningful piles, cluster analysis is of great
utility.
Ex clustering diseases, cures for diseases, or
symptoms of diseases can lead to very useful
taxonomies. In the field of psychiatry, the
correct diagnosis of clusters of symptoms such as
paranoia, schizophrenia, etc. is essential for
successful therapy.

7
Clustering Analysis Hierarchical Clustering

i.) Bottom-Up(upward)The purpose of this method
is to join together variables into successively
larger clusters, using some measure of similarity
or distance.
Initially each variable is considered as a
separate cluster.
Thus, similarity threshold is relaxed.
ii.) Top-Down(downward) In that method, a
partitioning scheme is followed.
Initially all data set is considered as a single
cluster.
Repeatedly similarity threshold is tightened.

8
Clustering Analysis Hierarchical Tree Clustering
9
Clustering AnalysisDistance Measures

Euclidean distance
The distance between any two objects is not
affected by the addition of new objects to the
analysis, which may be outliers.
distance(x,y) i (xi - yi)2 ½
Chebychev distance
differentiate furthest dimensions or attributes
distance(x,y) Maximumxi - yi
Percent disagreement
This measure is particularly useful if the data
for the dimensions included in the analysis are
categorical in nature.
distance(x,y) (Number of xi yi)/ i

10
Clustering AnalysisK-Means Clustering

When we already have hypotheses concerning the
number of clusters in your cases or variables
then we can address the k- means clustering.
In general, the k-means method will produce
exactly k different clusters of greatest possible
distinction.

11
Clustering Analysis K-Means vs ANOVA

K-Means clustering is analogous to "ANOVA in
reverse" in the sense that
- The significance test in ANOVA evaluates the
between group variability against the
within-group variability when computing the
significance test for the hypothesis that the
means in the groups are different from each
other.
- In k-means clustering, the program tries to
move objects (e.g., cases) in and out of groups
(clusters) to get the most significant ANOVA
results.

12
Clustering AnalysisExpectation Maximization
Clustering(Categorical Variables)

Classification probabilities instead of
classifications Each observation belongs to each
cluster with a certain probability.
Categorical variablesThe EM algorithm can also
accommodate categorical variables. The program
will at first randomly assign different
probabilities (weights, to be precise) to each
class or category, for each cluster. In
successive iterations, these probabilities are
refined (adjusted) to maximize the likelihood of
the data given the specified number of clusters.

13
Clustering Analysis Two-Way Clustering (Block
Clustering)

Block Clustering is useful in the relatively rare
circumstances when one expects that both cases
and variables will simultaneously contribute to
the uncovering of meaningful patterns of clusters.

Write a Comment

User Comments (0)