Data Mining Techniques Clustering

About This Presentation

Title:

Data Mining Techniques Clustering

Description:

... to some similarity metric (e.g., the opposite of distance between objects) ... K-Medoids Methods (PAM, CLARA, CLARANS) Density-Based Methods. Hierarchical Methods ... – PowerPoint PPT presentation

Number of Views:103

Avg rating:3.0/5.0

Slides: 23

Provided by: leeyu2

Category:

more less

Transcript and Presenter's Notes

Title: Data Mining Techniques Clustering

1
Data Mining Techniques Clustering
2
Purpose

In clustering analysis, there is no
pre-classified data
Instead, clustering analysis is a process where a
set of objects is partitioned into several
clusters
All members in one cluster are similar to each
other and different from the members of other
clusters, according to some similarity metric
(e.g., the opposite of distance between objects)

3
Cluster Analysis
Cluster
Y (Age)
Customer (Object)
X (Income)
Variables
4
Cluster Analysis
n objetcs p variables
Data Matrix
Dissimilarity Matrix (n?n)
5
Attribute Types Involved in Cluster Analysis

Interval Variables
An interval variable contains continuous
measurements (e.g., height, weight, temperature,
cost, etc.) which follow a linear scale
It is essential that intervals keep the same
importance throughout the scale
Nominal Variables
A nominal variable takes on more than two states.
For example, the eye color of a person can be
blue, brown, green or grey eyes
These states may be coded as 1, 2, ..., M,
however their order and the interval between any
two states do not have any meaning

6
Attribute Types Involved in Cluster Analysis

Ordinal Variables
An ordinal variable takes on more than two
states. For example, you may ask someone to
convey his/her appreciation of some paintings in
terms of the following categories 1detest,
2dislike, 3indifferent, 4like and 5admire
In an ordinal variable, their states are ordered
in a meaningful sequence. However, the interval
between any two consecutive states are not
equally distanced
Binary Variables
Binary variables have only two possible states.
For example, the gender of a person is either
female or male

7
Dissimilarity (Distance) Measure
8
Dissimilarity (Distance) Measure
9
Dissimilarity (Distance) Measure
10
Dissimilarity (Distance) Measure
11
Dissimilarity (Distance) Measure
12
Dissimilarity (Distance) Measure
13
Dissimilarity (Distance) Measure
14
Dissimilarity (Distance) Measure
15
Dissimilarity (Distance) Measure
16
Categorization of Clustering Methods

Exclusive vs. Non-Exclusive (Overlapping)
Hierarchical Methods vs. Partitioning Methods
Hierarchical Methods
Single Link Method
Complete Link Method
Partitioning Methods
Kohonen Self-Organizing Feature Maps
K-Means Methods
K-Medoids Methods (PAM, CLARA, CLARANS)
Density-Based Methods

17
Hierarchical Methods
Dissimilarity Matrix (5?5)
18
K-Means Methods
19
K-Means Methods
20
K-Means Methods
21
K-Means Methods
Sensitive to Outlier!
22
Exercise 7
Number
of clusters 2 Using Single Link, Complete Link
and K-Means to cluster the following data
Object X Y
1 22 60
2 40 25
3 60 30
4 64 66
5 80 30
6 82 55

Write a Comment

User Comments (0)