Data Mining Techniques Clustering - PowerPoint PPT Presentation

About This Presentation
Title:

Data Mining Techniques Clustering

Description:

... to some similarity metric (e.g., the opposite of distance between objects) ... K-Medoids Methods (PAM, CLARA, CLARANS) Density-Based Methods. Hierarchical Methods ... – PowerPoint PPT presentation

Number of Views:103
Avg rating:3.0/5.0
Slides: 23
Provided by: leeyu2
Category:

less

Transcript and Presenter's Notes

Title: Data Mining Techniques Clustering


1
Data Mining Techniques Clustering
2
Purpose
  • In clustering analysis, there is no
    pre-classified data
  • Instead, clustering analysis is a process where a
    set of objects is partitioned into several
    clusters
  • All members in one cluster are similar to each
    other and different from the members of other
    clusters, according to some similarity metric
    (e.g., the opposite of distance between objects)

3
Cluster Analysis
Cluster
Y (Age)
Customer (Object)
X (Income)
Variables
4
Cluster Analysis
n objetcs p variables
Data Matrix
Dissimilarity Matrix (n?n)
5
Attribute Types Involved in Cluster Analysis
  • Interval Variables
  • An interval variable contains continuous
    measurements (e.g., height, weight, temperature,
    cost, etc.) which follow a linear scale
  • It is essential that intervals keep the same
    importance throughout the scale
  • Nominal Variables
  • A nominal variable takes on more than two states.
    For example, the eye color of a person can be
    blue, brown, green or grey eyes
  • These states may be coded as 1, 2, ..., M,
    however their order and the interval between any
    two states do not have any meaning

6
Attribute Types Involved in Cluster Analysis
  • Ordinal Variables
  • An ordinal variable takes on more than two
    states. For example, you may ask someone to
    convey his/her appreciation of some paintings in
    terms of the following categories 1detest,
    2dislike, 3indifferent, 4like and 5admire
  • In an ordinal variable, their states are ordered
    in a meaningful sequence. However, the interval
    between any two consecutive states are not
    equally distanced
  • Binary Variables
  • Binary variables have only two possible states.
    For example, the gender of a person is either
    female or male

7
Dissimilarity (Distance) Measure
8
Dissimilarity (Distance) Measure
9
Dissimilarity (Distance) Measure
10
Dissimilarity (Distance) Measure
11
Dissimilarity (Distance) Measure
12
Dissimilarity (Distance) Measure
13
Dissimilarity (Distance) Measure
14
Dissimilarity (Distance) Measure
15
Dissimilarity (Distance) Measure
16
Categorization of Clustering Methods
  • Exclusive vs. Non-Exclusive (Overlapping)
  • Hierarchical Methods vs. Partitioning Methods
  • Hierarchical Methods
  • Single Link Method
  • Complete Link Method
  • Partitioning Methods
  • Kohonen Self-Organizing Feature Maps
  • K-Means Methods
  • K-Medoids Methods (PAM, CLARA, CLARANS)
  • Density-Based Methods

17
Hierarchical Methods
Dissimilarity Matrix (5?5)
18
K-Means Methods
19
K-Means Methods
20
K-Means Methods
21
K-Means Methods
Sensitive to Outlier!
22
Exercise 7
Number
of clusters 2 Using Single Link, Complete Link
and K-Means to cluster the following data
Object X Y
1 22 60
2 40 25
3 60 30
4 64 66
5 80 30
6 82 55
Write a Comment
User Comments (0)
About PowerShow.com