Prototype Classification Methods - PowerPoint PPT Presentation

1 / 42

About This Presentation

Title:

Prototype Classification Methods

Description:

Prototypes are centers of non-overlapping clusters. Fuzzy model (Fuzzy c-means, FCM) ... R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nd Ed. ... – PowerPoint PPT presentation

Number of Views:65

Avg rating:3.0/5.0

Slides: 43

Provided by: ist3

Category:

more less

Transcript and Presenter's Notes

Title: Prototype Classification Methods

1
Prototype Classification Methods

Fu Chang
Institute of Information Science
Academia Sinica
2788-3799 ext. 1819
fchang_at_iis.sinica.edu.tw

2
Types of Prototype Methods

Crisp model (K-means, KM)
Prototypes are centers of non-overlapping
clusters
Fuzzy model (Fuzzy c-means, FCM)
Prototypes are weighted average of all samples
Gaussian Mixture model (GM)
Prototypes have a mixture of distributions
Linear Discriminant Analysis (LDA)
Prototypes are projected sample means
K-nearest neighbor classifier (K-NN)
Learning vector quantization (LVQ)

3
Prototypes thru Clustering

Given the number k of prototypes, find k clusters
whose centers are prototypes
Commonality
Use iterative algorithm, aimed at decreasing an
objective function
May converge to local minima
The number of k as well as an initial solution
must be specified

4
Clustering Objectives

The aim of the iterative algorithm is to decrease
the value of an objective function
Notations
Samples
Prototypes
L2-distance

5
Objectives (cntd)

Crisp objective
Fuzzy objective
Gaussian mixture objective

6
K-Means (KM) Clustering
7
The KM Algorithm

Initiate P seeds of prototypes p1, p2, , pP
Grouping
Assign samples to their nearest prototypes
Form non-overlapping clusters out of these
samples
Centering
Centers of clusters become new prototypes
Repeat the grouping and centering steps, until
convergence is reached

8
Justification

Grouping
Assigning samples to their nearest prototypes
helps to decrease the objective
Centering
Also helps to decrease the above objective,
because
and equality holds only if

9
Exercise I

Prove that for any group of vectors yi, the
following inequality is always true
Prove that the equality holds only when
Use this fact to prove that the centering step is
helpful to decrease the objective function

10
Fuzzy c -Means (FCM) Clustering
11
Crisp vs. Fuzzy Membership

Membership matrix UPN
uij is the grade of membership of sample j with
respect to prototype i
Crisp membership
Fuzzy membership

12
Fuzzy c -means (FCM)

The objective function of FCM is
Subject to the constraints

13
Supplementary Subject

Lagrange Multipliers
Goal maximize or minimize f (x), x (x1, x2, ,
xd)
Constraints gi (x)0, i 1, 2, , n
Solution method
The solution of the above problem satisfies the
following equations
where
and

14
Supplementary Subject (cntd)

Geometric interpretation
Assuming there is only one constraint g (x)0.
is perpendicular to the contour x f (x)c
is perpendicular to the contour g (x)0
means that at the extremum
point, the two perpendicular lines are aligned
with each other or, equivalently, the two
contours are tangent to each other

15
Maximize f(x,y),under the constraint that g(x,y)
1,where f(x, y)4x and g(x, y)x2y2
16
FCM (Cntd)

Introducing the Lagrange multiplier ?j with
respect to the constraint
we rewrite the objective function as

17
FCM (Cntd)

Setting the partial derivatives to zero, we
obtain
It follows that
and

18
FCM (Cntd)

Therefore,
and

19
FCM (Cntd)

On the other hand, setting the derivative of J
with respect to pi to zero, we obtain

20
FCM (Cntd)

It follows that
Finally, we can obtain the update rule of pi

21
FCM (Cntd)

To summarize

22
Exercise II

Show that if we only allow one cluster for a set
of samples x1, x2, , xn, then both the KM
cluster center or the FCM cluster center must be

23
The FCM Algorithm

Using a set of seeds as the initial solution for
pi , FCM computes uij , and pi iteratively, until
convergence is reached, for i 1, 2, , P, and j
1, 2, , N

24
K-means vs. Fuzzy c-meansPositions of 3 Cluster
Centers
K-means
Fuzzy c-means
25
Gaussian Mixture Model
26
Given

Observed data xi i 1, 2, , N , each of
which is drawn independently from a mixture of
probability distributions with the density
where are mixture coefficients,

27
Solution

Repeat the following estimations, until
convergence is reached

28
Linear Discriminant Analysis (LDA)
29
Illustration
30
Definitions

Given
Samples x1, x2, , xn
Classes ni of them are of class i, i 1, 2, ,
C
Definition
Sample mean for class i
Scatter matrix for class i

31
Scatter Matrices

Total scatter matrix
Within-class scatter matrix
Between-class scatter matrix

32
Multiple Discriminant Analysis

We seek vectors wi , i 1, 2, .., C -1
And project the samples x to the C -1 dimensional
space y
The criterion for W (w1, w2, , wC-1) is

33
Multiple Discriminant Analysis (Cntd)

Consider the Lagrangian
Take the partial derivative
Setting the derivative to zero, we obtain

34
Multiple Discriminant Analysis (Cntd)

Find the roots of the characteristic function as
eigenvalues
and then solve
for wi with the largest C -1 eigenvalues

35
LDA Prototypes

The prototype of each class is the mean of the
projected samples of that class, the projection
is thru the matrix W
In the testing phase
All test samples are projected thru the same
optimal W
The nearest prototype is the winner

36
K-Nearest Neighbor (K-NN) Classifier
37
K-NN Classifier

For each test sample x, find the nearest K
training samples and classify x according to the
vote among the K neighbors
The error rate is
where
This shows that the error rate is at most twice
the Bayes error

38
Condensed Nearest Neighbor (CNN) Rule

K-NN is very powerful, but may take too much time
to conduct, if the number of samples is huge
CNN serves as a method to condense the number of
samples
We can then perform K-NN on the condensed set of
samples

39
CNN The Algorithm

For each class type c, add randomly a c-sample
to the condensed set Pc.
Check if all c-samples are absorbed, whereas a
c-sample x is said to be absorbed, if
where and
If there are still unabsorbed c-samples, add
randomly one of them to Pc
Go to step 2, until Pc no longer changes for all
c.

40
Learning Vector Quantization (LVQ)
41
LVQ Algorithm

Initialize R prototypes for each class m1(k),
m2(k), , mR(k), where k 1, 2, , K.
Sample a training sample x and find the nearest
prototype mj (k) to x
If x and mj (k) match in class type,
Otherwise,
Repeat step 2, decreasing e at each iteration

42
References

R. O. Duda, P. E. Hart, and D. G. Stork, Pattern
Classification, 2nd Ed., Wiley Interscience,
2001.
T. Hastie, R. Tibshirani, and J. Friedman, The
Elements of Statistical Learning,
Springer-Verlag, 2001.
F. Höppner, F. Klawonn, R. Kruse, and T. Runkler,
Fuzzy Cluster Analysis Methods for
Classification, Data Analysis and Image
Recognition, John Wiley Sons, 1999.
S. Theodoridis and K. Koutroumbas, Pattern
Recognition, Academic Press, 1999.