Title: Prototype Classification Methods
1Prototype Classification Methods
- Fu Chang
- Institute of Information Science
- Academia Sinica
- 2788-3799 ext. 1819
- fchang_at_iis.sinica.edu.tw
2Types of Prototype Methods
- Crisp model (K-means, KM)
- Prototypes are centers of non-overlapping
clusters - Fuzzy model (Fuzzy c-means, FCM)
- Prototypes are weighted average of all samples
- Gaussian Mixture model (GM)
- Prototypes have a mixture of distributions
- Linear Discriminant Analysis (LDA)
- Prototypes are projected sample means
- K-nearest neighbor classifier (K-NN)
- Learning vector quantization (LVQ)
3Prototypes thru Clustering
- Given the number k of prototypes, find k clusters
whose centers are prototypes - Commonality
- Use iterative algorithm, aimed at decreasing an
objective function - May converge to local minima
- The number of k as well as an initial solution
must be specified
4Clustering Objectives
- The aim of the iterative algorithm is to decrease
the value of an objective function - Notations
- Samples
- Prototypes
- L2-distance
5Objectives (cntd)
- Crisp objective
- Fuzzy objective
- Gaussian mixture objective
6K-Means (KM) Clustering
7The KM Algorithm
- Initiate P seeds of prototypes p1, p2, , pP
- Grouping
- Assign samples to their nearest prototypes
- Form non-overlapping clusters out of these
samples - Centering
- Centers of clusters become new prototypes
- Repeat the grouping and centering steps, until
convergence is reached
8Justification
- Grouping
- Assigning samples to their nearest prototypes
helps to decrease the objective -
- Centering
- Also helps to decrease the above objective,
because - and equality holds only if
9Exercise I
- Prove that for any group of vectors yi, the
following inequality is always true - Prove that the equality holds only when
-
- Use this fact to prove that the centering step is
helpful to decrease the objective function
10Fuzzy c -Means (FCM) Clustering
11Crisp vs. Fuzzy Membership
- Membership matrix UPN
- uij is the grade of membership of sample j with
respect to prototype i - Crisp membership
- Fuzzy membership
12Fuzzy c -means (FCM)
- The objective function of FCM is
- Subject to the constraints
-
13Supplementary Subject
- Lagrange Multipliers
- Goal maximize or minimize f (x), x (x1, x2, ,
xd) - Constraints gi (x)0, i 1, 2, , n
- Solution method
- The solution of the above problem satisfies the
following equations - where
- and
14Supplementary Subject (cntd)
- Geometric interpretation
- Assuming there is only one constraint g (x)0.
- is perpendicular to the contour x f (x)c
- is perpendicular to the contour g (x)0
- means that at the extremum
point, the two perpendicular lines are aligned
with each other or, equivalently, the two
contours are tangent to each other
15Maximize f(x,y),under the constraint that g(x,y)
1,where f(x, y)4x and g(x, y)x2y2
16FCM (Cntd)
- Introducing the Lagrange multiplier ?j with
respect to the constraint - we rewrite the objective function as
17FCM (Cntd)
- Setting the partial derivatives to zero, we
obtain - It follows that
- and
18FCM (Cntd)
19FCM (Cntd)
- On the other hand, setting the derivative of J
with respect to pi to zero, we obtain
20FCM (Cntd)
- It follows that
- Finally, we can obtain the update rule of pi
21FCM (Cntd)
22Exercise II
- Show that if we only allow one cluster for a set
of samples x1, x2, , xn, then both the KM
cluster center or the FCM cluster center must be
23The FCM Algorithm
- Using a set of seeds as the initial solution for
pi , FCM computes uij , and pi iteratively, until
convergence is reached, for i 1, 2, , P, and j
1, 2, , N
24K-means vs. Fuzzy c-meansPositions of 3 Cluster
Centers
K-means
Fuzzy c-means
25Gaussian Mixture Model
26Given
- Observed data xi i 1, 2, , N , each of
which is drawn independently from a mixture of
probability distributions with the density - where are mixture coefficients,
27Solution
- Repeat the following estimations, until
convergence is reached
28Linear Discriminant Analysis (LDA)
29Illustration
30Definitions
- Given
- Samples x1, x2, , xn
- Classes ni of them are of class i, i 1, 2, ,
C - Definition
- Sample mean for class i
- Scatter matrix for class i
31Scatter Matrices
- Total scatter matrix
- Within-class scatter matrix
- Between-class scatter matrix
32Multiple Discriminant Analysis
- We seek vectors wi , i 1, 2, .., C -1
- And project the samples x to the C -1 dimensional
space y - The criterion for W (w1, w2, , wC-1) is
33Multiple Discriminant Analysis (Cntd)
- Consider the Lagrangian
- Take the partial derivative
- Setting the derivative to zero, we obtain
34Multiple Discriminant Analysis (Cntd)
- Find the roots of the characteristic function as
eigenvalues - and then solve
- for wi with the largest C -1 eigenvalues
35LDA Prototypes
- The prototype of each class is the mean of the
projected samples of that class, the projection
is thru the matrix W - In the testing phase
- All test samples are projected thru the same
optimal W - The nearest prototype is the winner
36K-Nearest Neighbor (K-NN) Classifier
37K-NN Classifier
- For each test sample x, find the nearest K
training samples and classify x according to the
vote among the K neighbors - The error rate is
- where
- This shows that the error rate is at most twice
the Bayes error
38Condensed Nearest Neighbor (CNN) Rule
- K-NN is very powerful, but may take too much time
to conduct, if the number of samples is huge - CNN serves as a method to condense the number of
samples - We can then perform K-NN on the condensed set of
samples
39CNN The Algorithm
- For each class type c, add randomly a c-sample
to the condensed set Pc. - Check if all c-samples are absorbed, whereas a
c-sample x is said to be absorbed, if - where and
- If there are still unabsorbed c-samples, add
randomly one of them to Pc - Go to step 2, until Pc no longer changes for all
c.
40Learning Vector Quantization (LVQ)
41LVQ Algorithm
- Initialize R prototypes for each class m1(k),
m2(k), , mR(k), where k 1, 2, , K. - Sample a training sample x and find the nearest
prototype mj (k) to x - If x and mj (k) match in class type,
- Otherwise,
- Repeat step 2, decreasing e at each iteration
42References
- R. O. Duda, P. E. Hart, and D. G. Stork, Pattern
Classification, 2nd Ed., Wiley Interscience,
2001. - T. Hastie, R. Tibshirani, and J. Friedman, The
Elements of Statistical Learning,
Springer-Verlag, 2001. - F. Höppner, F. Klawonn, R. Kruse, and T. Runkler,
Fuzzy Cluster Analysis Methods for
Classification, Data Analysis and Image
Recognition, John Wiley Sons, 1999. - S. Theodoridis and K. Koutroumbas, Pattern
Recognition, Academic Press, 1999.