Prototype Classification Methods - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Prototype Classification Methods

Description:

Prototypes are centers of non-overlapping clusters. Fuzzy model (Fuzzy c-means, FCM) ... R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nd Ed. ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 43
Provided by: ist3
Category:

less

Transcript and Presenter's Notes

Title: Prototype Classification Methods


1
Prototype Classification Methods
  • Fu Chang
  • Institute of Information Science
  • Academia Sinica
  • 2788-3799 ext. 1819
  • fchang_at_iis.sinica.edu.tw

2
Types of Prototype Methods
  • Crisp model (K-means, KM)
  • Prototypes are centers of non-overlapping
    clusters
  • Fuzzy model (Fuzzy c-means, FCM)
  • Prototypes are weighted average of all samples
  • Gaussian Mixture model (GM)
  • Prototypes have a mixture of distributions
  • Linear Discriminant Analysis (LDA)
  • Prototypes are projected sample means
  • K-nearest neighbor classifier (K-NN)
  • Learning vector quantization (LVQ)

3
Prototypes thru Clustering
  • Given the number k of prototypes, find k clusters
    whose centers are prototypes
  • Commonality
  • Use iterative algorithm, aimed at decreasing an
    objective function
  • May converge to local minima
  • The number of k as well as an initial solution
    must be specified

4
Clustering Objectives
  • The aim of the iterative algorithm is to decrease
    the value of an objective function
  • Notations
  • Samples
  • Prototypes
  • L2-distance

5
Objectives (cntd)
  • Crisp objective
  • Fuzzy objective
  • Gaussian mixture objective

6
K-Means (KM) Clustering
7
The KM Algorithm
  • Initiate P seeds of prototypes p1, p2, , pP
  • Grouping
  • Assign samples to their nearest prototypes
  • Form non-overlapping clusters out of these
    samples
  • Centering
  • Centers of clusters become new prototypes
  • Repeat the grouping and centering steps, until
    convergence is reached

8
Justification
  • Grouping
  • Assigning samples to their nearest prototypes
    helps to decrease the objective
  • Centering
  • Also helps to decrease the above objective,
    because
  • and equality holds only if

9
Exercise I
  • Prove that for any group of vectors yi, the
    following inequality is always true
  • Prove that the equality holds only when
  • Use this fact to prove that the centering step is
    helpful to decrease the objective function

10
Fuzzy c -Means (FCM) Clustering
11
Crisp vs. Fuzzy Membership
  • Membership matrix UPN
  • uij is the grade of membership of sample j with
    respect to prototype i
  • Crisp membership
  • Fuzzy membership

12
Fuzzy c -means (FCM)
  • The objective function of FCM is
  • Subject to the constraints

13
Supplementary Subject
  • Lagrange Multipliers
  • Goal maximize or minimize f (x), x (x1, x2, ,
    xd)
  • Constraints gi (x)0, i 1, 2, , n
  • Solution method
  • The solution of the above problem satisfies the
    following equations
  • where
  • and

14
Supplementary Subject (cntd)
  • Geometric interpretation
  • Assuming there is only one constraint g (x)0.
  • is perpendicular to the contour x f (x)c
  • is perpendicular to the contour g (x)0
  • means that at the extremum
    point, the two perpendicular lines are aligned
    with each other or, equivalently, the two
    contours are tangent to each other

15
Maximize f(x,y),under the constraint that g(x,y)
1,where f(x, y)4x and g(x, y)x2y2
16
FCM (Cntd)
  • Introducing the Lagrange multiplier ?j with
    respect to the constraint
  • we rewrite the objective function as

17
FCM (Cntd)
  • Setting the partial derivatives to zero, we
    obtain
  • It follows that
  • and

18
FCM (Cntd)
  • Therefore,
  • and

19
FCM (Cntd)
  • On the other hand, setting the derivative of J
    with respect to pi to zero, we obtain

20
FCM (Cntd)
  • It follows that
  • Finally, we can obtain the update rule of pi

21
FCM (Cntd)
  • To summarize

22
Exercise II
  • Show that if we only allow one cluster for a set
    of samples x1, x2, , xn, then both the KM
    cluster center or the FCM cluster center must be

23
The FCM Algorithm
  • Using a set of seeds as the initial solution for
    pi , FCM computes uij , and pi iteratively, until
    convergence is reached, for i 1, 2, , P, and j
    1, 2, , N

24
K-means vs. Fuzzy c-meansPositions of 3 Cluster
Centers
K-means
Fuzzy c-means
25
Gaussian Mixture Model
26
Given
  • Observed data xi i 1, 2, , N , each of
    which is drawn independently from a mixture of
    probability distributions with the density
  • where are mixture coefficients,

27
Solution
  • Repeat the following estimations, until
    convergence is reached

28
Linear Discriminant Analysis (LDA)
29
Illustration
30
Definitions
  • Given
  • Samples x1, x2, , xn
  • Classes ni of them are of class i, i 1, 2, ,
    C
  • Definition
  • Sample mean for class i
  • Scatter matrix for class i

31
Scatter Matrices
  • Total scatter matrix
  • Within-class scatter matrix
  • Between-class scatter matrix

32
Multiple Discriminant Analysis
  • We seek vectors wi , i 1, 2, .., C -1
  • And project the samples x to the C -1 dimensional
    space y
  • The criterion for W (w1, w2, , wC-1) is

33
Multiple Discriminant Analysis (Cntd)
  • Consider the Lagrangian
  • Take the partial derivative
  • Setting the derivative to zero, we obtain

34
Multiple Discriminant Analysis (Cntd)
  • Find the roots of the characteristic function as
    eigenvalues
  • and then solve
  • for wi with the largest C -1 eigenvalues

35
LDA Prototypes
  • The prototype of each class is the mean of the
    projected samples of that class, the projection
    is thru the matrix W
  • In the testing phase
  • All test samples are projected thru the same
    optimal W
  • The nearest prototype is the winner

36
K-Nearest Neighbor (K-NN) Classifier
37
K-NN Classifier
  • For each test sample x, find the nearest K
    training samples and classify x according to the
    vote among the K neighbors
  • The error rate is
  • where
  • This shows that the error rate is at most twice
    the Bayes error

38
Condensed Nearest Neighbor (CNN) Rule
  • K-NN is very powerful, but may take too much time
    to conduct, if the number of samples is huge
  • CNN serves as a method to condense the number of
    samples
  • We can then perform K-NN on the condensed set of
    samples

39
CNN The Algorithm
  • For each class type c, add randomly a c-sample
    to the condensed set Pc.
  • Check if all c-samples are absorbed, whereas a
    c-sample x is said to be absorbed, if
  • where and
  • If there are still unabsorbed c-samples, add
    randomly one of them to Pc
  • Go to step 2, until Pc no longer changes for all
    c.

40
Learning Vector Quantization (LVQ)
41
LVQ Algorithm
  • Initialize R prototypes for each class m1(k),
    m2(k), , mR(k), where k 1, 2, , K.
  • Sample a training sample x and find the nearest
    prototype mj (k) to x
  • If x and mj (k) match in class type,
  • Otherwise,
  • Repeat step 2, decreasing e at each iteration

42
References
  • R. O. Duda, P. E. Hart, and D. G. Stork, Pattern
    Classification, 2nd Ed., Wiley Interscience,
    2001.
  • T. Hastie, R. Tibshirani, and J. Friedman, The
    Elements of Statistical Learning,
    Springer-Verlag, 2001.
  • F. Höppner, F. Klawonn, R. Kruse, and T. Runkler,
    Fuzzy Cluster Analysis Methods for
    Classification, Data Analysis and Image
    Recognition, John Wiley Sons, 1999.
  • S. Theodoridis and K. Koutroumbas, Pattern
    Recognition, Academic Press, 1999.
Write a Comment
User Comments (0)
About PowerShow.com