Title: Pattern Recognition: Statistical and Neural
1Nanjing University of Science Technology
Pattern RecognitionStatistical and Neural
Lonnie C. Ludeman Lecture 30 Nov 11, 2005
2Lecture 30 Topics
- General Comments about the Clustering Problem
- Present my small programs that can be used for
performing clustering - 3. Demonstrate the programs
- 4. Closing Comments
3Clustering is the art of grouping together
pattern vectors that in some sense belong
together because they have similar
characteristics and are different from other
pattern vectors.
In the most general problem the number of
clusters or subgroups is unknown as are the
properties that make them similar.
Review
4Question How do we start the process of finding
clusters and identifying similarities???
Answer First realize that clustering is an art
and there is no correct answer only feasible
alternatives. Second explore structures of data,
similarity measures, and limitations of various
clustering procedures
Review
5Problems in performing meaningful clustering
Scaling
The nonuniqueness of results
Programs always give clusters even when there are
no clusters
Review
6There are no correct answers, the clusters
provide us with different interpretations of the
data where the closeness of patterns is measured
with different definitions of similarity. The
results may produce ways of looking at the data
that we have not considered or noticed. These
structural insights may prove useful in the
pattern recognition process.
Review
7Methods for Clustering Quantitative Data
1. K-Means Clustering Algorithm 2. Hierarchical
Clustering Algorithm 3. ISODATA Clustering
Algorithm 4. Fuzzy Clustering Algorithm
Review
8K-Means Clustering Algorithm
Randomly Select K cluster centers from Pattern
Space Distribute set of patterns to the cluster
center using minimum distance Compute new
Cluster centers for each cluster Continue this
process until the cluster centers do not change.
Review
9Agglomerative Hierarchical Clustering
Consider a set S of patterns to be clustered
S x1, x2, ... , xk, ... , xN
Define Level N by
S1(N) x1
Clusters at level N are the individual pattern
vectors
S2(N) x2
...
SN(N) xN
Review
10Define Level N -1 to be N 1 Clusters formed by
merging two of the Level N clusters by the
following process.
Compute the distances between all the clusters at
level N and merge the two with the smallest
distance (resolve ties randomly) to give the
Level N-1 clusters as
S1(N-1)
Clusters at level N -1 result from this merging
S2(N-1)
...
SN-1(N-1)
Review
11The process of merging two clusters at each step
is performed sequentially until Level 1 is
reached. Level one is a single cluster
containing all samples
S1(1) x1, x2, ... , xk, ... , xN
Thus Hierarchical clustering provides cluster
assignments for all numbers of clusters from N to
1.
Review
12Fuzzy C-Means Clustering Preliminary
Given a set S composed of pattern vectors which
we wish to cluster
S x1, x2, ... , xN
Define C Cluster Membership Functions
...
...
C
Review
13Define C Cluster Centroids as follows
Let Vi be the Cluster Centroid for Fuzzy Cluster
Cli , i 1, 2, , C
Define a Performance Objective Jm as
where
Review
14Definitions
A is a symmetric positive definite matrix Ns is
total number of pattern vectors
m Fuzziness Index (m gt1 ) Higher numbers
being more fuzzy
The Fuzzy C-Means Algorithm minimizes Jm by
selecting Vi and i , i 1, 2, , C by an
alternating iterative procedure as described in
the algorithms details
Review
15Fuzzy C-Means Clustering Algorithm (a) Flow
Diagram
Review
Yes
No
16General Programs for Performing Clustering
1. Available commercial Packages
SPSS , SAS, GPSS,
2. Small Programs for classroom use
LCLKmean.exe
LCLHier.exe
LCLFuzz.exe
172. Small Programs for classroom use
LCLKmean.exe LCLHier.exe LCLFuzz.exe
Use the K-Means Algorithm to cluster small data
sets
Performs Hierarchical Clustering of small data
sets
Performs Fuzzy and crisp clustering of small data
sets
18Data File Format for the LCL Programs
NS Number of data samples VS Data vector
size DATA in row vectors with space between
components
NS
5 3 1 6 3 2 0 5 7 1 4 6 6 8 2 2 3
Text File
VS
DATA
19Food for Thought
All the clustering techniques presented so far
use a measure of distance or similarity. Many of
these give equal distance contours that represent
hyper spheres and hyper ellipses. If these
techniques are used directly on patterns that are
not describable by those type of regions we can
expect to obtain poor results.
20In some cases each cluster occupies a limited
region (subspace of the total pattern space )
described by a nonlinear functional relation
between components. An example appears below.
Existing Pattern vectors
Existing Pattern Vectors
Standard K-Means, Hierarchical, or Fuzzy cluster
analysis directly on the data will produce
unsatisfactory results.
21 For this type of problem the patterns
should be first preprocessed before a clustering
procedure is performed .
Two almost contradictory approaches can be
used for this processing. 1. Extend the pattern
space by techniques comparable to functional link
nets so that the clusters can be separated by
spherical and elliptical regions. 2. Reduce the
dimension of the space by a nonlinear form of
processing involving principal component like
processing before clustering.
22Both methods imply that we know additional
information about the structure of the data.
This additional information may be known to us
or it may need to be determined. The process of
finding structure within data has been put in the
large category of Data Mining. So get a
shovel and start looking. Good luck in your
search for gold in the mounds of practical data.
23Several very important topics in Pattern
Recognition were not covered in this course
because of time limitations. The following topics
deserve your special attention to make your
educational experience complete
1. Feature Selection and Extraction 2. Hopfield
and feedback neural nets 3. Syntactical Pattern
Recognition 4. Special Learning Theory
24Like to Thank
Nanjing University of Science Technology
and
Lu Jian Feng Yang Jing-yu Wang Han
for inviting me to present this course
on Statistical and Neural
Pattern Recognition
25A Very Special Thanks to my new friends
Lu Jian Feng Wang Qiong Wang Huan
for looking after me. Their kindness and gentle
assistance has made my stay in Nanjing a very
enjoyable and unforgettable experience.
26Last and not least I would like to thank
all you students for your kind attention
throughout this course. Without your interest and
cheerful faces it would have been difficult for
me to teach. My apology for teaching in English,
which I am sure, made your work a lot harder.
Best of Luck to all of you in your studies and
life.
27As you travel through life may all your trails
be down hill and the wind always be at your back.
Bye for now and I hope our paths cross again
in the future. I will have
pleasant thoughts about NUST Sudents and Faculty,
Nanjing, and China as I head back to New Mexico !
28New Mexico
Land of Enchantment
29End of Lecture 30