Title: Clustering and Self-Organizing Feature Map
1Clustering and Self-Organizing Feature Map
2Clustering
3Introduction
- Cluster
- Group of the similar objects
- Clustering
- Special method of classification
- Unsupervised learning no predefined classes
4What is Good Clustering?
- High Intra-cluster similarity
- Dissimilar to the objects in other clusters
- Low Inter-cluster similarity
- Similar to one another within the same cluster
- ?Depending on the similarity measure
5The problem of unsupervised clustering
- Nearly identical to that of distribution
estimation for classes with multi-modal features
Example of 4 data sets with the same mean and
covariance
6Similarity Measures
- The distance between them
- If the Euclidean distance between them is less
than some threshold distance d0, - same cluster
7Scaling Axes Effect
8Normalization
- To achieve invariance, normalize the data
- Subtracting the mean and dividing by the standard
deviation - Inappropriate if the spread is due to the
presence of subclasses
9Similarity function
- Similarity function between vectors s(x, x)
- Using the angle between two vectors, normalized
inner product may be an appropriate similarity
function.
10Tanimoto coefficient
- Using binary values
- The ratio of the number of shared attributes to
the number possessed by x or x -
11Distance between Sets
12Criterion function for clustering
- Sum-of-Squared-Error criterion
- Also called minimum variance patition
- Problem when natural grouping have very
different numbers of points - General form
13Category of Clustering Method
- Iterative optimization
- Move randomly selected point to other cluster if
it improves - Hierarchical Clustering
- Group objects into a tree of clusters
- AGNES(Agglomerative Nesting)
- DIANA(Divisible Analysis)
- Partitioning Clustering
- Construct a partition of object V into a set of k
clusters (k user input parameter) - K-means
- K-medoids
14Hierarchical Clustering
- Sequence of partitions of N samples into C
clusters - 1. N clusters
- 2. Merge nearest two clusters to make total N-1
cluster - 3. do until the number of clusters C
- Dendrogram
- Agglomerative bottom up
- divisive top down
15Hierarchical Method
agglomerative(AGNES)
divisible(DIANA)
16Hierarchical Method
- Algorithm for Agglomerative
- Input Set V of objects
- Put each object in a cluster
- Loop until the number of cluster is one
- Calculate the set of inter-cluster similarity
- Form merge by the fusion of the most similar
pair of current clusters
17Hierarchical Method
- Similarity Method
- Single-Linkage
- Complete-Linkage
- Average-Linkage
18Nearest-neighbor algorithm
- When dmin is used, the algorithm is called the
neirest neighbor algorithm - If it is terminated when the distance between
nearest clusters exceeds an arbitrary threshold,
it is called single-linkage algorithm - generate a minimal spanning tree
- Chaining effect defect of this distance measure
(right)
19K-means
- Use gravity center of the objects
- Algorithm
- Input k(the number of cluster), Set V of n
objects - Output A set of k clusters which minimizes the
sum of distance error criterion - Method
- Choose k objects as the initial cluster centers
set i0 - Loop
- For each object v
- Find the NearestCenter(i)(p), and assign p
to it - Compute mean of cluster as center
- Pro quick convergence
- Con sensitive to noise, outlier and initial
seed selection
20K-means sensitive to initial point
21K-means clustering
22K-means clustering
- Assign each object to the cluster to which it is
the closest - Compute the center
- of each cluster
23K-means clustering
- Reassign subjects to the cluster whose centroid
is nearest
24Graph Theoretic Approach
- Removal of inconsistent edges
25Self-Organizing Feature Maps
- Clustering method based on competitive learning
- one neuron per group is fired at any one time
- winner-takes-all, winning neuron
- winner-takes-all by lateral inhibitory connection
26Self-Organizing Feature Map
- neurons placed at the nodes of lattice
- one or two dimension
- neurons become selectively tuned to input
patterns (stimuli) - by a competitive learning process
- locations of neuron so tuned to be ordered
- formation of topographic map of input pattern
- Spatial locations of neurons in lattice ?
intrinsic statistical features contained in input
patterns - SOM is non-linear generalization of PCA
27SOFM motivated by human brain
- Brain is organized such a way that different
sensory data is represented by topologically
ordered computational maps - tactile, visual, acoustic sensory input are
mapped onto areas of cerebral cortex in
topologically ordered manner - building block of information processing
infrastructure of nervous system
28SOFM motivated by human brain
- Neurons transform input signals into a
place-coded probability distribution - sites of maximum relative activities within the
map - accessed by higher-order processors with simple
connection - each incoming information is kept in its proper
context - Neurons dealing closed related information are
close together so that they connected via short
connections
29Kohonen Model
- Captures essential features of computational maps
in Brain - capable of dimensionality reduction
30Kohonen Model
- Transform incoming signal pattern into discrete
map - of 1-D or 2-D
- adaptively topologically ordered fashion
- Topology-preserving transformation
- class of vector coding algorithm
- optimally map into fixed number of code words
- input pattern is represented as a localized
region or spot of activities in the network - After initialization, three essential processes
- competition
- cooperation
- synaptic adaptation
31Competitive Process
- Find best match of input vector with synaptic
weight - x x1, x2, , x3T
- wjwj1, wj2, , wjmT, j 1, 2,3, l
- Best matching, winning neuron
- i(x) arg min x-wj, j 1,2,3,..,l
- Determine the location where the topological
neighborhood of excited neurons is to be centered - continuous input space is mapped onto discrete
output space of neuron by competitive process
32Cooperative Process
- For a winning neuron, the neurons in its
immediate neighborhood excite more than those
farther away - topological neighborhood decay smoothly with
lateral distance - Symmetric about maximum point defined by dij 0
- Monotonically decreasing to zero for dij ? 8
- Neighborhood function Gaussian case
- Size of neighborhood shrinks with time
33Typical window function
34Adaptive process
- Synaptic weight vector is changed in relation
with input vector - wj(n1) wj(n) ?(n) hj,i(x)(n) (x - wj(n))
- applied to all neurons inside the neighborhood of
winning neuron i - effect of moving weight wj toward input vector x
- upon repeated presentation of the training data,
weight tend to follow the distribution - Learning rate ?(n) may decay with time
35SOFM algorithm
- 1.initialize ws by random number
- 2. For input x(n), find nearest cell
- i(x) argminj x(n) - wj(n)
- 3. update weights of neighbors
- wj(n1) wj(n) ? (n) hj,i(x)(n) x(n) -
wj(n) - 4. reduce neighbors and ?
- 5. Go to 2
36Computer Simulation
- Input sample random numbers within 2-D unit
square - 100 neurons ( 10x10)
- Initial weights random assignment (0.01.0)
- Display
- each neuron positioned at w1, w2
- neighbors are connected by line
- next slide
- 2nd example Figure 9.8 and 9.9
37SOFM Example(1) 2-D Lattice by 2-D distribution
38(No Transcript)
39Topologically ordered map development (2)
40Topologically ordered map development (3)
41Topologically ordered map development (5)
42Topologically ordered map development (1D array
of Neurons)
43SOFM Example(2)Phoneme Recognition
- Recognition result for humppila
44SOFM Example(3)
- http//www-ti.informatik.uni-tuebingen.de/goepper
t/KohonenApp/KohonenApp.html - http//davis.wpi.edu/matt/courses/soms/applet.htm
l
45Summary of SOM
- Continuous input space of activation patterns
that are generated in accordance with a certain
probability distribution - Topology of the network in the form of a lattice
of neurons, which defines a discrete output space - Time-varying neighborhood function defined around
winning neuron - Learning rate decrease gradually with time, but
never go to zero
46Vector Quantization
- VQ data compression technique
- input space is divided into distinct regions
- reproduction vector, representative vector
- code words, code book
- Voronoi quantizer
- nearest neighbor rule on the Euclidean metric
- Learned Vector Quantization
- a supervised learning technique
- move Voronoi vector slightly in order to improve
classification decision quality
47Voronoi Tesselation
48Learned Vector Quantization
- Suppose wc is the closest to input xi.
- Let Cw be the class of wc
- Let Cxi be the class label of xi
- If Cw Cxi, then
- wc(n1) wc(n) ?nxi - wc(n)
- otherwise
- wc(n1) wc(n) - ?nxi - wc(n)
- the other Voronoi vectors are not modified
49Adaptive Pattern Classification
- Combination of Feature extraction and
classification - Feature extraction
- unsupervised by SOFM
- essential information contents of input data
- data reduction / dimension reduction effect
- Classification
- supervised scheme such as MLP