Title: The SelfOrganizing Map and Applications
1The Self-Organizing Map and Applications
- Jennie Si, Ph.D.
- Professor
- Dept. of Electrical Engineering
- Center for Systems Science and Engineering
Research - Arizona State University
- (480) 965-6133 (voice)
- (480) 965-0461 (fax)
- si_at_asu.edu (email)
2- Structure of the presentation
- Background
- The algorithm
- Analysis
- Case 1 motor data analysis
- Case 2 auditory data analysis
- Case 3 supplier change control
- A qualitative comparison
- Advanced issue topology preserving
- Conclusions
3References J. Si, S. Lin, and M. A. Vuong,
Dynamic topology representing network, Neural
Networks, the Official Journal of International
Neural Network Society. 13 (6) 617-627, 2000. S.
Lin, and J. Si, Weight convergence and weight
density distribution of the SOFM network with
discrete input, Neural Computation 10 (4)
807-814, 1998. S. Lin, and J. Si, and A. B.
Schwartz, "Self-Organization of Firing Activities
in Monkey's Motor Cortex Trajectory Computation
from Spike Signals. Neural Computation, the MIT
Press, March 1997, pp. 607-621. R. Davis,
Industry data mining using the self-organizing
map. M.S. thesis. Arizona State University, May
2001.
4Evolution of the SOM algorithm
- Von der Masburg, 1970s, the self-organization of
orientation sensitive nerve cells in the striate
cortex - Willshaw and von der Masburg, 1976, the first
paper on the formation of self-organizing maps on
biological grounds to explain retinotopic mapping
from the retina to the visual cortex (in higher
vertebrates). - Kohonen, 1982, the paper on the self-organizing
map, Self-organized formation of topologically
correct featured maps, in Biological cybernetics
5SOM - a computational shortcut
- To mimic basic functions similar to biological
neural networks - Implementation details of biological systems
ignored - To create an Ordered map of input signals
- Internal structure of the input signals
themselves - Coordination of the unit activities through the
lateral connections between the units - A statistical data modeling tool
6Two distinct properties of SOM
- Clustering of multidimensional input data
- Spatially ordering the output map so that similar
input patterns tend to produce a response in
units that are close to each other in the output
map. - Topology preserving
- nodes in the output layer represent clustering
information from the input data
7Applications of SOM
- Speech processing
- Vector quantization
- Image coding
- Biological signal analysis
- Visualize results from multi-dimensional data
analysis. - And many many more
8- Structure of the presentation
- Background
- The algorithm
- Analysis
- Case 1 motor data analysis
- Case 2 auditory data analysis
- Case 3 supplier change control
- A qualitative comparison
- Advanced issue topology preserving
- Conclusions
9The output map - undirected graph
10The self-organizing map
Output graph G
Weight vector W
Input vector
11Learning the topological mapping from input
- The graph (output map) is usually pre-specified
as a one-dimensional chain or two dimensional
lattice. - SOM intends to learn the topological mapping by
means of self-organization driven by samples X - X is assumed to be connected in parallel to every
node in the output map. - A node in G, associated with a weight vector W,
can be represented by its index i or its position
12SOM building block
- Find the winner and the neighborhood of the
winner - comparing the inner products WiTX, for i 1,
2L and selecting the node with the largest inner
product. - If the weight vectors Wi are normalized, the
inner product criterion is equivalent to the
minimum Euclidean distance measure. - c(X) arg min X-Wi, i 1, 2L
- With c(X) to indicate the output node of which
the weight vector matches the input vector X
the best
13SOM building block (continued)
- Adaptive process (by Oja 1982)
- The weight vectors inside the neighborhood of the
winner are usually updated by Hebbian type
learning law.
- the negative component is a nonlinear forgetting
term.
14Discrete-time update format for the adaptive
process
- Simplifying the equation,
- yi(t) 1 if node i inside the neighborhood of
the winner c - yi(t) 0 otherwise
- Obtained discrete-time format
15SOM Algorithm implementation
1. Select a winner c in the map by
2. Update the weights in the neighborhood of c
by
Where c is the neighborhood function, defined
as
16Neighborhood Function
Neighborhood is large at first, shrinks over time
17 18Learning Rate a(t)
- Essential for convergence
- Large enough for the network to adapt quickly for
the new training patterns - Small enough for stability - the network would
not forget the experience from the past training
patterns - A decreasing function of time.
19SOM Software Implementation
- Matlab Neural Networks Tool box
- SOM toolbox created by a group at the Helsinki
Institute of Technology - SOMToolbox downloaded from website at
http//www.cis.hut.fi/projects/somtoolbox/about.ht
ml
20- Structure of the presentation
- Background
- The algorithm
- Analysis
- Case 1 motor data analysis
- Case 2 auditory data analysis
- Case 3 supplier change control
- A qualitative comparison
- Advanced issue topology preserving
- Conclusions
21Weight convergence (Assumptions)
- the input has discrete probability density
- the learning rate a(k) satisfies conditions
(Robbins and Monro, 1951)
22Weight convergence (Results)
- SOM algorithm (locally or globally) minimizes the
objective function
- Weights converge almost truly to a stationary
solution if the stationary solution exists
Lin, Si, Weight Value Convergence of the SOM
Algorithm for Discrete Input
23Voronoi polyhedra
Voronoi polyhedra on Rn
Masked Voronoi polyhedra on
24Some extreme cases
- Assume neighborhood function is constant Nc(k)Nc
in the final learning phase, then
where
25Extreme case 1
Neighborhood covers the entire output map, i.e.
Each weight vector converges to the same
stationary state which is the mass center of the
training data set.
To eliminate the effect of initial conditions, we
should use a neighborhood function covering a
large range of the output map.
26Extreme case 2
Neighborhood equals 0, i.e.
where
Wi become the centroids of the cells of Voronoi
partition of the inputs and the final iterations
of SOM becomes a sequential updating process of
vector quantization.
SOM could be used for vector quantization by
shrinking the range of the neighborhood function
to zero during the learning process.
27Observations
- Robbins-Monro algorithm ensures weight
convergence to the root dJ/dWi0 almost truly if
the root exists. - In practice the weights would only converge to
local minima. - It has been observed that SOM is capable to some
extent of escaping from local minima when it is
used for vector quantization (Mcauliffe 1990). - Topology ordering of weights is not explicitly
proved but it remains as a well observed
practice in many applications.
28- Structure of the presentation
- Background
- The algorithm
- Analysis
- Case 1 motor data analysis
- Case 2 auditory data analysis
- Case 3 supplier change control
- A qualitative comparison
- Advanced issue topology preserving
- Conclusions
29Monkeys motor cortical data analysis to
interpret its movement intention
- Collaborators
- Andy Schwartz, Siming Lin
30Motor Experiment Overview
31Firing rate calculation
dt bin size, Ti / Tj start / end time of the
i-th bin dij the firing rate of the i-th bin
Ti , Tj, Tj Tidt.
Calculation of dij the number of spike intervals
overlapping with the i-th bin is first determined
to be 3. As shown (counting from left to right),
30 of the first interval, 100 of the second
interval, and 50 of the third interval are
located in the I-th bin. Thus the equation above.
32Self-Organizing Application
Motor Cortical Information processing
- Spike signal and feature extraction
- Computation models using SOM
- Visualization of firing patterns of motor cortex
- Neural trajectory computation
- Weights are adaptively updated by the average
discharge rates -
Input Average discharge rates of 81
cells Output Two-dimensional grid, each node
codes the movement directions from the average
discharge rates
33The Self-Organized Map of Discharge Rates from
81 Neurons in the Center - Out Task
34The Self-Organized Map of Discharge Rates from
81 Neurons in the Center - Out Task
35The Self-Organized Map of Discharge Rates
from 81 Neurons in the Spiral Task
36Neural Directions Four Trials for Training,
One for Testing in Spiral Task
37Neural Trajectory Four Trials for Training, one
for testing
Left monkey finger trajectory Right SOM
predicted trajectory
38Neural Trajectory Data from Spiral Tasks and
Center-out Tasks for Training
Left monkey finger trajectory Right SOM
predicted trajectory
39Neural Trajectory Average Testing Result from
Five Trials Using Leave-K-Out
Left monkey finger trajectory Right SOM
predicted trajectory
40Trajectory Computation Error in 100 Bins
41- Structure of the presentation
- Background
- The algorithm
- Analysis
- Case 1 motor data analysis
- Case 2 auditory data analysis
- Case 3 supplier change control
- A qualitative comparison
- Advanced issue topology preserving
- Conclusions
42Guinea pig auditory cortical data analysis to
interpret its perception of sound
Collaborators Russ Witte , Jing Hu, Daryl Kipke
43Surgical implant, neural recordings
Sample Signal from a single electrode
Microvolts
Time (sec)
Each of 60 frequencies spanning 6 octaves were
repeated 10 times. Each stimulus interval lasted
700ms, including 200ms of tone on time and 500ms
of off time(interstimulus interval).
44Raster Plot- 6s snapshot
45Averaged Spike count of 22 chan
1701Hz
1473Hz
46Spikerate of Channel1, Stimulus 1-60
47Data Processing
- Channel selection - 30 channels were selected
- Bin width - Basic unit to hold spike count from
experiment data, 5ms, 10ms, 20ms, or higher.
70ms-bin size used - Noise filtering - Apply Gaussian filter to binned
data. - Frequency grouping 12 out of 60 stimuli are
selected. It is therefore approximately one
frequency per half Octave. - Trial loop selection/leave-1-out - Among the 10
loops of experimental data, take 9 for training,
leave one for testing
48SOM Training and Testing
- Input vector Bins from all channels are
combined. Training/testing patterns are from all
loops. - Output 2-Dimensional grid. The position of each
node in the map corresponds to certain spike
pattern. - Training parameters map size (eg 10 ?10),
learning rate (0.02/0.0001), neighborhood
function (Gaussian) and radius (eg 10/0),
training/fine-tuning epochs, reducing
schedules.etc. - Calibration after training label the output
using the labels (frequencies) of the training
data
49Tonotopic maps natural vs. SOM
1473
1028
717
Neuronal activities of one channel that leads to
the predicted stimulus map - Preserved topology
Auditory cortex map
Channel 12 has a narrow tuning curve. It is
mostly tuned into 700-1500Hz auditory tones.
50Results from 10 sessions using leave_k_out
51- Structure of the presentation
- Background
- The algorithm
- Analysis
- Case 1 motor data analysis
- Case 2 auditory data analysis
- Case 3 supplier change control
- A qualitative comparison
- Advanced issue topology preserving
- Conclusions
52SOM for supplier change control
analysis Collaborator Rob Davis
53SOM Analysis Methods
- UMatrices were used to find clusters in the data
- Hit histograms were used to see the density of
the input data on the organized output map - Empty maps with a label for all the data vectors
were used to check the organization - Component planes were used to determine the
characteristics of the clusters
54SOM - UMatrix Computation
55UMatrix Example
56Methodology - Knowledge Discovery
57Description of Data Used
- Supplier change control data
- Gun shop analogy for change control
- The data used describes a specific review
process, but the methods outlined could be
applied to data describing virtually any process
58Data Fields
- MHSNum unique labels
- Supplier supplier name and location
- Commodity group within Manufacturing Inc.
- Title title of change
- Description description of change
- Platform the platform that the product is used
in - Reason reason for the change being proposed
- InternalChangeClass internal risk level
identifier - SupplierChangeClass external risk level
identifier - ReviewBody name of the internal group reviewing
the change - Changestatus the status of the change in the
review process
59Data Fields (Continued)
- Owner internal contact for the proposed change
- PropImpDate the proposed implementation date
for the change - PWPDate the date that the preliminary white
paper was submitted - PWPApproval the date that the PWP was approved
- FWPDate the date that the final white paper was
submitted - FWPApproval the date that the FWP was approved
- ActualImpDate the date that the change was
actually implemented
60Coding the Data Main Concerns
- Weight given to each variable
- Resolution of variables
- How to represent variables with multiple values
(platforms,owners) - How to represent the date information
61Initial Results Hit Histogram
- Multiple platforms or owners were given contrived
fractional values - Date variables excluded, as well as title and
description
62Initial Results Labels
63Initial Results Component Planes
64Coding the Data - Solutions
- 10 x 10 Pre-SOMs were used for variables with
multiple values (platforms, owners) - BMUs of the platform and owner pre-SOMs were used
as two inputs in the main SOM, one for each digit
of the BMU - Two inputs were used for the supplier variable
one for each digit - Dates were analyzed separately, then the
differences between them were used as inputs to
the main SOM
65Updated Dataset
- With continuous data, at some point there needs
to be an update of the dataset - The original data was taken from the database
1/12/01, and an updated dataset was retrieved on
3/7/01. - The original dataset had 112 records, the updated
one had 145 records. The original dataset had 43
records with date information, and updated one
had 66.
66Dealing With the Dates
- Date format YYWW, where YY is the last two
digits of the year and WW is the work week
(1-53) - Dates were coded and put into their own SOM 3
variables for each date (one for the year and one
for each of the two work week digits) results
were organized, but so what? - Then the differences in the dates were analyzed
using charts and histograms.
67Proposed Actual Implementation Chart
68Proposed Actual Implementation Histogram
69Throughput Times Chart
70PWP Process Time Histogram
71PWP to FWP Time Histogram
72FWP Process Time Histogram
73FWP Approval to Implementation Histogram
74Putting It All Together
- In order to discover some useful knowledge from
an SOM, there are a couple things that are
necessary - The data needs to be accurately represented
- Some measure of results or efficiency has to be
included in the data in this case the date
information is a measure of the efficiency of the
change control process
75Including the Dates
- One reason for getting an updated dataset was to
get more records with date information - When the dates were included in the main SOM,
only the records with date information were used
(66 in the new dataset), so that they would be
accurately represented (vs. using average values
for most of the data in the date fields)
76Final Results - Dates Included
- The dataset used for the final results includes
17 variables and 66 records - Five inputs were used for the differences between
the dates (in weeks)
77Final Results Labels
78Final Results - Component Planes
79Analyzing the Results
- The clusters identified from the UMatrix can be
examined in the component planes to see if there
are any that we can label good or bad, based
on the date inputs, which are measures of
efficiency - The most extreme bad cluster can be seen in the
upper right corner (see the component plane for
Prop-ActualImp) - A possible cause for the bad results in the
cluster may be found in other component planes
(by looking for correlation between clusters)
80Simplifying the Analysis
- The understanding gained by examining the
component planes can then be put into a simple
table showing possible causes for the good or
bad results - Because the dataset used in this analysis was
small, the major indicators resulting from SOM
analysis are also small
81Major Indicators - Possible Causes
- Implementation of change was at least 3 months
behind schedule - FWP took over a month to review and approve
- FWP took over 4 months to be submitted after PWP
- Supplier was number 28
- Review Body was number 3 or 5
82Conclusions
- Pre-Processing is the largest barrier to
application of the SOM for data mining - The Rob Davis thesis offers solutions to
representing a few different types of data - Some measure of results for the data needs to be
included in order to get meaningful conclusions
from analysis
83- Structure of the presentation
- Background
- The algorithm
- Analysis
- Case 1 motor data analysis
- Case 2 auditory data analysis
- Case 3 supplier change control
- A qualitative comparison
- Advanced issue topology preserving
- Conclusions
84A qualitative comparison
85(No Transcript)
86Decision tree methods
- Human understandable representation The decision
tree can be transferred into a set of if-then
rules to improve human readability. - Robust to errors in classification of the
training set. - The testing set and pruning algorithms are used
to minimize the tree size and the
mis-classification. - Some can be used even when some training examples
have unknown values.
87Discriminant analysis methods
- Perform well when the classes can be
approximately described by multivariate normal
distributions with identical covariance matrices. - Limitations and Dangers
- Primarily effective for continuous explanatory
variables - The use of the Mahalanobis distance to the
centroid of a class will lead to high error rates
if unusual shaped classes were encountered. - Dependent on the estimate of the covariance matrix
88Multiple partition decision tree
- Limit to ordinal or nominal data and execution
speed is an issue when continuous variables are
used. - High error rates if classes are shaped like two
circles but with overlapping projections on some
axes. - Over-fitting with continuous data
- Big (middle) size of the decision tree greatly
decreases the interpretability
89- Structure of the presentation
- Background
- The algorithm
- Analysis
- Case 1 motor data analysis
- Case 2 auditory data analysis
- Case 3 supplier change control
- A qualitative comparison
- Advanced issue topology preserving
- Conclusions
90Topology distortion by SOM
Dimension mismatch (left) and structure mismatch
(right)when using a predetermined SOM to learn
input having a spiral topology (left) or several
disconnected manifolds (right)
91Mapping from to G
Definition of topology preserving
92Inverse mapping from G to
93Definition of topology preserving
94The DTRN Structure
- Each node i, i1, 2, , L, has an associated
weight vector
- Each node has synnaptic links Sisi1,si2, ,siL
and an age factor titi1,ti2, ,tiL.
- If sij,1, then I and j are adjacent.
- L, sij, and Wi are dynamicaly updates during the
learning process, i, j1, 2, , L.
95The DTRN algorithm
Step0 Initialization start with only one output
node. Step1 Find the first and second Winners
96The DTRN algorithm (continue)
97Topology preserving
Using the Hebbian competitive learning rule
98Dynamic Size
99Annealing and clustering
100Four Sets of 2D data
101Simulations by DTRN
102- Structure of the presentation
- Background
- The algorithm
- Analysis
- Case 1 motor data analysis
- Case 2 auditory data analysis
- Case 3 supplier change control
- A qualitative comparison
- Advanced issue topology preserving
- Conclusions
103Conclusions
- SOM - a general purpose clustering tool with
topology preserved from input data - It also provides visualization capability
- It is a statistical tool
- Topology can become an issue in some cases, but
new algorithms are available - Preprocessing is key in many successful
applications