The SelfOrganizing Map and Applications

About This Presentation

Title:

The SelfOrganizing Map and Applications

Description:

Two distinct properties of SOM. Clustering of multidimensional input data. Spatially ordering the output map so that similar input patterns tend to produce ... – PowerPoint PPT presentation

Number of Views:132

Avg rating:3.0/5.0

Slides: 104

Provided by: jhu99

Category:

more less

Transcript and Presenter's Notes

Title: The SelfOrganizing Map and Applications

1
The Self-Organizing Map and Applications

Jennie Si, Ph.D.
Professor
Dept. of Electrical Engineering
Center for Systems Science and Engineering
Research
Arizona State University
(480) 965-6133 (voice)
(480) 965-0461 (fax)
si_at_asu.edu (email)

Structure of the presentation
Background
The algorithm
Analysis
Case 1 motor data analysis
Case 2 auditory data analysis
Case 3 supplier change control
A qualitative comparison
Advanced issue topology preserving
Conclusions

3
References J. Si, S. Lin, and M. A. Vuong,
Dynamic topology representing network, Neural
Networks, the Official Journal of International
Neural Network Society. 13 (6) 617-627, 2000. S.
Lin, and J. Si, Weight convergence and weight
density distribution of the SOFM network with
discrete input, Neural Computation 10 (4)
807-814, 1998. S. Lin, and J. Si, and A. B.
Schwartz, "Self-Organization of Firing Activities
in Monkey's Motor Cortex Trajectory Computation
from Spike Signals. Neural Computation, the MIT
Press, March 1997, pp. 607-621. R. Davis,
Industry data mining using the self-organizing
map. M.S. thesis. Arizona State University, May
2001.
4
Evolution of the SOM algorithm

Von der Masburg, 1970s, the self-organization of
orientation sensitive nerve cells in the striate
cortex
Willshaw and von der Masburg, 1976, the first
paper on the formation of self-organizing maps on
biological grounds to explain retinotopic mapping
from the retina to the visual cortex (in higher
vertebrates).
Kohonen, 1982, the paper on the self-organizing
map, Self-organized formation of topologically
correct featured maps, in Biological cybernetics

5
SOM - a computational shortcut

To mimic basic functions similar to biological
neural networks
Implementation details of biological systems
ignored
To create an Ordered map of input signals
Internal structure of the input signals
themselves
Coordination of the unit activities through the
lateral connections between the units
A statistical data modeling tool

6
Two distinct properties of SOM

Clustering of multidimensional input data
Spatially ordering the output map so that similar
input patterns tend to produce a response in
units that are close to each other in the output
map.
Topology preserving
nodes in the output layer represent clustering
information from the input data

7
Applications of SOM

Speech processing
Vector quantization
Image coding
Biological signal analysis
Visualize results from multi-dimensional data
analysis.
And many many more

Structure of the presentation
Background
The algorithm
Analysis
Case 1 motor data analysis
Case 2 auditory data analysis
Case 3 supplier change control
A qualitative comparison
Advanced issue topology preserving
Conclusions

9
The output map - undirected graph
10
The self-organizing map
Output graph G
Weight vector W
Input vector
11
Learning the topological mapping from input

The graph (output map) is usually pre-specified
as a one-dimensional chain or two dimensional
lattice.
SOM intends to learn the topological mapping by
means of self-organization driven by samples X
X is assumed to be connected in parallel to every
node in the output map.
A node in G, associated with a weight vector W,
can be represented by its index i or its position

12
SOM building block

Find the winner and the neighborhood of the
winner
comparing the inner products WiTX, for i 1,
2L and selecting the node with the largest inner
product.
If the weight vectors Wi are normalized, the
inner product criterion is equivalent to the
minimum Euclidean distance measure.
c(X) arg min X-Wi, i 1, 2L
With c(X) to indicate the output node of which
the weight vector matches the input vector X
the best

13
SOM building block (continued)

Adaptive process (by Oja 1982)
The weight vectors inside the neighborhood of the
winner are usually updated by Hebbian type
learning law.

the negative component is a nonlinear forgetting
term.

14
Discrete-time update format for the adaptive
process

Simplifying the equation,
yi(t) 1 if node i inside the neighborhood of
the winner c
yi(t) 0 otherwise

Obtained discrete-time format

15
SOM Algorithm implementation
1. Select a winner c in the map by
2. Update the weights in the neighborhood of c
by
Where c is the neighborhood function, defined
as
16
Neighborhood Function

bell-shaped neighborhood

Neighborhood is large at first, shrinks over time
17

square neighborhood

18
Learning Rate a(t)

Essential for convergence
Large enough for the network to adapt quickly for
the new training patterns
Small enough for stability - the network would
not forget the experience from the past training
patterns
A decreasing function of time.

19
SOM Software Implementation

Matlab Neural Networks Tool box
SOM toolbox created by a group at the Helsinki
Institute of Technology
SOMToolbox downloaded from website at
http//www.cis.hut.fi/projects/somtoolbox/about.ht
ml

Structure of the presentation
Background
The algorithm
Analysis
Case 1 motor data analysis
Case 2 auditory data analysis
Case 3 supplier change control
A qualitative comparison
Advanced issue topology preserving
Conclusions

21
Weight convergence (Assumptions)

the input has discrete probability density

the learning rate a(k) satisfies conditions
(Robbins and Monro, 1951)

22
Weight convergence (Results)

SOM algorithm (locally or globally) minimizes the
objective function

Weights converge almost truly to a stationary
solution if the stationary solution exists

Lin, Si, Weight Value Convergence of the SOM
Algorithm for Discrete Input
23
Voronoi polyhedra
Voronoi polyhedra on Rn
Masked Voronoi polyhedra on
24
Some extreme cases

Assume neighborhood function is constant Nc(k)Nc
in the final learning phase, then

where
25
Extreme case 1
Neighborhood covers the entire output map, i.e.
Each weight vector converges to the same
stationary state which is the mass center of the
training data set.
To eliminate the effect of initial conditions, we
should use a neighborhood function covering a
large range of the output map.
26
Extreme case 2
Neighborhood equals 0, i.e.
where
Wi become the centroids of the cells of Voronoi
partition of the inputs and the final iterations
of SOM becomes a sequential updating process of
vector quantization.
SOM could be used for vector quantization by
shrinking the range of the neighborhood function
to zero during the learning process.
27
Observations

Robbins-Monro algorithm ensures weight
convergence to the root dJ/dWi0 almost truly if
the root exists.
In practice the weights would only converge to
local minima.
It has been observed that SOM is capable to some
extent of escaping from local minima when it is
used for vector quantization (Mcauliffe 1990).
Topology ordering of weights is not explicitly
proved but it remains as a well observed
practice in many applications.

Structure of the presentation
Background
The algorithm
Analysis
Case 1 motor data analysis
Case 2 auditory data analysis
Case 3 supplier change control
A qualitative comparison
Advanced issue topology preserving
Conclusions

29
Monkeys motor cortical data analysis to
interpret its movement intention

Collaborators
Andy Schwartz, Siming Lin

30
Motor Experiment Overview
31
Firing rate calculation
dt bin size, Ti / Tj start / end time of the
i-th bin dij the firing rate of the i-th bin
Ti , Tj, Tj Tidt.
Calculation of dij the number of spike intervals
overlapping with the i-th bin is first determined
to be 3. As shown (counting from left to right),
30 of the first interval, 100 of the second
interval, and 50 of the third interval are
located in the I-th bin. Thus the equation above.

32
Self-Organizing Application
Motor Cortical Information processing

Spike signal and feature extraction
Computation models using SOM
Visualization of firing patterns of motor cortex
Neural trajectory computation
Weights are adaptively updated by the average
discharge rates

Input Average discharge rates of 81
cells Output Two-dimensional grid, each node
codes the movement directions from the average
discharge rates
33
The Self-Organized Map of Discharge Rates from
81 Neurons in the Center - Out Task
34
The Self-Organized Map of Discharge Rates from
81 Neurons in the Center - Out Task
35
The Self-Organized Map of Discharge Rates
from 81 Neurons in the Spiral Task
36
Neural Directions Four Trials for Training,
One for Testing in Spiral Task
37
Neural Trajectory Four Trials for Training, one
for testing
Left monkey finger trajectory Right SOM
predicted trajectory
38
Neural Trajectory Data from Spiral Tasks and
Center-out Tasks for Training
Left monkey finger trajectory Right SOM
predicted trajectory
39
Neural Trajectory Average Testing Result from
Five Trials Using Leave-K-Out
Left monkey finger trajectory Right SOM
predicted trajectory
40
Trajectory Computation Error in 100 Bins
41

Structure of the presentation
Background
The algorithm
Analysis
Case 1 motor data analysis
Case 2 auditory data analysis
Case 3 supplier change control
A qualitative comparison
Advanced issue topology preserving
Conclusions

42
Guinea pig auditory cortical data analysis to
interpret its perception of sound
Collaborators Russ Witte , Jing Hu, Daryl Kipke
43
Surgical implant, neural recordings
Sample Signal from a single electrode
Microvolts
Time (sec)
Each of 60 frequencies spanning 6 octaves were
repeated 10 times. Each stimulus interval lasted
700ms, including 200ms of tone on time and 500ms
of off time(interstimulus interval).
44
Raster Plot- 6s snapshot
45
Averaged Spike count of 22 chan
1701Hz
1473Hz
46
Spikerate of Channel1, Stimulus 1-60
47
Data Processing

Channel selection - 30 channels were selected
Bin width - Basic unit to hold spike count from
experiment data, 5ms, 10ms, 20ms, or higher.
70ms-bin size used
Noise filtering - Apply Gaussian filter to binned
data.
Frequency grouping 12 out of 60 stimuli are
selected. It is therefore approximately one
frequency per half Octave.
Trial loop selection/leave-1-out - Among the 10
loops of experimental data, take 9 for training,
leave one for testing

48
SOM Training and Testing

Input vector Bins from all channels are
combined. Training/testing patterns are from all
loops.
Output 2-Dimensional grid. The position of each
node in the map corresponds to certain spike
pattern.
Training parameters map size (eg 10 ?10),
learning rate (0.02/0.0001), neighborhood
function (Gaussian) and radius (eg 10/0),
training/fine-tuning epochs, reducing
schedules.etc.
Calibration after training label the output
using the labels (frequencies) of the training
data

49
Tonotopic maps natural vs. SOM
1473
1028
717
Neuronal activities of one channel that leads to
the predicted stimulus map - Preserved topology
Auditory cortex map
Channel 12 has a narrow tuning curve. It is
mostly tuned into 700-1500Hz auditory tones.
50
Results from 10 sessions using leave_k_out
51

Structure of the presentation
Background
The algorithm
Analysis
Case 1 motor data analysis
Case 2 auditory data analysis
Case 3 supplier change control
A qualitative comparison
Advanced issue topology preserving
Conclusions

52
SOM for supplier change control
analysis Collaborator Rob Davis
53
SOM Analysis Methods

UMatrices were used to find clusters in the data
Hit histograms were used to see the density of
the input data on the organized output map
Empty maps with a label for all the data vectors
were used to check the organization
Component planes were used to determine the
characteristics of the clusters

54
SOM - UMatrix Computation
55
UMatrix Example
56
Methodology - Knowledge Discovery
57
Description of Data Used

Supplier change control data
Gun shop analogy for change control
The data used describes a specific review
process, but the methods outlined could be
applied to data describing virtually any process

58
Data Fields

MHSNum unique labels
Supplier supplier name and location
Commodity group within Manufacturing Inc.
Title title of change
Description description of change
Platform the platform that the product is used
in
Reason reason for the change being proposed
InternalChangeClass internal risk level
identifier
SupplierChangeClass external risk level
identifier
ReviewBody name of the internal group reviewing
the change
Changestatus the status of the change in the
review process

59
Data Fields (Continued)

Owner internal contact for the proposed change
PropImpDate the proposed implementation date
for the change
PWPDate the date that the preliminary white
paper was submitted
PWPApproval the date that the PWP was approved
FWPDate the date that the final white paper was
submitted
FWPApproval the date that the FWP was approved
ActualImpDate the date that the change was
actually implemented

60
Coding the Data Main Concerns

Weight given to each variable
Resolution of variables
How to represent variables with multiple values
(platforms,owners)
How to represent the date information

61
Initial Results Hit Histogram

Multiple platforms or owners were given contrived
fractional values
Date variables excluded, as well as title and
description

62
Initial Results Labels
63
Initial Results Component Planes
64
Coding the Data - Solutions

10 x 10 Pre-SOMs were used for variables with
multiple values (platforms, owners)
BMUs of the platform and owner pre-SOMs were used
as two inputs in the main SOM, one for each digit
of the BMU
Two inputs were used for the supplier variable
one for each digit
Dates were analyzed separately, then the
differences between them were used as inputs to
the main SOM

65
Updated Dataset

With continuous data, at some point there needs
to be an update of the dataset
The original data was taken from the database
1/12/01, and an updated dataset was retrieved on
3/7/01.
The original dataset had 112 records, the updated
one had 145 records. The original dataset had 43
records with date information, and updated one
had 66.

66
Dealing With the Dates

Date format YYWW, where YY is the last two
digits of the year and WW is the work week
(1-53)
Dates were coded and put into their own SOM 3
variables for each date (one for the year and one
for each of the two work week digits) results
were organized, but so what?
Then the differences in the dates were analyzed
using charts and histograms.

67
Proposed Actual Implementation Chart
68
Proposed Actual Implementation Histogram
69
Throughput Times Chart
70
PWP Process Time Histogram
71
PWP to FWP Time Histogram
72
FWP Process Time Histogram
73
FWP Approval to Implementation Histogram
74
Putting It All Together

In order to discover some useful knowledge from
an SOM, there are a couple things that are
necessary
The data needs to be accurately represented
Some measure of results or efficiency has to be
included in the data in this case the date
information is a measure of the efficiency of the
change control process

75
Including the Dates

One reason for getting an updated dataset was to
get more records with date information
When the dates were included in the main SOM,
only the records with date information were used
(66 in the new dataset), so that they would be
accurately represented (vs. using average values
for most of the data in the date fields)

76
Final Results - Dates Included

The dataset used for the final results includes
17 variables and 66 records
Five inputs were used for the differences between
the dates (in weeks)

77
Final Results Labels
78
Final Results - Component Planes
79
Analyzing the Results

The clusters identified from the UMatrix can be
examined in the component planes to see if there
are any that we can label good or bad, based
on the date inputs, which are measures of
efficiency
The most extreme bad cluster can be seen in the
upper right corner (see the component plane for
Prop-ActualImp)
A possible cause for the bad results in the
cluster may be found in other component planes
(by looking for correlation between clusters)

80
Simplifying the Analysis

The understanding gained by examining the
component planes can then be put into a simple
table showing possible causes for the good or
bad results
Because the dataset used in this analysis was
small, the major indicators resulting from SOM
analysis are also small

81
Major Indicators - Possible Causes

Implementation of change was at least 3 months
behind schedule
FWP took over a month to review and approve

FWP took over 4 months to be submitted after PWP
Supplier was number 28
Review Body was number 3 or 5

82
Conclusions

Pre-Processing is the largest barrier to
application of the SOM for data mining
The Rob Davis thesis offers solutions to
representing a few different types of data
Some measure of results for the data needs to be
included in order to get meaningful conclusions
from analysis

Structure of the presentation
Background
The algorithm
Analysis
Case 1 motor data analysis
Case 2 auditory data analysis
Case 3 supplier change control
A qualitative comparison
Advanced issue topology preserving
Conclusions

84
A qualitative comparison
85
(No Transcript)
86
Decision tree methods

Human understandable representation The decision
tree can be transferred into a set of if-then
rules to improve human readability.
Robust to errors in classification of the
training set.
The testing set and pruning algorithms are used
to minimize the tree size and the
mis-classification.
Some can be used even when some training examples
have unknown values.

87
Discriminant analysis methods

Perform well when the classes can be
approximately described by multivariate normal
distributions with identical covariance matrices.
Limitations and Dangers
Primarily effective for continuous explanatory
variables
The use of the Mahalanobis distance to the
centroid of a class will lead to high error rates
if unusual shaped classes were encountered.
Dependent on the estimate of the covariance matrix

88
Multiple partition decision tree

Limit to ordinal or nominal data and execution
speed is an issue when continuous variables are
used.
High error rates if classes are shaped like two
circles but with overlapping projections on some
axes.
Over-fitting with continuous data
Big (middle) size of the decision tree greatly
decreases the interpretability

Structure of the presentation
Background
The algorithm
Analysis
Case 1 motor data analysis
Case 2 auditory data analysis
Case 3 supplier change control
A qualitative comparison
Advanced issue topology preserving
Conclusions

90
Topology distortion by SOM
Dimension mismatch (left) and structure mismatch
(right)when using a predetermined SOM to learn
input having a spiral topology (left) or several
disconnected manifolds (right)
91
Mapping from to G
Definition of topology preserving
92
Inverse mapping from G to
93
Definition of topology preserving
94
The DTRN Structure

Each node i, i1, 2, , L, has an associated

weight vector

Each node has synnaptic links Sisi1,si2, ,siL
and an age factor titi1,ti2, ,tiL.

If sij,1, then I and j are adjacent.

L, sij, and Wi are dynamicaly updates during the
learning process, i, j1, 2, , L.

95
The DTRN algorithm
Step0 Initialization start with only one output
node. Step1 Find the first and second Winners
96
The DTRN algorithm (continue)
97
Topology preserving
Using the Hebbian competitive learning rule
98
Dynamic Size
99
Annealing and clustering
100
Four Sets of 2D data
101
Simulations by DTRN
102