The SelfOrganizing Map and Applications - PowerPoint PPT Presentation

1 / 103
About This Presentation
Title:

The SelfOrganizing Map and Applications

Description:

Two distinct properties of SOM. Clustering of multidimensional input data. Spatially ordering the output map so that similar input patterns tend to produce ... – PowerPoint PPT presentation

Number of Views:132
Avg rating:3.0/5.0
Slides: 104
Provided by: jhu99
Category:

less

Transcript and Presenter's Notes

Title: The SelfOrganizing Map and Applications


1
The Self-Organizing Map and Applications
  • Jennie Si, Ph.D.
  • Professor
  • Dept. of Electrical Engineering
  • Center for Systems Science and Engineering
    Research
  • Arizona State University
  • (480) 965-6133 (voice)
  • (480) 965-0461 (fax)
  • si_at_asu.edu (email)

2
  • Structure of the presentation
  • Background
  • The algorithm
  • Analysis
  • Case 1 motor data analysis
  • Case 2 auditory data analysis
  • Case 3 supplier change control
  • A qualitative comparison
  • Advanced issue topology preserving
  • Conclusions

3
References J. Si, S. Lin, and M. A. Vuong,
Dynamic topology representing network, Neural
Networks, the Official Journal of International
Neural Network Society. 13 (6) 617-627, 2000. S.
Lin, and J. Si, Weight convergence and weight
density distribution of the SOFM network with
discrete input, Neural Computation 10 (4)
807-814, 1998. S. Lin, and J. Si, and A. B.
Schwartz, "Self-Organization of Firing Activities
in Monkey's Motor Cortex Trajectory Computation
from Spike Signals. Neural Computation, the MIT
Press, March 1997, pp. 607-621. R. Davis,
Industry data mining using the self-organizing
map. M.S. thesis. Arizona State University, May
2001.
4
Evolution of the SOM algorithm
  • Von der Masburg, 1970s, the self-organization of
    orientation sensitive nerve cells in the striate
    cortex
  • Willshaw and von der Masburg, 1976, the first
    paper on the formation of self-organizing maps on
    biological grounds to explain retinotopic mapping
    from the retina to the visual cortex (in higher
    vertebrates).
  • Kohonen, 1982, the paper on the self-organizing
    map, Self-organized formation of topologically
    correct featured maps, in Biological cybernetics

5
SOM - a computational shortcut
  • To mimic basic functions similar to biological
    neural networks
  • Implementation details of biological systems
    ignored
  • To create an Ordered map of input signals
  • Internal structure of the input signals
    themselves
  • Coordination of the unit activities through the
    lateral connections between the units
  • A statistical data modeling tool

6
Two distinct properties of SOM
  • Clustering of multidimensional input data
  • Spatially ordering the output map so that similar
    input patterns tend to produce a response in
    units that are close to each other in the output
    map.
  • Topology preserving
  • nodes in the output layer represent clustering
    information from the input data

7
Applications of SOM
  • Speech processing
  • Vector quantization
  • Image coding
  • Biological signal analysis
  • Visualize results from multi-dimensional data
    analysis.
  • And many many more

8
  • Structure of the presentation
  • Background
  • The algorithm
  • Analysis
  • Case 1 motor data analysis
  • Case 2 auditory data analysis
  • Case 3 supplier change control
  • A qualitative comparison
  • Advanced issue topology preserving
  • Conclusions

9
The output map - undirected graph
10
The self-organizing map
Output graph G
Weight vector W
Input vector
11
Learning the topological mapping from input
  • The graph (output map) is usually pre-specified
    as a one-dimensional chain or two dimensional
    lattice.
  • SOM intends to learn the topological mapping by
    means of self-organization driven by samples X
  • X is assumed to be connected in parallel to every
    node in the output map.
  • A node in G, associated with a weight vector W,
    can be represented by its index i or its position

12
SOM building block
  • Find the winner and the neighborhood of the
    winner
  • comparing the inner products WiTX, for i 1,
    2L and selecting the node with the largest inner
    product.
  • If the weight vectors Wi are normalized, the
    inner product criterion is equivalent to the
    minimum Euclidean distance measure.
  • c(X) arg min X-Wi, i 1, 2L
  • With c(X) to indicate the output node of which
    the weight vector matches the input vector X
    the best

13
SOM building block (continued)
  • Adaptive process (by Oja 1982)
  • The weight vectors inside the neighborhood of the
    winner are usually updated by Hebbian type
    learning law.
  • the negative component is a nonlinear forgetting
    term.

14
Discrete-time update format for the adaptive
process
  • Simplifying the equation,
  • yi(t) 1 if node i inside the neighborhood of
    the winner c
  • yi(t) 0 otherwise
  • Obtained discrete-time format

15
SOM Algorithm implementation
1. Select a winner c in the map by
2. Update the weights in the neighborhood of c
by
Where c is the neighborhood function, defined
as
16
Neighborhood Function
  • bell-shaped neighborhood

Neighborhood is large at first, shrinks over time
17
  • square neighborhood

18
Learning Rate a(t)
  • Essential for convergence
  • Large enough for the network to adapt quickly for
    the new training patterns
  • Small enough for stability - the network would
    not forget the experience from the past training
    patterns
  • A decreasing function of time.

19
SOM Software Implementation
  • Matlab Neural Networks Tool box
  • SOM toolbox created by a group at the Helsinki
    Institute of Technology
  • SOMToolbox downloaded from website at
    http//www.cis.hut.fi/projects/somtoolbox/about.ht
    ml

20
  • Structure of the presentation
  • Background
  • The algorithm
  • Analysis
  • Case 1 motor data analysis
  • Case 2 auditory data analysis
  • Case 3 supplier change control
  • A qualitative comparison
  • Advanced issue topology preserving
  • Conclusions

21
Weight convergence (Assumptions)
  • the input has discrete probability density
  • the learning rate a(k) satisfies conditions
    (Robbins and Monro, 1951)

22
Weight convergence (Results)
  • SOM algorithm (locally or globally) minimizes the
    objective function
  • Weights converge almost truly to a stationary
    solution if the stationary solution exists

Lin, Si, Weight Value Convergence of the SOM
Algorithm for Discrete Input
23
Voronoi polyhedra
Voronoi polyhedra on Rn
Masked Voronoi polyhedra on
24
Some extreme cases
  • Assume neighborhood function is constant Nc(k)Nc
    in the final learning phase, then

where
25
Extreme case 1
Neighborhood covers the entire output map, i.e.
Each weight vector converges to the same
stationary state which is the mass center of the
training data set.
To eliminate the effect of initial conditions, we
should use a neighborhood function covering a
large range of the output map.
26
Extreme case 2
Neighborhood equals 0, i.e.
where
Wi become the centroids of the cells of Voronoi
partition of the inputs and the final iterations
of SOM becomes a sequential updating process of
vector quantization.
SOM could be used for vector quantization by
shrinking the range of the neighborhood function
to zero during the learning process.
27
Observations
  • Robbins-Monro algorithm ensures weight
    convergence to the root dJ/dWi0 almost truly if
    the root exists.
  • In practice the weights would only converge to
    local minima.
  • It has been observed that SOM is capable to some
    extent of escaping from local minima when it is
    used for vector quantization (Mcauliffe 1990).
  • Topology ordering of weights is not explicitly
    proved but it remains as a well observed
    practice in many applications.

28
  • Structure of the presentation
  • Background
  • The algorithm
  • Analysis
  • Case 1 motor data analysis
  • Case 2 auditory data analysis
  • Case 3 supplier change control
  • A qualitative comparison
  • Advanced issue topology preserving
  • Conclusions

29
Monkeys motor cortical data analysis to
interpret its movement intention
  • Collaborators
  • Andy Schwartz, Siming Lin

30
Motor Experiment Overview
31
Firing rate calculation
dt bin size, Ti / Tj start / end time of the
i-th bin dij the firing rate of the i-th bin
Ti , Tj, Tj Tidt.
Calculation of dij the number of spike intervals
overlapping with the i-th bin is first determined
to be 3. As shown (counting from left to right),
30 of the first interval, 100 of the second
interval, and 50 of the third interval are
located in the I-th bin. Thus the equation above.

32
Self-Organizing Application
Motor Cortical Information processing
  • Spike signal and feature extraction
  • Computation models using SOM
  • Visualization of firing patterns of motor cortex
  • Neural trajectory computation
  • Weights are adaptively updated by the average
    discharge rates

Input Average discharge rates of 81
cells Output Two-dimensional grid, each node
codes the movement directions from the average
discharge rates
33
The Self-Organized Map of Discharge Rates from
81 Neurons in the Center - Out Task
34
The Self-Organized Map of Discharge Rates from
81 Neurons in the Center - Out Task
35
The Self-Organized Map of Discharge Rates
from 81 Neurons in the Spiral Task
36
Neural Directions Four Trials for Training,
One for Testing in Spiral Task
37
Neural Trajectory Four Trials for Training, one
for testing
Left monkey finger trajectory Right SOM
predicted trajectory
38
Neural Trajectory Data from Spiral Tasks and
Center-out Tasks for Training
Left monkey finger trajectory Right SOM
predicted trajectory
39
Neural Trajectory Average Testing Result from
Five Trials Using Leave-K-Out
Left monkey finger trajectory Right SOM
predicted trajectory
40
Trajectory Computation Error in 100 Bins
41
  • Structure of the presentation
  • Background
  • The algorithm
  • Analysis
  • Case 1 motor data analysis
  • Case 2 auditory data analysis
  • Case 3 supplier change control
  • A qualitative comparison
  • Advanced issue topology preserving
  • Conclusions

42
Guinea pig auditory cortical data analysis to
interpret its perception of sound
Collaborators Russ Witte , Jing Hu, Daryl Kipke
43
Surgical implant, neural recordings
Sample Signal from a single electrode
Microvolts
Time (sec)
Each of 60 frequencies spanning 6 octaves were
repeated 10 times. Each stimulus interval lasted
700ms, including 200ms of tone on time and 500ms
of off time(interstimulus interval).
44
Raster Plot- 6s snapshot
45
Averaged Spike count of 22 chan
1701Hz
1473Hz
46
Spikerate of Channel1, Stimulus 1-60
47
Data Processing
  • Channel selection - 30 channels were selected
  • Bin width - Basic unit to hold spike count from
    experiment data, 5ms, 10ms, 20ms, or higher.
    70ms-bin size used
  • Noise filtering - Apply Gaussian filter to binned
    data.
  • Frequency grouping 12 out of 60 stimuli are
    selected. It is therefore approximately one
    frequency per half Octave.
  • Trial loop selection/leave-1-out - Among the 10
    loops of experimental data, take 9 for training,
    leave one for testing

48
SOM Training and Testing
  • Input vector Bins from all channels are
    combined. Training/testing patterns are from all
    loops.
  • Output 2-Dimensional grid. The position of each
    node in the map corresponds to certain spike
    pattern.
  • Training parameters map size (eg 10 ?10),
    learning rate (0.02/0.0001), neighborhood
    function (Gaussian) and radius (eg 10/0),
    training/fine-tuning epochs, reducing
    schedules.etc.
  • Calibration after training label the output
    using the labels (frequencies) of the training
    data

49
Tonotopic maps natural vs. SOM
1473
1028
717
Neuronal activities of one channel that leads to
the predicted stimulus map - Preserved topology
Auditory cortex map
Channel 12 has a narrow tuning curve. It is
mostly tuned into 700-1500Hz auditory tones.
50
Results from 10 sessions using leave_k_out
51
  • Structure of the presentation
  • Background
  • The algorithm
  • Analysis
  • Case 1 motor data analysis
  • Case 2 auditory data analysis
  • Case 3 supplier change control
  • A qualitative comparison
  • Advanced issue topology preserving
  • Conclusions

52
SOM for supplier change control
analysis Collaborator Rob Davis
53
SOM Analysis Methods
  • UMatrices were used to find clusters in the data
  • Hit histograms were used to see the density of
    the input data on the organized output map
  • Empty maps with a label for all the data vectors
    were used to check the organization
  • Component planes were used to determine the
    characteristics of the clusters

54
SOM - UMatrix Computation
55
UMatrix Example
56
Methodology - Knowledge Discovery
57
Description of Data Used
  • Supplier change control data
  • Gun shop analogy for change control
  • The data used describes a specific review
    process, but the methods outlined could be
    applied to data describing virtually any process

58
Data Fields
  • MHSNum unique labels
  • Supplier supplier name and location
  • Commodity group within Manufacturing Inc.
  • Title title of change
  • Description description of change
  • Platform the platform that the product is used
    in
  • Reason reason for the change being proposed
  • InternalChangeClass internal risk level
    identifier
  • SupplierChangeClass external risk level
    identifier
  • ReviewBody name of the internal group reviewing
    the change
  • Changestatus the status of the change in the
    review process

59
Data Fields (Continued)
  • Owner internal contact for the proposed change
  • PropImpDate the proposed implementation date
    for the change
  • PWPDate the date that the preliminary white
    paper was submitted
  • PWPApproval the date that the PWP was approved
  • FWPDate the date that the final white paper was
    submitted
  • FWPApproval the date that the FWP was approved
  • ActualImpDate the date that the change was
    actually implemented

60
Coding the Data Main Concerns
  • Weight given to each variable
  • Resolution of variables
  • How to represent variables with multiple values
    (platforms,owners)
  • How to represent the date information

61
Initial Results Hit Histogram
  • Multiple platforms or owners were given contrived
    fractional values
  • Date variables excluded, as well as title and
    description

62
Initial Results Labels
63
Initial Results Component Planes
64
Coding the Data - Solutions
  • 10 x 10 Pre-SOMs were used for variables with
    multiple values (platforms, owners)
  • BMUs of the platform and owner pre-SOMs were used
    as two inputs in the main SOM, one for each digit
    of the BMU
  • Two inputs were used for the supplier variable
    one for each digit
  • Dates were analyzed separately, then the
    differences between them were used as inputs to
    the main SOM

65
Updated Dataset
  • With continuous data, at some point there needs
    to be an update of the dataset
  • The original data was taken from the database
    1/12/01, and an updated dataset was retrieved on
    3/7/01.
  • The original dataset had 112 records, the updated
    one had 145 records. The original dataset had 43
    records with date information, and updated one
    had 66.

66
Dealing With the Dates
  • Date format YYWW, where YY is the last two
    digits of the year and WW is the work week
    (1-53)
  • Dates were coded and put into their own SOM 3
    variables for each date (one for the year and one
    for each of the two work week digits) results
    were organized, but so what?
  • Then the differences in the dates were analyzed
    using charts and histograms.

67
Proposed Actual Implementation Chart
68
Proposed Actual Implementation Histogram
69
Throughput Times Chart
70
PWP Process Time Histogram
71
PWP to FWP Time Histogram
72
FWP Process Time Histogram
73
FWP Approval to Implementation Histogram
74
Putting It All Together
  • In order to discover some useful knowledge from
    an SOM, there are a couple things that are
    necessary
  • The data needs to be accurately represented
  • Some measure of results or efficiency has to be
    included in the data in this case the date
    information is a measure of the efficiency of the
    change control process

75
Including the Dates
  • One reason for getting an updated dataset was to
    get more records with date information
  • When the dates were included in the main SOM,
    only the records with date information were used
    (66 in the new dataset), so that they would be
    accurately represented (vs. using average values
    for most of the data in the date fields)

76
Final Results - Dates Included
  • The dataset used for the final results includes
    17 variables and 66 records
  • Five inputs were used for the differences between
    the dates (in weeks)

77
Final Results Labels
78
Final Results - Component Planes
79
Analyzing the Results
  • The clusters identified from the UMatrix can be
    examined in the component planes to see if there
    are any that we can label good or bad, based
    on the date inputs, which are measures of
    efficiency
  • The most extreme bad cluster can be seen in the
    upper right corner (see the component plane for
    Prop-ActualImp)
  • A possible cause for the bad results in the
    cluster may be found in other component planes
    (by looking for correlation between clusters)

80
Simplifying the Analysis
  • The understanding gained by examining the
    component planes can then be put into a simple
    table showing possible causes for the good or
    bad results
  • Because the dataset used in this analysis was
    small, the major indicators resulting from SOM
    analysis are also small

81
Major Indicators - Possible Causes
  • Implementation of change was at least 3 months
    behind schedule
  • FWP took over a month to review and approve
  • FWP took over 4 months to be submitted after PWP
  • Supplier was number 28
  • Review Body was number 3 or 5

82
Conclusions
  • Pre-Processing is the largest barrier to
    application of the SOM for data mining
  • The Rob Davis thesis offers solutions to
    representing a few different types of data
  • Some measure of results for the data needs to be
    included in order to get meaningful conclusions
    from analysis

83
  • Structure of the presentation
  • Background
  • The algorithm
  • Analysis
  • Case 1 motor data analysis
  • Case 2 auditory data analysis
  • Case 3 supplier change control
  • A qualitative comparison
  • Advanced issue topology preserving
  • Conclusions

84
A qualitative comparison
85
(No Transcript)
86
Decision tree methods
  • Human understandable representation The decision
    tree can be transferred into a set of if-then
    rules to improve human readability.
  • Robust to errors in classification of the
    training set.
  • The testing set and pruning algorithms are used
    to minimize the tree size and the
    mis-classification.
  • Some can be used even when some training examples
    have unknown values.

87
Discriminant analysis methods
  • Perform well when the classes can be
    approximately described by multivariate normal
    distributions with identical covariance matrices.
  • Limitations and Dangers
  • Primarily effective for continuous explanatory
    variables
  • The use of the Mahalanobis distance to the
    centroid of a class will lead to high error rates
    if unusual shaped classes were encountered.
  • Dependent on the estimate of the covariance matrix

88
Multiple partition decision tree
  • Limit to ordinal or nominal data and execution
    speed is an issue when continuous variables are
    used.
  • High error rates if classes are shaped like two
    circles but with overlapping projections on some
    axes.
  • Over-fitting with continuous data
  • Big (middle) size of the decision tree greatly
    decreases the interpretability

89
  • Structure of the presentation
  • Background
  • The algorithm
  • Analysis
  • Case 1 motor data analysis
  • Case 2 auditory data analysis
  • Case 3 supplier change control
  • A qualitative comparison
  • Advanced issue topology preserving
  • Conclusions

90
Topology distortion by SOM
Dimension mismatch (left) and structure mismatch
(right)when using a predetermined SOM to learn
input having a spiral topology (left) or several
disconnected manifolds (right)
91
Mapping from to G
Definition of topology preserving
92
Inverse mapping from G to
93
Definition of topology preserving
94
The DTRN Structure
  • Each node i, i1, 2, , L, has an associated

weight vector
  • Each node has synnaptic links Sisi1,si2, ,siL
    and an age factor titi1,ti2, ,tiL.
  • If sij,1, then I and j are adjacent.
  • L, sij, and Wi are dynamicaly updates during the
    learning process, i, j1, 2, , L.

95
The DTRN algorithm
Step0 Initialization start with only one output
node. Step1 Find the first and second Winners
96
The DTRN algorithm (continue)
97
Topology preserving
Using the Hebbian competitive learning rule
98
Dynamic Size
99
Annealing and clustering
100
Four Sets of 2D data
101
Simulations by DTRN
102
  • Structure of the presentation
  • Background
  • The algorithm
  • Analysis
  • Case 1 motor data analysis
  • Case 2 auditory data analysis
  • Case 3 supplier change control
  • A qualitative comparison
  • Advanced issue topology preserving
  • Conclusions

103
Conclusions
  • SOM - a general purpose clustering tool with
    topology preserved from input data
  • It also provides visualization capability
  • It is a statistical tool
  • Topology can become an issue in some cases, but
    new algorithms are available
  • Preprocessing is key in many successful
    applications
Write a Comment
User Comments (0)
About PowerShow.com