Self Organization: Hebbian Learning - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

Self Organization: Hebbian Learning

Description:

Deflation method: subtract the principal component from the input ... Sanger's rule is a modification of Oja's rule that implements the deflation method for PCA ... – PowerPoint PPT presentation

Number of Views:500

Avg rating:3.0/5.0

Slides: 25

Provided by: asimk

Category:

more less

Transcript and Presenter's Notes

Title: Self Organization: Hebbian Learning

1
Self Organization Hebbian Learning

CS/CMPE 333 Neural Networks

2
Introduction

So far, we have studied neural networks that
learn from their environment in a supervised
manner
Neural networks can also learn in an unsupervised
manner as well. This is also known as self
organized learning
Self organized learning discovers significant
features or patterns in the input data through
general rules that operate locally
Self organizing networks typically consist of two
layers with feedforward connections and elements
to facilitate local learning

3
Self-Organization

Global order can arise from local interactions
Turing (1952)
Input signal produces certain activity patterns
in network lt-gt weights are modified (feedback
loop)
Principles of self organization
Modification in weights tend to self-amplify
Limitation of resources leads to competition and
selection of the most active synapse and
disregard of less active synapse
Modifications in weights tends to cooperate

4
Hebbian Learning

A self-organizing principle was proposed by Hebb
in 1949 in the context of biological neurons
Hebbs principle
When a neuron repeatedly excites another neuron,
then the threshold of the latter neuron is
decreased, or the synaptic weight between the
neurons is increased, in effect increasing the
likelihood of the second neuron to excite
Hebbian learning rule
?wji ?yjxi
There is no desired or target signal required in
the Hebbian rule, hence it is unsupervised
learning
The update rule is local to the weight

5
Hebbian Update

Consider the update of a single weight w (x and y
are the pre- and post-synaptic activities)
w(n 1) w(n) ?x(n)y(n)
For a linear activation function
w(n 1) w(n)1 ?x2(n)
Weights increase without bounds. If initial
weight is negative, then it will increase in the
negative. If it is positive, then it will
increase in the positive range
Hebbian learning is intrinsically unstable,
unlike error-correction learning with BP algorithm

6
Geometric Interpretation of Hebbian Learning

Consider a single linear neuron with p inputs
y wTx xTw
and
?w ?x1y x2y xpyT
The dot product can be written as
y wx cos(a)
a angle between vectors x and w
If a is zero (x and w are close) y is large. If
a is 90 (x and w are far) y is zero.

7
Similarity Measure

A network trained with Hebbian learning creates a
similarity measure (the inner product) in its
input space according to the information
contained in the weights
The weights capture (memorizes) the information
in the data during training
During operation, when the weights are fixed, a
large output y signifies that the present input
is "similar" to the inputs x that created the
weights during training
Similarity measures
Hamming distance
Correlation

8
Hebbian Learning as Correlation Learning

Hebbian learning (pattern-by-pattern mode)
?w(n) ?x(n)y(n) ?xT(n)x(n)w(n)
Using batch mode
?w(n) ?Sn1 Nx(n)xT(n)w(0)
The term Sn1 Nx(n)xT(n) is sample approximation
of the auto-correlation of the input data
Thus Hebbian learning can be thought of learning
the auto-correlation of the input space
Correlation is a well-known operation in signal
processing and statistics. In particular, it
completely describes signals defined by Gaussian
distributions
Applications in signal processing

9
Ojas Rule

The simple Hebbian rule causes the weights to
increase (or decrease) without bounds
The weights need to be normalized to one as
wji(n 1) wji(n) ?xi(n)yj(n) /
vSiwji(n) ?xi(n)yj(n)2
This equation effectively imposes a constraint on
the weights that the sum at a neuron be equal to
1
Oja approximated the normalization (for small ?)
as
wji(n 1) wjin) ?yj(n)xi(n) yj(n)wji(n)
This is Ojas rule, or the generalized Hebbian
rule
It involves a forgetting term that prevents the
weights from growing without bounds

10
Ojas Rule Geometric Interpretation

The simple Hebbian rule finds the weight vector
with the largest variance with the input data.
However, the magnitude of the weight vector
increases without bounds
Ojas rule has a similar interpretation
normalization only changes the magnitude while
the direction of the weight vector is same
Magnitude is equal to one
Ojas rule converges asymptotically, unlike
Hebbian rule which is unstable

11
(No Transcript)
12
The Maximum Eigenfilter

A linear neuron trained with Ojas rule produces
a weight vector that is the eigenvector of the
input auto correlation matrix, and produces at
its output the largest eigenvalue
A linear neuron trained with Ojas rule solves
the following eigen problem
Re1 ?1e1
R auto-correlation matrix of input data
e1 largest eigenvector which corresponds to the
weight vector w obtained by Ojas rule
?1 largest eigenvalue, which corresponds to the
networks output

13
Principal Component Analysis (1)

Ojas rule when applied to a single neuron
creates a principal component in the input space
in the form of the weight vector
How can we find other components in the input
space with significant variance ?
In statistics, PCA is used to obtain the
significant components of data in the form of
orthogonal principal axes
PCA is also known as K-L filtering in signal
processing
First proposed in 1901. Later developments
occurred in the 1930s, 1940s and 1960s.
Hebbian network with Ojas rule can perform PCA

14
Principal Component Analysis (2)

PCA
Consider a set of vectors x with zero mean and
unit variance. There exist an orthogonal
transformation y QTx such that the covariance
matrix of y is ? EyyT
?ij ?i if i j and ?ij 0 otherwise
(diagonal matrix)
?1 gt ?2 gt gt ?p eigenvalues of covariance
matrix of x (C ExxT
Columns of Q are the corresponding eigenvectors
Vector y is the principal component that has the
maximum variance with all other components

15
PCA Example
16
Hebbian Network for PCA
17
Hebbian Network for PCA

Procedure
Use Ojas rule to find the principal component
Project the data orthogonal to the principal
component
Use Ojas rule on the projected data to find the
next major component
Repeat the above for m lt p (m desired
components p input space dimensionality)
How to find the projection onto orthogonal
direction?
Deflation method subtract the principal
component from the input
Ojas rule can be modified to perform this
operation Sangers rule

18
Sangers Rule

Sangers rule is a modification of Ojas rule
that implements the deflation method for PCA
Classical PCA involves matrix operations
Sangers rule implements PCA in an iterative
fashion for neural networks
Consider p inputs and m outputs, where m lt p
yj(n) Si1 p wji(n)xi(n) j 1, m
and, the update (Sangers rule)
?wji(n) ?yj(n)xi(n) yj(n) Sk1 j
wki(n)yk(n)

19
PCA for Feature Extraction

PCA is the optimal linear feature extractor. This
means that there is no other linear system that
is able to provide better features for
reconstruction.
PCA may or may not be the best preprocessing for
pattern classification or recognition.
Classification requires good discrimination which
PCA might not be able to provide.
Feature extraction transform p-dimensional input
space to an m-dimensional space (m lt p), such
that the m-dimensions capture the information
with minimal loss
The error e in the reconstruction is given by
e2 SiM1 p ?i

20
PCA for Data Compression

PCA identifies an orthogonal coordinate system
for the input data such that the variance of the
projection on the principal axis is largest,
followed by the next major axis, and so on
By discarding some of the minor components, PCA
can be used for data compression, where a
p-dimension (bit) input is encoded in a m lt p
dimensional space
Weights are computed by Sangers rule on typical
inputs
The de-compressor (receiver) must know the
weights of the network to reconstruct the
original signal
x WTy

21
PCA for Classification (1)

Can PCA enhance classification ?
In general, no. PCA is good for reconstruction
and not feature discrimination or classification

22
PCA for Classification (2)
23
PCA Some Remarks

Practical uses of PCA
Data Compression
Cluster analysis
Feature extraction
Preprocessing for classification/recognition
(e.g. preprocessing for MLP training)
Biological basis
It is unlikely that the processing performed by
biological neurons in, say perception, involves
PCA only. More complex feature extraction
processes are involved.

24
Anti-Hebbian Learning

Modifying the Hebbian rule as
?wji(n) - ?xi(n)yj(n)
The anti-Hebbian rule find the direction in space
that has the minimum variance. In other words, it
is the complement of the Hebbian rule
Anti-Hebbian does de-correlation. It
de-correlates the output from the input
Hebbian rule is unstable, since it tries to
maximize the variance. Anti-Hebbian rule, on the
other hand, is stable and converges