PPT – WK7 PowerPoint presentation | free to download

About This Presentation

Title:

WK7

Description:

Typical examples are: Hebb s hypothesis: In the simplest case we have just the product of the two signals (it is also called the activity product rule): Hebb. – PowerPoint PPT presentation

Number of Views:23

Avg rating:3.0/5.0

Slides: 42

Provided by: geo92

Category:

Tags: hebb | wk7

more less

Transcript and Presenter's Notes

Title: WK7

1
WK7 Hebbian Learning
CS 476 Networks of Neural Computation WK7
Hebbian Learning Dr. Stathis Kasderidis Dept. of
Computer Science University of Crete Spring
Semester, 2009
2
Contents

Introduction to Hebbian Learning
Definitions on Pattern Association
Pattern Association Network
Formal Theory of Associations Building
Correlations
Examples
Conclusions

Contents
3
Hebbian Learning

Hebbian Learning is a learning rule which is the
oldest and most famous of all learning rules. It
was postulated by Donald Hebb (1949) in his book
(The Organisation of Behaviour)
When an axon of cell A is near enough to excite
a cell B and repeatedly or persistently takes
part in firing it, some growth process or
metabolic changes take place in one or both cells
such as As efficiency as one of the cells firing
B, is increased
Hebb proposed this change as a basis of
associative learning. We may expand this as a
two-part rule

Hebb. Learn.
4
Hebbian Learning-1

If two neurons on either side of a synapse
(connection) are activated simultaneously then
the strength of that synapse is selectively
increased.
If two neurons on either side of a synapse are
activated asynchronously, then that synapse is
selectively weakened or eliminated.
Such a synapse is called a Hebbian synapse. More
precisely, we define a Hebbian synapse as a
synapse that uses a time-dependent, highly local,
and strongly interactive mechanism to increase
synaptic efficiency as a function of the
correlation between the pre-synaptic and
post-synaptic activities.

Hebb. Learn.
5
Hebbian Learning-2

We analyse the four key mechanisms mentioned
above
Time dependent mechanism This mechanism refers
to the fact that the modifications in the synapse
depend on the exact time of occurrence of the
pre-synaptic and post-synaptic signals
Local mechanism By its very nature, a synapse is
the transmission site where information-bearing
signals are in spationtemporal contiguity. This
locally available information is used by the
synapse to produce a local modification that is
input specific
Interactive Mechanism The occurrence of a change
in a synapse depends on signals on both

Hebb. Learn.
6
Hebbian Learning-3

sides of the synapse. That is, a Hebbian form of
learning depends on a true interaction between
the pre- and post-synaptic signals in the sense
that we cannot make any prediction from either
one of these two activities by itself. The
interaction may be deterministic of stochastic
Correlational mechanism The condition for a
change in synaptic efficiency is the
co-occurrence of pre- and post-synaptic signals.
The correlation over time between the two signals
is responsible for the synaptic change.

Hebb. Learn.
7
Hebbian Learning-4

We may classify synaptic modifications of a
synapse as
Hebbian which is a synapse that increases its
strength when positively correlated pre- and
post-synaptic signals are present and decreases
its strength when these signals are either
uncorrelated or negatively correlated
Anti-Hebbian Such a synapse weakens positively
correlated pre- and post-synaptic signals and
strengthens negatively correlated signals
Non-Hebbian It does not involve, in the
modification of a synapse, any mechanism that is
time dependent, highly local and strongly
interactive in nature (as in the previous cases).

Hebb. Learn.
8
Hebbian Learning-5

To formulate the Hebbian rule mathematically we
consider a weight wkj of a neuron k with pre-and
post-synaptic signal denoted by xj and yk
respectively. The adjustment to the weight wkj at
time step n is given by
?wkj (n)F(yk(n), xj(n))
where F(,) is a function of both signals. The
above formula can take many specific forms.
Typical examples are
Hebbs hypothesis In the simplest case we have
just the product of the two signals (it is also
called the activity product rule)

Hebb. Learn.
9
Hebbian Learning-6

?wkj (n)?yk(n) xj(n)
where ? is a learning rate. This form emphasises
the correlational nature of a Hebbian synapse.
However this simple rule leads to an exponential
growth of the weights (becomes unbounded). Thus
we need to mechanism to stop the unbounded
increase of the weights. One such is the
following.
Covariance hypothesis In this case we replace
the product of pre- and post-synaptic signals
with the departure of of the same signals from
their respective average values over a certain
time interval. If x and y is their

Hebb. Learn.
10
Hebbian Learning-7

time-averaged value then the covariance form is
defined by
?wkj (n)?(yk(n)-y) (xj(n)-x)
The covariance hypothesis allows for the
following
Convergence to a non-trivial state, which is
reached when xj(n)x or yk(n)y
Prediction of both synaptic potentiation (i.e.
increase in synaptic strength) and synaptic
depression (i.e. decrease in synaptic strength).

Hebb. Learn.
11
Pattern Association

An associative memory is a brain-like distributed
memory that learns associations. Association is a
known and prominent feature of human memory.
Association takes two forms
Auto-association Here the task of a network is
to store a set of patterns (vectors) by
repeatedly presenting them to the network. The
network subsequently is presented with a partial
description or distorted (noisy) version of the
original pattern stored in it, and the task is to
retrieve (recall) that particular pattern.
Hetero-association In this task we want to pair
an arbitrary set of input patterns to an
arbitrary set of output patterns.

Patt. Assoc.
12
Pattern Association-1

Auto-association involves the use of unsupervised
learning (Hebbian, Hopfield) while
hetero-association involves the use of
unsupervised (Hebbian) or supervised learning
(e.g. MLP/BP) approaches.
Let xk denote a key pattern applied to an
associative memory and yk denote a memorised
pattern. The pattern association performed by the
network is described by
xk ? yk , k1,2,,q
Where q is the number of patterns stored in the
network. The key pattern xk acts as a stimulus
that not only determines the storage location of
memorised pattern yk but also holds the key for
its retrieval.

Patt. Assoc.
13
Pattern Association-2

In an auto-associative memory, yk xk , so the
input and output spaces have the same
dimensionality. In a hetero-associative memory,
yk ? xk , hence in this case the dimensionality
of the output space may or may not equal the
dimensionality of the input space.
There are two phases involved in the operation of
the associative memory
Storage phase which refers to the training of
the network in accordance with a suitable rule
Recall phase which involves the retrieval of a
memorised pattern in response to the presentation
of a noisy version of a key pattern to the
network.

Patt. Assoc.
14
Pattern Association-3

Let the stimulus x (input) represent a noisy
version of a key pattern xj. This stimulus
produces a response y (output). For perfect
recall, we should find that y yj where yj is the
memorised pattern associated with the key pattern
xj. When y ? yj for x xj , the associative
memory is said to have made an error in recall.
The number q of patterns stored in an associative
memory provides a direct measure of the storage
capacity of the network. In designing an
associative memory, we want to make the storage
capacity q (expressed as a percentage of the
total number N of neurons) as large as possible
and yet insist that a large fraction of the
patterns is recalled correctly.

Patt. Assoc.
15
Pattern Association Network

A pattern associator is a network which is able
to learn hetero-associations of two patterns. A
schematic representation is given below

Associator
16
Pattern Association Network-1

The net input that arrives to every unit is
calculated as
Where i is an output neuron and j an index of an
input neuron. The dimensionality of the input
space is N and of the output space is M. wij is
the weight from neuron j to neuron i. aj is the
activation of a neuron j.
The activation of each neuron is produced by
using a suitable threshold function and a
threshold. For example we can assume that the
activations are binary (i.e. either 0 or 1) and
to achieve this we use the step

Associator
17
Pattern Association Network-2

function
The training of the network takes place by using
for example the Hebbian form. Thus what we have
is a matrix of weights, with all of them zero
initially, assuming an input pattern of (101010)
and an output pattern (1100)

Associator
18
Pattern Association Network-3

If we assume a learning rate ?1 and after a
single learning step we get
To recall from the matrix we simply apply the
input pattern and we perform matrix
multiplication of the weight matrix with the
input vector. We get in our example

Associator
19
Pattern Association Network-4

If we assume a threshold of 2 we can get the
correct answer (1100) using a step function as
activation function.
We can learn multiple associations using the same
weight matrix. For example assume that a new
input vector (110001) is given with corresponding
output

Associator
20
Pattern Association Network-5

vector as (0101). In this case after a single
presentation (with ?1) we will get an updated
weight matrix
Again we can get the correct output vectors when
we introduce the corresponding input vector

Associator
21
Pattern Association Network-6

Again by using the threshold of 2 and a step
function we can get the correct answers of (1100)
and (0101).
However, keep in mind that there is only a
limited number of patterns which can be stored
before perfect recall fails. Typical capacity of
an associator network is 20 of the total number
of neurons.

Associator
22
Pattern Association Network-7

Recall accuracy reflects the similarity of a key
pattern with the stored patterns. The network can
generalise in the sense when an input pattern is
not exactly the same with any of the stored
patterns, then it returns the (stored) patterns
which more closely resembles the input.
Properties of pattern associators
Generalisation
Fault Tolerance
Distributed representations are necessary for
generalisation and fault tolerance

Associator
23
Pattern Association Network-8

Prototype extraction and noise removal
Speed
Interference is not necessarily a bad thing (it
is the basis of generalisation).

Associator
24
Correlations

We have stated that the simple Hebb form creates
unbounded weights. One way to overcome this
problem was the covariance rule. A second one is
the Ojas rule. The latter rule has the benefit
that is closely related to the principal
components analysis method.
Let us restate the Hebb form for a single linear
unit in the output layer and for an input vector
with dimension larger than 1
?wi ?V?i
Where V is the activation of the output unit, and
?i is the activation of input neuron i. ? is the
learning rate.

Correlations
25
Correlations-1

This rule as it stands does not have any
(non-trivial) stable fixed point. To see this,
let us assume for the moment that it there are
(hypothetically) some fixed points. (A fixed
point is a pair of (V, ?) such that lt ?wgt0). In
this case we will have
0lt ?wigtltV?igtlt?jwj?j?igt?jCijwjCw
Where the angle brackets indicate an average over
the input distribution P(?) and we have defined
the correlation matrix C by
Cij?lt?i ?jgt
Or Cij?lt? ?Tgt

Correlations
26
Correlations-2

Several things should be noted for C
C is not the covariance matrix of the input,
which would be defined in terms of the means ?ilt
?igt as lt(?i - ?i)(?j - ?j)gt
C is symmetric, i.e. Cij Cji which implies that
the eigenvalues are real and the eigenvectors can
be taken as orthogonal
Because of the outer product form, C is positive
semi-definite, thus all its eigenvalues are
positive or zero.
Now let us return to the equation
Cw 0

Correlations
27
Correlations-3

This equation says that w is an eigenvector of C
with eigenvalue 0. But this will never be stable
because C has some positive eigenvalues. Thus we
conclude that there are only unstable fixed
points for the plain Hebb learning procedure.
One can prevent the divergence of the Hebbian
learning by constraining the growth of the weight
vector w. There are several methods how this can
be achieved
One way is to renormalise the new vector, wiawi
, of all the weights after each update, choosing
as such that w1

Correlations
28
Correlations-4

Another way is to clip the value of the weight at
a lower and higher bound, in other words to
constrain the value of the weight to higher or
lower value when tries to cross over these
values, i.e.
w- ? wi ? w
Another way is to use the Ojas rule. This will
examine next.
Oja has modified the plain Hebb rule in such a
way so as to make possible the weight vector to
approach a constant length w1, without having
to do any renormalisation by hand.
Moveover, w approaches an eigenvector of C with
largest eigenvalue ?max. We call this maximal

Correlations
29
Correlations-5

eigenvector.
Ojas modification corresponds to adding a weight
decay proportional to V2 to the plain Hebb rule
?wi ?V(?i-Vwi)
Note that this form looks like a delta rule where
the correction ?wi depends on the difference of
the actual input and the backpropagated output.
We state some properties of Ojas rule without
any proof
Unit length w1
Eigenvector direction w lies in the maximal

Correlations
30
Correlations-6

eigenvector direction of C
Variance maximisation w lies in a direction that
maximises ltV2gt
Other rules exist in the literature about the
modification of the plain Hebb rule. In most
cases these are more complex forms.

Correlations
31
Examples

Ex1- Hippocampal Model There has been strong
support up today to suggest that the brain area
known as hipocampus uses a Hebb style learning
for forming episodic memories.
A model which captures the interactions of the
hippocampus (DG / CA3 /CA1) with the immediate
surrounding regions (Entorhinal cortex,
Subiculum) and the neocortex areas is given below

Examples
32
Examples-1
Examples
33
Examples-2

The module details are as follows
Entorhinal cortex 600 neurons, each with 200
synapses and sparseness0.05
DG 1000 with 60 synapses each and
sparseness0.05
CA3 1000 neurons each with
200 recurrent synapses (from other CA3 neurons)
120 synapses from Entorhinal cortex
4 synapses from DG
With a sparseness0.05

Examples
34
Examples-3

CA1 1000 neurons 200 synapses each and
sparseness0.01
Sparseness is the number of activated neurons
when a new stimulus arrives. This is determined
by true data from the rat hippocampal area.
Input is coming to Entorhinal cortex
The connections from Ent. Cortex ? DG are trained
using Hebbian learning
DG is a competitive network
CA3 is an auto-association network
CA3 recurrent connections use Hebbian learning

Examples
35
Examples-4

The connections CA3 ? CA1 are trained with a
Hebbian rule
CA1 is a competitive network
The connections from CA1 ? Ent. Coertx use
Hebbian learning
Simulations of the model showed that one-shot
learning is possible and it matched well a number
of experimental data.

Examples
36
Examples-5

Ex2 VisNet This network is a model of
biological vision and tries to solve the problem
of position and view invariant representations
built from multiple views of the same object,
e.g. a human face.
It uses a hierarchical layered structure where
the neurons of a top layer are connected to
neurons of a previous layer by using receptive
fields of appropriate size. The fields are
progressively becoming wider as we move along the
hierarchy.
In each layer we have an array of 32x32 cells,
which use lateral inhibition in a competitive
network arrangement.

Examples
37
Examples-6

Forward connections from one layer to another are
trained by Hebbian-style learning.
Each cell receives 100 conenctions from the
previous layer with 67 probability that a
connections is coming from within 4 cells of the
distribution centre.
The architecture is shown below

Examples
38
Examples-7
Examples
39
Examples-8

The input to the model is an image of a face
which is then is convoluted with appropriate
filters so as to recognise different orientations
and edges in the input image. This corresponds
roughly to V1 brain area.
The learning law that is used, is a Hebbian rule
with a memory trace
?wkj (n)?ak(n) mj(n)
mi(n)(1-?)ai(n) ?mi(n-1)
Where ? is a constant which determines the
contribution of memory and of current activation.
ai(n)is the activation of the neuron at time n
and is calculated in the usual way.

Examples
40
Examples-9

The model successfully provides recognition of
faces in different angles and positions in the
input image. For more details one has to see the
literature (Rolls Treves, 1998)

Examples
41
Conclusions

Hebbian learning is the oldest learning law
discovered in neural networks
It is used mainly in order to build associators
of patterns.
The original Hebb rule creates unbounded weights.
For this reason there are other forms which try
to correct this problem. There are also temporal
forms of the Hebbian rule. A hybrid case is the
memory case presented before in the VisNet case.
It has wide applications in pattern association
problems and models of computational neuroscience
cognitive science.

Conclusions

Write a Comment

User Comments (0)