Title: Learning Rules 1
1Learning Rules 1
- Computational Neuroscience 03
- Lecture 8
2Last week showed how to model neurons and
networks of neurons using firing rate models.
And we then discussed how to add them together to
form networks of neurons
3Feedforward and Recurrent networks
Where the weight vector is replaced by a matrix.
Also often replace feedforward input with a vector
4Recurrent networks can also do this but have much
more complex dynamics than feedforward nets. Also
more difficult to analyse Much analysis focuses
on looking at the eigenvectors of the matrix
M Can show for instance that networks can exhibit
selective amplification if there is one dominant
eigenvector (cf PCA)
5Or if an eigenvalue is exactly equal to 1 and
others lt 1can get integration of inputs and
therefore persistent activity as activity does
not stop when input stops
But how are we to generate such precise weight
changes? Need some synaptic modification rules
6Hebbs postulate
Hebb's postulate of learning (or simply Hebb's
rule) (1949), is the following "When an axon of
cell A is near enough to excite cell B and
repeatedly or persistently takes part in firing
it, some growth processes or metabolic changes
take place in one or both cells such that A's
efficiency as one of the cells firing B, is
increased". This rule forms the basis of much of
the research on role of synaptic plasticity in
memory and learning Has been generalised to
include decreases of strength when neuron A
repeatedly fails to be involved in activation of
B and generally look at the correlation or
covariance of activities of pre-and postsynaptic
neurons
7Learning in the hippocampus
Here see examples of long-term potentiation and
depression (LTP and LTD). High frequency
stimulation induces LTP while long-lasting low
frequency leads to LTD Long Term refers to
changes gt 10 mins
8Learning Rules
- We will consider 3 types of learning and will
focus on Hebbian type learning though others
exist (modifcations on pre/postsynaptic activity
only etc) - Unsupervised learning Network responds during
training solely as a result of its connections
and intrinsic dynamics. Net self-organises in a
manner dependent on inputs and synaptic
plasticity rule - Supervised learning Here the network also has a
teacher in the form of a set of desired
target outputs for each input. Not especially
biologically plausible, but good for existence
proofs - Reinforcement learning Somewhere in between the
2. The net does not know the target output but
gets feedback via reward punishment
9Unsupervised Learning
Start with single postsynaptic neuron and a
linear activation function. As synaptic changes
will be much longer time scale than these
dynamics firing rate equation reduces to
Start with simplest Hebbian style plasticity for
a single neuron
10As u, v denote firing rates vu can be interpreted
as a probability that pre- and post fire
together. Each different set of u is known as an
input pattern. As weight changes are slow, rather
than summing all changes separately can average
the input patterns and thus compute the average
change. To do this use lt gt to denote averages
over the ensemble of input patterns. Thus get
Remembering that v w.u this gives the
correlation-based rule
Where Q is correlation matrix Qbb ltububgt
11eg for 2 patterns u1 and u2, Qij ½(u1i u1j
u2i u2j)
So suppose u1 (1, 0) and u2 (0, 1) then
Alternatively u1 (1, 1) and u2 (1, 0) then
Suppose tw10 and w(0.1,0.1) and we have the 2nd
matrix then
As wn1 wn Dw, w1(0.115, 0.11), w2(0.13,
0.12) .
Which leads to instability and uncontrolled
growth of w
12W(0.1, 0.1)
W(0.1, -0.3)
Outcome dependent on eiegenvectors of Q and
initial conditions, but unstable
13To avoid unbounded growth can impose a saturation
constraint. However, this means all weights go
to max or min and thus we have no competition
between different synapses This means that the
neuron cannot distinguish between presynaptic
inputs Also since u, v are firing rates and
therefore positive, Hebb rule only describes LTP.
However, earlier figure showed that synapses can
depress in strength if presynaptic activity is
accompanied by a low level of postsynaptic
activity Can also get results where opposite is
true
14Analysis focuses on looking at eigenvectors of Q
the correlation matrix of the input vectors For
instance, can show that after training Hebbian
rule leads to v a e1.u for arbitary vector u and
that weight vector expressed as a sum of
eigenvectors is dominated by e1 ie w a e1 That is
v is projection of u onto the principal
eigenvector of Q
Eg for a Q with principal eigenvector (1,
-1)/sqrt(2) we would expect w to end up as
(wmax,0) or (0, wmax). However because of
saturation constraints can get (wmax, wmax)
15Covariance rule
To get LTD can introduce a postsynaptic or
presynaptic threshold
Below thresholds get depression, above
potentiation. A convenient choice for the
thresholds is the average pre/post synaptic input
ltugt or ltvgt. If we now replace v with w.u we get
the covariance rule
Where C is the covariance matrix of the input
data ie
16Note that as ltvgt keeps changing we need to keep
updating the postsynaptic threshold while the
presynaptic one is independent of the weights
Although both average to the same thing, they do
have differences. The postsynaptic threshold
means that only modifies weights for non-zero
presynaptic activities. If v is below threshold
then this results in homosynaptic
depression Alternatively, presynaptic threshold
reduces the strength of inactive synapses for
vgt0 heterosynaptic depression Although the
covariance rule allows LTD it is still unstable
due to positive feedback Also we do not have
competition, but this can be introduced to
allowing threshold to slide as follows
17BCM rule
As covariance rule allows LTD without
postsynaptic/presynaptic activity, Bienenstock,
Cooper and Munro (82) proposeed an alternative
for which there is experimental evidence where
the postsynaptic threshold is dynamic
18This is again unstable if q is fixed. However,
if we allow the threshold to grow faster than v
we get stability. For instance use q as low pass
filtered version of v2
Usually set tq to be less than tw so that changes
in q faster than changes in v
Now get competition between synapses since
strengthening some synapses results in threshold
increasing meaning that it is harder for others
to be strengthened
19Synaptic normalisation
A more direct way of enforcing competition is
through synaptic normalisation Idea is that
postsynaptic neuron can only support a certain
amount of total synaptic weight so strengthening
one leads to weakening others Can either hold the
sum of weights constant if all are ve or ve or
can constrain the sum of squares of the weights
(cf ANN network pruning) 2 types subtractive
normalisation and multiplicative normalisation
20Subtractive normalisation
Where n is a vector of ones of length Nu so n.u
is the sum of all the inputs u. Thus the second
term is simply a vector Nu long with the same
values in ie (k, k, .., k) whose sum over all
the elements is equal to the sum over all the
elements of vu. Thus the total increase in the
weights is 0 This rule must be augmented by a
saturation constraint to prevent the weights
becoming negative, that if a weight becomes zero,
it is not moved downwards Also, without upper
saturation often leads to all weights bar one
being zero. Note also rule involves global
knowledge of weights
21Multiplicative normalisation
Where a is a ve constant known as Ojas rule
(1982) This rule is more local than previous as
it only involves the weight in question and
pre-and post synaptic activities. However, its
form is based on theoretical arguments rather
than experimental data Previous rule was rigid as
it had to be satisfied at all times whereas this
is more dynamic with w2 gradually relaxing to
1/ a This induces competition as if one weight
increases, the maintenance of constant length of
the weight vector forces others to decrease
22Both Hebbian and Ojas rule run for long enough
generate vectors parallel to principal
eigenvector of correlation matrix as in A This is
basically principal component analaysis (PCA)
which is theoretically the optimal in terms of
retaining info way to encode high dimensional
info onto lower dimensional subspace
However, B shows what happens if input vectors
dont have zero mean (as in real systems), but
this problem is alleviated by using
covariance-based rules
23Timing based rules
Previous rules dont take timing into account Can
be crucial since if pre-synaptic spike occurs
after postsynaptic get LTD rather than LTP
24Therfore need to integrate over time as in the
following
Where H is a function like the solid line in
previous figure Such functions still require
saturation constraints but timing can generate
competition
25Multiple Postsynaptic Neurons
Can extend the rules defined previously to nets
with multiple postsynaptic neurons. In these
networks the output rates are
Thus
where
And the Hebbian rule becomes
26Can also use feature-based models where the net
is indexed by input features rather than buy
individual inputs Network models can have
adaptive feedforward wieghts and fixed recurrent
ones, or vice versa or both layers can be
adaptive Can get competition through mainly
inhibitory recurrent connections Can then get eg
self-organising maps and elastic nets where K can
have forms like