Title: The Physics of the Brain
1Learning online with discrete, bounded synaptic
weights.
2Associative memory Hetero associative
Auto associative
3Associative memory - Storage Matrix memory
associate vectors xi with vectors yi, where the
upper index denotes the pattern number. A simple
way of forming a weight matrix is Or in
vector form
4Simplest case orthonormal input vectors
Random binary (not orthonormal) case
xi,yi(1,-1)
5The weight matrix
The field
Recall
Capacity What is the largest number P such that
there is still a high probability that This
number Pmax defines the capacity aPmax/N
6Where the noise term is
The question we ask is what is the probability
that the noise will flip the sign of the
signal. For a large P the noise term will
approach a Gaussian distribution. Assume x and y
are chosen randomly and independently with
prob(xi1)prob(yi1)1/2. The mean of the noise
term is 0, what is its variance?
7What is the variance of the noise term ?? There
are N(P-1)NP terms, with equal probability of
being 1 and -1. The probability of getting K, 1
terms is Which gives the value
?(K-(NP-K))/N(2K-NP)/N The random variable K
has a mean 0 and variance 0f (1/4)NP. What is
the variance of the random variable ? ?
8Show at home using the mean and variance of K,
that ? has a mean 0 and variance P/N. Remember
the variance is lt?2gt-lt ?gt2, Where lt gt denotes
average.
9For large NP, the binomial approaches the
Gaussian And then the variance of ? is Assume
for simplicity that The probability of a
mistake is the probability that the noise term is
less than -1, which is the same probability that
it exceeds 1. Define the error function
10(No Transcript)
11For large x The condition Becomes for
large N And therefore
12More complex dynamics, the Hopfield model a
model of an autoassociative memory Define an
Energy function
Dynamics
I would now like to show that these dynamics
always Reduce the energy function.
Assume the new state of unit i is changed
to And that wii0
13Assume Then Therefore if
then and
14If
then So Therefore these dynamics minimize this
energy function. In order to have convergence
this energy function must be bounded from below!
15- What does it do?
- This is an auto-associative network.
- It retrieves stored patterns.
- It can retrieve corrupted versions of stored
patterns. - It is also robust to noise added to the weight
matrix
16The capacity of a Hopfield network is Pmax0.14
N. Material synapses, may not be unbounded, and
might be discrete. What are the implications of
this? Batch learning with binary weights Or
if all positive If all negative and binary
17Show examples (note parallel/Little dynamics)
In these different discrete batch learning
paradigms we still have linear scaling of the
form
Pmax
Full W
Binary W
N
18- But
- Introducing Online learning
- Assumptions
- Synapses are bounded
- Synapses are discrete
19Assume synapses can be in a set of Q discrete
states
wmax
Q6
wmin
where
And Q is the number of discrete states
20Examples for different Q
21The effect of N on capacity
Pmax increases only very moderately with N
22The disappointing observation by Fusi and Amit is
that This is bad. Why? A rough argument due
to Fusi Drew and Abbott (2005) to explain
this. Variables r, the rate of plasticity
events, N number of neurons, and a new parameter
q. This parameters allows us to randomly choose
from all eligible plasticity events a subset that
will actually change the weights. Once a
plasticity event takes place it modifies
synapses. After a time t only
of these synapses will no be changed due t
oongoing activity
23Therefore the signal after a time t has passed
is We can assume (?) that the noise is on the
order of Therefore, the signal is equal to the
noise when so
is the time when the signal
becomes equal to the noise. Although this
looks bad, the value of t can be optimized by
changing q. How would you find this q?
24This equation is optimized for An extremely
small fraction. We then obtain the tmax value
of . However, this
relatively high capacity comes at a cost the
signal to nose ratio (S/noise) is small
If r0.2 Hz with 106 synapses would last 30 sec.
With a billion synapses one minute.
25Student Summary
26The Cascade model a possible solution
Fusi et. al 2005
Observation If we could forget with a power law
depndance (rather than exponential) we could
improve the scaling of memory. If
then The aim of
this model is to try and approach this limit.
The Cascade model
other variables
f1-f- is the
probability of
a potentiation
event given
a
plasticity
event
with x1/2
27The mean field equations
28Power law
(initial segment k3/4)
What does signal mean here?
29Therefore
Optimal number of elements in a cascade, as a
function of N
30- The cascade model Student summary
- Describe assumptions
- What is it good for?
- What does it mean in terms of real biology