Title: Extending SpikeProp
1Extending SpikeProp
- Benjamin Schrauwen
- Jan Van Campenhout
- Ghent University
- Belgium
2Overview
- Introduction
- SpikeProp
- Improvements
- Results
- Conclusions
3Introduction
- Spiking neural networks get increased attention
- Biologically more plausible
- Computationally stronger (W. Maass)
- Compact and fast implementation in hardware
possible (analogue and digital) - Have temporal nature
- Main problem supervised learning algorithms
4SpikeProp
- Introduced by S. Bohte et al. in 2000
- An error-backpropagation learning algorithm
- Only for SNN using time-to-first-spike coding
5Architecture of SpikeProp
- Originally introduced by Natschläger and Ruf
- Every connection consists of several synaptic
connections - All 16 synaptic connections have enumerated
delays (1-16ms) and different weights, originally
same filter
6SRM neuron
- Modified Spike Response Model (Gerstner)
Neuron reset of no interest because only one
spike needed !
7Idea behind SpikeProp
Minimize SSE between actual output spike time and
desired output spike time
Change weight along negative direction of the
gradient
8Math of SpikeProp
Linearise around threshold crossing time
Only output layer given
9Problems with SpikeProp
- Overdetermined architecture
- Tendency to get stuck when a neuron stops firing
- Problems with weight initialisation
10Solving some of the problems
- Instead of enumerating parameters learn them
- Delays
- Synaptic time constants
- Thresholds
- We can use much more limited architecture
- Add specific mechanism to keep neurons firing
decrease threshold
11Learn more parameters
- Quite similar to weight update rule
- Gradient of error with respect to parameter
- Parameter specific learning rate
12Math of the improvements - delays
Delta is the same as for weight rule, thus
different delta formula for output as for inner
layers.
13Math of the improvements synaptic time constants
14Math of the improvements - thresholds
15What if training gets stuck?
- If one of the neurons in the network stops
firing training rule stops working - Solution actively lower threshold of neuron
whenever it stops firing (multiply by 0.9) - Same as scaling all the weights up
- Improves convergence
16What about weight initialisation
- Weight initialisation is a difficult problem
- Original publication has vague description of
process - S. M. Moore contacted S. Bohte personally for
clarifying the subject for his masters thesis - Weight initialisation is done by a complex
procedure - Moore concluded that weights should be
initialized in such a way that every neuron
initially fires, and that its membrane potential
doesnt surpass the threshold too much
17What about weight initialisation
- In this publication we chose a very simple
initialisation procedure - Initialise all weights randomly
- Afterwards, set a weight such that the sum of all
weights is equal to 1.5 - Convergence rates could be increased by using
more complex initialisation procedure
18Problem with large delays
- During the testing of the algorithm a problem
arose when the trained delays got very large
delay learning stopped - If input is preceded by output problem
- Solved by constraining delays
Output of neuron
Input of neuron
19Results
- Tested for binary XOR (MSE 1ms)
20Results
- Optimal learning rates (found by experiment)
-
-
-
-
- Some rates seem very high, but that is because
the values we work with are times expressed in ms - Idea that learning rate must be approx. 0.1 is
only correct when input and weights are
normalised !!
21Conclusions
- Because parameters can be learned, no enumeration
is necesarry, thus architectures are much smaller - For XOR
- 8 times less weights needed
- Learning converges faster (50 of original)
- No complex initialisation functions
- Positive and negative weights can be mixed
- But convergence deteriorate with further
reduction of weights
22Conclusions
- Technique only tested on small problem, should be
tested on real world applications - But, we are currently preparing a journal paper
on a new backprop rule that - supports a multitude of coding hypotheses
(population coding, convolution coding, ...) - better convergence
- simpler weight initialisation
- ...