Extending SpikeProp - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Extending SpikeProp

Description:

IJCNN, July 27, 2004. Benjamin.Schrauwen_at_elis.UGent.be. 1. IJCNN, ... Benjamin Schrauwen. Jan Van Campenhout. Ghent University. Belgium. Overview. Introduction ... – PowerPoint PPT presentation

Number of Views:169
Avg rating:3.0/5.0
Slides: 23
Provided by: escherEl
Category:

less

Transcript and Presenter's Notes

Title: Extending SpikeProp


1
Extending SpikeProp
  • Benjamin Schrauwen
  • Jan Van Campenhout
  • Ghent University
  • Belgium

2
Overview
  • Introduction
  • SpikeProp
  • Improvements
  • Results
  • Conclusions

3
Introduction
  • Spiking neural networks get increased attention
  • Biologically more plausible
  • Computationally stronger (W. Maass)
  • Compact and fast implementation in hardware
    possible (analogue and digital)
  • Have temporal nature
  • Main problem supervised learning algorithms

4
SpikeProp
  • Introduced by S. Bohte et al. in 2000
  • An error-backpropagation learning algorithm
  • Only for SNN using time-to-first-spike coding

5
Architecture of SpikeProp
  • Originally introduced by Natschläger and Ruf
  • Every connection consists of several synaptic
    connections
  • All 16 synaptic connections have enumerated
    delays (1-16ms) and different weights, originally
    same filter

6
SRM neuron
  • Modified Spike Response Model (Gerstner)

Neuron reset of no interest because only one
spike needed !
7
Idea behind SpikeProp
Minimize SSE between actual output spike time and
desired output spike time
Change weight along negative direction of the
gradient
8
Math of SpikeProp
Linearise around threshold crossing time
Only output layer given
9
Problems with SpikeProp
  • Overdetermined architecture
  • Tendency to get stuck when a neuron stops firing
  • Problems with weight initialisation

10
Solving some of the problems
  • Instead of enumerating parameters learn them
  • Delays
  • Synaptic time constants
  • Thresholds
  • We can use much more limited architecture
  • Add specific mechanism to keep neurons firing
    decrease threshold

11
Learn more parameters
  • Quite similar to weight update rule
  • Gradient of error with respect to parameter
  • Parameter specific learning rate

12
Math of the improvements - delays
Delta is the same as for weight rule, thus
different delta formula for output as for inner
layers.
13
Math of the improvements synaptic time constants
14
Math of the improvements - thresholds
15
What if training gets stuck?
  • If one of the neurons in the network stops
    firing training rule stops working
  • Solution actively lower threshold of neuron
    whenever it stops firing (multiply by 0.9)
  • Same as scaling all the weights up
  • Improves convergence

16
What about weight initialisation
  • Weight initialisation is a difficult problem
  • Original publication has vague description of
    process
  • S. M. Moore contacted S. Bohte personally for
    clarifying the subject for his masters thesis
  • Weight initialisation is done by a complex
    procedure
  • Moore concluded that weights should be
    initialized in such a way that every neuron
    initially fires, and that its membrane potential
    doesnt surpass the threshold too much

17
What about weight initialisation
  • In this publication we chose a very simple
    initialisation procedure
  • Initialise all weights randomly
  • Afterwards, set a weight such that the sum of all
    weights is equal to 1.5
  • Convergence rates could be increased by using
    more complex initialisation procedure

18
Problem with large delays
  • During the testing of the algorithm a problem
    arose when the trained delays got very large
    delay learning stopped
  • If input is preceded by output problem
  • Solved by constraining delays

Output of neuron
Input of neuron
19
Results
  • Tested for binary XOR (MSE 1ms)

20
Results
  • Optimal learning rates (found by experiment)
  • Some rates seem very high, but that is because
    the values we work with are times expressed in ms
  • Idea that learning rate must be approx. 0.1 is
    only correct when input and weights are
    normalised !!

21
Conclusions
  • Because parameters can be learned, no enumeration
    is necesarry, thus architectures are much smaller
  • For XOR
  • 8 times less weights needed
  • Learning converges faster (50 of original)
  • No complex initialisation functions
  • Positive and negative weights can be mixed
  • But convergence deteriorate with further
    reduction of weights

22
Conclusions
  • Technique only tested on small problem, should be
    tested on real world applications
  • But, we are currently preparing a journal paper
    on a new backprop rule that
  • supports a multitude of coding hypotheses
    (population coding, convolution coding, ...)
  • better convergence
  • simpler weight initialisation
  • ...
Write a Comment
User Comments (0)
About PowerShow.com