CSC2535: Computation in Neural Networks Lecture 8: Hopfield nets - PowerPoint PPT Presentation

About This Presentation
Title:

CSC2535: Computation in Neural Networks Lecture 8: Hopfield nets

Description:

... of binary threshold units with recurrent connections are very hard to analyse. ... And can we analyze what unlearning achieves? Wishful thinking? ... – PowerPoint PPT presentation

Number of Views:165
Avg rating:3.0/5.0
Slides: 14
Provided by: hin9
Category:

less

Transcript and Presenter's Notes

Title: CSC2535: Computation in Neural Networks Lecture 8: Hopfield nets


1
CSC2535 Computation in Neural NetworksLecture
8 Hopfield nets
  • Geoffrey Hinton

2
Hopfield Nets
  • Networks of binary threshold units with recurrent
    connections are very hard to analyse.
  • But Hopfield realized that if the connections are
    symmetric, there is a global energy function
  • Each configuration of the network has an
    energy.
  • The binary threshold decision rule obeys the
    energy function it minimizes energy locally.
  • Hopfield proposed that memories could be energy
    minima of a neural net.
  • The binary threshold decision rule can then be
    used to clean up incomplete or corrupted
    memories.

3
The energy function
  • The global energy is the sum of many
    contributions. Each contribution depends on one
    connection weight and the binary states of two
    neurons
  • The simple quadratic energy function makes it
    easy to compute how the state of one neuron
    affects the global energy

4
Settling to an energy minimum
  • Pick the units one at a time and flip their
    states if it reduces the global energy.
  • Find the minima in this net
  • If units make simultaneous decisions the energy
    could go up.

-4
3 2 3 3
-1 -1
-100
0
0
5
5
5
Storing memories
  • If we use activities of 1 and -1, we can store a
    state vector by incrementing the weight between
    any two units by the product of their activities.
  • Treat biases as weights from a permanently on
    unit
  • With states of 0 and 1 the rule is slightly more
    complicated.

6
Spurious minima
  • Each time we memorize a configuration, we hope to
    create a new energy minimum.
  • But what if two nearby minima merge to create a
    minimum at an intermediate location?
  • This limits the capacity of a Hopfield net.
  • Using Hopfields storage rule the capacity of a
    totally connected net with N units is only 0.15N
    memories.

7
Better storage rules
  • We could improve efficiency by using sparse
    vectors.
  • Its optimal to have log N bits on in each vector
    and the rest off. This gives
    useful bits per bit provided we adjust the
    thresholds dynamically during retrieval.
  • Instead of trying to store vectors in one shot as
    Hopfield does, cycle through the training set and
    use the perceptron convergence procedure to train
    each unit to have the correct state given the
    states of all the other units in that vector.
  • This uses the capacity of the weights efficiently.

8
Avoiding spurious minima by unlearning
  • Hopfield and Feinstein and Palmer suggested the
    following strategy
  • Let the net settle from a random initial state
    and then do unlearning.
  • This will get rid of deep , spurious minima and
    increase memory capacity.
  • Crick and Mitchison proposed it as a model of
    what dreams are for.
  • But how much unlearning should we do?
  • And can we analyze what unlearning achieves?

9
Wishful thinking?
  • Wouldnt it be nice if interleaved learning and
    unlearning corresponded to maximum likelihood
    fitting of a model to the training set.
  • This seems improbable, especially if we want to
    include hidden units whose states are not
    specified by the vectors to be stored.

10
Another computational role for Hopfield nets
Hidden units. Used to represent an interpretation
of the inputs
  • Instead of using the net to store memories, use
    it to construct interpretations of sensory input.
  • The input is represented by the visible units.
  • The interpretation is represented by the states
    of the hidden units.
  • The badness of the interpretation is represented
    by the energy
  • This raises two difficult issues
  • How do we escape from poor local minima to get
    good interpretations?
  • How do we learn the weights on connections to the
    units?

Visible units. Used to represent the inputs
11
Stochastic units make search easier
  • Replace the binary threshold units by binary
    stochastic units.
  • Use temperature to make it easier to cross energy
    barriers.
  • Start at high temperature where its easy to cross
    energy barriers.
  • Reduce slowly to low temperature where good
    states are much more probable than bad ones.

A B C
12
The annealing trade-off
  • At high temperature the transition probabilities
    for uphill jumps are much greater.
  • At low temperature the equilibrium probabilities
    of good states are much better than the
    probabilities of bad ones.

13
Why annealing works
  • In high dimensions, energy barriers are typically
    much more degenerate than the minima that they
    separate.
  • At high temperature the free energy of the
    barrier is lower than the free energy of the
    minima.

E
A
B B B B B B
F
C
A B C
Write a Comment
User Comments (0)
About PowerShow.com