CSC2535: Computation in Neural Networks Lecture 8: Hopfield nets - PowerPoint PPT Presentation

About This Presentation

Title:

CSC2535: Computation in Neural Networks Lecture 8: Hopfield nets

Description:

Number of Views:165

Avg rating:3.0/5.0

Slides: 14

Provided by: hin9

Learn more at: http://www.cs.toronto.edu

Category:

more less

Transcript and Presenter's Notes

Title: CSC2535: Computation in Neural Networks Lecture 8: Hopfield nets

1
CSC2535 Computation in Neural NetworksLecture
8 Hopfield nets

2
Hopfield Nets

Networks of binary threshold units with recurrent
connections are very hard to analyse.
But Hopfield realized that if the connections are
symmetric, there is a global energy function
Each configuration of the network has an
energy.
The binary threshold decision rule obeys the
energy function it minimizes energy locally.
Hopfield proposed that memories could be energy
minima of a neural net.
The binary threshold decision rule can then be
used to clean up incomplete or corrupted
memories.

3
The energy function

The global energy is the sum of many
contributions. Each contribution depends on one
connection weight and the binary states of two
neurons
The simple quadratic energy function makes it
easy to compute how the state of one neuron
affects the global energy

4
Settling to an energy minimum

Pick the units one at a time and flip their
states if it reduces the global energy.
Find the minima in this net
If units make simultaneous decisions the energy
could go up.

-4
3 2 3 3
-1 -1
-100
0
0
5
5
5
Storing memories

If we use activities of 1 and -1, we can store a
state vector by incrementing the weight between
any two units by the product of their activities.
Treat biases as weights from a permanently on
unit
With states of 0 and 1 the rule is slightly more
complicated.

6
Spurious minima

Each time we memorize a configuration, we hope to
create a new energy minimum.
But what if two nearby minima merge to create a
minimum at an intermediate location?
This limits the capacity of a Hopfield net.
Using Hopfields storage rule the capacity of a
totally connected net with N units is only 0.15N
memories.

7
Better storage rules

We could improve efficiency by using sparse
vectors.
Its optimal to have log N bits on in each vector
and the rest off. This gives
useful bits per bit provided we adjust the
thresholds dynamically during retrieval.
Instead of trying to store vectors in one shot as
Hopfield does, cycle through the training set and
use the perceptron convergence procedure to train
each unit to have the correct state given the
states of all the other units in that vector.
This uses the capacity of the weights efficiently.

8
Avoiding spurious minima by unlearning

9
Wishful thinking?

Wouldnt it be nice if interleaved learning and
unlearning corresponded to maximum likelihood
fitting of a model to the training set.
This seems improbable, especially if we want to
include hidden units whose states are not
specified by the vectors to be stored.

10
Another computational role for Hopfield nets
Hidden units. Used to represent an interpretation
of the inputs

Instead of using the net to store memories, use
it to construct interpretations of sensory input.
The input is represented by the visible units.
The interpretation is represented by the states
of the hidden units.
The badness of the interpretation is represented
by the energy
This raises two difficult issues
How do we escape from poor local minima to get
good interpretations?
How do we learn the weights on connections to the
units?

Visible units. Used to represent the inputs
11
Stochastic units make search easier

Replace the binary threshold units by binary
stochastic units.
Use temperature to make it easier to cross energy
barriers.
Start at high temperature where its easy to cross
energy barriers.
Reduce slowly to low temperature where good
states are much more probable than bad ones.

A B C
12
The annealing trade-off

At high temperature the transition probabilities
for uphill jumps are much greater.
At low temperature the equilibrium probabilities
of good states are much better than the
probabilities of bad ones.

13
Why annealing works

In high dimensions, energy barriers are typically
much more degenerate than the minima that they
separate.
At high temperature the free energy of the
barrier is lower than the free energy of the
minima.

E
A
B B B B B B
F
C
A B C

Write a Comment

User Comments (0)