Secrets of Neural Network Models

About This Presentation

Title:

Secrets of Neural Network Models

Description:

How does the gray glop in your head give rise to cognition? ... Learning rules: How to adjust weights based on local information (presynaptic ... – PowerPoint PPT presentation

Number of Views:64

Avg rating:3.0/5.0

Slides: 175

Provided by: kenn160

Category:

more less

Transcript and Presenter's Notes

Title: Secrets of Neural Network Models

1
Secrets of Neural Network Models
Note These slides have been provided online for
the convenience of students attending the 2003
Merck summer school, and for individuals who have
explicitly been given permission by Ken Norman.
Please do not distribute these slides to third
parties without permission from Ken (which is
easy to get just email Ken at knorman_at_princeton.e
du).

Ken Norman
Princeton University
July 24, 2003

The Plan, and Acknowledgements
The Plan
I will teach you all of the the secrets of neural
network models in 2.5 hours
Lecture for the first half
Hands-on workshop for the second half
Acknowledgements
Randy OReilly
my lab Greg Detre, Ehren Newman, Adler Perotte,
and Sean Polyn

The Big Question
How does the gray glop in your head give rise to
cognition?
We know a lot about the brain, and we also know a
lot about cognition
The real challenge is to bridge between these two
levels

Complexity and Levels of Analysis
The brain is very complex billions of neurons,
trillions of synapses, all changing every
nanosecond
Each neuron is a very complex entity unto itself
We need to abstract away from this complexity!
Is there some simpler, higher level for
describing what the brain does during cognition?

We want to draw on neurobiology for ideas about
how the brain performs a particular kind of task
Our models should be consistent with what we know
about how the brain performs the task
But at the same time, we want to include only
aspects of neurobiology that are essential for
explaining task performance

Learning and Development
Neural network models provide an explicit,
mechanistic account of how the brain changes as a
function of experience
Goals of learning
To acquire an internal representation (a model)
of the world that allows you to predict what will
happen next, and to make inferences about
unseen aspects of the environment
The system must be robust to noise/degradation/dam
age
Focus of workshop Use neural networks to
explore how the brain meets these goals

Outline of Lecture
What is a neural network?
Principles of learning in neural networks
Hebbian learning Simple learning rules that are
very good at extracting the statistical structure
of the environment (i.e., what things are there
in the world, and how are they related to one
another)
Shortcomings of Hebbian learning Its good at
acquiring coarse category structure (prototypes)
but its less good at learning about atypical
stimuli and arbitrary associations
Error-driven learning Very powerful rules that
allow networks to learn from their mistakes

Outline, Continued
The problem of interference in neocortical
networks, and how the hippocampus can help
alleviate this problem
Brief discussion of PFC and how networks can
support active maintenance in the face of
distracting information
Background information for the hands-on portion
of the workshop

Overall Philosophy
The goal is to give you a good set of intuitions
for how neural networks function
I will simplify and gloss over lots of things.
Please ask questions if you dont understand what
Im saying...

10
What is a neural network?

Neurons measure how much input they receive from
other neurons they fire (send a signal) if
input exceeds a threshold value
Input is a function of firing rate and connection
strength
Learning in neural networks involves adjusting
connection strength

11
What is a neural network?

Key simplifications
We reduce all of the complexity of neuronal
firing to a single number, the activity of the
neuron, that reflects how often the neuron is
spiking
We reduce all of the complexity of synaptic
connections between neurons to a single number,
the synaptic weight, that reflects how strong the
connection is

Neurons are Detectors
Each neuron is detecting some set of conditions
(e.g., smoke detector). Representation is what
is detected.

13
Understanding Neural Components in Terms of the
Detector Model
14

Detector Model
Neurons feed on each others outputs layers of
ever more complicated detectors
Things can get very complex in terms of content,
but each neuron is still carrying out the basic
detector function

15
Two-layer Attractor Networks
Hidden Layer (Internal Representation)
Input/Output Layer

Model of processing in neocortex
Circles units (neurons) lines connections
(synapses)
Unit brightness activity line thickness
synaptic weight
Connections are symmetric

16
Two-layer Attractor Networks
I
Hidden Layer (Internal Representation)
Input/Output Layer

Units within a layer compete to become active.
Competition is enforced by inhibitory
interneurons that sample the amount of activity
in the layer and send back a proportional amount
of inhibition
Inhibitory interneurons prevent epilepsy in the
network
Inhibitory interneurons are not pictured in
subsequent diagrams

17
Two-layer Attractor Networks
I
Hidden Layer (Internal Representation)
Input/Output Layer

These networks are capable of sustaining a stable
pattern of activity on their own.
Attractor a fancy word for stable pattern of
activity
Real networks are much larger than this, also gt 1
unit is active in the hidden layer...

Properties of Two-Layer Attractor Networks
I will show that these networks are capable of
meeting the learning goals outlined
Given partial information (e.g., seeing something
that has wings and features), the networks can
make a guess about other properties of that
thing (e.g., it probably flies)
Networks show graceful degradation

19
Pattern Completion in two layer networks
wings
beak
feathers
flies
20
Pattern Completion in two layer networks
wings
beak
feathers
flies
21
Pattern Completion in two layer networks
wings
beak
feathers
flies
22
Pattern Completion in two layer networks
wings
beak
feathers
flies
23
Networks are Robust to Damage, Noise
wings
beak
feathers
flies
24
Networks are Robust to Damage, Noise
wings
feathers
flies
25
Networks are Robust to Damage, Noise
wings
feathers
flies
26
Networks are Robust to Damage, Noise
wings
feathers
flies
27
Networks are Robust to Damage, Noise
wings
feathers
flies
28

Learning Overview
Learning changing connection weights
Learning rules How to adjust weights based on
local information (presynaptic and postsynaptic
activity) to produce appropriate network behavior
Hebbian learning building a statistical model of
the world, without an explicit teacher...
Error-driven learning rules that detect
undesirable states and change weights to
eliminate these undesirable states...

Building a Statistical Model of the World
The world is inhabited by things with relatively
stable sets of features
We want to wire detectors in our brains to detect
these things. How can we do this?
Answer Leverage correlation
The features of a particular thing tend to appear
together, and to disappear together a thing is
nothing more than a correlated cluster of
features
Learning mechanisms that are sensitive to
correlation will end up representing useful things

Hebbian Learning
How does the brain learn about correlations?
Donald Hebb proposed the following mechanism
When the pre-synaptic neuron and post-synaptic
neuron are active at the same time, strengthen
the connection between them
neurons that fire together, wire together

31
Hebbian Learning
32
Hebbian Learning
33
Hebbian Learning
34

Hebbian Learning
Proposed by Donald Hebb
When the pre-synaptic (sending) neuron and
post-synaptic (receiving) neuron are active at
the same time, strengthen the connection between
them
neurons that fire together, wire together
When two neurons are connected, and one is active
but the other is not, reduce the connections
between them
neurons that fire apart, unwire

35
Hebbian Learning
36
Hebbian Learning
37
Biology of Hebbian Learning NMDA-Mediated
Long-Term Potentiation
38

Biology of Hebbian Learning
Long-Term Depression
When the postsynaptic neuron is depolarized, but
presynaptic activity is relatively weak, you get
weakening of the synapse

What Does Hebbian Learning Do?
Hebbian learning tunes units to represent
correlated sets of input features.
Here is why
Say that a unit has 1,000 inputs
In this case, turning on and off a single input
feature wont have a big effect on the units
activity
In contrast, turning on and off a large cluster
of 900 input features will have a big effect on
the units activity

40
Hebbian Learning
41
Hebbian Learning
42
Hebbian Learning

Because small clusters of inputs do not reliably
activate the receiving unit, the receiving unit
does not learn much about these inputs

43
Hebbian Learning
44
Hebbian Learning
45
Hebbian Learning
46
Hebbian Learning
Big clusters of inputs reliably activate the
receiving unit, so the network learns more about
big (vs. small) clusters (the gang effect).
47
Hebbian Learning
Big clusters of inputs reliably activate the
receiving unit, so the network learns more about
big (vs. small) clusters (the gang effect).
48

What Does Hebbian Learning Do?
Hebbian learning finds the thing in the world
that most reliably activates the unit, and tunes
the unit to like that thing even more!

49
Hebbian Learning
scaly
slithers
wings
beak
feathers
flies
50
Hebbian Learning
scaly
slithers
wings
beak
feathers
flies
51
Hebbian Learning
scaly
slithers
wings
beak
feathers
flies
52
Hebbian Learning
scaly
slithers
wings
beak
feathers
flies
53
Hebbian Learning
scaly
slithers
wings
beak
feathers
flies
54
Hebbian Learning
scaly
slithers
wings
beak
feathers
flies
55
Hebbian Learning
scaly
slithers
wings
beak
feathers
flies
56
Hebbian Learning
scaly
slithers
wings
beak
feathers
flies
57
Hebbian Learning
scaly
slithers
wings
beak
feathers
flies
58

What Does Hebbian Learning Do?
Hebbian learning finds the thing in the world
that most reliably activates the unit, and tunes
the unit to like that thing even more!
The outcome of Hebbian learning is a function of
how well different inputs activate the unit, and
how frequently they are presented

Self-Organizing Learning
One detector can only represent one thing (i.e.,
pattern of correlated features)
Goal We want to present input patterns to the
network and have different units in the network
specialize for different things, such that each
thing is represented by at least one unit
Random weights (different initial receptive
fields) and competition are important for
achieving this goal
What happens without competition ...

60
No Competition
lives under water
scaly
slithers
wings
beak
feathers
flies
61
No Competition
lives under water
scaly
slithers
wings
beak
feathers
flies
62
No Competition
lives under water
scaly
slithers
wings
beak
feathers
flies
63
No Competition
lives under water
scaly
slithers
wings
beak
feathers
flies
64
No Competition
lives under water
scaly
slithers
wings
beak
feathers
flies
65
No Competition
wings
beak
feathers
flies
scaly
slithers
lives under water
Without competition, all units end up
representing the same gang of features other,
smaller correlations get ignored
66
Competition is important
lives under water
scaly
slithers
wings
beak
feathers
flies
67
Competition is important
lives under water
scaly
slithers
wings
beak
feathers
flies
68
Competition is important
inhibition
lives under water
scaly
slithers
wings
beak
feathers
flies
69
Competition is important
inhibition
lives under water
scaly
slithers
wings
beak
feathers
flies
70
Competition is important
lives under water
scaly
slithers
wings
beak
feathers
flies
71
Competition is important
lives under water
scaly
slithers
wings
beak
feathers
flies
72
Competition is important
lives under water
scaly
slithers
wings
beak
feathers
flies
73
Competition is important
When units have different initial receptive
fields and they compete to represent input
patterns, units end up representing different
things
74

Hebbian Learning Summary
Hebbian learning finds the thing in the world
that most reliably activates the unit, and tunes
the unit to like that thing even more
When
There are multiple hidden units competing to
represent input patterns
Each hidden unit starts out with a distinct
receptive field
Then
Hebbian learning will tune these units so that
each thing in the world (i.e., each cluster of
correlated features) is represented by at least
one unit

75
Problems with Penguins
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
76
Problems with Penguins
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
77
Problems with Penguins
inhibition
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
78
Problems with Penguins
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
79
Problems with Penguins
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
80
Problems with Penguins
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
81
Problems with Penguins
inhibition
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
82
Problems with Penguins
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
83
Problems with Penguins
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
84
Problems with Penguins
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
85

Problems with Hebb, and Possible Solutions
Self-organizing Hebbian learning is capable of
discovering the high-level (coarse) categorical
structure of the inputs
However, it sometimes collapses across more
subtle (but important) distinctions, and the
learning rule does not have any provisions for
fixing these errors once they happen

Problems with Hebb, and Possible Solutions
In the penguin problem, if we want the network to
remember that typical birds fly, but penguins
dont, then penguins and typical birds need to
have distinct (non-identical) hidden
representations
Hebbian learning assigns the same hidden unit to
penguins and typical birds
We need to supplement Hebbian learning with
another learning rule that is sensitive to when
the network makes an error (e.g., saying that
penguins fly) and corrects the error by pulling
apart the hidden representations of penguins vs.
typical birds.

What is an error, exactly?
One common way of conceptualizing error is in
terms of predictions and outcomes
If you give the network a partial version of a
studied pattern, the network will make a
prediction as to the missing features of that
pattern (e.g., given something that has
feathers, the network will guess that it
probably flies)
Later, you learn what the missing features are
(the outcome). If the networks guess about the
missing features is wrong, we want the network to
be able to change its weights based on the
difference between the prediction and the
outcome.
Today, I will present the GeneRec error-driven
learning rule developed by Randy OReilly.

88
Error-Driven Learning

Prediction phase
Present a partial pattern
The network makes a guess about the missing
features.

slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
89
Error-Driven Learning

Prediction phase
Present a partial pattern
The network makes a guess about the missing
features.

slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
90
Error-Driven Learning

Prediction phase
Present a partial pattern
The network makes a guess about the missing
features.

slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
91
Error-Driven Learning

Prediction phase
Present a partial pattern
The network makes a guess about the missing
features.

slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
92
Error-Driven Learning

Prediction phase
Present a partial pattern
The network makes a guess about the missing
features.

slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
93
Error-Driven Learning

Prediction phase
Present a partial pattern
The network makes a guess about the missing
features.

slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
94
Error-Driven Learning

Prediction phase
Present a partial pattern
The network makes a guess about the missing
features.
Outcome phase
Present the full pattern
Let the network settle

slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
95
Error-Driven Learning

Prediction phase
Present a partial pattern
The network makes a guess about the missing
features.
Outcome phase
Present the full pattern
Let the network settle

slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
96
Error-Driven Learning

Prediction phase
Present a partial pattern
The network makes a guess about the missing
features.
Outcome phase
Present the full pattern
Let the network settle

slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
97
Error-Driven Learning

Prediction phase
Present a partial pattern
The network makes a guess about the missing
features.
Outcome phase
Present the full pattern
Let the network settle

slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
98
Error-Driven Learning

We now need to compare these two activity
patterns and figure out which weights to change.

slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
99

Motivating the Learning Rule
The goal of error-driven learning is to discover
an internal representation for the item that
activates the correct answer.
Basically, we want to find hidden units that are
associated with the correct answer (in this case,
waddles).
The best way to do this is to examine how
activity changes when waddles is clamped on
during the outcome phase.
Hidden units that are associated with waddles
should show an increase in activity in the
outcome (vs. prediction) phase.
Hidden units that are not associated with
waddles should show a decrease in activity in
the outcome phase (because of increased
competition from other units that are associated
with waddle).

100

Motivating the Learning Rule
Hidden units that are associated with waddle
should show an increase in activity in the
outcome (vs. prediction) phase.
Hidden units that are not associated with
waddle should show a decrease in activity in
the outcome phase
Here is the learning role
If a hidden unit shows increased activity (i.e.,
its associated with the correct answer),
increase its weights to the input pattern
If a hidden unit should decreased activity (i.e.,
its not associated with the correct answer),
reduce its weights to the input pattern

101
Error-Driven Learning
slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
102
Error-Driven Learning
slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
103
Error-Driven Learning
slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
104
Error-Driven Learning
slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
105
Error-Driven Learning

Hebb and error have opposite effects on weights
here!
Error increases the extent to which penguin is
linked to the right-hand unit, whereas Hebb
reinforced penguins tendency to activate the
left-hand unit

slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
106
Error-Driven Learning
slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
107
Error-Driven Learning
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
108
Error-Driven Learning
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
109
Error-Driven Learning
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
110
Error-Driven Learning
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
111
Error-Driven Learning
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
112
Error-Driven Learning
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
113
Error-Driven Learning
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
114
Error-Driven Learning
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
115
Error-Driven Learning
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
116
Error-Driven Learning
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
117

Catastrophic Interference
If you change the weights too strongly in
response to penguin, then the network starts to
behave like all birds waddle. New learning
interferes with stored knowledge...
The best way to avoid this problem is to make
small weight changes, and to interleave penguin
learning trials with typical bird trials
The typical bird trials serve to remind the
network to retain the association between
wings/feathers/beak and flies...

118
Interleaved Training
slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
119
Interleaved Training
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
120
Interleaved Training
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
121
Interleaved Training
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
122
Interleaved Training
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
123
Interleaved Training
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
124
Interleaved Training
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
125
Interleaved Training
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
126
Interleaved Training
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
127
Interleaved Training
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
128
Interleaved Training
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
129

Gradual vs. One-Trial Learning
Problem It appears that the solution to the
catastrophic interference problem is to learn
slowly.
But we also need to be able to learn quickly!

130

Gradual vs. One-Trial Learning
Put another way There appears to be a trade-off
between learning rate and interference in the
cortical network
Our claim is that the brain avoids this trade-off
by having two separate networks
A slow-learning cortical network that gradually
develops internal representations that support
generalization, prediction, categorization, etc.
A fast-learning hippocampal network that is
specialized for rapid memorization (but does not
support generalization, categorization, etc.)

131
hippo- campus
CA3
CA1
Dentate Gyrus
Entorhinal Cortex input
Entorhinal Cortex output
neo- cortex
lower-level cortex
132

Interactions Between Hippo and Cortex
According to the Complementary Learning Systems
theory (McClelland et al., 1995), hippocampus
rapidly memorizes patterns of cortical activity.
The hippocampus manages to learn rapidly without
suffering catastrophic interference because it
has a built-in tendency to assign distinct,
minimally overlapping representations to input
patterns, even when they are very similar. Of
course this hurts its ability to categorize.

133

Interactions Between Hippo and Cortex
The theory states that, when you are asleep, the
hippocampus plays back stored patterns in an
interleaved fashion, thereby allowing cortex to
weave new facts and experiences into existing
knowledge structures.
Even if something just happens once in the real
world, hippocampus can keep re-playing it to
cortex, interleaved with other events, until it
sinks in...
Detailed theory
slow-wave sleep hippo playback to cortex
REM sleep cortex randomly activates stored
representations this strengthens pre-existing
knowledge and protects it against interference

134
Role of the Hippocampus
hippocampus
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
135
Role of the Hippocampus
hippocampus
slithers
lives in Antarctica
wings
beak
feathers
waddles
flies
136
Role of the Hippocampus
hippocampus
slithers
lives in Antarctica
wings
beak
feathers
waddles
flies
137
Role of the Hippocampus
hippocampus
slithers
lives in Antarctica
wings
beak
feathers
waddles
flies
138
Role of the Hippocampus
hippocampus
slithers
lives in Antarctica
wings
beak
feathers
waddles
flies
139
Role of the Hippocampus
hippocampus
slithers
lives in Antarctica
wings
beak
feathers
waddles
flies
140
Role of the Hippocampus
hippocampus
slithers
lives in Antarctica
wings
beak
feathers
waddles
flies
141
Role of the Hippocampus
hippocampus
slithers
lives in Antarctica
wings
beak
feathers
waddles
flies
142

Error-Driven Learning Summary
Error-driven learning algorithms are very
powerful So long as the learning rate is small,
and training patterns are presented in an
interleaved fashion, algorithms like GeneRec can
learn internal representations that support good
pattern completion of missing features.
Error-driven learning is not meant to be a
replacement for Hebbian learning The two
algorithms can co-exist!
Hebbian learning actually improves the
performance of GeneRec by ensuring that hidden
units represent meaningful clusters of features

143

Error-Driven Learning Summary
Theoretical issues to resolve with error-driven
learning The algorithm requires that the network
know whether you are in a prediction phase or
an outcome phase, how does the network know
this?
For that matter, the whole phases idea is
sketchy
GeneRec based on prediction/outcome differences
is not the only way to do error-driven
learning...
Backpropagation
Learning by reconstruction
Adaptive Resonance Theory (Grossberg Carpenter)

144

Learning by Reconstruction
Instead of doing error-driven learning by
comparing predictions and outcomes, you can also
do error-driven learning as follows
First, you clamp the correct, full pattern onto
the network and let it settle.
Then, you erase the input pattern and see whether
the network can reconstruct the input pattern
based on its internal representation
The algorithm is basically the same, you are
still comparing two phases...

145
Learning by Reconstruction

Clamp the to-be-learned pattern onto the input
and let the network settle

slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
146
Learning by Reconstruction

Clamp the to-be-learned pattern onto the input
and let the network settle
Next, wipe the input layer clean (but not the
hidden layer) and let the network settle

slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
147
Learning by Reconstruction

Clamp the to-be-learned pattern onto the input
and let the network settle
Next, wipe the input layer clean (but not the
hidden layer) and let the network settle

slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
148
Learning by Reconstruction

Clamp the to-be-learned pattern onto the input
and let the network settle
Next, wipe the input layer clean (but not the
hidden layer) and let the network settle

slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
149
Learning by Reconstruction

Clamp the to-be-learned pattern onto the input
and let the network settle
Next, wipe the input layer clean (but not the
hidden layer) and let the network settle

slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
150
Learning by Reconstruction

Compare hidden activity in the two phases and
adjust weights accordingly (i.e., if activation
was higher with the correct answer clamped,
increase weights if activation was lower,
decrease wts)

slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
151
Learning by Reconstruction

Compare hidden activity in the two phases and
adjust weights accordingly (i.e., if activation
was higher with the correct answer clamped,
increase weights if activation was lower,
decrease wts)

slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
152
Adaptive Resonance Theory
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
153
Adaptive Resonance Theory
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
154
Adaptive Resonance Theory
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
155
Adaptive Resonance Theory
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
156
Adaptive Resonance Theory
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
157
Adaptive Resonance Theory
MISMATCH!
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
158
Adaptive Resonance Theory
MISMATCH!
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
159
Adaptive Resonance Theory
MISMATCH!
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
160
Adaptive Resonance Theory
MISMATCH!
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
161
Spreading Activation vs. Active Maintenance

Spreading activation is generally very useful...
it lets us make predictions/inferences/etc.
But sometimes you just want to hold on to a
pattern of activation without letting activation
spread (e.g., a phone number, or a persons
name).
How do we maintain specific patterns of activity
in the face of distraction?

162
Spreading Activation vs. Active Maintenance

As you will see in the hands-on part of the
workshop, the networks we have been discussing
are not very robust to noise/distraction.
Thus, there appears to be another tradeoff
Networks that are good at generalization/predictio
n are lousy at holding on to phone
numbers/plans/ideas in the face of distraction

163
Spreading Activation vs. Active Maintenance

Solution We have evolved a network that is
optimized for active maintenance Prefrontal
cortex! This complements the rest of cortex,
which is good at generalization but not so good
at active maintenance.

PFC uses isolated representations to prevent
spread of activity...
Evidence for isolated stripes in PFC

164
Tripartite Functional Organization

PC posterior perceptual motor cortex
FC prefrontal cortex
HC hippocampus and related structures

165
Tripartite Functional Organization

PC incremental learning about the structure of
the environment
FC active maintenance, cognitive control
HC rapid memorization
Roles are defined by functional tradeoffs

166
Key Trade-offs

Extracting what is generally true (across events)
vs. memorizing specific events
Inference (spreading activation) vs. robust
active maintenance

167
Hands-On Exercises

The goal of the hands-on part of the workshop is
to get a feel for the kinds of representations
that are acquired by Hebbian vs. error-driven
learning, and for network dynamics more generally.

168

Here is the network that we will be using

Activity constraints Only 10 of hidden units
can be strongly active at once in the input
layer, only one unit per row
Think of each row in the input as a feature
dimension (e.g., shape) and the units in that row
are mutually exclusive features along that
dimension (square, circle, etc.)

169

This diagram illustrates the connectivity of the
network

Each hidden unit is connected to 50 of the input
units there are also recurrent connections from
each hidden unit to all of the other hidden units
Weights are symmetric
Initial weight values were set randomly

170

I trained up the network on the following 8
patterns

Typical Bird Number 1
Typical Bird Number 2
Typical Fish Number 2
Typical Fish Number 1
Typical Bird Number 3
Atypical Bird (duck)
Atypical Fish (flying fish)
Typical Fish Number 3

In each pattern, the bottom 16 rows encode
prototypical features that tend to be shared
across patterns within a category the top 8 rows
encode item-specific features that are unique to
each pattern.
Each category has 3 typical items and one
atypical item
During training, the network studied typical
patterns 90 of the time and it studied atypical
patterns 10 of the time

171

To save time, the networks you will be using have
been pre-trained on the 8 patterns (by presenting
them repeatedly, in an interleaved fashion)
For some of the simulations, you will be using a
network that was trained with (purely) Hebbian
learning

172

For other simulations, you will be using a
network that was trained with a combination of
error-driven (GeneRec) and Hebbian learning.
Training of this network use a three-phase
design
First, there was a prediction (minus) phase
where a partial pattern was presented
Second, there was an outcome (plus) phase where
the full version of the pattern was presented
Finally, there was a nothing phase where the
input pattern was erased (but not the hidden
pattern)
Error-driven learning occurred based on the
difference in activity between the minus and plus
patterns, and based on the differenced in
activity between the plus and nothing patterns

173

When you get to the computer room, the simulation
should already be open on the computer (some of
you may have to double-up, I think there are
slightly fewer computers than students) and there
will be a handout on the desk explaining what to
do
You can proceed at your own pace
I will be there to answer questions (about the
lecture and about the computer exercises) and my
two grad students Ehren Newman and Sean Polyn
will also be there to answer questions.

174
Your Helpers
Ehren Sean
me

Write a Comment

User Comments (0)