Title: Secrets of Neural Network Models
1Secrets of Neural Network Models
Note These slides have been provided online for
the convenience of students attending the 2003
Merck summer school, and for individuals who have
explicitly been given permission by Ken Norman.
Please do not distribute these slides to third
parties without permission from Ken (which is
easy to get just email Ken at knorman_at_princeton.e
du).
- Ken Norman
- Princeton University
- July 24, 2003
2- The Plan, and Acknowledgements
- The Plan
- I will teach you all of the the secrets of neural
network models in 2.5 hours - Lecture for the first half
- Hands-on workshop for the second half
- Acknowledgements
- Randy OReilly
- my lab Greg Detre, Ehren Newman, Adler Perotte,
and Sean Polyn
3- The Big Question
- How does the gray glop in your head give rise to
cognition? - We know a lot about the brain, and we also know a
lot about cognition - The real challenge is to bridge between these two
levels
4- Complexity and Levels of Analysis
- The brain is very complex billions of neurons,
trillions of synapses, all changing every
nanosecond - Each neuron is a very complex entity unto itself
- We need to abstract away from this complexity!
- Is there some simpler, higher level for
describing what the brain does during cognition?
5- We want to draw on neurobiology for ideas about
how the brain performs a particular kind of task - Our models should be consistent with what we know
about how the brain performs the task - But at the same time, we want to include only
aspects of neurobiology that are essential for
explaining task performance
6- Learning and Development
- Neural network models provide an explicit,
mechanistic account of how the brain changes as a
function of experience - Goals of learning
- To acquire an internal representation (a model)
of the world that allows you to predict what will
happen next, and to make inferences about
unseen aspects of the environment - The system must be robust to noise/degradation/dam
age - Focus of workshop Use neural networks to
explore how the brain meets these goals
7- Outline of Lecture
- What is a neural network?
- Principles of learning in neural networks
- Hebbian learning Simple learning rules that are
very good at extracting the statistical structure
of the environment (i.e., what things are there
in the world, and how are they related to one
another) - Shortcomings of Hebbian learning Its good at
acquiring coarse category structure (prototypes)
but its less good at learning about atypical
stimuli and arbitrary associations - Error-driven learning Very powerful rules that
allow networks to learn from their mistakes
8- Outline, Continued
- The problem of interference in neocortical
networks, and how the hippocampus can help
alleviate this problem - Brief discussion of PFC and how networks can
support active maintenance in the face of
distracting information - Background information for the hands-on portion
of the workshop
9- Overall Philosophy
- The goal is to give you a good set of intuitions
for how neural networks function - I will simplify and gloss over lots of things.
- Please ask questions if you dont understand what
Im saying...
10What is a neural network?
- Neurons measure how much input they receive from
other neurons they fire (send a signal) if
input exceeds a threshold value - Input is a function of firing rate and connection
strength - Learning in neural networks involves adjusting
connection strength
11What is a neural network?
- Key simplifications
- We reduce all of the complexity of neuronal
firing to a single number, the activity of the
neuron, that reflects how often the neuron is
spiking - We reduce all of the complexity of synaptic
connections between neurons to a single number,
the synaptic weight, that reflects how strong the
connection is
12- Neurons are Detectors
- Each neuron is detecting some set of conditions
(e.g., smoke detector). Representation is what
is detected.
13Understanding Neural Components in Terms of the
Detector Model
14- Detector Model
- Neurons feed on each others outputs layers of
ever more complicated detectors - Things can get very complex in terms of content,
but each neuron is still carrying out the basic
detector function
15Two-layer Attractor Networks
Hidden Layer (Internal Representation)
Input/Output Layer
- Model of processing in neocortex
- Circles units (neurons) lines connections
(synapses) - Unit brightness activity line thickness
synaptic weight - Connections are symmetric
16Two-layer Attractor Networks
I
Hidden Layer (Internal Representation)
Input/Output Layer
- Units within a layer compete to become active.
- Competition is enforced by inhibitory
interneurons that sample the amount of activity
in the layer and send back a proportional amount
of inhibition - Inhibitory interneurons prevent epilepsy in the
network - Inhibitory interneurons are not pictured in
subsequent diagrams
17Two-layer Attractor Networks
I
Hidden Layer (Internal Representation)
Input/Output Layer
- These networks are capable of sustaining a stable
pattern of activity on their own. - Attractor a fancy word for stable pattern of
activity - Real networks are much larger than this, also gt 1
unit is active in the hidden layer...
18- Properties of Two-Layer Attractor Networks
- I will show that these networks are capable of
meeting the learning goals outlined - Given partial information (e.g., seeing something
that has wings and features), the networks can
make a guess about other properties of that
thing (e.g., it probably flies) - Networks show graceful degradation
19Pattern Completion in two layer networks
wings
beak
feathers
flies
20Pattern Completion in two layer networks
wings
beak
feathers
flies
21Pattern Completion in two layer networks
wings
beak
feathers
flies
22Pattern Completion in two layer networks
wings
beak
feathers
flies
23Networks are Robust to Damage, Noise
wings
beak
feathers
flies
24Networks are Robust to Damage, Noise
wings
feathers
flies
25Networks are Robust to Damage, Noise
wings
feathers
flies
26Networks are Robust to Damage, Noise
wings
feathers
flies
27Networks are Robust to Damage, Noise
wings
feathers
flies
28- Learning Overview
- Learning changing connection weights
- Learning rules How to adjust weights based on
local information (presynaptic and postsynaptic
activity) to produce appropriate network behavior - Hebbian learning building a statistical model of
the world, without an explicit teacher... - Error-driven learning rules that detect
undesirable states and change weights to
eliminate these undesirable states...
29- Building a Statistical Model of the World
- The world is inhabited by things with relatively
stable sets of features - We want to wire detectors in our brains to detect
these things. How can we do this? - Answer Leverage correlation
- The features of a particular thing tend to appear
together, and to disappear together a thing is
nothing more than a correlated cluster of
features - Learning mechanisms that are sensitive to
correlation will end up representing useful things
30- Hebbian Learning
- How does the brain learn about correlations?
- Donald Hebb proposed the following mechanism
- When the pre-synaptic neuron and post-synaptic
neuron are active at the same time, strengthen
the connection between them - neurons that fire together, wire together
31Hebbian Learning
32Hebbian Learning
33Hebbian Learning
34- Hebbian Learning
- Proposed by Donald Hebb
- When the pre-synaptic (sending) neuron and
post-synaptic (receiving) neuron are active at
the same time, strengthen the connection between
them - neurons that fire together, wire together
- When two neurons are connected, and one is active
but the other is not, reduce the connections
between them - neurons that fire apart, unwire
35Hebbian Learning
36Hebbian Learning
37Biology of Hebbian Learning NMDA-Mediated
Long-Term Potentiation
38- Biology of Hebbian Learning
- Long-Term Depression
- When the postsynaptic neuron is depolarized, but
presynaptic activity is relatively weak, you get
weakening of the synapse
39- What Does Hebbian Learning Do?
- Hebbian learning tunes units to represent
correlated sets of input features. - Here is why
- Say that a unit has 1,000 inputs
- In this case, turning on and off a single input
feature wont have a big effect on the units
activity - In contrast, turning on and off a large cluster
of 900 input features will have a big effect on
the units activity
40Hebbian Learning
41Hebbian Learning
42Hebbian Learning
- Because small clusters of inputs do not reliably
activate the receiving unit, the receiving unit
does not learn much about these inputs
43Hebbian Learning
44Hebbian Learning
45Hebbian Learning
46Hebbian Learning
Big clusters of inputs reliably activate the
receiving unit, so the network learns more about
big (vs. small) clusters (the gang effect).
47Hebbian Learning
Big clusters of inputs reliably activate the
receiving unit, so the network learns more about
big (vs. small) clusters (the gang effect).
48- What Does Hebbian Learning Do?
- Hebbian learning finds the thing in the world
that most reliably activates the unit, and tunes
the unit to like that thing even more!
49Hebbian Learning
scaly
slithers
wings
beak
feathers
flies
50Hebbian Learning
scaly
slithers
wings
beak
feathers
flies
51Hebbian Learning
scaly
slithers
wings
beak
feathers
flies
52Hebbian Learning
scaly
slithers
wings
beak
feathers
flies
53Hebbian Learning
scaly
slithers
wings
beak
feathers
flies
54Hebbian Learning
scaly
slithers
wings
beak
feathers
flies
55Hebbian Learning
scaly
slithers
wings
beak
feathers
flies
56Hebbian Learning
scaly
slithers
wings
beak
feathers
flies
57Hebbian Learning
scaly
slithers
wings
beak
feathers
flies
58- What Does Hebbian Learning Do?
- Hebbian learning finds the thing in the world
that most reliably activates the unit, and tunes
the unit to like that thing even more! - The outcome of Hebbian learning is a function of
how well different inputs activate the unit, and
how frequently they are presented
59- Self-Organizing Learning
- One detector can only represent one thing (i.e.,
pattern of correlated features) - Goal We want to present input patterns to the
network and have different units in the network
specialize for different things, such that each
thing is represented by at least one unit - Random weights (different initial receptive
fields) and competition are important for
achieving this goal - What happens without competition ...
60No Competition
lives under water
scaly
slithers
wings
beak
feathers
flies
61No Competition
lives under water
scaly
slithers
wings
beak
feathers
flies
62No Competition
lives under water
scaly
slithers
wings
beak
feathers
flies
63No Competition
lives under water
scaly
slithers
wings
beak
feathers
flies
64No Competition
lives under water
scaly
slithers
wings
beak
feathers
flies
65No Competition
wings
beak
feathers
flies
scaly
slithers
lives under water
Without competition, all units end up
representing the same gang of features other,
smaller correlations get ignored
66Competition is important
lives under water
scaly
slithers
wings
beak
feathers
flies
67Competition is important
lives under water
scaly
slithers
wings
beak
feathers
flies
68Competition is important
inhibition
lives under water
scaly
slithers
wings
beak
feathers
flies
69Competition is important
inhibition
lives under water
scaly
slithers
wings
beak
feathers
flies
70Competition is important
lives under water
scaly
slithers
wings
beak
feathers
flies
71Competition is important
lives under water
scaly
slithers
wings
beak
feathers
flies
72Competition is important
lives under water
scaly
slithers
wings
beak
feathers
flies
73Competition is important
When units have different initial receptive
fields and they compete to represent input
patterns, units end up representing different
things
74- Hebbian Learning Summary
- Hebbian learning finds the thing in the world
that most reliably activates the unit, and tunes
the unit to like that thing even more - When
- There are multiple hidden units competing to
represent input patterns - Each hidden unit starts out with a distinct
receptive field - Then
- Hebbian learning will tune these units so that
each thing in the world (i.e., each cluster of
correlated features) is represented by at least
one unit
75Problems with Penguins
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
76Problems with Penguins
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
77Problems with Penguins
inhibition
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
78Problems with Penguins
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
79Problems with Penguins
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
80Problems with Penguins
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
81Problems with Penguins
inhibition
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
82Problems with Penguins
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
83Problems with Penguins
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
84Problems with Penguins
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
85- Problems with Hebb, and Possible Solutions
- Self-organizing Hebbian learning is capable of
discovering the high-level (coarse) categorical
structure of the inputs - However, it sometimes collapses across more
subtle (but important) distinctions, and the
learning rule does not have any provisions for
fixing these errors once they happen
86- Problems with Hebb, and Possible Solutions
- In the penguin problem, if we want the network to
remember that typical birds fly, but penguins
dont, then penguins and typical birds need to
have distinct (non-identical) hidden
representations - Hebbian learning assigns the same hidden unit to
penguins and typical birds - We need to supplement Hebbian learning with
another learning rule that is sensitive to when
the network makes an error (e.g., saying that
penguins fly) and corrects the error by pulling
apart the hidden representations of penguins vs.
typical birds.
87- What is an error, exactly?
- One common way of conceptualizing error is in
terms of predictions and outcomes - If you give the network a partial version of a
studied pattern, the network will make a
prediction as to the missing features of that
pattern (e.g., given something that has
feathers, the network will guess that it
probably flies) - Later, you learn what the missing features are
(the outcome). If the networks guess about the
missing features is wrong, we want the network to
be able to change its weights based on the
difference between the prediction and the
outcome. - Today, I will present the GeneRec error-driven
learning rule developed by Randy OReilly.
88Error-Driven Learning
- Prediction phase
- Present a partial pattern
- The network makes a guess about the missing
features.
slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
89Error-Driven Learning
- Prediction phase
- Present a partial pattern
- The network makes a guess about the missing
features.
slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
90Error-Driven Learning
- Prediction phase
- Present a partial pattern
- The network makes a guess about the missing
features.
slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
91Error-Driven Learning
- Prediction phase
- Present a partial pattern
- The network makes a guess about the missing
features.
slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
92Error-Driven Learning
- Prediction phase
- Present a partial pattern
- The network makes a guess about the missing
features.
slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
93Error-Driven Learning
- Prediction phase
- Present a partial pattern
- The network makes a guess about the missing
features.
slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
94Error-Driven Learning
- Prediction phase
- Present a partial pattern
- The network makes a guess about the missing
features. - Outcome phase
- Present the full pattern
- Let the network settle
slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
95Error-Driven Learning
- Prediction phase
- Present a partial pattern
- The network makes a guess about the missing
features. - Outcome phase
- Present the full pattern
- Let the network settle
slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
96Error-Driven Learning
- Prediction phase
- Present a partial pattern
- The network makes a guess about the missing
features. - Outcome phase
- Present the full pattern
- Let the network settle
slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
97Error-Driven Learning
- Prediction phase
- Present a partial pattern
- The network makes a guess about the missing
features. - Outcome phase
- Present the full pattern
- Let the network settle
slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
98Error-Driven Learning
- We now need to compare these two activity
patterns and figure out which weights to change.
slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
99- Motivating the Learning Rule
- The goal of error-driven learning is to discover
an internal representation for the item that
activates the correct answer. - Basically, we want to find hidden units that are
associated with the correct answer (in this case,
waddles). - The best way to do this is to examine how
activity changes when waddles is clamped on
during the outcome phase. - Hidden units that are associated with waddles
should show an increase in activity in the
outcome (vs. prediction) phase. - Hidden units that are not associated with
waddles should show a decrease in activity in
the outcome phase (because of increased
competition from other units that are associated
with waddle).
100- Motivating the Learning Rule
- Hidden units that are associated with waddle
should show an increase in activity in the
outcome (vs. prediction) phase. - Hidden units that are not associated with
waddle should show a decrease in activity in
the outcome phase - Here is the learning role
- If a hidden unit shows increased activity (i.e.,
its associated with the correct answer),
increase its weights to the input pattern - If a hidden unit should decreased activity (i.e.,
its not associated with the correct answer),
reduce its weights to the input pattern
101Error-Driven Learning
slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
102Error-Driven Learning
slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
103Error-Driven Learning
slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
104Error-Driven Learning
slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
105Error-Driven Learning
- Hebb and error have opposite effects on weights
here! - Error increases the extent to which penguin is
linked to the right-hand unit, whereas Hebb
reinforced penguins tendency to activate the
left-hand unit
slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
106Error-Driven Learning
slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
107Error-Driven Learning
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
108Error-Driven Learning
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
109Error-Driven Learning
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
110Error-Driven Learning
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
111Error-Driven Learning
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
112Error-Driven Learning
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
113Error-Driven Learning
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
114Error-Driven Learning
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
115Error-Driven Learning
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
116Error-Driven Learning
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
117- Catastrophic Interference
- If you change the weights too strongly in
response to penguin, then the network starts to
behave like all birds waddle. New learning
interferes with stored knowledge... - The best way to avoid this problem is to make
small weight changes, and to interleave penguin
learning trials with typical bird trials - The typical bird trials serve to remind the
network to retain the association between
wings/feathers/beak and flies...
118Interleaved Training
slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
119Interleaved Training
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
120Interleaved Training
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
121Interleaved Training
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
122Interleaved Training
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
123Interleaved Training
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
124Interleaved Training
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
125Interleaved Training
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
126Interleaved Training
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
127Interleaved Training
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
128Interleaved Training
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
129- Gradual vs. One-Trial Learning
- Problem It appears that the solution to the
catastrophic interference problem is to learn
slowly. - But we also need to be able to learn quickly!
130- Gradual vs. One-Trial Learning
- Put another way There appears to be a trade-off
between learning rate and interference in the
cortical network - Our claim is that the brain avoids this trade-off
by having two separate networks - A slow-learning cortical network that gradually
develops internal representations that support
generalization, prediction, categorization, etc. - A fast-learning hippocampal network that is
specialized for rapid memorization (but does not
support generalization, categorization, etc.)
131hippo- campus
CA3
CA1
Dentate Gyrus
Entorhinal Cortex input
Entorhinal Cortex output
neo- cortex
lower-level cortex
132- Interactions Between Hippo and Cortex
- According to the Complementary Learning Systems
theory (McClelland et al., 1995), hippocampus
rapidly memorizes patterns of cortical activity. - The hippocampus manages to learn rapidly without
suffering catastrophic interference because it
has a built-in tendency to assign distinct,
minimally overlapping representations to input
patterns, even when they are very similar. Of
course this hurts its ability to categorize.
133- Interactions Between Hippo and Cortex
- The theory states that, when you are asleep, the
hippocampus plays back stored patterns in an
interleaved fashion, thereby allowing cortex to
weave new facts and experiences into existing
knowledge structures. - Even if something just happens once in the real
world, hippocampus can keep re-playing it to
cortex, interleaved with other events, until it
sinks in... - Detailed theory
- slow-wave sleep hippo playback to cortex
- REM sleep cortex randomly activates stored
representations this strengthens pre-existing
knowledge and protects it against interference
134Role of the Hippocampus
hippocampus
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
135Role of the Hippocampus
hippocampus
slithers
lives in Antarctica
wings
beak
feathers
waddles
flies
136Role of the Hippocampus
hippocampus
slithers
lives in Antarctica
wings
beak
feathers
waddles
flies
137Role of the Hippocampus
hippocampus
slithers
lives in Antarctica
wings
beak
feathers
waddles
flies
138Role of the Hippocampus
hippocampus
slithers
lives in Antarctica
wings
beak
feathers
waddles
flies
139Role of the Hippocampus
hippocampus
slithers
lives in Antarctica
wings
beak
feathers
waddles
flies
140Role of the Hippocampus
hippocampus
slithers
lives in Antarctica
wings
beak
feathers
waddles
flies
141Role of the Hippocampus
hippocampus
slithers
lives in Antarctica
wings
beak
feathers
waddles
flies
142- Error-Driven Learning Summary
- Error-driven learning algorithms are very
powerful So long as the learning rate is small,
and training patterns are presented in an
interleaved fashion, algorithms like GeneRec can
learn internal representations that support good
pattern completion of missing features. - Error-driven learning is not meant to be a
replacement for Hebbian learning The two
algorithms can co-exist! - Hebbian learning actually improves the
performance of GeneRec by ensuring that hidden
units represent meaningful clusters of features
143- Error-Driven Learning Summary
- Theoretical issues to resolve with error-driven
learning The algorithm requires that the network
know whether you are in a prediction phase or
an outcome phase, how does the network know
this? - For that matter, the whole phases idea is
sketchy - GeneRec based on prediction/outcome differences
is not the only way to do error-driven
learning... - Backpropagation
- Learning by reconstruction
- Adaptive Resonance Theory (Grossberg Carpenter)
144- Learning by Reconstruction
- Instead of doing error-driven learning by
comparing predictions and outcomes, you can also
do error-driven learning as follows - First, you clamp the correct, full pattern onto
the network and let it settle. - Then, you erase the input pattern and see whether
the network can reconstruct the input pattern
based on its internal representation - The algorithm is basically the same, you are
still comparing two phases...
145Learning by Reconstruction
- Clamp the to-be-learned pattern onto the input
and let the network settle
slithers
lives in Antarctica
wad- dles
wings
beak
feathers
flies
146Learning by Reconstruction
- Clamp the to-be-learned pattern onto the input
and let the network settle - Next, wipe the input layer clean (but not the
hidden layer) and let the network settle
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
147Learning by Reconstruction
- Clamp the to-be-learned pattern onto the input
and let the network settle - Next, wipe the input layer clean (but not the
hidden layer) and let the network settle
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
148Learning by Reconstruction
- Clamp the to-be-learned pattern onto the input
and let the network settle - Next, wipe the input layer clean (but not the
hidden layer) and let the network settle
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
149Learning by Reconstruction
- Clamp the to-be-learned pattern onto the input
and let the network settle - Next, wipe the input layer clean (but not the
hidden layer) and let the network settle
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
150Learning by Reconstruction
- Compare hidden activity in the two phases and
adjust weights accordingly (i.e., if activation
was higher with the correct answer clamped,
increase weights if activation was lower,
decrease wts)
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
151Learning by Reconstruction
- Compare hidden activity in the two phases and
adjust weights accordingly (i.e., if activation
was higher with the correct answer clamped,
increase weights if activation was lower,
decrease wts)
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
slithers
lives in Antarctica
wings
beak
feathers
flies
wad- dles
152Adaptive Resonance Theory
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
153Adaptive Resonance Theory
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
154Adaptive Resonance Theory
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
155Adaptive Resonance Theory
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
156Adaptive Resonance Theory
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
157Adaptive Resonance Theory
MISMATCH!
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
158Adaptive Resonance Theory
MISMATCH!
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
159Adaptive Resonance Theory
MISMATCH!
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
160Adaptive Resonance Theory
MISMATCH!
slithers
lives in Antarctica
waddles
wings
beak
feathers
flies
161Spreading Activation vs. Active Maintenance
- Spreading activation is generally very useful...
it lets us make predictions/inferences/etc. - But sometimes you just want to hold on to a
pattern of activation without letting activation
spread (e.g., a phone number, or a persons
name). - How do we maintain specific patterns of activity
in the face of distraction?
162Spreading Activation vs. Active Maintenance
- As you will see in the hands-on part of the
workshop, the networks we have been discussing
are not very robust to noise/distraction. - Thus, there appears to be another tradeoff
- Networks that are good at generalization/predictio
n are lousy at holding on to phone
numbers/plans/ideas in the face of distraction
163Spreading Activation vs. Active Maintenance
- Solution We have evolved a network that is
optimized for active maintenance Prefrontal
cortex! This complements the rest of cortex,
which is good at generalization but not so good
at active maintenance.
- PFC uses isolated representations to prevent
spread of activity... - Evidence for isolated stripes in PFC
164Tripartite Functional Organization
- PC posterior perceptual motor cortex
- FC prefrontal cortex
- HC hippocampus and related structures
165Tripartite Functional Organization
- PC incremental learning about the structure of
the environment - FC active maintenance, cognitive control
- HC rapid memorization
- Roles are defined by functional tradeoffs
166Key Trade-offs
- Extracting what is generally true (across events)
vs. memorizing specific events - Inference (spreading activation) vs. robust
active maintenance
167Hands-On Exercises
- The goal of the hands-on part of the workshop is
to get a feel for the kinds of representations
that are acquired by Hebbian vs. error-driven
learning, and for network dynamics more generally.
168- Here is the network that we will be using
- Activity constraints Only 10 of hidden units
can be strongly active at once in the input
layer, only one unit per row - Think of each row in the input as a feature
dimension (e.g., shape) and the units in that row
are mutually exclusive features along that
dimension (square, circle, etc.)
169- This diagram illustrates the connectivity of the
network
- Each hidden unit is connected to 50 of the input
units there are also recurrent connections from
each hidden unit to all of the other hidden units - Weights are symmetric
- Initial weight values were set randomly
170- I trained up the network on the following 8
patterns
Typical Bird Number 1
Typical Bird Number 2
Typical Fish Number 2
Typical Fish Number 1
Typical Bird Number 3
Atypical Bird (duck)
Atypical Fish (flying fish)
Typical Fish Number 3
- In each pattern, the bottom 16 rows encode
prototypical features that tend to be shared
across patterns within a category the top 8 rows
encode item-specific features that are unique to
each pattern. - Each category has 3 typical items and one
atypical item - During training, the network studied typical
patterns 90 of the time and it studied atypical
patterns 10 of the time
171- To save time, the networks you will be using have
been pre-trained on the 8 patterns (by presenting
them repeatedly, in an interleaved fashion) - For some of the simulations, you will be using a
network that was trained with (purely) Hebbian
learning
172- For other simulations, you will be using a
network that was trained with a combination of
error-driven (GeneRec) and Hebbian learning.
Training of this network use a three-phase
design - First, there was a prediction (minus) phase
where a partial pattern was presented - Second, there was an outcome (plus) phase where
the full version of the pattern was presented - Finally, there was a nothing phase where the
input pattern was erased (but not the hidden
pattern) - Error-driven learning occurred based on the
difference in activity between the minus and plus
patterns, and based on the differenced in
activity between the plus and nothing patterns
173- When you get to the computer room, the simulation
should already be open on the computer (some of
you may have to double-up, I think there are
slightly fewer computers than students) and there
will be a handout on the desk explaining what to
do - You can proceed at your own pace
- I will be there to answer questions (about the
lecture and about the computer exercises) and my
two grad students Ehren Newman and Sean Polyn
will also be there to answer questions.
174Your Helpers
Ehren Sean
me