Title: Computational Neuromodulation
1Computational Neuromodulation
- Peter Dayan
- Gatsby Computational Neuroscience Unit
- University College London
Nathaniel Daw Sham Kakade Read Montague
John ODoherty Wolfram Schultz Ben
Seymour Terry Sejnowski Angela Yu
2- 5. Diseases of the Will
- Contemplators
- Bibliophiles and Polyglots
- Megalomaniacs
- Instrument addicts
- Misfits
- Theorists
3Theorists
There are highly cultivated, wonderfully endowed
minds whose wills suffer from a particular form
of lethargy. Its undeniable symptoms include a
facility for exposition, a creative and restless
imagination, an aversion to the laboratory, and
an indomitable dislike for concrete science and
seemingly unimportant data When faced with a
difficult problem, they feel an irresistible urge
to formulate a theory rather than question
nature. As might be expected, disappointments
plague the theorist
4Computation and the Brain
- statistical computations
- representation from density estimation (Terry)
- combining uncertain information over space, time,
modalities for sensory/memory inference - learning as a hierarchical Bayesian problem
- learning as a filtering problem
- control theoretic computations
- optimising rewards, punishments
- homeostasis/allostasis
5Conditioning
prediction of important events control in
the light of those predictions
policy evaluation policy improvement
- Ethology
- Psychology
- classical/operant
- conditioning
- Computation
- dynamic programming
- Kalman filtering
- Algorithm
- TD/delta rules
neuromodulators amygdala OFC nucleus
accumbens dorsal striatum
6Dopamine
- drug addiction, self-stimulation
- effect of antagonists
- effect on vigour
- link to action
- scalar signal
R
L
R
L
Schultz et al
R
no prediction
prediction, reward
prediction, no reward
7Prediction, but What Sort?
predict sum future reward
TD error
8Rewards rather than Punishments
TD error
R
L
V(t)
R
no prediction
prediction, reward
prediction, no reward
dopamine cells in VTA/SNc
Schultz et al
9Prediction, but What Sort?
- Sutton
- Watkins policy evaluation
predict sum future reward
TD error
10Policy Improvement
- Sutton define p(xM) do R-M on
- uses the same TD error
- Watkins value iteration with
11Active Issues
- exploration/exploitation
- model-based (PFC)/cached (striatal) methods
- motivational influences
- vigour
- hierarchical control (PFC)
- hyperbolic discounting, Pavlovian misbehavior and
the will - representational learning
- appetitive/aversive opponency
- links with behavioural economics
12Computation and the Brain
- statistical computations
- representation from density estimation (Terry)
- combining uncertain information over space, time,
modalities for sensory/memory inference - learning as a hierarchical Bayesian problem
- learning as a filtering problem
- control theoretic computations
- optimising rewards, punishments
- homeostasis/allostasis
- exploration/exploitation trade-offs
13Uncertainty
Computational functions of uncertainty
- weaken top-down influence over sensory
processing - promote learning about the relevant
representations
14Norepinephrine
- vigilance
- reversals
- modulates plasticity? exploration?
- scalar
15Aston-Jones Target Detection
detect and react to a rare target amongst common
distractors
- elevated tonic activity for reversal
- activated by rare target (and reverses)
- not reward/stimulus related? more response
related?
16Vigilance Task
- variable time in start
- ? controls confusability
- one single run
- cumulative is clearer
- exact inference
- effect of 80 prior
17Phasic NE
- NE reports uncertainty about current state
- state in the model, not state of the model
- divisively related to prior probability of that
state - NE measured relative to default state sequence
- start ? distractor
- temporal aspect - start ? distractor
- structural aspect target versus distractor
18Phasic NE
- onset response from timing
- uncertainty (SET)
- growth as P(target)/0.2 rises
- act when P(target)0.95
- stop if P(target)0.01
- arbitrarily set NE0 after
- 5 timesteps
(small prob of reflexive action)
19Four Types of Trial
19
1.5
1
77
fall is rather arbitrary
20Response Locking
slightly flatters the model since no
further response variability
21Interrupts/Resets (SB)
PFC/ACC
LC
22Active Issues
- approximate inference strategy
- interaction with expected uncertainty (ACh)
- other representations of uncertainty
- finer gradations of ignorance
23Computation and the Brain
- statistical computations
- representation from density estimation (Terry)
- combining uncertain information over space, time,
modalities for sensory/memory inference - learning as a hierarchical Bayesian problem
- learning as a filtering problem
- control theoretic computations
- optimising rewards, punishments
- homeostasis/allostasis
- exploration/exploitation trade-offs
24Computational Neuromodulation
- general excitability, signal/noise ratios
- specific prediction errors, uncertainty signals
25Learning and Inference
- Learning predict control
- ? weight ? (learning rate) x (error) x (stimulus)
- dopamine
- phasic prediction error for future reward
- serotonin
- phasic prediction error for future punishment
- acetylcholine
- expected uncertainty boosts learning
- norepinephrine
- unexpected uncertainty boosts learning
26Learning and Inference
context
expected uncertainty
unexpected uncertainty
top-down processing
NE
ACh
cortical processing
prediction, learning, ...
bottom-up processing
sensory inputs
27Temporal Difference Prediction Error
High Pain
0.8
1.0
0.2
0.2
Low Pain
0.8
1.0
predict sum future pain
TD error
? weight ? (learning rate) x (error) x (stimulus)
28Temporal Difference Prediction Error
TD error
Prediction error
Value
High Pain
0.8
1.0
0.2
0.2
Low Pain
0.8
1.0
29Temporal Difference Prediction Error
experimental sequence..
A B HIGH C D LOW C B HIGH
A B HIGH A D LOW C D
LOW A B HIGH A B HIGH C
D LOW C B HIGH
MR scanner
TD model
Brain responses
?
Ben Seymour John ODoherty
30 TD prediction error ventral striatum
Z-4
R
31Temporal Difference Values
dorsal raphe?
right anterior insula
32Rewards rather than Punishments
TD error
R
L
V(t)
R
no prediction
prediction, reward
prediction, no reward
dopamine cells in VTA/SNc
Schultz et al
33TD Prediction Errors
- computation dynamic programming and optimal
control - algorithm ongoing error in predictions of the
future - implementation
- dopamine phasic prediction error for reward
tonic punishment - serotonin phasic prediction error for
punishment tonic reward - evident in VTA striatum raphe?
- next action motivation addiction misbehavior
34Two Cohenesque Theories
- Qualitative (AJ) exploration v exploitation
- high tonic mode involves labile attention
- search for better options
- important if short term reward rate is below par
- implemented by changed brittleness?
- Quantitative (EB) gain change in decision nets
- NE controls balance of
- recurrence/bottom-up
- implements changed
- S/N ratio with target
- detect to detect
- barely any benefit
- why only for targets?
35Task Difficulty
- set ?0.65 rather than 0.675
- information accumulates over a longer period
- hits more affected than crs
- timing not quite right
36Intra-trial Uncertainty
- phasic NE as unexpected state change within a
model - relative to prior probability against default
- interrupts (resets) ongoing processing
- tie to ADHD?
- close to alerting (AJ) but not necessarily tied
to behavioral output (onset rise) - close to behavioural switching (PR) but not DA
- farther from optimal inference (EB)
- phasic ACh aspects of known variability within a
state?
37Where Next
- dopamine
- tonic release and vigour
- appetitive misbehaviour and hyperbolic
discounting - actions and habits
- psychosis
- serotonin
- aversive misbehaviour and psychiatry
- norepinephrine
- stress, depression and beyond
38Experimental Data
- ACh NE have similar physiological effects
- suppress recurrent feedback processing
- enhance thalamocortical transmission
- boost experience-dependent plasticity
(e.g. Kimura et al, 1995 Kobayashi et al, 2000)
(e.g. Gil et al, 1997)
(e.g. Bear Singer, 1986 Kilgard Merzenich,
1998)
- ACh NE have distinct behavioral effects
- ACh boosts learning to stimuli with uncertain
- consequences
- NE boosts learning upon encountering global
- changes in the environment
(e.g. Bucci, Holland, Gallagher, 1998)
(e.g. Devauges Sara, 1990)
39Model Schematics
context
expected uncertainty
unexpected uncertainty
top-down processing
NE
ACh
cortical processing
prediction, learning, ...
bottom-up processing
sensory inputs
40Attention
attentional selection for (statistically) optimal
processing, above and beyond the traditional view
of resource constraint
0.1s
0.1s
0.2-0.5s
0.15s
generalize to the case that cue identity changes
with no notice
41Formal Framework
ACh
NE
variability in quality of relevant cue
variability in identity of relevant cue
cues vestibular, visual, ...
target stimulus location, exit direction...
avoid representing full uncertainty
Sensory Information
42Simulation Results Posners Task
vary cue validity ? vary ACh
fix relevant cue ? low NE
43Maze Task
example 2 attentional shift
no issue of validity
44Simulation Results Maze Navigation
fix cue validity ? no explicit manipulation of ACh
45Simulation Results Full Model
46Simulated Psychopharmacology
50 NE
ACh compensation
50 ACh/NE
NE can nearly catch up
47Summary
- single framework for understanding ACh, NE and
some - aspects of attention
- ACh/NE as expected/unexpected uncertainty
signals - experimental psychopharmacological data
replicated by model simulations - implications from complex interactions between
ACh NE - predictions at the cellular, systems, and
behavioral levels - activity vs weight vs neuromodulatory vs
population representations of uncertainty