Computational Neuromodulation

1 / 45

About This Presentation

Title:

Computational Neuromodulation

Description:

... Psychology classical/operant conditioning Computation dynamic programming Kalman filtering Algorithm TD/delta rules Dopamine ... Pavlovian misbehavior and ... – PowerPoint PPT presentation

Number of Views:1

Avg rating:3.0/5.0

Slides: 46

Provided by: Angela216

Learn more at: http://www.cs.caltech.edu

more less

Transcript and Presenter's Notes

Title: Computational Neuromodulation

1
Computational Neuromodulation

Peter Dayan
Gatsby Computational Neuroscience Unit
University College London

Nathaniel Daw Sham Kakade Read Montague
John ODoherty Wolfram Schultz Ben
Seymour Terry Sejnowski Angela Yu
2

5. Diseases of the Will
Contemplators
Bibliophiles and Polyglots
Megalomaniacs
Instrument addicts
Misfits
Theorists

3
Theorists
There are highly cultivated, wonderfully endowed
minds whose wills suffer from a particular form
of lethargy. Its undeniable symptoms include a
facility for exposition, a creative and restless
imagination, an aversion to the laboratory, and
an indomitable dislike for concrete science and
seemingly unimportant data When faced with a
difficult problem, they feel an irresistible urge
to formulate a theory rather than question
nature. As might be expected, disappointments
plague the theorist
4
Computation and the Brain

statistical computations
representation from density estimation (Terry)
combining uncertain information over space, time,
modalities for sensory/memory inference
learning as a hierarchical Bayesian problem
learning as a filtering problem
control theoretic computations
optimising rewards, punishments
homeostasis/allostasis

5
Conditioning
prediction of important events control in
the light of those predictions
policy evaluation policy improvement

Ethology
Psychology
classical/operant
conditioning

Computation
dynamic programming
Kalman filtering
Algorithm
TD/delta rules

Neurobiology

neuromodulators amygdala OFC nucleus
accumbens dorsal striatum
6
Dopamine

drug addiction, self-stimulation
effect of antagonists
effect on vigour
link to action
scalar signal

R
L
R
L
Schultz et al
R
no prediction
prediction, reward
prediction, no reward
7
Prediction, but What Sort?

Sutton

predict sum future reward
TD error
8
Rewards rather than Punishments
TD error
R
L
V(t)
R
no prediction
prediction, reward
prediction, no reward
dopamine cells in VTA/SNc
Schultz et al
9
Prediction, but What Sort?

Sutton
Watkins policy evaluation

predict sum future reward
TD error
10
Policy Improvement

Sutton define p(xM) do R-M on
uses the same TD error
Watkins value iteration with

11
Active Issues

exploration/exploitation
model-based (PFC)/cached (striatal) methods
motivational influences
vigour
hierarchical control (PFC)
hyperbolic discounting, Pavlovian misbehavior and
the will
representational learning
appetitive/aversive opponency
links with behavioural economics

12
Computation and the Brain

statistical computations
representation from density estimation (Terry)
combining uncertain information over space, time,
modalities for sensory/memory inference
learning as a hierarchical Bayesian problem
learning as a filtering problem
control theoretic computations
optimising rewards, punishments
homeostasis/allostasis
exploration/exploitation trade-offs

13
Uncertainty
Computational functions of uncertainty

weaken top-down influence over sensory
processing
promote learning about the relevant
representations

14
Norepinephrine

vigilance
reversals
modulates plasticity? exploration?
scalar

15
Aston-Jones Target Detection
detect and react to a rare target amongst common
distractors

elevated tonic activity for reversal
activated by rare target (and reverses)
not reward/stimulus related? more response
related?

16
Vigilance Task

variable time in start
? controls confusability

one single run
cumulative is clearer

exact inference
effect of 80 prior

17
Phasic NE

NE reports uncertainty about current state
state in the model, not state of the model
divisively related to prior probability of that
state
NE measured relative to default state sequence
start ? distractor
temporal aspect - start ? distractor
structural aspect target versus distractor

18
Phasic NE

onset response from timing
uncertainty (SET)
growth as P(target)/0.2 rises
act when P(target)0.95
stop if P(target)0.01
arbitrarily set NE0 after
5 timesteps

(small prob of reflexive action)
19
Four Types of Trial
19
1.5
1
77
fall is rather arbitrary
20
Response Locking
slightly flatters the model since no
further response variability
21
Interrupts/Resets (SB)
PFC/ACC
LC
22
Active Issues

approximate inference strategy
interaction with expected uncertainty (ACh)
other representations of uncertainty
finer gradations of ignorance

23
Computation and the Brain

statistical computations
representation from density estimation (Terry)
combining uncertain information over space, time,
modalities for sensory/memory inference
learning as a hierarchical Bayesian problem
learning as a filtering problem
control theoretic computations
optimising rewards, punishments
homeostasis/allostasis
exploration/exploitation trade-offs

24
Computational Neuromodulation

general excitability, signal/noise ratios
specific prediction errors, uncertainty signals

25
Learning and Inference

Learning predict control
? weight ? (learning rate) x (error) x (stimulus)
dopamine
phasic prediction error for future reward
serotonin
phasic prediction error for future punishment

acetylcholine
expected uncertainty boosts learning
norepinephrine
unexpected uncertainty boosts learning

26
Learning and Inference
context
expected uncertainty
unexpected uncertainty
top-down processing
NE
ACh
cortical processing
prediction, learning, ...
bottom-up processing
sensory inputs
27
Temporal Difference Prediction Error
High Pain
0.8
1.0
0.2
0.2
Low Pain
0.8
1.0
predict sum future pain
TD error
? weight ? (learning rate) x (error) x (stimulus)
28
Temporal Difference Prediction Error
TD error
Prediction error
Value
High Pain
0.8
1.0
0.2
0.2
Low Pain
0.8
1.0
29
Temporal Difference Prediction Error
experimental sequence..
A B HIGH C D LOW C B HIGH
A B HIGH A D LOW C D
LOW A B HIGH A B HIGH C
D LOW C B HIGH
MR scanner
TD model
Brain responses
?
Ben Seymour John ODoherty
30

TD prediction error ventral striatum
Z-4
R
31
Temporal Difference Values
dorsal raphe?
right anterior insula
32
Rewards rather than Punishments
TD error
R
L
V(t)
R
no prediction
prediction, reward
prediction, no reward
dopamine cells in VTA/SNc
Schultz et al
33
TD Prediction Errors

computation dynamic programming and optimal
control
algorithm ongoing error in predictions of the
future
implementation
dopamine phasic prediction error for reward
tonic punishment
serotonin phasic prediction error for
punishment tonic reward
evident in VTA striatum raphe?
next action motivation addiction misbehavior

34
Two Cohenesque Theories

Qualitative (AJ) exploration v exploitation
high tonic mode involves labile attention
search for better options
important if short term reward rate is below par
implemented by changed brittleness?
Quantitative (EB) gain change in decision nets
NE controls balance of
recurrence/bottom-up
implements changed
S/N ratio with target
detect to detect
barely any benefit
why only for targets?

35
Task Difficulty

set ?0.65 rather than 0.675
information accumulates over a longer period
hits more affected than crs
timing not quite right

36
Intra-trial Uncertainty

phasic NE as unexpected state change within a
model
relative to prior probability against default
interrupts (resets) ongoing processing
tie to ADHD?
close to alerting (AJ) but not necessarily tied
to behavioral output (onset rise)
close to behavioural switching (PR) but not DA
farther from optimal inference (EB)
phasic ACh aspects of known variability within a
state?

37
Where Next

dopamine
tonic release and vigour
appetitive misbehaviour and hyperbolic
discounting
actions and habits
psychosis
serotonin
aversive misbehaviour and psychiatry
norepinephrine
stress, depression and beyond

38
Experimental Data

ACh NE have similar physiological effects
suppress recurrent feedback processing
enhance thalamocortical transmission
boost experience-dependent plasticity

(e.g. Kimura et al, 1995 Kobayashi et al, 2000)
(e.g. Gil et al, 1997)
(e.g. Bear Singer, 1986 Kilgard Merzenich,
1998)

ACh NE have distinct behavioral effects
ACh boosts learning to stimuli with uncertain
consequences
NE boosts learning upon encountering global
changes in the environment

(e.g. Bucci, Holland, Gallagher, 1998)
(e.g. Devauges Sara, 1990)
39
Model Schematics
context
expected uncertainty
unexpected uncertainty
top-down processing
NE
ACh
cortical processing
prediction, learning, ...
bottom-up processing
sensory inputs
40
Attention
attentional selection for (statistically) optimal
processing, above and beyond the traditional view
of resource constraint
0.1s
0.1s
0.2-0.5s
0.15s
generalize to the case that cue identity changes
with no notice
41
Formal Framework
ACh
NE
variability in quality of relevant cue
variability in identity of relevant cue
cues vestibular, visual, ...
target stimulus location, exit direction...
avoid representing full uncertainty
Sensory Information
42
Simulation Results Posners Task
vary cue validity ? vary ACh
fix relevant cue ? low NE
43
Maze Task
example 2 attentional shift
no issue of validity
44
Simulation Results Maze Navigation
fix cue validity ? no explicit manipulation of ACh
45
Simulation Results Full Model
46
Simulated Psychopharmacology
50 NE
ACh compensation
50 ACh/NE
NE can nearly catch up
47
Summary

single framework for understanding ACh, NE and
some
aspects of attention
ACh/NE as expected/unexpected uncertainty
signals
experimental psychopharmacological data
replicated by model simulations
implications from complex interactions between
ACh NE
predictions at the cellular, systems, and
behavioral levels
activity vs weight vs neuromodulatory vs
population representations of uncertainty