Title: The Basal Ganglia (Lecture 6)
1The Basal Ganglia(Lecture 6)
- Harry R. Erwin, PhD
- COMM2E
- University of Sunderland
2Why is this important?
- Not well-understood
- Hot research area
- Apparently underlies reward learning.
- Related to the production of behaviour.
- May play a role in spatial localization.
- Now known to be insufficient for goal-directed
behaviour (Daw and Dayan), which seems to involve
forward models in the prefrontal cortex or some
specialised processing in the basal ganglia. Care
for a Nobel Prize?solve this!
3Resources
- Shepherd, G., ed., 2004, The Synaptic
Organization of the Brain, 5th edition, Oxford
University Press. - http//scat-he-g4.sunderland.ac.uk/harryerw/phpwi
ki/index.php/BasalGanglia - http//www.unifr.ch/biochem/DREYER/BG.html
4Reinforcement Learning
- Montague, PR, Hyman, SE JD Cohen, 2004,
Computational roles for dopamine in behavioural
control, Nature, 431760-767, 14 October 2004. - Reinforcement learning theories discuss how
(habitual) behaviour is organized in response to
rewards or reinforcers. This is not stimulus
response learning. This is also not how
goal-oriented behaviour is learned.
5How it Works
- The 'reinforcement signal' distribution measures
the current value of the possible states of the
agent. - The current state of the agent is converted into
a 'value' using a 'value function'. - A 'policy function' then maps the agent's states
to its possible actions, with the probability of
each possible action weighted by the value of the
next state produced by the action.
6Temporal Difference Learning
- A form of reinforcement learning of interest here
is temporal difference learning, where - current TD error current reward gammanext
prediction - current prediction. - Supports the learning of a plan leading to a
reward.
7Actor-Critic Model
- Sutton and Barto, 1998, Reinforcement Learning,
MIT Press, describe a mechanism for bootstrapping
reinforcement learning. - Actor-critic methods have a separate memory
structure to explicitly represent the policy
independent of the value function. The policy
structure is known as the actor, because it is
used to select actions, and the estimated value
function is known as the critic, because it
criticizes the actions made by the actor. - The critique takes the form of a TD error
estimate.
8The Algorithm
- ?t rt1 ?V(st1) - V(st),
- where rt1 is the actual reward at time t1,
- st is the state at time t,
- V(s) is the current perceived value of the state,
s, and - ? is the discount rate that translates a value at
time t1 to a lower value at time t. - ?t is the TD error estimate at time t1 of
following a specific action, a, at time t. - V(s) is zero at terminal states, and rt is zero
unless there is a real reward at time t.
9Interpretation
- ?t is the quantity that appears to correspond to
the dopamine level output by the basal ganglia to
the cortex (Schultz, et al.). - How are V(s) and the preferences for the various
actions, a, updated? - Given a set of actions, a, let p(st, at) be the
preference for action a at time t given state s. - Then let the probability of picking a be ?(s,a)
exp(p(s,a))/?(p(s,ab)) summed over all reasonable
actions ab.
10Learning Processes
- Now, update the function V(st,at) by adding ?t
times some learning rate (less than one). - Update p(st, at) by adding ?t times another
learning rate (less than one). - Thats all, folks.
- Note the state space is very large.
- Actor-critic learning cannot cope with changing
goal values.
11A Few Points
- Actor-critic learning works better for high-level
rather than low-level actions. Somehow the
biological system is able to shift up. - Note that the error, ?t, can be either positive
or negative. The basal ganglia output both
dopamine () and GABA (-) to represent the error.
Cocaine has the property of producing an error
signal that is always positive, which really
fouls up the learning process. - Mirror neurons may play a role in this and autism
may be a malady of this system.
12The Bootstrap Issue
- To make this work, the critic has to either
innately know the rewards for various actor
actions, or it has to learn them. - The resulting bootstrap problem is of
particular importance in biological systems that
might implement the model. - One approach might be for the critic to reward
all actions indiscriminately and then as noxious
stimuli are reported by sensory systems, reduce
the corresponding rewards. - Is this biologically realistic?
13The Basal Ganglia
- A richly connected set of brain nuclei in the
fore- and mid-brain of amniotes. - Degenerative diseases tend to produce severe
movement deficits, but there is reason to believe
the function of the basal ganglia is more
generalthe selection among candidate movements,
goals, strategies, and interpretations of sensory
information. - (Wilson, 2004, in Shepherd, from which much of
this presentation is derived).
14Rostral Anatomy
lthttp//thalamus.wustl.edu/course/cerebell.htmlgt
15Medial Anatomy
lthttp//thalamus.wustl.edu/course/cerebell.htmlgt
16Caudal Anatomy
lthttp//thalamus.wustl.edu/course/cerebell.htmlgt
17Nuclei of the Basal Ganglia
- The most prominent are the following
- Caudate Nucleus
- Putamen or Striatum
- Nucleus Accumbens
- Globus Pallidus (GP)
- external segment (GPe), internal segment (GPi)
- Substantia Nigra (SN)
- pars reticulata (SNr), pars compacta (SNc)
- Subthalamic Nucleus
- The two largest sources of input are the cerebral
cortex and thalamus
18BG Circuits
(from Dreyer, http//www.unifr.ch/biochem/DREYER/B
G.html )
19Neostriatum
- The neostriatum consists of the caudate nucleus,
the putamen, and the nucleus accumbens. - For the caudate nucleus and putamen, inputs from
sensory, motor, and association cortical areas
converge with inputs from the thalamic
intralaminar nuclei, dopaminergic inputs from the
SNc, and 5HT inputs from the dorsal Raphe'
nucleus (serotoninergic). - This subsystem supports planning and
reinforcement learning involving the PFC.
20Putamen
- A portion of the basal ganglia that forms the
outermost part of the lenticular nucleus. - The motor and somatosensory cortices, the
intralaminar nuclei of the thalamus, and the
substantia nigra project to the putamen. - The putamen projects to premotor and
supplementary motor areas of cortex via the
globus pallidus and thalamus. - Coextensive with the insula, which has been found
to contain mirror neurons.
21Nucleus Accumbens
- There are similar connections from the limbic
cortex (emotional) and hippocampus, converging
with inputs from the ventral tegmental area (VTA)
in the nucleus accumbens. - The VTA is dopaminergic and seems to play a role
in reward learning. - This subsystem appears to support emotional
learning.
22Input Structure
- The cortex, thalamus, and amygdala provide
glutamergic input to the neostriatum (and can
produce LTP or LTD). - Most neostriatal interneurons are GABAergic,
except the cholinergic cells, which are
neuromodulatory, and the output of the principal
cells is also GABAergic.
23Neostriatal Structure
- Consist mainly of principle neurons and afferent
fibres, with smaller populations of interneurons.
- The neostriatum appears to be a functional
remapping of the cortex, based on common
interests of some sort. For example the neurons
concerned with a finger will tend to project to a
common area. - Coincidence detection important.
24Neostriatal Neurons
- GABAergic principal neurons firing rarely and for
short periods of time (100-3000 msec). - The axons emit local collaterals to form an
extremely rich arborization and then project to
their long-range destinations. - Approximately half are direct pathway neurons and
the other half are indirect pathway neurons. It's
unclear in Wilson, but it may be that only the
direct pathway neurons are collateralized.
25Neostriatal Interneurons
- A number of rare types (eight to nine estimated).
Three major types as follows - Giant cholinergic interneurons forming a dense
plexus of extremely fine axonal branches. Tonic - GABA/parvalbumin-containing basket cells. Very
similar to basket cells of the hippocampus and
cerebral cortex. Linked by gap junctions. - Somatostatin (SOM)/nitric oxide synthetase
(NOS)-containing interneurons. A neuromodulatory
function. Probably GABAergic.
26Neostriatal Outputs
- The output of the neostriatum projects to the
GPe, GPi, and SNr. - The GPi and SNr project (GABAergic projections)
outside the basal ganglia to the thalamus (and
mostly from there to the frontal cortex), the
lateral habenular nucleus, and the deep layers of
the superior colliculus. - The GPe projects mostly to the subthalamic
nucleus, which also receives frontal input and
finally projects to the GPe, GPi, and SN.
27Intermediate Processing
- At the GP and SN, most afference is from the
neostriatum, with secondary input from the
subthalamic nucleus. - The GPe projects to the GPi and SNr and has
recurrent local inhibitory connections. - The GPe also receives some input from the
cerebral cortex and thalamus. - The subthalamic neurons receive excitatory inputs
from the cortex and inhibitory input from the GPe.
28GP Processing
- The principal cells of the GP are inhibitory,
receive excitatory input from the subthalamic
nucleus, and inhibitory input from the
neostriatum. - The GPe inhibits the GPi and the SNr, which are
the output nuclei of the basal ganglia.
29Phasic/Tonic
- The principal cells of the SNc are dopaminergic
and neuromodulatory. The SNc and the VTA seem to
encode rewards. - The cells of the GP and SN fire tonically, at
very high rates, the GP and SNr inhibiting
neurons in the thalamus and SC. - Phasic firing of neostriatal neurons produces a
pause in this tonic firing, allowing thalamic and
SC neurons to respond to input. (This can also
terminate tonic activity in the cortex.)
30Detailed Neostriatal Projections
- There are two pathways
- Direct pathway neurons with direct projections
to GPi and SN (possibly in addition to the GPe),
directly playing a role in the output of the
basal ganglia. - Indirect pathway neurons that project only to
GPe. These affect the output of the basal ganglia
via projections of the subthalamic nucleus and
the GPe.
31Cell Counts
- Count of the neostriatum is estimated at about
100,000,000 neurons. - The GP is about 700,000 neurons in toto, 170,000
in the GPi. Highly convergent. - Spiking in the GP and SN is very localized.
- Principal cells of the neostriatum receive about
11,000 afferent synapses from about the same
number of thalamic and cortical neurons.
32Patch Structure
- The primate neostriatum is organized into cell
islands or clusters (striosomes or patches) in a
background of lesser cellular density (the
matrix). Afferent fibres observe this
compartmentalization, with some cortical regions
projecting to each. - Infragranular pyramidal neurons (layers 5 and 6)
seem to project to the patches, while
extragranular neurons (layers 2 and 3) project to
the matrix.
33Targets of Patches
- The patches project preferentially to the
dopaminergic neurons of the SNc, while the matrix
projects to the SNr (non-dopaminergic neurons
projecting to the thalamus and SC), - Results in two parallel pathways (in addition to
the direct and indirect pathways, which are
present in both). - Interneurons in the neostriatum may provide
intercommunication between the two paths.
34General Role of the Basal Ganglia
- The basal ganglia are suspected of being a system
that detects candidate movements, goals,
strategies, or interpretations of sensory
patterns and releases responses. - They seem to be a multisensory integration
system, and this seems particularly the case with
reference to the SC.
35How it May Work
- DA neurons fire in response to the resolution of
uncertainty about the prospects for reward,
providing a training signal for the neostriatal
system - These fire more at the moment when the animal
recognizes it can begin a behavioral sequence
that will end with a reward. - Pause when an expected reward isnt received.
- The neostriatum thus detects patterns of cortical
activity associated with future reward,
associating values to situations.
36Why Two Neostriatal Areas?
- The matrix seems to learn what has worked in the
past. - The patches learn which cortical inputs are best
able to predict the value of particular
situations. - Patches might use dopaminergic signals based on
current knowledge to learn how to predict
dopaminergic signals more accurately. (Houk,
Adams, and Barto) - To avoid a bootstrap problem, there has to be
innate neural connectivity so that immediate
rewards for behaviour are signalled to the SN via
the patches.
37Basic Mechanism of the BG
- Disinhibition of proposed actions
- The basal ganglia output nuclei tonically inhibit
the thalamic nuclei and the superior colliculus. - Released when input patterns excite principal
neurons of the neostriatum. - Tonic activity regulated by striatal projections
to the GPe via the GP (inhibitory principal
neurons) and to the subthalamic nucleus
(excitatory principal neurons) that increase the
activity of the GPi and SNr neurons, producing a
balanced opposition of activity.
38Feedback in the Neostriatum
- Plenz, Dietmar, (2003), "When inhibition goes
incognito feedback interaction between spiny
projection neurons in striatal function," TINS,
26(8)436-443, August 2003. - This paper discusses how spiny projection neurons
(the principal GABAergic neurons of the striatum)
process cortical inputs in a highly parallel way.
39Implications
- Striatal dynamics are probably not 'winner take
all'. Local depolarization facilitates the
depolarization of nearby cells, so that
behavioural sequences can be generated. - Plenz suggests the striatum could also function
as a resistive grid that computes state
transitions for movement trajectories. (See
Connolly and Burns, 1993, "A model for the
functioning of the striatum," Biological
Cybernetics 68535-544.) - This is an important but unclear idea.
40Conclusions
- If you want to use reward learning in a system
that generates behaviour, look at the
Actor-Critic model. - If you want to build a biologically-inspired
reward learning system, consider the basal
ganglia as a model. - If you want to do the same for a trajectory
prediction system, also consider modelling the
basal ganglia.