The Basal Ganglia (Lecture 6)

About This Presentation

Title:

The Basal Ganglia (Lecture 6)

Description:

Now known to be insufficient for goal-directed behaviour (Daw and Dayan), which ... The primate neostriatum is organized into cell islands or clusters (striosomes ... – PowerPoint PPT presentation

Number of Views:108

Avg rating:3.0/5.0

Slides: 41

Provided by: harry8

more less

Transcript and Presenter's Notes

Title: The Basal Ganglia (Lecture 6)

1
The Basal Ganglia(Lecture 6)

Harry R. Erwin, PhD
COMM2E
University of Sunderland

2
Why is this important?

Not well-understood
Hot research area
Apparently underlies reward learning.
Related to the production of behaviour.
May play a role in spatial localization.
Now known to be insufficient for goal-directed
behaviour (Daw and Dayan), which seems to involve
forward models in the prefrontal cortex or some
specialised processing in the basal ganglia. Care
for a Nobel Prize?solve this!

3
Resources

Shepherd, G., ed., 2004, The Synaptic
Organization of the Brain, 5th edition, Oxford
University Press.
http//scat-he-g4.sunderland.ac.uk/harryerw/phpwi
ki/index.php/BasalGanglia
http//www.unifr.ch/biochem/DREYER/BG.html

4
Reinforcement Learning

Montague, PR, Hyman, SE JD Cohen, 2004,
Computational roles for dopamine in behavioural
control, Nature, 431760-767, 14 October 2004.
Reinforcement learning theories discuss how
(habitual) behaviour is organized in response to
rewards or reinforcers. This is not stimulus
response learning. This is also not how
goal-oriented behaviour is learned.

5
How it Works

The 'reinforcement signal' distribution measures
the current value of the possible states of the
agent.
The current state of the agent is converted into
a 'value' using a 'value function'.
A 'policy function' then maps the agent's states
to its possible actions, with the probability of
each possible action weighted by the value of the
next state produced by the action.

6
Temporal Difference Learning

A form of reinforcement learning of interest here
is temporal difference learning, where
current TD error current reward gammanext
prediction - current prediction.
Supports the learning of a plan leading to a
reward.

7
Actor-Critic Model

Sutton and Barto, 1998, Reinforcement Learning,
MIT Press, describe a mechanism for bootstrapping
reinforcement learning.
Actor-critic methods have a separate memory
structure to explicitly represent the policy
independent of the value function. The policy
structure is known as the actor, because it is
used to select actions, and the estimated value
function is known as the critic, because it
criticizes the actions made by the actor.
The critique takes the form of a TD error
estimate.

8
The Algorithm

?t rt1 ?V(st1) - V(st),
where rt1 is the actual reward at time t1,
st is the state at time t,
V(s) is the current perceived value of the state,
s, and
? is the discount rate that translates a value at
time t1 to a lower value at time t.
?t is the TD error estimate at time t1 of
following a specific action, a, at time t.
V(s) is zero at terminal states, and rt is zero
unless there is a real reward at time t.

9
Interpretation

?t is the quantity that appears to correspond to
the dopamine level output by the basal ganglia to
the cortex (Schultz, et al.).
How are V(s) and the preferences for the various
actions, a, updated?
Given a set of actions, a, let p(st, at) be the
preference for action a at time t given state s.
Then let the probability of picking a be ?(s,a)
exp(p(s,a))/?(p(s,ab)) summed over all reasonable
actions ab.

10
Learning Processes

Now, update the function V(st,at) by adding ?t
times some learning rate (less than one).
Update p(st, at) by adding ?t times another
learning rate (less than one).
Thats all, folks.
Note the state space is very large.
Actor-critic learning cannot cope with changing
goal values.

11
A Few Points

Actor-critic learning works better for high-level
rather than low-level actions. Somehow the
biological system is able to shift up.
Note that the error, ?t, can be either positive
or negative. The basal ganglia output both
dopamine () and GABA (-) to represent the error.
Cocaine has the property of producing an error
signal that is always positive, which really
fouls up the learning process.
Mirror neurons may play a role in this and autism
may be a malady of this system.

12
The Bootstrap Issue

To make this work, the critic has to either
innately know the rewards for various actor
actions, or it has to learn them.
The resulting bootstrap problem is of
particular importance in biological systems that
might implement the model.
One approach might be for the critic to reward
all actions indiscriminately and then as noxious
stimuli are reported by sensory systems, reduce
the corresponding rewards.
Is this biologically realistic?

13
The Basal Ganglia

A richly connected set of brain nuclei in the
fore- and mid-brain of amniotes.
Degenerative diseases tend to produce severe
movement deficits, but there is reason to believe
the function of the basal ganglia is more
generalthe selection among candidate movements,
goals, strategies, and interpretations of sensory
information.
(Wilson, 2004, in Shepherd, from which much of
this presentation is derived).

14
Rostral Anatomy
lthttp//thalamus.wustl.edu/course/cerebell.htmlgt
15
Medial Anatomy
lthttp//thalamus.wustl.edu/course/cerebell.htmlgt
16
Caudal Anatomy
lthttp//thalamus.wustl.edu/course/cerebell.htmlgt
17
Nuclei of the Basal Ganglia

The most prominent are the following
Caudate Nucleus
Putamen or Striatum
Nucleus Accumbens
Globus Pallidus (GP)
external segment (GPe), internal segment (GPi)
Substantia Nigra (SN)
pars reticulata (SNr), pars compacta (SNc)
Subthalamic Nucleus
The two largest sources of input are the cerebral
cortex and thalamus

18
BG Circuits
(from Dreyer, http//www.unifr.ch/biochem/DREYER/B
G.html )
19
Neostriatum

The neostriatum consists of the caudate nucleus,
the putamen, and the nucleus accumbens.
For the caudate nucleus and putamen, inputs from
sensory, motor, and association cortical areas
converge with inputs from the thalamic
intralaminar nuclei, dopaminergic inputs from the
SNc, and 5HT inputs from the dorsal Raphe'
nucleus (serotoninergic).
This subsystem supports planning and
reinforcement learning involving the PFC.

20
Putamen

A portion of the basal ganglia that forms the
outermost part of the lenticular nucleus.
The motor and somatosensory cortices, the
intralaminar nuclei of the thalamus, and the
substantia nigra project to the putamen.
The putamen projects to premotor and
supplementary motor areas of cortex via the
globus pallidus and thalamus.
Coextensive with the insula, which has been found
to contain mirror neurons.

21
Nucleus Accumbens

There are similar connections from the limbic
cortex (emotional) and hippocampus, converging
with inputs from the ventral tegmental area (VTA)
in the nucleus accumbens.
The VTA is dopaminergic and seems to play a role
in reward learning.
This subsystem appears to support emotional
learning.

22
Input Structure

The cortex, thalamus, and amygdala provide
glutamergic input to the neostriatum (and can
produce LTP or LTD).
Most neostriatal interneurons are GABAergic,
except the cholinergic cells, which are
neuromodulatory, and the output of the principal
cells is also GABAergic.

23
Neostriatal Structure

Consist mainly of principle neurons and afferent
fibres, with smaller populations of interneurons.
The neostriatum appears to be a functional
remapping of the cortex, based on common
interests of some sort. For example the neurons
concerned with a finger will tend to project to a
common area.
Coincidence detection important.

24
Neostriatal Neurons

GABAergic principal neurons firing rarely and for
short periods of time (100-3000 msec).
The axons emit local collaterals to form an
extremely rich arborization and then project to
their long-range destinations.
Approximately half are direct pathway neurons and
the other half are indirect pathway neurons. It's
unclear in Wilson, but it may be that only the
direct pathway neurons are collateralized.

25
Neostriatal Interneurons

A number of rare types (eight to nine estimated).
Three major types as follows
Giant cholinergic interneurons forming a dense
plexus of extremely fine axonal branches. Tonic
GABA/parvalbumin-containing basket cells. Very
similar to basket cells of the hippocampus and
cerebral cortex. Linked by gap junctions.
Somatostatin (SOM)/nitric oxide synthetase
(NOS)-containing interneurons. A neuromodulatory
function. Probably GABAergic.

26
Neostriatal Outputs

The output of the neostriatum projects to the
GPe, GPi, and SNr.
The GPi and SNr project (GABAergic projections)
outside the basal ganglia to the thalamus (and
mostly from there to the frontal cortex), the
lateral habenular nucleus, and the deep layers of
the superior colliculus.
The GPe projects mostly to the subthalamic
nucleus, which also receives frontal input and
finally projects to the GPe, GPi, and SN.

27
Intermediate Processing

At the GP and SN, most afference is from the
neostriatum, with secondary input from the
subthalamic nucleus.
The GPe projects to the GPi and SNr and has
recurrent local inhibitory connections.
The GPe also receives some input from the
cerebral cortex and thalamus.
The subthalamic neurons receive excitatory inputs
from the cortex and inhibitory input from the GPe.

28
GP Processing

The principal cells of the GP are inhibitory,
receive excitatory input from the subthalamic
nucleus, and inhibitory input from the
neostriatum.
The GPe inhibits the GPi and the SNr, which are
the output nuclei of the basal ganglia.

29
Phasic/Tonic

The principal cells of the SNc are dopaminergic
and neuromodulatory. The SNc and the VTA seem to
encode rewards.
The cells of the GP and SN fire tonically, at
very high rates, the GP and SNr inhibiting
neurons in the thalamus and SC.
Phasic firing of neostriatal neurons produces a
pause in this tonic firing, allowing thalamic and
SC neurons to respond to input. (This can also
terminate tonic activity in the cortex.)

30
Detailed Neostriatal Projections

There are two pathways
Direct pathway neurons with direct projections
to GPi and SN (possibly in addition to the GPe),
directly playing a role in the output of the
basal ganglia.
Indirect pathway neurons that project only to
GPe. These affect the output of the basal ganglia
via projections of the subthalamic nucleus and
the GPe.

31
Cell Counts

Count of the neostriatum is estimated at about
100,000,000 neurons.
The GP is about 700,000 neurons in toto, 170,000
in the GPi. Highly convergent.
Spiking in the GP and SN is very localized.
Principal cells of the neostriatum receive about
11,000 afferent synapses from about the same
number of thalamic and cortical neurons.

32
Patch Structure

The primate neostriatum is organized into cell
islands or clusters (striosomes or patches) in a
background of lesser cellular density (the
matrix). Afferent fibres observe this
compartmentalization, with some cortical regions
projecting to each.
Infragranular pyramidal neurons (layers 5 and 6)
seem to project to the patches, while
extragranular neurons (layers 2 and 3) project to
the matrix.

33
Targets of Patches

The patches project preferentially to the
dopaminergic neurons of the SNc, while the matrix
projects to the SNr (non-dopaminergic neurons
projecting to the thalamus and SC),
Results in two parallel pathways (in addition to
the direct and indirect pathways, which are
present in both).
Interneurons in the neostriatum may provide
intercommunication between the two paths.

34
General Role of the Basal Ganglia

The basal ganglia are suspected of being a system
that detects candidate movements, goals,
strategies, or interpretations of sensory
patterns and releases responses.
They seem to be a multisensory integration
system, and this seems particularly the case with
reference to the SC.

35
How it May Work

DA neurons fire in response to the resolution of
uncertainty about the prospects for reward,
providing a training signal for the neostriatal
system
These fire more at the moment when the animal
recognizes it can begin a behavioral sequence
that will end with a reward.
Pause when an expected reward isnt received.
The neostriatum thus detects patterns of cortical
activity associated with future reward,
associating values to situations.

36
Why Two Neostriatal Areas?

The matrix seems to learn what has worked in the
past.
The patches learn which cortical inputs are best
able to predict the value of particular
situations.
Patches might use dopaminergic signals based on
current knowledge to learn how to predict
dopaminergic signals more accurately. (Houk,
Adams, and Barto)
To avoid a bootstrap problem, there has to be
innate neural connectivity so that immediate
rewards for behaviour are signalled to the SN via
the patches.

37
Basic Mechanism of the BG

Disinhibition of proposed actions
The basal ganglia output nuclei tonically inhibit
the thalamic nuclei and the superior colliculus.
Released when input patterns excite principal
neurons of the neostriatum.
Tonic activity regulated by striatal projections
to the GPe via the GP (inhibitory principal
neurons) and to the subthalamic nucleus
(excitatory principal neurons) that increase the
activity of the GPi and SNr neurons, producing a
balanced opposition of activity.

38
Feedback in the Neostriatum

Plenz, Dietmar, (2003), "When inhibition goes
incognito feedback interaction between spiny
projection neurons in striatal function," TINS,
26(8)436-443, August 2003.
This paper discusses how spiny projection neurons
(the principal GABAergic neurons of the striatum)
process cortical inputs in a highly parallel way.

39
Implications

Striatal dynamics are probably not 'winner take
all'. Local depolarization facilitates the
depolarization of nearby cells, so that
behavioural sequences can be generated.
Plenz suggests the striatum could also function
as a resistive grid that computes state
transitions for movement trajectories. (See
Connolly and Burns, 1993, "A model for the
functioning of the striatum," Biological
Cybernetics 68535-544.)
This is an important but unclear idea.

40
Conclusions

If you want to use reward learning in a system
that generates behaviour, look at the
Actor-Critic model.
If you want to build a biologically-inspired
reward learning system, consider the basal
ganglia as a model.
If you want to do the same for a trajectory
prediction system, also consider modelling the
basal ganglia.

Write a Comment

User Comments (0)