Title: Models of decisionmaking
1Models of decision-making
- Sommerakademie St. Johann
- 8. September 2009
- Konstantin Stark
- TU München
2Overview
- Visual-Saccadic Decision Making
- Random Dot Motion task
- Rise-to-threshold-accumulator models
- Random-walk
- Linear rise-to-threshold
- Value-based Decision Making
- Decision-making computation
- Valuation systems
- Modulating variables
3Visual-Saccadic Decision Making
4The Neurobiology of Visual-Saccadic Decision
Making
- System that is becoming a model for understanding
decision making in general - How are decisions, the computational events that
connect sensory data and a stored representation
of the structure of the world with behavior,
accomplished at a mechanistic level? - Decision making connection between sensation
and action - Within material body or nonmaterial sole?
- Refelxes as a simple model stereotyped sensory
event triggers a fixed motor response - How a visual stimulus might be used to trigger an
eye movement - Glimcher, Annu. Rev. Neurosci. 2003
5The Neurobiology of Visual-Saccadic Decision
Making
Decartes, 1664
6Random Dot Motion task
- Monkeys are shown a patch of moving dots
- Most of the dots are moving randomly, but a
fraction of them has a coherent direction of
motion - The animal has to indicate by an eye movement
which of two targets reflects the coherent motion - Subjects are rewarded a fixed amount for correct
answers and nothing for incorrect answers - Newsome, Nature 1989
7Random Dot Motion task
8Random Dot Motion taskNeural Basis
- The likelihood that the monkey would make a
rightward saccade is a function of the firing
rate of the rightward motion-preferring neurons
in area MT (middle temporal area) - MT neurons respond stongly to high levels of
correlated dot motion in their preferred
direction - Electrical activation of neurons in area MT could
alter the probability that the animals would
produce a particular saccade - Suggests, that movements of the animals could be
predicted from activity in area MT - Newsome, Nature 1989
- Salzman, Nature 1990
9Concrete and quantitative description of a
complete neural circuit within the primate brain
that could model a simple decision that was
driven by sensory data
Shadlen, PNAS 1996
10Decison-making circuit in the visual saccadic
sytem
- Neurons of the visual cortices appear to act as
receptor system - Sensory signal is passed to the parietal cortex,
which combines information across time - Signal is passed to the frontal eye field
- A movement occurs when this network produces
activity in frontal eye field neurons that
crosses a biophsysical threshold - Glimcher, Annu. Rev. Neurosci. 2003
- Shadlen, J. Neurophysiol. 2001
11Predicted properties of the integrative element
- Rightward integrators would show a gradual
increase in activity during the motion stimulus
if the monkey would decide to make a rightward
movement - The higher the fraction of dots moving rightward,
the faster the activity should grow within the
rightward integrative element - If there is no net motion signal, the integrative
element should still predict the upcoming decision
Shadlen, PNAS 1996
12Lateral intraparietal area
- Likely site of integrative element,
- Recieves projections from extrastriate visual
areas like MT - Projects to the frontal eye field
- Neurons were shown to be most active before
movements having particular amplitudes and
directions - Neurons express a prestimulus bias toward one
direction if there was 0 correlation of dot
motion, that correlated with the behavior of the
animals
13RDM vs value driven choice
- Subjects care about the right choice because it
affects the reward they get - The value of making an eye movement size of the
reward x probability of correct direction - Subjects can estimate the action value using
their senses to measure the net direction of
motion in the stimulus display
14Models of perceptual decision making
- Provide a unified theory of value computation,
action selection, and reaction time - Make stark behavioural predictions about how the
psychometric choice function and the reaction
time distribution should be affected by the
coherence the stimulus - Make predictions about which decision variables
should be encoded in the brain and how they are
computed - Contain processes that encode the
moment-to-moment sensory evidence in support of
the potential state of the world - Accumulator process associated with the two
states that integrates the cumulative sensory
evidence which is recieved in its favor - Contain a criterion that needs to be satisfied
for a choice to be made - Limitation Only apply to the cases of two
alternatives
15Rise-to-threshold-accumulator models
- These models aim to provide a computational
abstraction of a biophysically conceivable
mechanism that explains saccade latencies and
their variability across trials - In saccadic decision making with a fixed set of
potential target locations, the model assumes
that subjects maintain a set of hypotheses each
of which corresponds to one such location - As the stimulus appears, a measure of evidence
for each of these hypothesis is continuously
refined, implemented as a competition between
alternative decision signals in the brain - Gold, Trends in Cognitive Sciences 2001
- Gold, Neuron 2002
- Glimcher, Annual Review of Neuroscience 2003
16Rise-to threshold-accumulator models
- Depending on the way in which information is
assumed to be accumulated over time, two models
are distinguished - Random-walk
- Linear rise-to-threshold
17Random-walk modelProperties
- It pedicts
- a logistic relationship between choice and the
extent to which the dots are moving in a coherent
direction - that the distribution of reaction times is
right-skewed, and that reaction times are longer
for error trials - that reaction times decrease with stimulus
coherence and that accuracy improves - It implements a sequential probability ratio test
- Key variable is the measure of relative evidence
- Postive values of this variable denote that the
estimation process favors the right decision - Process starts at a midline and stops when the
variable crosses a threshold - Size of every step is given by a Gaussian
distribution with a mean that is proportional to
the true direction of motion
18Random-walk models
- Based on a sequential probability ratio test that
is beeing carried out continually - Each new incoming piece of sensory evidence
either increases or decreases a signal decision
variable until it has drifted beyond a threshold
associated with the saccadic movement towards a
particular target - The decision variable represents the relative
evidence for the two alternatives - Ratcliff, Psychological Review 1999
- Ratcliff, Psychological Review 2004
- Ratcliff, Psychological Science 1998
19Random-walk model
Smith, Trend in Neurosci. 2004
20Random-walk modelBehavioral evidence
- In a human version of the task, the results are
as predicted by the model - Choices are a logistic function of stimulus
strength - Reaction times decrease with stimulus strength
- Response times are longer in error than in
correct trials - Palmer, J. Vision 2005
21Random-walk modelNeurobiological evidence
Smith, Trends in Neurosci 2004
22Random-walk modelNeurobiological evidence
- Sensory process
- Set of processes capturing the incoming sensory
evidence - Direction of motion encoded in middle temporal
area (MT) - Encodes the motion properties of objects in the
visual field - Organized in response fields that respond
preferntially to stimuli moving in a certain
direction - Born, Annu. Rev. Neurosci. 2005
- Britten, Vis. Neurosci. 1993
23Random-walk modelNeurobiological evidence
- Accumulator signal
- Measuring the amount of net evidence in favor of
the two alternatives - Lateral intraparietal area (LIP)
- Acitivity resembles the accumulation of sensory
signal predictes by the model - Organized in response fields, that respond to eye
movement in a particular direction - For trials in which a choice is made into the
response field, firing rates are increasing on
stimulus coherence - When a choice is made into a response field, all
of the time-courses rise to a single threshold
about 100ms before the initiation of the saccade - Shadlen, PNAS 1999
- Shadlen, J. Neurophys. 2001
24Lateral intraparietal area
25Random-walk modelNeurobiological evidence
- Lateral intraparietal area (LIP)
- Microstimulation of LIP neurons can bias the
proportion of choices that are made towards the
stimulated response field - Stimulation generated faster raction times, when
movements were made into the neurons response
field - Hanks Nat. Neurosci 2006
26Random-walk modelNeurobiological evidence
- Neural Implementation of threshold
- LIP neurons exhibit a threshold like behaviour
- Time pressure leads to a decrease in the size of
the threshold Smith Trends Neurosci. 2004 - Natural candidate would be the superior
colliculus - Recieves visual input from LIP and other cortex
areas - Sends motor signals ito the brainstem and spinal
cord - Contains burst neurons that fire just before a
saccade is initiated - Lo and Wang, Nat. Neurosci. 2006
27LATERlinear approach to threshold with ergodic
rate
- Randomness is introduced as trial-by-trial
changes in the otherwise constant rate of rise of
the decison signal - Saccade towards a target is elicited as soon as a
neural decision signal has reached a particular
threshold - Assumes a fixed theshold and a linear increase
whose rate is subject to variation across trials,
yet fixed within a given trial
28LATER
- Threshold for saccade release seems to be
constant, whereas the slope of the rise in
activity varied considerably across trials - Observed saccadic latency was a function of the
log probability of the corresponding target
location - The more likely the target location, the shorter
the latency - Assumes that learned a priori target
probabilities determine the baseline levels of
the decision signals, but not their rates of rise - Carpenter, Nature 1995
- Reddi, Nat. Neuosci. 2000
- Reddi, Journal of Neurophysiology 2003
29LATERLimitations
- Has only been applied to saccade-to-target
situations where no leaning took place - During learning, the baseline levels of decision
signals are expected to change across trials - Only accounts for simple marginal probabilities
- Does not account for higher-order contingencies
- No generative model has been proposed within
linear rise to threshold models that would allow
for statistical inference about parameter
estimates and for model comparison
30Variability in reaction times
Hanes, Science 1996
31Prior probability and reward values in the
parietal cortex
32Prior probability and reward values in the
parietal cortex
33Prior probability and reward values in the
parietal cortex
- LIP
- Neurons appear to encode the prior probability
that the movement will be reinforced across
blocks - Seems to carry information related to the
instantaneous likelihood that the movement will
be reinforced - Neurons encode the value of a movement even when
the sensory and motor properties are held
constant - Relative magnitude of the leftward and rightward
rewards are linearly correlated with LIP firing
rates - Platt, Nature 1999
34Value-based decision making
35Value-based decision making
- Choice from several alternatives on the basis of
a subjective value that is placed on them - Rational decision making mechanism that combines
the likelihood and magnitude of a gain to
determine the value of a course of action (Blaise
Pascal) - Three components
- Decision-making computation
- Valuation systems
- Modulating variables
- Rangel A., Nat Rev Neurosci. 2008
36Decision-making Computations
37Representation
- Representation of the decision problem
- Identification of the potential courses of action
- Identifying internal states and external states
38Valuation
- Different actions that are under consideration
need to be assigned a value - Values have to be reliable predictors of the
benefits that result from the each action
39Action selection
- Options with different values need to be compared
in order to make a decision - In real life, the true value of each action
candidate is rarely known - Conflict between valuation systems
- Brain assignes the system that has the less
uncertain estimate of the true value of action - Habit systems should gradually take over from
goal-directed systems, as the quality of
estimates increases with experience - Control assignment problem could be respnsible
for puzzling behaviours or decision making
pathologies (OCD)
40Outcome evaluation
- After implementing the decision, the desirability
of the outcomes need to be measured - Acitvity in medial OFC could compute positive
outcome valuations
41Learning
- Feedback measures are used to update the other
processes to improve the quality of future
decisions - Goals
- most advantageous behaviours in every state
- Valuation sytems must learn to assign values to
actions that match their anticipated rewards - Prediction error
- measures the qualitiy of the forecast
- Value of actions is changed by an amount that is
proportional to the prediction error - Animal learns to assign the correct value to
actions - BOLD signal in the ventral striatum correlates
with prediction errors -
42Valuation systems
- Useful operational divisions of the valuation
problem according to the style of the
computations that are performed by each - Three proposed types
- Pavlovian systems
- Habitual systems
- Goal-directed systems
43Pavlovian systems
- Assign values to a small set of behaviours that
are evolutionary appropriate responses to
particular environmental stimuli - Innate responses to specific predetermined
stimuli - Examples preparatory behaviours, consummatory
responses to a reward, aversive stimuli lead to
avoidance behaviour - Human behaviors that might be controled by the
Pavlovian system - Overeating in the presence of food
- Obsessive-complusive disorders
- Harvesting immediate smaller rewards at the
expense of delayed larger rewards
44Pavlovian systems Neural bases
- Pavlovian response to negative stimuli along the
axis of the dorsal periaqueductal grey - Amygdala has been shown to play a crucial role in
influencing some Pavlovian responses
45Habit systems
- Can learn, through repeated training, to assign
values to a large number of actions - Habit system can also learn from observing the
outcomes of actions that it did not take - Characteristics
- Learn to assign value to stimulus-response
associations on the basis of previous experience
through a process of tiral-and-error - Learn to assign a value to actions that is
commensurate with the expected reward - Learns slowly because of tiral-and-error approach
- Might forecast the value of actions incorrectly
immediately after a change in the action-reward
contingencies - Rely on generalizations
46Habit systems
- Dorsolateral striatum might play a crucial role
in the control of habits - Projections of dopamine neurons into this area
are believed to be important for learning the
values of actions
47Goal-directed systems
- Capable of computing values in novel situations
and in environments with rapidly changing
action-outcome contingencies - Assigns values to actions by computing
action-outcome associations and then evaluating
the rewards of the different outcomes - The value that is assigned to an actions should
equal the average reward to which it might lead - Updates the value of an action as soon as the
value of its outcome changes, whereas the habit
system does not
48How does the brain compute and compare values?
- In order to make good decisions, organisms need
to assign values to actions that are commensurate
with the level of reward that they generate - Many models assume that the brain can flawlessly
and instantaneously assign appropriate values to
actions - Organisms are susceptible to the influence of
environmental variables that interfere with their
ability to compute accurate values or select the
best actions - Damage to various neural systems may alter the
way the brain assigns and compares values
contributing to psychiatric disorders
49How does the brain compute and compare values?
- Needs to store action-outcome and outcome-value
associations - Requires a large amount of accurate information
to forecast values correctly - Dorsomedial striatum might have a role in
action-outcome associations - OFC might encode outcome-value associations
50Modulating variables
- Action-outcome associations are probabilistic
- Goal-directed systems need to take into account
the likelihood of the different outcomes - Value modulators
- Factors, which affect the value of an action
- Examples riskiness and delay of
payoffs, social context - Might have different effects in each of the
valuation systems
51Hypothetical model of realization of
reinforcement learning in the cortex-basal
ganglia network
d reward prediction error
Doya, Nat. Neurosci 2008
52Factors that affect decisions and learning
- Needs and desires
- Utility curve should relect the decision makers
physiologic or economic needs - Sigmoid shape with saturation and threshold
- Different desires lead to different thresholds of
nonlinear valuation - Devaluation Flattening of the curve for instance
by satiety
Doya, Nat. Neurosci 2008
53Factors that affect decisions and learning
- Risk and uncertainty
- buying insurance is supposed to be a rational
behaviour, even though it leads to loss on
average. The main reason for buying insurance is
to improve tha value of the worst case outcome. - Min-max evaluation minimize the maximal
punishment or to maximize the minimal possible
reward - Types of uncertainties from
- the stochasticity inherent in the environmental
dynamics - Unexpected variation of the environment
- Limited knowledge
54Factors that affect decisions and learning
- Time spent and time remaining
- How fast should one learn from new experiences
and how stably old knowledge should be retained - Constant environment start with rapid memory
updating and than decay the learning rate as an
inverse of the number of experiences - Changing environment learning rate should depend
on the estimate of the time for which the past
experiences remain valid - Exclusiveness of commitment
- In deciding between action with less than average
rewards, an action with smaller reward in shorter
time can be more appropiate because it allows
moving on to the next action earlier
55Time discounting
- Goal-directed and habitual systems assign lower
values to delayed rewards than to immediate ones - Incorporation of the timing of rewards
- Dual-process models interaction of at least two
different neural valuation systems one with a low
and one with a high discount rate - Single valuation system that discounts future
rewards either exponentially or hyperbolically
56Neural Basis of Modulating Variables
- Expectation of a high reward motivates subjects
to choose actions despite a large cost, for which
dopamine in the ACC is important - Uncertainty of action outcomes can promote risk
taking and exploratory choices, in which
norepinephrine and OFC seems to be involved - Predictable environments promote consideration of
longer delayed rewards, for which serotonin,
dorsal striatum, and dorsal prefrontal cortex are
key
57Neural substrates modulating decison making
DLPFC dorsolateral prefrontal cortex VS Ventral
Striatum DS Dorsal Striatum Doya, Nat.
Neurosci 2008
58Potential directions
- Psychiatry disorders involve failure of decision
making processes - Legal evaluation of full command of decision
making faculties - Failures of self-control addiction, obesity
- Economy effects of marketing on decisions
- Artificial intelligence which features of
decision-making should be imitated - Personal Training individuals to become better
decision makers