Models of decisionmaking

About This Presentation

Title:

Models of decisionmaking

Description:

System that is becoming a model for understanding decision making in general ... Hanes, Science 1996. Prior probability and reward values in the parietal cortex ... – PowerPoint PPT presentation

Number of Views:78

Avg rating:3.0/5.0

Slides: 59

Provided by: Tino

more less

Transcript and Presenter's Notes

Title: Models of decisionmaking

1
Models of decision-making

Sommerakademie St. Johann
8. September 2009
Konstantin Stark
TU München

2
Overview

Visual-Saccadic Decision Making
Random Dot Motion task
Rise-to-threshold-accumulator models
Random-walk
Linear rise-to-threshold
Value-based Decision Making
Decision-making computation
Valuation systems
Modulating variables

3
Visual-Saccadic Decision Making
4
The Neurobiology of Visual-Saccadic Decision
Making

System that is becoming a model for understanding
decision making in general
How are decisions, the computational events that
connect sensory data and a stored representation
of the structure of the world with behavior,
accomplished at a mechanistic level?
Decision making connection between sensation
and action
Within material body or nonmaterial sole?
Refelxes as a simple model stereotyped sensory
event triggers a fixed motor response
How a visual stimulus might be used to trigger an
eye movement
Glimcher, Annu. Rev. Neurosci. 2003

5
The Neurobiology of Visual-Saccadic Decision
Making
Decartes, 1664
6
Random Dot Motion task

Monkeys are shown a patch of moving dots
Most of the dots are moving randomly, but a
fraction of them has a coherent direction of
motion
The animal has to indicate by an eye movement
which of two targets reflects the coherent motion
Subjects are rewarded a fixed amount for correct
answers and nothing for incorrect answers
Newsome, Nature 1989

7
Random Dot Motion task
8
Random Dot Motion taskNeural Basis

The likelihood that the monkey would make a
rightward saccade is a function of the firing
rate of the rightward motion-preferring neurons
in area MT (middle temporal area)
MT neurons respond stongly to high levels of
correlated dot motion in their preferred
direction
Electrical activation of neurons in area MT could
alter the probability that the animals would
produce a particular saccade
Suggests, that movements of the animals could be
predicted from activity in area MT
Newsome, Nature 1989
Salzman, Nature 1990

9
Concrete and quantitative description of a
complete neural circuit within the primate brain
that could model a simple decision that was
driven by sensory data
Shadlen, PNAS 1996
10
Decison-making circuit in the visual saccadic
sytem

Neurons of the visual cortices appear to act as
receptor system
Sensory signal is passed to the parietal cortex,
which combines information across time
Signal is passed to the frontal eye field
A movement occurs when this network produces
activity in frontal eye field neurons that
crosses a biophsysical threshold
Glimcher, Annu. Rev. Neurosci. 2003
Shadlen, J. Neurophysiol. 2001

11
Predicted properties of the integrative element

Rightward integrators would show a gradual
increase in activity during the motion stimulus
if the monkey would decide to make a rightward
movement
The higher the fraction of dots moving rightward,
the faster the activity should grow within the
rightward integrative element
If there is no net motion signal, the integrative
element should still predict the upcoming decision

Shadlen, PNAS 1996
12
Lateral intraparietal area

Likely site of integrative element,
Recieves projections from extrastriate visual
areas like MT
Projects to the frontal eye field
Neurons were shown to be most active before
movements having particular amplitudes and
directions
Neurons express a prestimulus bias toward one
direction if there was 0 correlation of dot
motion, that correlated with the behavior of the
animals

13
RDM vs value driven choice

Subjects care about the right choice because it
affects the reward they get
The value of making an eye movement size of the
reward x probability of correct direction
Subjects can estimate the action value using
their senses to measure the net direction of
motion in the stimulus display

14
Models of perceptual decision making

Provide a unified theory of value computation,
action selection, and reaction time
Make stark behavioural predictions about how the
psychometric choice function and the reaction
time distribution should be affected by the
coherence the stimulus
Make predictions about which decision variables
should be encoded in the brain and how they are
computed
Contain processes that encode the
moment-to-moment sensory evidence in support of
the potential state of the world
Accumulator process associated with the two
states that integrates the cumulative sensory
evidence which is recieved in its favor
Contain a criterion that needs to be satisfied
for a choice to be made
Limitation Only apply to the cases of two
alternatives

15
Rise-to-threshold-accumulator models

These models aim to provide a computational
abstraction of a biophysically conceivable
mechanism that explains saccade latencies and
their variability across trials
In saccadic decision making with a fixed set of
potential target locations, the model assumes
that subjects maintain a set of hypotheses each
of which corresponds to one such location
As the stimulus appears, a measure of evidence
for each of these hypothesis is continuously
refined, implemented as a competition between
alternative decision signals in the brain
Gold, Trends in Cognitive Sciences 2001
Gold, Neuron 2002
Glimcher, Annual Review of Neuroscience 2003

16
Rise-to threshold-accumulator models

Depending on the way in which information is
assumed to be accumulated over time, two models
are distinguished
Random-walk
Linear rise-to-threshold

17
Random-walk modelProperties

It pedicts
a logistic relationship between choice and the
extent to which the dots are moving in a coherent
direction
that the distribution of reaction times is
right-skewed, and that reaction times are longer
for error trials
that reaction times decrease with stimulus
coherence and that accuracy improves
It implements a sequential probability ratio test
Key variable is the measure of relative evidence
Postive values of this variable denote that the
estimation process favors the right decision
Process starts at a midline and stops when the
variable crosses a threshold
Size of every step is given by a Gaussian
distribution with a mean that is proportional to
the true direction of motion

18
Random-walk models

Based on a sequential probability ratio test that
is beeing carried out continually
Each new incoming piece of sensory evidence
either increases or decreases a signal decision
variable until it has drifted beyond a threshold
associated with the saccadic movement towards a
particular target
The decision variable represents the relative
evidence for the two alternatives
Ratcliff, Psychological Review 1999
Ratcliff, Psychological Review 2004
Ratcliff, Psychological Science 1998

19
Random-walk model
Smith, Trend in Neurosci. 2004
20
Random-walk modelBehavioral evidence

In a human version of the task, the results are
as predicted by the model
Choices are a logistic function of stimulus
strength
Reaction times decrease with stimulus strength
Response times are longer in error than in
correct trials
Palmer, J. Vision 2005

21
Random-walk modelNeurobiological evidence
Smith, Trends in Neurosci 2004
22
Random-walk modelNeurobiological evidence

Sensory process
Set of processes capturing the incoming sensory
evidence
Direction of motion encoded in middle temporal
area (MT)
Encodes the motion properties of objects in the
visual field
Organized in response fields that respond
preferntially to stimuli moving in a certain
direction
Born, Annu. Rev. Neurosci. 2005
Britten, Vis. Neurosci. 1993

23
Random-walk modelNeurobiological evidence

Accumulator signal
Measuring the amount of net evidence in favor of
the two alternatives
Lateral intraparietal area (LIP)
Acitivity resembles the accumulation of sensory
signal predictes by the model
Organized in response fields, that respond to eye
movement in a particular direction
For trials in which a choice is made into the
response field, firing rates are increasing on
stimulus coherence
When a choice is made into a response field, all
of the time-courses rise to a single threshold
about 100ms before the initiation of the saccade
Shadlen, PNAS 1999
Shadlen, J. Neurophys. 2001

24
Lateral intraparietal area
25
Random-walk modelNeurobiological evidence

Lateral intraparietal area (LIP)
Microstimulation of LIP neurons can bias the
proportion of choices that are made towards the
stimulated response field
Stimulation generated faster raction times, when
movements were made into the neurons response
field
Hanks Nat. Neurosci 2006

26
Random-walk modelNeurobiological evidence

Neural Implementation of threshold
LIP neurons exhibit a threshold like behaviour
Time pressure leads to a decrease in the size of
the threshold Smith Trends Neurosci. 2004
Natural candidate would be the superior
colliculus
Recieves visual input from LIP and other cortex
areas
Sends motor signals ito the brainstem and spinal
cord
Contains burst neurons that fire just before a
saccade is initiated
Lo and Wang, Nat. Neurosci. 2006

27
LATERlinear approach to threshold with ergodic
rate

Randomness is introduced as trial-by-trial
changes in the otherwise constant rate of rise of
the decison signal
Saccade towards a target is elicited as soon as a
neural decision signal has reached a particular
threshold
Assumes a fixed theshold and a linear increase
whose rate is subject to variation across trials,
yet fixed within a given trial

28
LATER

Threshold for saccade release seems to be
constant, whereas the slope of the rise in
activity varied considerably across trials
Observed saccadic latency was a function of the
log probability of the corresponding target
location
The more likely the target location, the shorter
the latency
Assumes that learned a priori target
probabilities determine the baseline levels of
the decision signals, but not their rates of rise
Carpenter, Nature 1995
Reddi, Nat. Neuosci. 2000
Reddi, Journal of Neurophysiology 2003

29
LATERLimitations

Has only been applied to saccade-to-target
situations where no leaning took place
During learning, the baseline levels of decision
signals are expected to change across trials
Only accounts for simple marginal probabilities
Does not account for higher-order contingencies
No generative model has been proposed within
linear rise to threshold models that would allow
for statistical inference about parameter
estimates and for model comparison

30
Variability in reaction times
Hanes, Science 1996
31
Prior probability and reward values in the
parietal cortex
32
Prior probability and reward values in the
parietal cortex
33
Prior probability and reward values in the
parietal cortex

LIP
Neurons appear to encode the prior probability
that the movement will be reinforced across
blocks
Seems to carry information related to the
instantaneous likelihood that the movement will
be reinforced
Neurons encode the value of a movement even when
the sensory and motor properties are held
constant
Relative magnitude of the leftward and rightward
rewards are linearly correlated with LIP firing
rates
Platt, Nature 1999

34
Value-based decision making
35
Value-based decision making

Choice from several alternatives on the basis of
a subjective value that is placed on them
Rational decision making mechanism that combines
the likelihood and magnitude of a gain to
determine the value of a course of action (Blaise
Pascal)
Three components
Decision-making computation
Valuation systems
Modulating variables
Rangel A., Nat Rev Neurosci. 2008

36
Decision-making Computations
37
Representation

Representation of the decision problem
Identification of the potential courses of action
Identifying internal states and external states

38
Valuation

Different actions that are under consideration
need to be assigned a value
Values have to be reliable predictors of the
benefits that result from the each action

39
Action selection

Options with different values need to be compared
in order to make a decision
In real life, the true value of each action
candidate is rarely known
Conflict between valuation systems
Brain assignes the system that has the less
uncertain estimate of the true value of action
Habit systems should gradually take over from
goal-directed systems, as the quality of
estimates increases with experience
Control assignment problem could be respnsible
for puzzling behaviours or decision making
pathologies (OCD)

40
Outcome evaluation

After implementing the decision, the desirability
of the outcomes need to be measured
Acitvity in medial OFC could compute positive
outcome valuations

41
Learning

Feedback measures are used to update the other
processes to improve the quality of future
decisions
Goals
most advantageous behaviours in every state
Valuation sytems must learn to assign values to
actions that match their anticipated rewards
Prediction error
measures the qualitiy of the forecast
Value of actions is changed by an amount that is
proportional to the prediction error
Animal learns to assign the correct value to
actions
BOLD signal in the ventral striatum correlates
with prediction errors

42
Valuation systems

Useful operational divisions of the valuation
problem according to the style of the
computations that are performed by each
Three proposed types
Pavlovian systems
Habitual systems
Goal-directed systems

43
Pavlovian systems

Assign values to a small set of behaviours that
are evolutionary appropriate responses to
particular environmental stimuli
Innate responses to specific predetermined
stimuli
Examples preparatory behaviours, consummatory
responses to a reward, aversive stimuli lead to
avoidance behaviour
Human behaviors that might be controled by the
Pavlovian system
Overeating in the presence of food
Obsessive-complusive disorders
Harvesting immediate smaller rewards at the
expense of delayed larger rewards

44
Pavlovian systems Neural bases

Pavlovian response to negative stimuli along the
axis of the dorsal periaqueductal grey
Amygdala has been shown to play a crucial role in
influencing some Pavlovian responses

45
Habit systems

Can learn, through repeated training, to assign
values to a large number of actions
Habit system can also learn from observing the
outcomes of actions that it did not take
Characteristics
Learn to assign value to stimulus-response
associations on the basis of previous experience
through a process of tiral-and-error
Learn to assign a value to actions that is
commensurate with the expected reward
Learns slowly because of tiral-and-error approach
Might forecast the value of actions incorrectly
immediately after a change in the action-reward
contingencies
Rely on generalizations

46
Habit systems

Dorsolateral striatum might play a crucial role
in the control of habits
Projections of dopamine neurons into this area
are believed to be important for learning the
values of actions

47
Goal-directed systems

Capable of computing values in novel situations
and in environments with rapidly changing
action-outcome contingencies
Assigns values to actions by computing
action-outcome associations and then evaluating
the rewards of the different outcomes
The value that is assigned to an actions should
equal the average reward to which it might lead
Updates the value of an action as soon as the
value of its outcome changes, whereas the habit
system does not

48
How does the brain compute and compare values?

In order to make good decisions, organisms need
to assign values to actions that are commensurate
with the level of reward that they generate
Many models assume that the brain can flawlessly
and instantaneously assign appropriate values to
actions
Organisms are susceptible to the influence of
environmental variables that interfere with their
ability to compute accurate values or select the
best actions
Damage to various neural systems may alter the
way the brain assigns and compares values
contributing to psychiatric disorders

49
How does the brain compute and compare values?

Needs to store action-outcome and outcome-value
associations
Requires a large amount of accurate information
to forecast values correctly
Dorsomedial striatum might have a role in
action-outcome associations
OFC might encode outcome-value associations

50
Modulating variables

Action-outcome associations are probabilistic
Goal-directed systems need to take into account
the likelihood of the different outcomes
Value modulators
Factors, which affect the value of an action
Examples riskiness and delay of
payoffs, social context
Might have different effects in each of the
valuation systems

51
Hypothetical model of realization of
reinforcement learning in the cortex-basal
ganglia network
d reward prediction error
Doya, Nat. Neurosci 2008
52
Factors that affect decisions and learning

Needs and desires
Utility curve should relect the decision makers
physiologic or economic needs
Sigmoid shape with saturation and threshold
Different desires lead to different thresholds of
nonlinear valuation
Devaluation Flattening of the curve for instance
by satiety

Doya, Nat. Neurosci 2008
53
Factors that affect decisions and learning

Risk and uncertainty
buying insurance is supposed to be a rational
behaviour, even though it leads to loss on
average. The main reason for buying insurance is
to improve tha value of the worst case outcome.
Min-max evaluation minimize the maximal
punishment or to maximize the minimal possible
reward
Types of uncertainties from
the stochasticity inherent in the environmental
dynamics
Unexpected variation of the environment
Limited knowledge

54
Factors that affect decisions and learning

Time spent and time remaining
How fast should one learn from new experiences
and how stably old knowledge should be retained
Constant environment start with rapid memory
updating and than decay the learning rate as an
inverse of the number of experiences
Changing environment learning rate should depend
on the estimate of the time for which the past
experiences remain valid
Exclusiveness of commitment
In deciding between action with less than average
rewards, an action with smaller reward in shorter
time can be more appropiate because it allows
moving on to the next action earlier

55
Time discounting

Goal-directed and habitual systems assign lower
values to delayed rewards than to immediate ones
Incorporation of the timing of rewards
Dual-process models interaction of at least two
different neural valuation systems one with a low
and one with a high discount rate
Single valuation system that discounts future
rewards either exponentially or hyperbolically

56
Neural Basis of Modulating Variables

Expectation of a high reward motivates subjects
to choose actions despite a large cost, for which
dopamine in the ACC is important
Uncertainty of action outcomes can promote risk
taking and exploratory choices, in which
norepinephrine and OFC seems to be involved
Predictable environments promote consideration of
longer delayed rewards, for which serotonin,
dorsal striatum, and dorsal prefrontal cortex are
key

57
Neural substrates modulating decison making
DLPFC dorsolateral prefrontal cortex VS Ventral
Striatum DS Dorsal Striatum Doya, Nat.
Neurosci 2008
58
Potential directions