Models of decisionmaking

1 / 58
About This Presentation
Title:

Models of decisionmaking

Description:

System that is becoming a model for understanding decision making in general ... Hanes, Science 1996. Prior probability and reward values in the parietal cortex ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 59
Provided by: Tino

less

Transcript and Presenter's Notes

Title: Models of decisionmaking


1
Models of decision-making
  • Sommerakademie St. Johann
  • 8. September 2009
  • Konstantin Stark
  • TU München

2
Overview
  • Visual-Saccadic Decision Making
  • Random Dot Motion task
  • Rise-to-threshold-accumulator models
  • Random-walk
  • Linear rise-to-threshold
  • Value-based Decision Making
  • Decision-making computation
  • Valuation systems
  • Modulating variables

3
Visual-Saccadic Decision Making
4
The Neurobiology of Visual-Saccadic Decision
Making
  • System that is becoming a model for understanding
    decision making in general
  • How are decisions, the computational events that
    connect sensory data and a stored representation
    of the structure of the world with behavior,
    accomplished at a mechanistic level?
  • Decision making connection between sensation
    and action
  • Within material body or nonmaterial sole?
  • Refelxes as a simple model stereotyped sensory
    event triggers a fixed motor response
  • How a visual stimulus might be used to trigger an
    eye movement
  • Glimcher, Annu. Rev. Neurosci. 2003

5
The Neurobiology of Visual-Saccadic Decision
Making
Decartes, 1664
6
Random Dot Motion task
  • Monkeys are shown a patch of moving dots
  • Most of the dots are moving randomly, but a
    fraction of them has a coherent direction of
    motion
  • The animal has to indicate by an eye movement
    which of two targets reflects the coherent motion
  • Subjects are rewarded a fixed amount for correct
    answers and nothing for incorrect answers
  • Newsome, Nature 1989

7
Random Dot Motion task
8
Random Dot Motion taskNeural Basis
  • The likelihood that the monkey would make a
    rightward saccade is a function of the firing
    rate of the rightward motion-preferring neurons
    in area MT (middle temporal area)
  • MT neurons respond stongly to high levels of
    correlated dot motion in their preferred
    direction
  • Electrical activation of neurons in area MT could
    alter the probability that the animals would
    produce a particular saccade
  • Suggests, that movements of the animals could be
    predicted from activity in area MT
  • Newsome, Nature 1989
  • Salzman, Nature 1990

9
Concrete and quantitative description of a
complete neural circuit within the primate brain
that could model a simple decision that was
driven by sensory data
Shadlen, PNAS 1996
10
Decison-making circuit in the visual saccadic
sytem
  • Neurons of the visual cortices appear to act as
    receptor system
  • Sensory signal is passed to the parietal cortex,
    which combines information across time
  • Signal is passed to the frontal eye field
  • A movement occurs when this network produces
    activity in frontal eye field neurons that
    crosses a biophsysical threshold
  • Glimcher, Annu. Rev. Neurosci. 2003
  • Shadlen, J. Neurophysiol. 2001

11
Predicted properties of the integrative element
  • Rightward integrators would show a gradual
    increase in activity during the motion stimulus
    if the monkey would decide to make a rightward
    movement
  • The higher the fraction of dots moving rightward,
    the faster the activity should grow within the
    rightward integrative element
  • If there is no net motion signal, the integrative
    element should still predict the upcoming decision

Shadlen, PNAS 1996
12
Lateral intraparietal area
  • Likely site of integrative element,
  • Recieves projections from extrastriate visual
    areas like MT
  • Projects to the frontal eye field
  • Neurons were shown to be most active before
    movements having particular amplitudes and
    directions
  • Neurons express a prestimulus bias toward one
    direction if there was 0 correlation of dot
    motion, that correlated with the behavior of the
    animals

13
RDM vs value driven choice
  • Subjects care about the right choice because it
    affects the reward they get
  • The value of making an eye movement size of the
    reward x probability of correct direction
  • Subjects can estimate the action value using
    their senses to measure the net direction of
    motion in the stimulus display

14
Models of perceptual decision making
  • Provide a unified theory of value computation,
    action selection, and reaction time
  • Make stark behavioural predictions about how the
    psychometric choice function and the reaction
    time distribution should be affected by the
    coherence the stimulus
  • Make predictions about which decision variables
    should be encoded in the brain and how they are
    computed
  • Contain processes that encode the
    moment-to-moment sensory evidence in support of
    the potential state of the world
  • Accumulator process associated with the two
    states that integrates the cumulative sensory
    evidence which is recieved in its favor
  • Contain a criterion that needs to be satisfied
    for a choice to be made
  • Limitation Only apply to the cases of two
    alternatives

15
Rise-to-threshold-accumulator models
  • These models aim to provide a computational
    abstraction of a biophysically conceivable
    mechanism that explains saccade latencies and
    their variability across trials
  • In saccadic decision making with a fixed set of
    potential target locations, the model assumes
    that subjects maintain a set of hypotheses each
    of which corresponds to one such location
  • As the stimulus appears, a measure of evidence
    for each of these hypothesis is continuously
    refined, implemented as a competition between
    alternative decision signals in the brain
  • Gold, Trends in Cognitive Sciences 2001
  • Gold, Neuron 2002
  • Glimcher, Annual Review of Neuroscience 2003

16
Rise-to threshold-accumulator models
  • Depending on the way in which information is
    assumed to be accumulated over time, two models
    are distinguished
  • Random-walk
  • Linear rise-to-threshold

17
Random-walk modelProperties
  • It pedicts
  • a logistic relationship between choice and the
    extent to which the dots are moving in a coherent
    direction
  • that the distribution of reaction times is
    right-skewed, and that reaction times are longer
    for error trials
  • that reaction times decrease with stimulus
    coherence and that accuracy improves
  • It implements a sequential probability ratio test
  • Key variable is the measure of relative evidence
  • Postive values of this variable denote that the
    estimation process favors the right decision
  • Process starts at a midline and stops when the
    variable crosses a threshold
  • Size of every step is given by a Gaussian
    distribution with a mean that is proportional to
    the true direction of motion

18
Random-walk models
  • Based on a sequential probability ratio test that
    is beeing carried out continually
  • Each new incoming piece of sensory evidence
    either increases or decreases a signal decision
    variable until it has drifted beyond a threshold
    associated with the saccadic movement towards a
    particular target
  • The decision variable represents the relative
    evidence for the two alternatives
  • Ratcliff, Psychological Review 1999
  • Ratcliff, Psychological Review 2004
  • Ratcliff, Psychological Science 1998

19
Random-walk model
Smith, Trend in Neurosci. 2004
20
Random-walk modelBehavioral evidence
  • In a human version of the task, the results are
    as predicted by the model
  • Choices are a logistic function of stimulus
    strength
  • Reaction times decrease with stimulus strength
  • Response times are longer in error than in
    correct trials
  • Palmer, J. Vision 2005

21
Random-walk modelNeurobiological evidence
Smith, Trends in Neurosci 2004
22
Random-walk modelNeurobiological evidence
  • Sensory process
  • Set of processes capturing the incoming sensory
    evidence
  • Direction of motion encoded in middle temporal
    area (MT)
  • Encodes the motion properties of objects in the
    visual field
  • Organized in response fields that respond
    preferntially to stimuli moving in a certain
    direction
  • Born, Annu. Rev. Neurosci. 2005
  • Britten, Vis. Neurosci. 1993

23
Random-walk modelNeurobiological evidence
  • Accumulator signal
  • Measuring the amount of net evidence in favor of
    the two alternatives
  • Lateral intraparietal area (LIP)
  • Acitivity resembles the accumulation of sensory
    signal predictes by the model
  • Organized in response fields, that respond to eye
    movement in a particular direction
  • For trials in which a choice is made into the
    response field, firing rates are increasing on
    stimulus coherence
  • When a choice is made into a response field, all
    of the time-courses rise to a single threshold
    about 100ms before the initiation of the saccade
  • Shadlen, PNAS 1999
  • Shadlen, J. Neurophys. 2001

24
Lateral intraparietal area
25
Random-walk modelNeurobiological evidence
  • Lateral intraparietal area (LIP)
  • Microstimulation of LIP neurons can bias the
    proportion of choices that are made towards the
    stimulated response field
  • Stimulation generated faster raction times, when
    movements were made into the neurons response
    field
  • Hanks Nat. Neurosci 2006

26
Random-walk modelNeurobiological evidence
  • Neural Implementation of threshold
  • LIP neurons exhibit a threshold like behaviour
  • Time pressure leads to a decrease in the size of
    the threshold Smith Trends Neurosci. 2004
  • Natural candidate would be the superior
    colliculus
  • Recieves visual input from LIP and other cortex
    areas
  • Sends motor signals ito the brainstem and spinal
    cord
  • Contains burst neurons that fire just before a
    saccade is initiated
  • Lo and Wang, Nat. Neurosci. 2006

27
LATERlinear approach to threshold with ergodic
rate
  • Randomness is introduced as trial-by-trial
    changes in the otherwise constant rate of rise of
    the decison signal
  • Saccade towards a target is elicited as soon as a
    neural decision signal has reached a particular
    threshold
  • Assumes a fixed theshold and a linear increase
    whose rate is subject to variation across trials,
    yet fixed within a given trial

28
LATER
  • Threshold for saccade release seems to be
    constant, whereas the slope of the rise in
    activity varied considerably across trials
  • Observed saccadic latency was a function of the
    log probability of the corresponding target
    location
  • The more likely the target location, the shorter
    the latency
  • Assumes that learned a priori target
    probabilities determine the baseline levels of
    the decision signals, but not their rates of rise
  • Carpenter, Nature 1995
  • Reddi, Nat. Neuosci. 2000
  • Reddi, Journal of Neurophysiology 2003

29
LATERLimitations
  • Has only been applied to saccade-to-target
    situations where no leaning took place
  • During learning, the baseline levels of decision
    signals are expected to change across trials
  • Only accounts for simple marginal probabilities
  • Does not account for higher-order contingencies
  • No generative model has been proposed within
    linear rise to threshold models that would allow
    for statistical inference about parameter
    estimates and for model comparison

30
Variability in reaction times
Hanes, Science 1996
31
Prior probability and reward values in the
parietal cortex
32
Prior probability and reward values in the
parietal cortex
33
Prior probability and reward values in the
parietal cortex
  • LIP
  • Neurons appear to encode the prior probability
    that the movement will be reinforced across
    blocks
  • Seems to carry information related to the
    instantaneous likelihood that the movement will
    be reinforced
  • Neurons encode the value of a movement even when
    the sensory and motor properties are held
    constant
  • Relative magnitude of the leftward and rightward
    rewards are linearly correlated with LIP firing
    rates
  • Platt, Nature 1999

34
Value-based decision making
35
Value-based decision making
  • Choice from several alternatives on the basis of
    a subjective value that is placed on them
  • Rational decision making mechanism that combines
    the likelihood and magnitude of a gain to
    determine the value of a course of action (Blaise
    Pascal)
  • Three components
  • Decision-making computation
  • Valuation systems
  • Modulating variables
  • Rangel A., Nat Rev Neurosci. 2008

36
Decision-making Computations
37
Representation
  • Representation of the decision problem
  • Identification of the potential courses of action
  • Identifying internal states and external states

38
Valuation
  • Different actions that are under consideration
    need to be assigned a value
  • Values have to be reliable predictors of the
    benefits that result from the each action

39
Action selection
  • Options with different values need to be compared
    in order to make a decision
  • In real life, the true value of each action
    candidate is rarely known
  • Conflict between valuation systems
  • Brain assignes the system that has the less
    uncertain estimate of the true value of action
  • Habit systems should gradually take over from
    goal-directed systems, as the quality of
    estimates increases with experience
  • Control assignment problem could be respnsible
    for puzzling behaviours or decision making
    pathologies (OCD)

40
Outcome evaluation
  • After implementing the decision, the desirability
    of the outcomes need to be measured
  • Acitvity in medial OFC could compute positive
    outcome valuations

41
Learning
  • Feedback measures are used to update the other
    processes to improve the quality of future
    decisions
  • Goals
  • most advantageous behaviours in every state
  • Valuation sytems must learn to assign values to
    actions that match their anticipated rewards
  • Prediction error
  • measures the qualitiy of the forecast
  • Value of actions is changed by an amount that is
    proportional to the prediction error
  • Animal learns to assign the correct value to
    actions
  • BOLD signal in the ventral striatum correlates
    with prediction errors

42
Valuation systems
  • Useful operational divisions of the valuation
    problem according to the style of the
    computations that are performed by each
  • Three proposed types
  • Pavlovian systems
  • Habitual systems
  • Goal-directed systems

43
Pavlovian systems
  • Assign values to a small set of behaviours that
    are evolutionary appropriate responses to
    particular environmental stimuli
  • Innate responses to specific predetermined
    stimuli
  • Examples preparatory behaviours, consummatory
    responses to a reward, aversive stimuli lead to
    avoidance behaviour
  • Human behaviors that might be controled by the
    Pavlovian system
  • Overeating in the presence of food
  • Obsessive-complusive disorders
  • Harvesting immediate smaller rewards at the
    expense of delayed larger rewards

44
Pavlovian systems Neural bases
  • Pavlovian response to negative stimuli along the
    axis of the dorsal periaqueductal grey
  • Amygdala has been shown to play a crucial role in
    influencing some Pavlovian responses

45
Habit systems
  • Can learn, through repeated training, to assign
    values to a large number of actions
  • Habit system can also learn from observing the
    outcomes of actions that it did not take
  • Characteristics
  • Learn to assign value to stimulus-response
    associations on the basis of previous experience
    through a process of tiral-and-error
  • Learn to assign a value to actions that is
    commensurate with the expected reward
  • Learns slowly because of tiral-and-error approach
  • Might forecast the value of actions incorrectly
    immediately after a change in the action-reward
    contingencies
  • Rely on generalizations

46
Habit systems
  • Dorsolateral striatum might play a crucial role
    in the control of habits
  • Projections of dopamine neurons into this area
    are believed to be important for learning the
    values of actions

47
Goal-directed systems
  • Capable of computing values in novel situations
    and in environments with rapidly changing
    action-outcome contingencies
  • Assigns values to actions by computing
    action-outcome associations and then evaluating
    the rewards of the different outcomes
  • The value that is assigned to an actions should
    equal the average reward to which it might lead
  • Updates the value of an action as soon as the
    value of its outcome changes, whereas the habit
    system does not

48
How does the brain compute and compare values?
  • In order to make good decisions, organisms need
    to assign values to actions that are commensurate
    with the level of reward that they generate
  • Many models assume that the brain can flawlessly
    and instantaneously assign appropriate values to
    actions
  • Organisms are susceptible to the influence of
    environmental variables that interfere with their
    ability to compute accurate values or select the
    best actions
  • Damage to various neural systems may alter the
    way the brain assigns and compares values
    contributing to psychiatric disorders

49
How does the brain compute and compare values?
  • Needs to store action-outcome and outcome-value
    associations
  • Requires a large amount of accurate information
    to forecast values correctly
  • Dorsomedial striatum might have a role in
    action-outcome associations
  • OFC might encode outcome-value associations

50
Modulating variables
  • Action-outcome associations are probabilistic
  • Goal-directed systems need to take into account
    the likelihood of the different outcomes
  • Value modulators
  • Factors, which affect the value of an action
  • Examples riskiness and delay of
    payoffs, social context
  • Might have different effects in each of the
    valuation systems

51
Hypothetical model of realization of
reinforcement learning in the cortex-basal
ganglia network
d reward prediction error
Doya, Nat. Neurosci 2008
52
Factors that affect decisions and learning
  • Needs and desires
  • Utility curve should relect the decision makers
    physiologic or economic needs
  • Sigmoid shape with saturation and threshold
  • Different desires lead to different thresholds of
    nonlinear valuation
  • Devaluation Flattening of the curve for instance
    by satiety

Doya, Nat. Neurosci 2008
53
Factors that affect decisions and learning
  • Risk and uncertainty
  • buying insurance is supposed to be a rational
    behaviour, even though it leads to loss on
    average. The main reason for buying insurance is
    to improve tha value of the worst case outcome.
  • Min-max evaluation minimize the maximal
    punishment or to maximize the minimal possible
    reward
  • Types of uncertainties from
  • the stochasticity inherent in the environmental
    dynamics
  • Unexpected variation of the environment
  • Limited knowledge

54
Factors that affect decisions and learning
  • Time spent and time remaining
  • How fast should one learn from new experiences
    and how stably old knowledge should be retained
  • Constant environment start with rapid memory
    updating and than decay the learning rate as an
    inverse of the number of experiences
  • Changing environment learning rate should depend
    on the estimate of the time for which the past
    experiences remain valid
  • Exclusiveness of commitment
  • In deciding between action with less than average
    rewards, an action with smaller reward in shorter
    time can be more appropiate because it allows
    moving on to the next action earlier

55
Time discounting
  • Goal-directed and habitual systems assign lower
    values to delayed rewards than to immediate ones
  • Incorporation of the timing of rewards
  • Dual-process models interaction of at least two
    different neural valuation systems one with a low
    and one with a high discount rate
  • Single valuation system that discounts future
    rewards either exponentially or hyperbolically

56
Neural Basis of Modulating Variables
  • Expectation of a high reward motivates subjects
    to choose actions despite a large cost, for which
    dopamine in the ACC is important
  • Uncertainty of action outcomes can promote risk
    taking and exploratory choices, in which
    norepinephrine and OFC seems to be involved
  • Predictable environments promote consideration of
    longer delayed rewards, for which serotonin,
    dorsal striatum, and dorsal prefrontal cortex are
    key

57
Neural substrates modulating decison making
DLPFC dorsolateral prefrontal cortex VS Ventral
Striatum DS Dorsal Striatum Doya, Nat.
Neurosci 2008
58
Potential directions
  • Psychiatry disorders involve failure of decision
    making processes
  • Legal evaluation of full command of decision
    making faculties
  • Failures of self-control addiction, obesity
  • Economy effects of marketing on decisions
  • Artificial intelligence which features of
    decision-making should be imitated
  • Personal Training individuals to become better
    decision makers
Write a Comment
User Comments (0)