Title: Dynamics of Reward Bias Effects in Perceptual Decision Making
1Dynamics of Reward Bias Effects in Perceptual
Decision Making
- Jay McClelland Juan Gao
- Building on
- Newsome and RorieHolmes and FengUsher and
McClelland
2Our Questions
- Can we trace the effect of reward bias on
decision making over time? - Can we determine what would be the optimal
policy, and what constraints there are on this
policy? - Can we determine how well participants do at
achieving optimality? - Can we uncover the processing mechanisms that
lead to the observed patterns of behavior?
3Overview
- Experiment
- Results
- Optimality analysis
- Abstract dynamical model
- Mechanistic dynamical model
4Human Experiment Examining Reward Bias Effect at
Different Time Points after Target Onset
- Stimuli are rectangles shifted 1,3, or 5 pixels L
or R of fixation - Reward cue occurs 750 msec before stimulus.
- Small arrow head visible for 250 msec.
- Only biased reward conditions (2 vs 1 and 1 vs 2)
are considered. - Response signal occurs at these times after
stimulus onset - 0 75 150 225 300 450 600 900 1200 2000
- Participant receives reward (one or two points)
if response occurs within 250 msec of response
signal and is correct. - Participants were run for 15-25 sessions to
provide stable data. - Data shown are from later sets of sessions in
which the biasing effect of reward appeared to be
fairly stable.
5A participant with very little reward bias
- Top panel shows probability of response giving
larger reward as a function of actual response
time for combinations of - Stimulus shift (1 3 5) pixels
- Reward-stimulus compatibility
- Lower panel shows data transformed to z scores,
and corresponds to the theoretical construct
mean(x1(t)-x2(t))bias(t)
sd(x1(t)-x2(t)) - where x1 represents the state of the accumulator
associated with greater reward, x2 the same for
lesser reward,and S is thought to choose larger
reward if x1(t)-x2(t)bias(t) gt 0.
6Participants Showing Reward Bias
7(No Transcript)
8Abstract optimality analysis
9Assumptions
- At a given time, two distributions, means mu,
-mu, same STD sigma. Choice? x gt?lt X_c - For three difficulty levels, same STD sigma,
means mu_i (i1,2,3), same X_c.
10Only one diff level
Three diff levels
Subjects sensitivity, a definition in theory of
signal detectability
When response signal delay varies
For each subject, fit with function
11Subject Sensitivity
12(No Transcript)
13Real bias
Optimal bias
14(No Transcript)
15Dynamical analysis
- Based on one dimensional leaky integrator model.
- Initial condition x 0
- Chose left if x gt 0 when the response signal is
detected otherwise choose right. - Accuracy approximates exponential approach to
asymptote because of leakage. - How is the reward implemented?
- A time-varying offset that optimizes reward?
- Offset in initial conditions?
- An additional term in the input to the decision
variable? - A fixed offset in the value of the decision
variable?
161. Time-varying term that optimizes rewards (No
free parameter for reward bias)
- Notes
- Equivalent to a time-varying criterion -b(t).
- There is a dip at
- Prediction and test higher C level ? earlier
dip. - For multiple C levels, no analytical expressions.
172. Offset in initial conditions
- Notes
- Effect of the bias decays away for lambdalt0.
- Single C level , a dip at
- Prediction and test higher C level ? earlier dip
183. Reward as a term in the input
- Reward signal comes -t seconds relative to
stimulus. - For tlt0 input b noise sd s
- For tgt0, input baC noise continues as before.
- Notes
- Effect of the bias persists.
- But bias is sub-optimal initially, and there is
no dip.
194. Reward as a constant offset in the decision
variable
- Note
- Equivalent to setting criterion at m0
- Effect persists for lambdalt0.
- Single C level , a dip at
- Prediction and test higher C level ? earlier dip
205. Reward as a term in the input, creating
variability at stimulus onset
- Reward signal comes -t seconds relative to
stimulus. - For tlt0 input b, noise sd sb
- Eor tgt0, input baC noise sd sbs.
- Notes
- Effect of the bias persists.
- If sb 0, no dip.
- Prediction and testgiven small sb, longer
reward period ? later and shallower dip.
21Leaky Competing Integrator Model
Inputs for reward stimulus response
signal High threshold for
22(No Transcript)