Title: A hierarchical Bayesian model of causal learning in humans and rats
1A hierarchical Bayesian model of causal learning
in humans and rats
- Tom Beckers1, Hongjing Lu2, Randall R. Rojas2,
Alan Yuille2 - 1UvA / 2UCLA
2overview
- the problem effects of pre-training on blocking
in humans and animals - a work-in-progress solution causal learning as
hierarchical Bayesian inference - problems with the solution
3 prelude blocking
- blocking a touchstone phenomenon of associative
learning in animals (Kamin, 1969) and humans
(Dickinson, Shanks Evenden, 1984)
4 prelude blocking
- blocking in animal Pavlovian conditioningexp
buzz --gt shock / buzzlight --gt shock - contr tone --gt shock / buzzlight --gt shock
5 prelude blocking
- blocking in human causal learningA --gt allergy,
Z --gt no allergy / AX --gt allergy, KL --gt
allergy, Z --gt no allergy
6 prelude blocking
- the observation of blocking provided the main
inspiration for the development of the
Rescorla-Wagner model of conditioning (Rescorla
Wagner, 1972) and its application to human causal
learning (Dickinson et al., 1984)
7 act 1 introducing the problem
- pre-training appears to significantly modulate
blocking, both in human causal learning and in
rat Pavlovian fear conditioning (e.g., Beckers et
al., 2005, 2006 Vandorpe, De Houwer Beckers,
2007 Wheeler, Beckers Miller, 2008)
8pre-training in humans
pretraining elem.tr.
comp.tr. additive C D CD E M- A M-
AX KL M- subadditive C D CD E M- A
M- AX KL M-
Beckers et al., 2005, JEPLMC, Exp. 2
9pre-training in rats
Beckers et al., 2006, JEPG, Exp. 1
10pre-training in rats
Beckers et al., 2006, JEPG, Exp. 1
11act 2 toward resolution
- to date, no convincing formal explanation
available for pre-training effects on blocking - blocking in itself is compatible with a number of
formalisations, including associative models,
probabilistic models, and Bayesian inference
models
12learning as Bayesian inference
- Bayes rule p(HD) p(DH)?p(H)/p(D)
- to evaluate the existence of a causal
linkp(H1D) p(DH1)?p(H1)
p(H0D) p(DH0)?p(H0) - can be used for Bayesian parameter estimation of
the weight of the causal link
13learning as Bayesian inference
- in this framework, learning implies the
application of Bayes theorem to find the most
likely causal structure of the world given the
available data - the posterior probability of a causal model H
given the data is a function of the probability
of the data given the causal model (which can be
derived analytically) and the prior probability
of the causal model
14learning as selection between causal graphs
15a generative graphical model
hidden variables represent internal states that
reflect the magnitudes of the effect generated by
each individual cause
16a generative graphical model
p(DH) obviously depends on how R1 and R2 combine
to produce R
17a generative graphical model
simplest assumption linear-sum model R
expressed as the sum of R1 and R2 plus some
Gaussian noise.
18applying the linear-sum model to a blocking
contingency
- priors on weights are set (close
to) 0 - weights are then updated trial-by-trial by
combining priors with likelihoods (likelihoods
being derived using a linear-sum model)
p(HD) p(DH)?p(H)/p(D)
19applying the linear-sum model to a blocking
contingency
- using some mathematical hocus-pocus, this yields
Lu, Rojas, Beckers Yuille, Proc Cog Sci, 2008
20other integration models
- as indicated, these results crucially depend on
the choice of the integration rule, here a
linear-sum model - one could think of an alternative rule, such as a
max or noisy-max rule - the noisy-max rule is a generalisation of the
noisy-OR rule to continuous outcomes, the
noisy-OR rule itself being a mathematical
implementation of the power PC model
21the noisy-max model
- basically, the noisy-max rule implies that R is
either the maximum of R1 and R2 (the max rule) or
the average of R1 and R2, or something in between
22applying the noisy-max model to a blocking
contingency
- using this rule instead of the linear-sum model,
yields for the same blocking contingency
Lu, Rojas, Beckers Yuille, Proc Cog Sci, 2008
23Bayesian inference for model selection
- so far, we have used Bayesian inference to
estimate parameter weights (causal strengths),
based on a hypothetical integration model - however, Bayesian inference can also be used to
estimate which integration model best fits a set
of data (this is called hierarchical Bayesian
inference) - i.e., what is the posterior likelihood that we
would observe the sequence of data
(presence/absence of cues and outcomes) given
either a linear or noisy-max rule?
24pre-training and model selection
- simplifying things slightly, the pretraining data
are consistent with one causal graph only, so we
do not need to select between a graph that
contains one causal link and a graph that
contains two causal links - instead, for pretraining, we can focus on which
integration rule best fits the data, given the
causal graph with two links
25learning as selection between causal graphs
26pre-training and model selection
- the pretraining trials can thus serve to compute
log-likelihood ratios for the noisy-max model
relative to the linear-sum model
Lu, Rojas, Beckers Yuille, Proc Cog Sci, 2008
27pre-training and model selection
- additive pretraining induces a preference
towards the linear-sum model, subadditive
pretraining a preference towards the noisy-max
model, relative to no CD compound trials (red
line)
Lu, Rojas, Beckers Yuille, Proc Cog Sci, 2008
28combining model selection and parameter estimation
- the model selected on the basis of the
pretraining trials can then be applied to do
parameter estimation based on the elemental and
compound trials
Lu, Rojas, Beckers Yuille, Proc Cog Sci, 2008
29hierarchical Bayesian inference in animal
conditioning
- slightly different pretraining subadditive
versus irrelevant - for model selection, log likelihood treshold set
such that without pretraining, animals exhibit
preference for linear-sum model
30model selection by pre-training
Lu, Rojas, Beckers Yuille, Proc Cog Sci, 2008
31parameter estimation
- using the model that best fits the pre-training
data, we do Bayesian parameter estimation from
the elemental and compound trials - these weight estimates are translated in
estimated suppresion ratios using - where N number of lever presses in pre-CS
baseline period
32simulation results
Lu, Rojas, Beckers Yuille, Proc Cog Sci, 2008
33so ...
- hierarchical, sequential Bayesian inference can
account for influences of pretraining on
subsequent learning with completely different
stimuli - key assumption learners have available multiple
integration models for combining the influence of
multiple causes i.e., both humans and rats have
tacit knowledge that multiple cues may have
summative impact on the outcome (linear-sum
model), or that the outcome may be effectively
saturated at a level approximated by the weight
of the strongest individual cause (noisy-max
model)
34so ...
- using standard Bayesian model selection, the
learner selects the model that best explains the
pretraining data, and then continues to favor the
most successful model during subsequent learning
with different cues
35act 3 conflict / complication
- there are a number of problems and complications
to be spelled out so far - in its present form, the simulations capture some
of the data very well however, the model would
probably fail with other data, e.g., on effects
of outcome maximality
36act 3 conflict / complication
- there are a number of problems and complications
to be spelled out so far - for the human data, the model does not assume any
priors on the integration models this is
psychologically implausible - on the other hand, humans and other animals are
probably capable of learning other integration
rules as well (e.g., super-additive rules)
37act 3 conflict / complication
- there are a number of problems and complications
to be spelled out so far - if so, the present model still represents a gross
simplification we need a prior distribution on
the whole range of possible integration models,
and ways to compute posterior likelihoods for
them - the whole damn thing feels like shooting a cannon
ball to kill a fly
38