A hierarchical Bayesian model of causal learning in humans and rats - PowerPoint PPT Presentation

1 / 38

About This Presentation

Title:

A hierarchical Bayesian model of causal learning in humans and rats

Description:

the problem: effects of pre-training on blocking in humans and animals. a work-in-progress solution: causal learning as ... act 1: introducing the problem ... – PowerPoint PPT presentation

Number of Views:98

Avg rating:3.0/5.0

Slides: 39

Provided by: tombe74

Category:

more less

Transcript and Presenter's Notes

Title: A hierarchical Bayesian model of causal learning in humans and rats

1
A hierarchical Bayesian model of causal learning
in humans and rats

Tom Beckers1, Hongjing Lu2, Randall R. Rojas2,
Alan Yuille2
1UvA / 2UCLA

2
overview

the problem effects of pre-training on blocking
in humans and animals
a work-in-progress solution causal learning as
hierarchical Bayesian inference
problems with the solution

3
prelude blocking

blocking a touchstone phenomenon of associative
learning in animals (Kamin, 1969) and humans
(Dickinson, Shanks Evenden, 1984)

4
prelude blocking

blocking in animal Pavlovian conditioningexp
buzz --gt shock / buzzlight --gt shock
contr tone --gt shock / buzzlight --gt shock

5
prelude blocking

blocking in human causal learningA --gt allergy,
Z --gt no allergy / AX --gt allergy, KL --gt
allergy, Z --gt no allergy

6
prelude blocking

the observation of blocking provided the main
inspiration for the development of the
Rescorla-Wagner model of conditioning (Rescorla
Wagner, 1972) and its application to human causal
learning (Dickinson et al., 1984)

7
act 1 introducing the problem

pre-training appears to significantly modulate
blocking, both in human causal learning and in
rat Pavlovian fear conditioning (e.g., Beckers et
al., 2005, 2006 Vandorpe, De Houwer Beckers,
2007 Wheeler, Beckers Miller, 2008)

8
pre-training in humans
pretraining elem.tr.
comp.tr. additive C D CD E M- A M-
AX KL M- subadditive C D CD E M- A
M- AX KL M-
Beckers et al., 2005, JEPLMC, Exp. 2
9
pre-training in rats
Beckers et al., 2006, JEPG, Exp. 1
10
pre-training in rats
Beckers et al., 2006, JEPG, Exp. 1
11
act 2 toward resolution

to date, no convincing formal explanation
available for pre-training effects on blocking
blocking in itself is compatible with a number of
formalisations, including associative models,
probabilistic models, and Bayesian inference
models

12
learning as Bayesian inference

Bayes rule p(HD) p(DH)?p(H)/p(D)
to evaluate the existence of a causal
linkp(H1D) p(DH1)?p(H1)
p(H0D) p(DH0)?p(H0)
can be used for Bayesian parameter estimation of
the weight of the causal link

13
learning as Bayesian inference

in this framework, learning implies the
application of Bayes theorem to find the most
likely causal structure of the world given the
available data
the posterior probability of a causal model H
given the data is a function of the probability
of the data given the causal model (which can be
derived analytically) and the prior probability
of the causal model

14
learning as selection between causal graphs

A X A
X
effect effect

15
a generative graphical model
hidden variables represent internal states that
reflect the magnitudes of the effect generated by
each individual cause
16
a generative graphical model
p(DH) obviously depends on how R1 and R2 combine
to produce R
17
a generative graphical model
simplest assumption linear-sum model R
expressed as the sum of R1 and R2 plus some
Gaussian noise.
18
applying the linear-sum model to a blocking
contingency

priors on weights are set (close
to) 0
weights are then updated trial-by-trial by
combining priors with likelihoods (likelihoods
being derived using a linear-sum model)

p(HD) p(DH)?p(H)/p(D)
19
applying the linear-sum model to a blocking
contingency

using some mathematical hocus-pocus, this yields

Lu, Rojas, Beckers Yuille, Proc Cog Sci, 2008
20
other integration models

as indicated, these results crucially depend on
the choice of the integration rule, here a
linear-sum model
one could think of an alternative rule, such as a
max or noisy-max rule
the noisy-max rule is a generalisation of the
noisy-OR rule to continuous outcomes, the
noisy-OR rule itself being a mathematical
implementation of the power PC model

21
the noisy-max model

basically, the noisy-max rule implies that R is
either the maximum of R1 and R2 (the max rule) or
the average of R1 and R2, or something in between

22
applying the noisy-max model to a blocking
contingency

using this rule instead of the linear-sum model,
yields for the same blocking contingency

Lu, Rojas, Beckers Yuille, Proc Cog Sci, 2008
23
Bayesian inference for model selection

so far, we have used Bayesian inference to
estimate parameter weights (causal strengths),
based on a hypothetical integration model
however, Bayesian inference can also be used to
estimate which integration model best fits a set
of data (this is called hierarchical Bayesian
inference)
i.e., what is the posterior likelihood that we
would observe the sequence of data
(presence/absence of cues and outcomes) given
either a linear or noisy-max rule?

24
pre-training and model selection

simplifying things slightly, the pretraining data
are consistent with one causal graph only, so we
do not need to select between a graph that
contains one causal link and a graph that
contains two causal links
instead, for pretraining, we can focus on which
integration rule best fits the data, given the
causal graph with two links

25
learning as selection between causal graphs

C D C
D
effect effect

26
pre-training and model selection

the pretraining trials can thus serve to compute
log-likelihood ratios for the noisy-max model
relative to the linear-sum model

Lu, Rojas, Beckers Yuille, Proc Cog Sci, 2008
27
pre-training and model selection

additive pretraining induces a preference
towards the linear-sum model, subadditive
pretraining a preference towards the noisy-max
model, relative to no CD compound trials (red
line)

Lu, Rojas, Beckers Yuille, Proc Cog Sci, 2008
28
combining model selection and parameter estimation

the model selected on the basis of the
pretraining trials can then be applied to do
parameter estimation based on the elemental and
compound trials

Lu, Rojas, Beckers Yuille, Proc Cog Sci, 2008
29
hierarchical Bayesian inference in animal
conditioning

slightly different pretraining subadditive
versus irrelevant
for model selection, log likelihood treshold set
such that without pretraining, animals exhibit
preference for linear-sum model

30
model selection by pre-training
Lu, Rojas, Beckers Yuille, Proc Cog Sci, 2008
31
parameter estimation

using the model that best fits the pre-training
data, we do Bayesian parameter estimation from
the elemental and compound trials
these weight estimates are translated in
estimated suppresion ratios using
where N number of lever presses in pre-CS
baseline period

32
simulation results
Lu, Rojas, Beckers Yuille, Proc Cog Sci, 2008
33
so ...

hierarchical, sequential Bayesian inference can
account for influences of pretraining on
subsequent learning with completely different
stimuli
key assumption learners have available multiple
integration models for combining the influence of
multiple causes i.e., both humans and rats have
tacit knowledge that multiple cues may have
summative impact on the outcome (linear-sum
model), or that the outcome may be effectively
saturated at a level approximated by the weight
of the strongest individual cause (noisy-max
model)

34
so ...

using standard Bayesian model selection, the
learner selects the model that best explains the
pretraining data, and then continues to favor the
most successful model during subsequent learning
with different cues

35
act 3 conflict / complication

there are a number of problems and complications
to be spelled out so far
in its present form, the simulations capture some
of the data very well however, the model would
probably fail with other data, e.g., on effects
of outcome maximality

36
act 3 conflict / complication

there are a number of problems and complications
to be spelled out so far
for the human data, the model does not assume any
priors on the integration models this is
psychologically implausible
on the other hand, humans and other animals are
probably capable of learning other integration
rules as well (e.g., super-additive rules)

37
act 3 conflict / complication

there are a number of problems and complications
to be spelled out so far
if so, the present model still represents a gross
simplification we need a prior distribution on
the whole range of possible integration models,
and ways to compute posterior likelihoods for
them
the whole damn thing feels like shooting a cannon
ball to kill a fly