Title: Conditioning
1Conditioning
Bear with me. Bare with me. Beer with me. Stay
focused.
2Learning
Typically this subsides as this is learned.
- A. Two-process learning (Rescorla-Solomon 67)
- fast fear and arousal
- slow adaptive behavioral responses
- B. Three-process learning
- A
- declarative memory (as opposed to procedural)
- C. More-than-three-process learning
- A
- declarative memory
- episodic memory
- semantic memory
- more stuff
3Conditional and Unconditional
Training
US Reinforcer
easier
harder
4Classical and Operant
CC predicts that the animal will produce UR/CR
while performing the desired action, but does
not explain why the animal learns to select the
action.
5Selectionist View
- Selectionist principles
- Behaviors are varied, selected and retained in a
process similar to the natural selection of the
species - Only overt behaviors can be reinforced by the
environment - Principle of the selection is based in the
behavioral discrepancy
6Behavioral Discrepancy
Behavioral discrepancy is the change in an
ongoing behavior produced by the eliciting
stimulus
Example Presentation of food produces
salivation which would not otherwise occur
7Unified Selection Principle
Whenever a behavioral discrepancy occurs, an
environment-behavior relation is selected that
consists -- other things being equal -- of all
those stimuli occurring immediately before the
discrepancy and all those responses occurring
immediately before and at the same time as the
elicited response.
Under this principle there is no difference
between Classical and Operant conditioning as
far as learning goes.
8Conditioning Phenomena
It goes on...
9Conditioning/Selection Models
- Trial-by-trial
- Probabilistic (Dayan-Long, Cheng-Novick)
- and not (Rescorla-Wagner)
- NN (Donohoe)
- Moment-by-moment
- Sutton-Barto
- Mignault
- Schmajuk (NN)
- Bazillion of others...
S1 and S2 processing should happen at roughly the
same time so almost all models suggest a
multiplicative relationship between levels of S1
and S2.
10Rescorla-Wagner model
- Trial based
- Based on net prediction of the reward
- Only happens when prediction discrepancy is
detected - Falls out straight from ML estimation of
association strength - Is essantially the delta-rule
net prediction
reward
association strength update
stimulus eligibility
- Problems
- Does not deal well with overshadowing and
downwards unblocking... - Does not depend on the temporal relations
between stimuli - Does not explain re-acquisition rate
11Sutton-Barto model
- Problems
- Does not model Inter-Stimulus Intervals where
the efficiency of the training should decrease
with increased ISI - Does not deal with reacquisition
12Temporal Difference model
- Is related to the SB model (and the RW model)
- Models reward in small discrete intervals
- Models second order conditioning
- Based on the assumption that the goal of
learning is to accurately predict the future US
levels
discounted prediction of the future reward (V for
predicted values of S)
- Problems
- No model of attention, salience, configuration
etc... - No indirect associations modeled (sensory
preconditioning) - Problems with downwards unblocking
13Statistical models
This results in exactly the RW model with ML.
This is EM. Similar to comparator models of
conditioning (whatever they are). Has problems
with inhibitory conditioning.
Dayan Longs model. Models the conditioning
phenomena. Does not consider associability
(eligibility in SB) and attention. No distinction
between preparatory and consumatory conditioning
14NN models
Warning a personal opinion!
- Everything is a neural net - things happen
naturally - The weights propagate and this forms the
dynamics of the Stimulus-Stimulus interactions
S1
Stuff happens here
Response
S2
Whatever.
15Bruces favorite model
- Model time and rate of CS and reinforcement
- Time -scale invariant
- Non-associative framework
rates of reinforcement
cumulative number of reinforcements in presence
of Sn
cumulative duration of the conjunction of S1 and
Sn
cumulative duration of Sn
16References
- Dayan, P., and Abbot, L. F. (2000?). Theoretical
Neuroscience. In Print??? (http//www.gatsby.ucl.
ac.uk/dayan/book/) - Dayan, P. and Long, T., (1998?). Statistical
Models of Conditioning. NIPS10. - Gallistel, C. R., and Gibbon, J., (2000) . Time,
Rate and Conditioning. Psychological review, in
print. - Pavlov, I. P. (1927). Conditioned Reflexes.
Oxford Oxford University Press. - Mignault, A. and Marley, A. A. J. (1997). A
Real-Time Neuronal Model of Classical
Conditioning. Adaptive Behavior. Vol. 6-1, 3-61. - Rescorla, R. A. (1988). Behavioral studies of
Pavlovian conditioning. Annual Review of
Neuroscience 11 329 - 352. - Rescorla, R. A., and R. L. Solomon. (1967).
Two-process learning theory Relationships
between Pavlovian conditioning and instrumental
learning. Psychological Review 74 151 - 182. - Rescorla, R. A., and A. R. Wagner. (1972). A
theory of Pavlovian conditioning Variations in
the effectiveness of reinforcement and
nonreinforcement. In A. H. Black and W. F.
Proskay, Eds., Classical Conditioning, vol. 2,
Current Research and Theory. New York
Appleton-Century-Crofts, pp. 54 - 99. - Roitblat, H. L. and Meyer, J.-A.. Comparative
Approaches to Cognitive Science. MIT Press. - Schmajuk, N. A. (1997). Animal Learning and
Cognition. A neural Network approach. - Skinner, B. F. (1938). The Behavior of
Organisms. New York Appleton-Century-Crofts. - Sutton, R. S., and Barto, A. W, (1990).
Computational Neuroscience Foundations of
Adaptive Networks. MIT Press - Thorndike, E. L. (1911). Animal Intelligence
Experimental Studies. New York Macmillan. - Wilson, R. A. and Keil, F. (1999) The MIT
Encyclopedia of Cognitive Sciences. MIT Press.
MITECS (http//cognet.mit.edu/MITECS)