Title: Operant Conditioning
1Operant Conditioning
- Chapter 13, Unit 4 Psychology
2Introduction
- While CC is useful for explaining learned
behaviour, there are many other learned
behaviours that CC cannot explain, such as
behaviours that are voluntary. - Much of our learning occurs by trial and error.
- We all make adjustments to our behaviour
according to the outcomes or consequences it
produces. - Operant conditioning is the learning that take
places as a result of these consequences.
3Trial and error learning
- Describes an organisms attempts to learn, or to
solve a problem, by trying alternative
possibilities until a correct solution or desired
outcome is achieved. - Involves a number of attempts (trials) and a
number of incorrect choices (errors) before the
correct behaviour is learned.
- Also referred to as instrumental learning, as
the individual is instrumental in learning the
correct response. - More recently however, this learning has been
referred to as operant conditioning, because the
individual operates on the environment to solve
a problem.
4Trial and error learning cont
- It involves
- Motivation
- Exploration
- Incorrect and correct responses
- Reward
- Receiving a reward of some kind leads to the
repeated performance of the correct response,
strengthening the association between the
behaviour and its outcome.
5Thorndikes experiments with cats
- American psychologist Edward Lee Thorndike
(1874-1949) undertook the first studies of trial
and error learning. - He put a hungry cat in a puzzle box and placed
a piece of fish outside the box where it could be
seen (and smelt) but was just outside of the
cats reach.
- The cat had to learn to escape from the box by
operating a latch to escape from the box by
operating a latch to release a door on the side
of the box. - Learning was measured by the time it took the
cat, on consecutive trials to escape.
6The law of effect
- The results of Thorndikes experiments left
Thorndike to develop the law of effect. - It states that a behaviour that is followed by
satisfying consequences is strengthened and a
behaviour that is followed by annoying
consequences is weakened.
- In the puzzle-box experiments, behaviour that
enabled the cat to escape and get to the food
(satisfying) was more likely to occur and
behaviour that kept the cat in the box (annoying)
was less likely to occur. - The cat became instrumental in obtaining its
release to get the food.
7Operant conditioning
- Term operant conditioning (OC) was not introduced
until some years after Thorndikes experiments
with cats escaping from puzzle boxes. - OC was coined by American psychologist Burrhus
Skinner. - He referred to the responses observed in a trial
and error learning as operants.
- An operant is a response that occurs and acts on
the environment to produce some kind of effect. - OC is based on the principle that an organism
will tend to repeat behaviours that have
desirable consequences, or that will enable it to
avoid undesirable consequences. - Organisms will tend not to repeat behaviours that
have undesirable consequences.
8Burrhus Skinner (1904-1990)
- Began his own experiments in the 1930s, but used
to term operant conditioning to emphasise that
animals and people learn to operate on the
environment to produce desired consequences. - He also contrasted operants with respondents in
CC. - Respondents are behaviours elicited by known or
recognised stimuli (e.g. the meat powder making
the dog salivate in Pavlovs experiment).
- He believed that all behaviour can be explained
by the relationships between the behaviour, its
antecedents (the events that precede or come
before it), and its consequences. - He argued that any behaviour that is followed by
a consequence will change in strength and
frequency depending on the nature if that
consequence.
9The Skinner box
- He created an apparatus called a Skinner box,
which is a small chamber in which an experimental
animal learns to make a particular response for
which the consequences can be controlled by the
researcher. - It is attached to a cumulative recorder which
indicates how often each response is made
(frequency) and the rate of response (speed).
10Skinners experiments with rats
- 1938 Skinner uses the box to demonstrate OC.
- 1. a hungry rat is placed in the box.
- 2. it scurries around and randomly touches parts
of the floor and walls. - 3. rat accidentally presses the lever and rat
food is released into the box.
- After additional repetitions, the rats random
acts subsided and were replaced with more
consistent lever pressing. - Eventually, the rat was pressing the lever as
fast as it could eat each pellet. - Pellet is reward for the correct response.
- Skinner referred to different types of rewards as
a reinforcer.
11Skinners experiment cont
- The hunger of the rats was their motivation for
frantic activity. - Skinner believed that there was no need to search
for internal agents to explain changes in
behaviour. - This view was based on the notion that behaviour
can be understood in terms of environmental or
external influences.
12Elements of operant conditioning
13Elements of operant conditioning
- Central to OC is reinforcement (reward).
- A response that is rewarded is strengthened,
whereas one that is punished is weakened.
14Reinforcement
- Reinforcement may involve receiving a pleasant
stimulus or escaping an unpleasant stimulus. - Reinforcement is applying a positive stimulus or
removing a negative stimulus to subsequently
strengthen or increase the likelihood of a
particular response that it follows. - A reinforcer is any object or event that changes
the probability that an operant behaviour will
occur again.
- The term reinforcer is often used interchangeably
with the term reward although they are not
technically the same. - 1 difference is that a reward suggests an outcome
that is positive a stimulus is a reinforcer if
it strengthens the preceding behaviour. - Also a stimulus can be rewarding because it is
pleasurable, but it cannot be said to reinforce
unless it increases the likelihood of a response
occurring.
A person might enjoy eating chocolate find it
pleasurable, but chocolate cannot be considered
to be a reinforcer unless it promotes or
strengthens a particular response.
15Schedules of reinforcement
- The schedule of reinforcement is the way in which
the reinforcement is delivered in experimental
settings. - It influences the speed of learning and the
strength of the learned response. - Reinforcement may be provided on a continuous or
partial reinforcement schedule.
- Continuous reinforcement is when every correct
response in the early stages of learning is
reinforced (the reinforcer is typically provided
immediately after every correct response). - Partial reinforcement is the process of
reinforcing some correct responses but not all of
them. It may be delivered in number of ways or by
different schedules.
16Schedules of reinforcement Cont
- The term schedule of reinforcement refers to the
frequency and manner in which a desired response
is reinforced. - For instance, reinforcement can be given after a
certain number of correct response have been made
(i.e. after an interval). - Furthermore, reinforcement may be given on a
regular basis, such as after every 6th correct
response, or every 30 seconds following a correct
response (that is, fixed) or it may be
unpredictable (that is, variable)
17Positive reinforcement
- The food pellet in the Skinner box is a positive
reinforcer for the hungry rat pressing the lever. - A positive reinforcer is a stimulus that
strengthens or increases the likelihood of a
desired response by providing a satisfying
consequence (reward). - Positive reinforcement occurs from giving or
applying a positive reinforcer after the desired
response has been made.
18Negative reinforcement
- A negative reinforcer is any unpleasant or
aversive stimulus that, when removed or avoided,
strengthens or increases the likelihood of a
desired response. - Negative reinforcement is the removal or
avoidance of an unpleasant stimulus. It has the
effect of increasing the likelihood of a response
being repeated.
- E.g. a Skinner box has a grid on the floor
through which a mild electrical current can be
passed continuously. The rat can feel the
unpleasant foot shock (stimulus). When the rat
presses the lever, the electric current is
switched off and the mild shock is taken away. - The removal of the shock (negative reinforcer) is
referred to as negative reinforcement.
19Distinction between - reinforcers
- Positive reinforcers are given and negative
reinforcers are removed or avoided. - Yet because both procedures lead to desirable
consequences, each procedure strengthens
(reinforces) the behaviour that produced the
consequence.
- Examples of negative reinforcement in everyday
life - Turning off a scary video
- Taking an aspirin to remove a headache
- Not drink-driving for fear of losing your license
- In these examples, the removal of the negative
reinforcer is providing a satisfying or desirable
consequence.
20A quick calculation
- Positive reinforcer () adding something
pleasant - Negative reinforcer (-) subtracting something
unpleasant
21punishment
- Punishment is the delivery of an unpleasant
stimulus following a response, or the removal of
a pleasant stimulus following a response. - It has the same unpleasant quality as a negative
reinforcer, the punishment is given or applied,
whereas the negative reinforcer is prevented or
avoided.
- When closely associated with a response,
punishment weakens the response, or decreases the
probability of that response occurring again over
time.
22Factors that influence the effectiveness of
reinforcement and punishment
- Reinforcement is intended to increase the
likelihood of a behaviour being repeated and
punishment is intended to decrease the likelihood
of behaviour being repeated.
- In OC, what happens after the desired response is
performed is very important in determining the
strength of learning and the rate at which is
occurs. - E.g. when in the process of OC the consequence is
presented, the time lapse between the response
and consequence, and the appropriateness of the
consequence used are all important in determining
the effectiveness of reinforcement or punishment
and therefore learning.
23Order of presentation
- For reinforcement and punishment to be used
effectively, it must be presented after a desired
response, never before. - This ensures that an organism learns the
consequences of a particular response.
- E.g., presenting a child with a lolly after every
time they use the toilet instead of their nappy
when they are in the process of being toilet
trained.
24timing
- Reinforcement and punishment are most effective
when given immediately after the response has
occurred. - This allows for association between the response
and the reinforcer or punisher. - It also influences the strength of the response,
e.g., if there is a delay, the learning will
generally be very slow to progress and in some
cases may not occur at all.
- This is easily controlled in a lab, but not as
easy in everyday life. - E.g. a delay between studying hard in Year 12 and
receiving your desired ENTER. - Or receiving a detention for misbehaviour can
occur more than one day after the misdemeanour.
25appropriateness
- For any stimulus to be a reinforcer, it must
provide a pleasing or satisfying consequence
(reward) its recipient. - Technically, it will not be known if something
will act as reinforcer until after it has been
used. - Also it cannot be assumed that a reinforcer that
works in one situation will work in another.
- Similarly, for any stimulus to be an appropriate
punisher, it must provide a consequence that is
unpleasant and therefore likely to decrease the
likelihood of the undesirable behaviour. - An inappropriate punisher can have the opposite
effect and produce the same consequence as a
reinforcer.
26Key processes in operant conditioning
- Acquisition, Extinction, Spontaneous Recovery,
Stimulus Generalisation, Stimulus Discrimination
involved in both CC and OC, however, the way in
which these processes occur is slightly different
in operant conditioning.
27acquisition
- Refers to the overall learning process during
which a specific response is established. - Differs from acquisition in CC as the means by
which the behaviour is acquired is different
the types of behaviours acquired through OC are
usually more complex than the reflexive,
involuntary responses that became learned
responses in CC.
- In OC, acquisition is the establishment of a
response through reinforcement. - The speed that the response is established
depends on whether continuous or partial
reinforcement is used. - Also, a gradual progression towards a more
complex target behaviour can be achieved, by
reinforcing successive approximations. This is
known as shaping.
28Acquisition cont
- Shaping is a procedure in which reinforcement is
given for any response that successively
approximates and ultimately leads to the final
desired response, or target behaviour. - Consequently, shaping is also known as the method
of successive approximations. - Skinner used shaping in 1 experiment where he set
a target behaviour for a pigeon to turn a
complete circle in an anticlockwise direction.
- He initially continually reinforced the pigeon
with a food pellet that was delivered through a
mechanically operated door every time it turned
slightly to the left. - He then waited until the pigeon turned left
further before reinforcing it. - By limiting the reinforcement only to those
responses that gradually edged towards the target
behaviour, Skinner was able to condition the
pigeon to turn complete circles regularly.
29Extinction
- In OC, extinction is the gradual decrease in the
strength or rate of a conditioned (learned)
response following consistent non-reinforcement
of the response. - It is said to occur when a conditioned response
is no longer present. - With OC, extinction occurs over time, but after
reinforcement is no longer given.
- E.g., when Skinner sopped reinforcing his rats or
pigeons with food pellets, their conditioned
response (e.g. of lever pressing or turning
circles) was eventually extinguished. - Extinction is less likely to occur when partial
reinforcement is used i.e. when reinforcement
does not regularly follow every correct response,
as the uncertainty of the reinforcement leads to
a greater tendency for the response to continue.
30Spontaneous recovery
- Same as in CC, extinction is often not permanent
in OC. - After the apparent extinction of the CR,
spontaneous recovery can occur and the organism
will once again show the response in the absence
of any reinforcement.
31Stimulus generalisation
- This occurs when the correct response is made to
another stimulus that is similar (but not
necessarily identical) to the stimulus that was
present when the CR was reinforced (usually at a
reduced level). - E.g. the sound of a car back firing as it goes
past an athletics carnival may cause the athletes
to generalise this sound to that of the starters
pistol.
32Stimulus discrimination
- In OC, stimulus discrimination occurs when an
organism makes the correct response to a stimulus
and is reinforced, but does not respond to any
other stimulus, even when stimuli are similar
(but not identical). - Skinner taught lab animals to discriminate
between similar stimuli by reinforcing some
responses but not others.
- E.g. a pigeon in a Skinner box could be taught to
discriminate between a red and a green light, by
reinforcing the pigeon when it pecked a target
when the green light was illuminated, but not
when the red one was. - Also, sniffer dogs are used in airports
throughout the world to detect the smuggling on
contraband items (e.g. drugs). - They have been taught this by OC.
33Comparison of cC and oc
- The role of the learning, timing of the stimulus
and response, the nature of the response
34Similarities of CC and Oc
- Acquisition
- Extinction
- Spontaneous recovery
- Stimulus generalisation and discrimination
- Both types of conditioning are achieved as a
result of the repeated association of 2 events
that follow each other closely in time.
- These similarities have led some psychologists to
believe that both OC CC are variants of a
single learning process. - E.g. when Little Albert learned to fear the rat,
his response (trembling) was CC. But when he
learned to avoid the rat by crawling away (a
response that had the effect of reducing his
fear), that was an example of OC
35Differences of CC OC
- OC
- Emphasis on the consequences of a response.
- Involves voluntary responses
- CC
- The behaviour of the learner does not have any
environmental consequences. - Response is involuntary.
36The role of the learner
- In CC the learner is relatively passive when
either the CS of the UCS is presented. - In OC the learner must actively operate on the
environment so as to obtain the reinforcement or
the punishment.
37Timing of the stimulus and response
- In CC the response depends on the presentation of
the UCS occurring first. - In OC the presentation of the reinforcer depends
on the response occurring first. - In CC, the timing of the 2 stimuli (CS, then
UCS), produces an association between them that
conditions the learner to anticipate the UCS and
respond to it even if is not presented.
- In OC, the association that is conditioned is
between the stimulus and the response. - In CC the timing of the 2 stimuli needs to be
very close and the sequencing is vital the CS
must come before the UCS. - In OC, while learning generally occurs faster
when the reinforcement or punishment occurs soon
after the response, there can be a considerable
time difference between them.
38The nature of the response
- In CC, the response by the learner is usually a
reflexive, involuntary one. - In OC, the response by the learner is usually a
voluntary one. - In CC, the response is often one involving the
action of the autonomic nervous system, and the
association of the 2 stimuli is often not a
conscious or deliberate one.
- In OC, the response is more likely to involve the
central nervous system and to be conscious,
intentional and often goal-directed.