...

Basic Components of Operant Conditioning

by taratuta

on
Category: Documents
71

views

Report

Comments

Transcript

Basic Components of Operant Conditioning
Instrumental and Operant Conditioning: Learning the Consequences of Behavior
FIGURE
179
5.5
Thorndike’s Puzzle Box
This drawing illustrates the kind of puzzle
box used in Thorndike’s research. His cats
learned to open the door and reach food
by stepping on the pedal, but the learning
occurred gradually. Some cats actually
took longer to get out of the box on one
trial than on a previous trial.
follow them, between behavior and its consequences. A child learns to say “please” to
get a piece of candy. A headache sufferer learns to take a pill to escape the pain. A dog
learns to “shake hands” to get a treat.
From the Puzzle Box to the Skinner Box
Edward L. Thorndike, an American psychologist, did much of the fundamental research
on the consequences of behavior. While Pavlov was exploring classical conditioning,
Thorndike was studying animals’ intelligence, including their ability to solve problems.
For example, he placed a hungry cat in a puzzle box like the one in Figure 5.5. The cat
had to learn some response—such as stepping on a pedal—to unlock the door and get
food. During the first few trials in the puzzle box, the cat explored and prodded until
it finally hit the pedal. The animal eventually solved the puzzle, but very slowly. It did
not appear to understand, or suddenly gain insight into, the problem (Thorndike,
1898). After many trials, though, the cat solved the puzzle quickly each time it was
placed in the box. What was it learning? Thorndike argued that any response (such as
pacing or meowing) that did not produce a satisfying effect (opening the door) gradually became weaker. And any response (pressing the pedal) that did have a satisfying
effect gradually became stronger. The cat’s learning, said Thorndike, is governed by the
law of effect. According to this law, if a response made to a particular stimulus is followed by a satisfying effect (such as food or some other reward), that response is more
likely to occur the next time the stimulus is present. In contrast, responses that produce discomfort are less likely to be performed again. Thorndike described this kind
of learning as instrumental conditioning, because responses are strengthened when they
are instrumental in producing rewards (Thorndike, 1905).
About thirty years after Thorndike published his work, B. F. Skinner extended and
formalized many of Thorndike’s ideas. Skinner noted that, during instrumental conditioning, an organism learns a response by operating on the environment. So he used
the term operant conditioning to refer to the learning process in which behavior is
changed by its consequences, specifically by rewards and punishments. To study operant conditioning, Skinner devised some new tools. One of these was a small chamber
that, despite Skinner’s objections, came to be known as the Skinner box (see Figure 5.6).
law of effect A law stating that if a
response made in the presence of a particular stimulus is rewarded, the same
response is more likely to occur when
that stimulus is encountered again.
operant conditioning A process in
which responses are learned on the
basis of their rewarding or punishing
consequences.
Basic Components of Operant Conditioning
The chamber Skinner designed allowed researchers to arrange relationships between a particular response and its consequences. If an animal pressed a lever in the chamber, for
example, it might receive a food pellet. The researchers could then analyze how
consequences affect behavior. It turned out that stimulus generalization, stimulus discrimination, extinction, spontaneous recovery, and other phenomena seen in classical conditioning also appear in operant conditioning. However, research in operant conditioning
also focused on concepts known as operants, reinforcers, and discriminative stimuli.
180
Chapter 5
Learning
FIGURE 5 .6
B. F. Skinner (1904–1990)
In the operant conditioning chamber
shown here, the rat can press a bar to obtain food pellets from a tube. Operant and
instrumental conditioning are similar in
most respects, but they do differ in one
way. Instrumental conditioning is measured by how long it takes for a response
(such as a bar press) to occur. Operant conditioning is measured by the rate at which
responses occur. In this chapter, the term
operant conditioning refers to both.
Skinner coined the term operant or operant response to
distinguish the responses in operant conditioning from those in classical conditioning.
In classical conditioning, the conditioned response does not affect whether or when the
stimulus occurs. Pavlov’s dogs salivated when a tone sounded. The salivation had no
effect on the tone or on whether food was presented. In contrast, an operant has some
effect on the world. It is a response that operates on the environment. When a child
says, “Momma, I’m hungry” and is then fed, the child has made an operant response
that influences when food will appear.
A reinforcer is a stimulus that increases the probability that the operant behavior
will occur again. There are two main types of reinforcers: positive and negative.
Positive reinforcers strengthen a response if they are presented after that response
occurs. The food given to a hungry pigeon after it pecks a key is a positive reinforcer
for key pecking. For people, positive reinforcers can include food, smiles, money, and
other desirable outcomes. Presentation of a positive reinforcer after a response is called
positive reinforcement. Negative reinforcers are the removal of unpleasant stimuli,
such as pain or noise. For example, the disappearance of a headache after you take a
pain reliever acts as a negative reinforcer that makes you more likely to take that pain
reliever in the future. When a response is strengthened by the removal of an unpleasant stimulus such as pain, the process is called negative reinforcement.
Notice that reinforcement always increases the likelihood of the behavior that precedes it, whether the reinforcer is adding something pleasant or removing something
unpleasant. Figure 5.7 shows this relationship.
Operants and Reinforcers
operant A response that has some effect on the world.
reinforcer A stimulus event that increases the probability that the response immediately preceding it will
occur again.
positive reinforcers Stimuli that
strengthen a response if they follow
that response.
negative reinforcers The removal of
unpleasant stimuli.
escape conditioning The process of
learning responses that stop an aversive
stimulus.
avoidance conditioning The process
of learning particular responses that
avoid an aversive stimulus.
The effects of negative reinforcement can
be seen in both escape conditioning and avoidance conditioning. Escape conditioning
occurs when we learn responses that stop an unpleasant stimulus. The left-hand panel
of Figure 5.8 shows an example from an animal laboratory, but escape conditioning
operates in humans, too. We not only learn to take pills to stop pain, but some parents
learn to stop their child’s annoying demands for a toy by agreeing to buy it. And television viewers learn to use the mute button to shut off obnoxious commercials.
When an animal or a person responds to a signal in a way that avoids an aversive
stimulus before it arrives, avoidance conditioning has occurred (see the right-hand
Escape and Avoidance Conditioning
Instrumental and Operant Conditioning: Learning the Consequences of Behavior
FIGURE 5 .7
Positive and Negative
Reinforcement
Remember that behavior is
strengthened through positive
reinforcement when something
pleasant or desirable occurs following
the behavior. Behavior is strengthened
through negative reinforcement when the
behavior results in the removal or termination of something unpleasant. To see
how these principles apply in your own
life, list two examples of situations in
which your behavior was affected by positive reinforcement and two in which you
were affected by negative reinforcement.
181
POSITIVE REINFORCEMENT
Behavior
You put coins into
a vending machine.
doing
2
learn
by
Presentation of
a pleasant or
positive stimulus
You receive a cold
can of soda.
Frequency of
behavior increases
You put coins in
vending machines
in the future.
Termination of
an unpleasant
stimulus
The date ends early.
Frequency of
behavior increases
You use the same
tactic on future
boring dates.
NEGATIVE REINFORCEMENT
Behavior
In the middle of
a boring date,
you say you have
a headache.
sections of Figure 5.8). Avoidance conditioning is an important influence on everyday
behavior. We go to work even when we would rather stay in bed, we stop at red lights
even when we are in a hurry, and we apologize for our mistakes even before they are
discovered. Each of these behaviors helps us avoid a negative consequence, such as lost
pay, a traffic ticket, or a scolding.
Avoidance conditioning represents a marriage of classical and operant conditioning.
If, as shown in Figure 5.8, a buzzer predicts shock (an unconditioned stimulus), the
FIGURE 5
A Shuttle Box
.8
A shuttle box has two sections, usually separated by a barrier, and its floor is an electric grid. Shock can be administered through the grid to
either section. The left-hand panel shows escape conditioning, in which an animal learns to get away from a mild shock by jumping over the
barrier when the electricity is turned on. The next two panels show avoidance conditioning. Here, the animal has learned to avoid shock altogether by jumping over the barrier when it hears a warning buzzer just before shock occurs.
Escape conditioning
Source: Adapted from Hintzman (1978).
Avoidance conditioning
182
Chapter 5
Learning
FIGURE 5 .9
Stimulus Discrimination
This rat could jump from a stand through
any of three doors, but it was reinforced
only if it jumped through the door that
differed from the other two. The rat
learned to do this quite well: On this occasion, it discriminated vertical from horizontal stripes.
buzzer becomes a conditioned stimulus (CS). Through classical conditioning, the
buzzer (now a CS) triggers a conditioned fear response (CR). Like the shock itself, conditioned fear is an unpleasant internal sensation. The animal then learns the instrumental response of jumping the barrier. The instrumental response (jumping) is reinforced because it reduces fear.
Once learned, avoidance is a difficult habit to break, because avoidance responses
continue to be reinforced by fear reduction (Solomon, Kamin, & Wynne, 1953). So even
if the shock is turned off in a shuttle box, animals may keep jumping when the buzzer
sounds because they never discover that avoidance is no longer necessary. The same is
often true of people. Those who avoid escalators out of fear never get a chance to find
out that they are safe. Those with limited social skills may avoid potentially embarrassing social situations, but doing so also prevents them from learning how to be successful in those situations.
The study of avoidance conditioning has not only expanded our understanding of
negative reinforcement but also has led some psychologists to consider more complex
cognitive processes in operant learning. These psychologists suggest, for example, that
in order for people to learn to avoid an unpleasant event (such as getting fired or paying a fine), they must have established an expectancy or other mental representation of
that event. The role of such mental representations is emphasized in the cognitive theories of learning described later in this chapter.
The consequences of our behavior often depend on the situation we are in. Most people know that a flirtatious comment about someone’s appearance may be welcomed on a date, but not at the office.
And even if you have been rewarded for telling jokes, you are not likely to do so at a
funeral. In the language of operant conditioning, situations serve as discriminative
stimuli, which are stimuli that signal whether reinforcement is available if a certain
response is made. Stimulus discrimination occurs when an organism learns to make a
particular response in the presence of one stimulus but not another (see Figure 5.9).
The response is then said to be under stimulus control. Stimulus discrimination allows
people or animals to learn what is appropriate (reinforced) and inappropriate (not reinforced) in particular situations.
Stimulus generalization also occurs in operant conditioning. That is, an animal or
person will perform a response in the presence of a stimulus that is similar to, but not
exactly like, a stimulus that has signaled reinforcement in the past. The more similar
the new stimulus is to the old one, the more likely it is that the response will be performed. Suppose you ate a wonderful meal at a restaurant called “Captain Jack’s,” which
Discriminative Stimuli and Stimulus Control
discriminative stimuli Stimuli that signal whether reinforcement is available
if a certain response is made.
Fly UP