...

Forming and Strengthening Operant Behavior

by taratuta

on
Category: Documents
84

views

Report

Comments

Transcript

Forming and Strengthening Operant Behavior
Instrumental and Operant Conditioning: Learning the Consequences of Behavior
183
Although the artist may not have
intended it, this cartoon nicely illustrates
one way in which discriminative stimuli
can affect behavior.
© The New Yorker Collection 1993 Tom Cheney from Cartoonbank.com. All Rights Reserved.
was decorated to look like the inside of a sailing ship. You might later be attracted to
other restaurants with nautical names or with interiors that look something like the
one where you had that great meal.
Stimulus generalization and stimulus discrimination complement each other. In one
study, for example, pigeons received food for pecking a key, but only when they saw
certain kinds of artwork (Watanabe, Sakamoto, & Wakita, 1995). As a result, these birds
learned to discriminate the works of the impressionist painter Claude Monet from those
of the cubist painter Pablo Picasso. Later, when the birds were shown new paintings by
other impressionist and cubist artists, they were able to generalize from the original
artists to other artists who painted in the same style, as if they had learned the conceptual categories of “impressionism” and “cubism.” We humans learn to place people
and objects into even more finely detailed categories, such as “honest,” “dangerous,” or
“tax deductible.” We discriminate one stimulus from another and then, through generalization, respond similarly to all those we perceive to be in a particular category. This
ability to respond in a similar way to all members of a category can save us considerable time and effort, but it can also lead to the development of unwarranted prejudice
against certain groups of people (see the chapter on social psychology).
Forming and Strengthening Operant Behavior
Your daily life is full of examples of operant conditioning. You go to movies, parties,
classes, and jobs primarily because doing so brings reinforcement. What is the effect of
the type or timing of your reinforcers? How do you learn new behaviors? How can you
get rid of old ones?
Online Study Center
Improve Your Grade
Tutorial: Shaping
shaping The reinforcement of responses that come successively closer
to some desired response.
Let’s say you want to train your dog, Sugar, to sit and “shake hands.” You
figure that positive reinforcement should work, so you decide to give Sugar a treat every
time she sits and shakes hands. But there’s a problem with this plan. Smart as Sugar is,
she may never make the desired response on her own, so you might never be able to
give the reinforcer. The way around this problem is to shape Sugar’s behavior. Shaping
is the process of reinforcing successive approximations—that is, responses that come successively closer to the desired behavior. For example, you might first give Sugar a treat
whenever she sits down. Next, you might reinforce her only when she sits and partially
lifts a paw. Finally, you might reinforce only complete paw lifting. Eventually, you
would require Sugar to perform the entire sit-lift-shake sequence before giving the treat.
Shaping
184
Chapter 5
Learning
Shaping is a powerful tool. Animal trainers have used it to teach chimpanzees to rollerskate, dolphins to jump through hoops, and pigeons to play Ping-Pong (Coren, 1999).
Secondary Reinforcement Operant conditioning often begins with the use of
primary reinforcers, which are events or stimuli—such as food or water—that satisfy
Learning
to eat with a spoon is, as you can see, a
hit-and-miss process at first. However,
this child will learn to hit the target more
and more often as the food reward gradually shapes a more efficient, and far less
messy, pattern of behavior.
GETTING THE HANG OF IT
needs basic to survival. The effects of primary reinforcers are powerful and automatic.
But constantly giving Sugar food as a reward can disrupt training, because she will stop
to eat after every response. Also, once she gets full, food will no longer act as an effective reinforcer. To avoid these problems, animal trainers, parents, and teachers rely on
the principle of secondary reinforcement.
Secondary reinforcers are previously neutral stimuli that take on reinforcing properties after being paired with stimuli that are already reinforcing. In other words, they
are rewards that people or animals learn to like. If you say “Good girl!” just before feeding Sugar, those words will become reinforcing after a few pairings. “Good girl!” can
then be used alone to reinforce Sugar’s behavior. It helps if the words are again paired
with food every now and then. Does this remind you of classical conditioning? It should,
because the primary reinforcer (food) is an unconditioned stimulus. If the words “Good
girl!” become a reliable signal for food, they will act as a conditioned stimulus (CS). For
this reason, secondary reinforcers are sometimes called conditioned reinforcers.
The power of operant conditioning can be greatly increased by using secondary reinforcers. Consider the secondary reinforcer we call money. Some people will do almost
anything for it despite the fact that it tastes terrible and won’t quench your thirst. Its
reinforcing power lies in its association with the many rewards it can buy. Smiles and
other forms of social approval (such as the words “Good job!”) are also important secondary reinforcers for human beings. However, what becomes a secondary reinforcer
can vary widely from person to person and culture to culture. Tickets to a rock concert may be an effective secondary reinforcer for some people, but not for others. A
ceremony honoring outstanding job performance might be strongly reinforcing in an
individualist culture, but the same experience might be embarrassing for a person in a
collectivist culture, where group cooperation is given greater value than personal distinction (Miller, 2001). When carefully chosen, however, secondary reinforcers can build
or maintain behavior, even when primary reinforcement is absent for long periods.
Much of our behavior is learned and maintained because it is regularly reinforced. But many of us overeat, smoke, drink too
much, or procrastinate, even though we know these behaviors are bad for us. We
may want to eliminate them, but they are hard to change. We seem to lack “selfcontrol.” If behavior is controlled by its consequences, why do we do things that are
ultimately self-defeating?
Part of the answer lies in the timing of reinforcers. For example, the good feelings
(positive reinforcers) that follow excessive drinking are immediate. But because hangovers
and other negative consequences are usually delayed, their effects on future drinking are
weakened. In other words, operant conditioning is stronger when reinforcers appear soon
after a response occurs (Rachlin, 2000). Under some conditions, delaying a positive reinforcer for even a few seconds can decrease the effectiveness of positive reinforcement. (An
advantage of praise and other secondary reinforcers is that they can easily be delivered as
soon as the desired response occurs.) The size of the reinforcer is also important. In general, conditioning is faster when the reinforcer is large than when it is small.
Delay and Size of Reinforcement
When a continuous reinforcement schedule is in
effect, a reinforcer is delivered every time a particular response occurs. This schedule
can be helpful when teaching someone a new skill, but it can be impractical in the long
run. Imagine how inefficient it would be, for example, if an employer had to deliver
praise or pay following every little task employees performed all day long. In most cases,
reinforcement is given only some of the time, on a partial, or intermittent, reinforcement schedule. Intermittent schedules are described in terms of when and how reinforcers are given. “When” refers to the number of responses that have to occur, or the
Schedules of Reinforcement
primary reinforcers Events or stimuli
that satisfy physiological needs basic to
survival.
secondary reinforcers Rewards that
people or animals learn to like.
Instrumental and Operant Conditioning: Learning the Consequences of Behavior
185
amount of time that must pass, before a reinforcer will occur. “How” refers to whether
the reinforcer will be delivered in a predictable or unpredictable way.
1. Fixed-ratio (FR) schedules provide reinforcement following a fixed number of
responses. Rats might receive food after every tenth time they press the lever
in a Skinner box (FR 10) or after every twentieth (FR 20). Computer helpline
technicians might be allowed to take a break after every fifth call they handle,
or every tenth.
2. Variable-ratio (VR) schedules also call for reinforcement after a certain number
of responses, but that number varies. As a result, it is impossible to predict which
particular response will bring reinforcement. On a VR 30 schedule, for example,
a rat will be reinforced after an average of thirty lever presses. This means that
the reward sometimes comes after ten presses, sometimes after fifteen, and other
times after fifty or more. Gambling offers humans a similar variable-ratio schedule. Slot machines pay off only after a frustratingly unpredictable number of lever
pulls, averaging perhaps one in twenty.
doing
2
learn
by
REINFORCEMENT
SCHEDULES ON THE JOB
Make a list of all the jobs you
have ever held, along with the reinforcement schedule on which you received
your pay for each. Which of the four
types of schedules (fixed ratio, fixed interval, variable ratio, or variable interval)
was most common, and which was most
satisfying to you?
3. Fixed-interval (FI) schedules provide reinforcement for the first response that occurs
after some fixed time has passed since the last reward. On an FI 60 schedule, for
instance, the first response after sixty seconds has passed will be rewarded, regardless of how many responses have been made during that interval. Some radio
stations make use of fixed-interval schedules. Listeners who have won a call-in
contest might have to wait at least ten days before they are eligible to win again.
4. Variable-interval (VI) schedules reinforce the first response after some period of
time, but the amount of time varies unpredictably. So on a VI 60 schedule, the
first response that occurs after an average of 60 seconds is reinforced, but the
actual time between reinforcements might vary anywhere from 1 to 120 seconds
or more. Police in Illinois once used a VI schedule to encourage seat belt use.
They stopped drivers at random times and awarded prizes to those who were
buckled up (Mortimer et al., 1988). Kindergarten teachers, too, use VI schedules
when they give rewards to children who are in their seats when a chime sounds
at random intervals.
As shown in Figure 5.10, different schedules of reinforcement produce different
patterns of responding (Skinner, 1961). Both fixed-ratio and variable-ratio schedules
produce especially high response rates overall. The reason, in both cases, is that the frequency of reward depends directly on the rate of responding. Under a fixed-interval
schedule, it does not matter how many responses are made during the time between
rewards. Because the timing of the reinforcement is so predictable, the rate of responding typically drops immediately after reinforcement and then increases as the time for
another reward approaches. When teachers schedule quizzes on the same day each
week, most students will study just before each quiz and then almost cease studying
immediately afterward. By contrast, the possibility of unpredictable “pop” quizzes (on
a variable-interval schedule) typically generates more consistent studying patterns
(Kouyoumdjian, 2004).
partial reinforcement extinction effect
A phenomenon in which behaviors
learned under a partial reinforcement
schedule are more difficult to extinguish than those learned on a continuous reinforcement schedule.
Schedules and Extinction Ending the relationship between an operant response
and its consequences weakens that response. In fact, failure to reinforce a response eventually extinguishes it. The response occurs less and less often and eventually may disappear. If you keep sending text messages to someone who never replies, you eventually stop trying. But extinction in operant conditioning does not erase learned
relationships (Delamater, 2004). If a discriminative stimulus for reinforcement reappears some time after an operant response has been extinguished, that response may
recur (spontaneously recover). And if again reinforced, the response will quickly return
to its former level, as though extinction had never happened.
In general, behaviors learned under a partial reinforcement schedule are far more
difficult to extinguish than those learned on a continuous reinforcement schedule.
This phenomenon is called the partial reinforcement extinction effect. Imagine,
Fly UP