Comments
Transcript
Forming and Strengthening Operant Behavior
Instrumental and Operant Conditioning: Learning the Consequences of Behavior 183 Although the artist may not have intended it, this cartoon nicely illustrates one way in which discriminative stimuli can affect behavior. © The New Yorker Collection 1993 Tom Cheney from Cartoonbank.com. All Rights Reserved. was decorated to look like the inside of a sailing ship. You might later be attracted to other restaurants with nautical names or with interiors that look something like the one where you had that great meal. Stimulus generalization and stimulus discrimination complement each other. In one study, for example, pigeons received food for pecking a key, but only when they saw certain kinds of artwork (Watanabe, Sakamoto, & Wakita, 1995). As a result, these birds learned to discriminate the works of the impressionist painter Claude Monet from those of the cubist painter Pablo Picasso. Later, when the birds were shown new paintings by other impressionist and cubist artists, they were able to generalize from the original artists to other artists who painted in the same style, as if they had learned the conceptual categories of “impressionism” and “cubism.” We humans learn to place people and objects into even more finely detailed categories, such as “honest,” “dangerous,” or “tax deductible.” We discriminate one stimulus from another and then, through generalization, respond similarly to all those we perceive to be in a particular category. This ability to respond in a similar way to all members of a category can save us considerable time and effort, but it can also lead to the development of unwarranted prejudice against certain groups of people (see the chapter on social psychology). Forming and Strengthening Operant Behavior Your daily life is full of examples of operant conditioning. You go to movies, parties, classes, and jobs primarily because doing so brings reinforcement. What is the effect of the type or timing of your reinforcers? How do you learn new behaviors? How can you get rid of old ones? Online Study Center Improve Your Grade Tutorial: Shaping shaping The reinforcement of responses that come successively closer to some desired response. Let’s say you want to train your dog, Sugar, to sit and “shake hands.” You figure that positive reinforcement should work, so you decide to give Sugar a treat every time she sits and shakes hands. But there’s a problem with this plan. Smart as Sugar is, she may never make the desired response on her own, so you might never be able to give the reinforcer. The way around this problem is to shape Sugar’s behavior. Shaping is the process of reinforcing successive approximations—that is, responses that come successively closer to the desired behavior. For example, you might first give Sugar a treat whenever she sits down. Next, you might reinforce her only when she sits and partially lifts a paw. Finally, you might reinforce only complete paw lifting. Eventually, you would require Sugar to perform the entire sit-lift-shake sequence before giving the treat. Shaping 184 Chapter 5 Learning Shaping is a powerful tool. Animal trainers have used it to teach chimpanzees to rollerskate, dolphins to jump through hoops, and pigeons to play Ping-Pong (Coren, 1999). Secondary Reinforcement Operant conditioning often begins with the use of primary reinforcers, which are events or stimuli—such as food or water—that satisfy Learning to eat with a spoon is, as you can see, a hit-and-miss process at first. However, this child will learn to hit the target more and more often as the food reward gradually shapes a more efficient, and far less messy, pattern of behavior. GETTING THE HANG OF IT needs basic to survival. The effects of primary reinforcers are powerful and automatic. But constantly giving Sugar food as a reward can disrupt training, because she will stop to eat after every response. Also, once she gets full, food will no longer act as an effective reinforcer. To avoid these problems, animal trainers, parents, and teachers rely on the principle of secondary reinforcement. Secondary reinforcers are previously neutral stimuli that take on reinforcing properties after being paired with stimuli that are already reinforcing. In other words, they are rewards that people or animals learn to like. If you say “Good girl!” just before feeding Sugar, those words will become reinforcing after a few pairings. “Good girl!” can then be used alone to reinforce Sugar’s behavior. It helps if the words are again paired with food every now and then. Does this remind you of classical conditioning? It should, because the primary reinforcer (food) is an unconditioned stimulus. If the words “Good girl!” become a reliable signal for food, they will act as a conditioned stimulus (CS). For this reason, secondary reinforcers are sometimes called conditioned reinforcers. The power of operant conditioning can be greatly increased by using secondary reinforcers. Consider the secondary reinforcer we call money. Some people will do almost anything for it despite the fact that it tastes terrible and won’t quench your thirst. Its reinforcing power lies in its association with the many rewards it can buy. Smiles and other forms of social approval (such as the words “Good job!”) are also important secondary reinforcers for human beings. However, what becomes a secondary reinforcer can vary widely from person to person and culture to culture. Tickets to a rock concert may be an effective secondary reinforcer for some people, but not for others. A ceremony honoring outstanding job performance might be strongly reinforcing in an individualist culture, but the same experience might be embarrassing for a person in a collectivist culture, where group cooperation is given greater value than personal distinction (Miller, 2001). When carefully chosen, however, secondary reinforcers can build or maintain behavior, even when primary reinforcement is absent for long periods. Much of our behavior is learned and maintained because it is regularly reinforced. But many of us overeat, smoke, drink too much, or procrastinate, even though we know these behaviors are bad for us. We may want to eliminate them, but they are hard to change. We seem to lack “selfcontrol.” If behavior is controlled by its consequences, why do we do things that are ultimately self-defeating? Part of the answer lies in the timing of reinforcers. For example, the good feelings (positive reinforcers) that follow excessive drinking are immediate. But because hangovers and other negative consequences are usually delayed, their effects on future drinking are weakened. In other words, operant conditioning is stronger when reinforcers appear soon after a response occurs (Rachlin, 2000). Under some conditions, delaying a positive reinforcer for even a few seconds can decrease the effectiveness of positive reinforcement. (An advantage of praise and other secondary reinforcers is that they can easily be delivered as soon as the desired response occurs.) The size of the reinforcer is also important. In general, conditioning is faster when the reinforcer is large than when it is small. Delay and Size of Reinforcement When a continuous reinforcement schedule is in effect, a reinforcer is delivered every time a particular response occurs. This schedule can be helpful when teaching someone a new skill, but it can be impractical in the long run. Imagine how inefficient it would be, for example, if an employer had to deliver praise or pay following every little task employees performed all day long. In most cases, reinforcement is given only some of the time, on a partial, or intermittent, reinforcement schedule. Intermittent schedules are described in terms of when and how reinforcers are given. “When” refers to the number of responses that have to occur, or the Schedules of Reinforcement primary reinforcers Events or stimuli that satisfy physiological needs basic to survival. secondary reinforcers Rewards that people or animals learn to like. Instrumental and Operant Conditioning: Learning the Consequences of Behavior 185 amount of time that must pass, before a reinforcer will occur. “How” refers to whether the reinforcer will be delivered in a predictable or unpredictable way. 1. Fixed-ratio (FR) schedules provide reinforcement following a fixed number of responses. Rats might receive food after every tenth time they press the lever in a Skinner box (FR 10) or after every twentieth (FR 20). Computer helpline technicians might be allowed to take a break after every fifth call they handle, or every tenth. 2. Variable-ratio (VR) schedules also call for reinforcement after a certain number of responses, but that number varies. As a result, it is impossible to predict which particular response will bring reinforcement. On a VR 30 schedule, for example, a rat will be reinforced after an average of thirty lever presses. This means that the reward sometimes comes after ten presses, sometimes after fifteen, and other times after fifty or more. Gambling offers humans a similar variable-ratio schedule. Slot machines pay off only after a frustratingly unpredictable number of lever pulls, averaging perhaps one in twenty. doing 2 learn by REINFORCEMENT SCHEDULES ON THE JOB Make a list of all the jobs you have ever held, along with the reinforcement schedule on which you received your pay for each. Which of the four types of schedules (fixed ratio, fixed interval, variable ratio, or variable interval) was most common, and which was most satisfying to you? 3. Fixed-interval (FI) schedules provide reinforcement for the first response that occurs after some fixed time has passed since the last reward. On an FI 60 schedule, for instance, the first response after sixty seconds has passed will be rewarded, regardless of how many responses have been made during that interval. Some radio stations make use of fixed-interval schedules. Listeners who have won a call-in contest might have to wait at least ten days before they are eligible to win again. 4. Variable-interval (VI) schedules reinforce the first response after some period of time, but the amount of time varies unpredictably. So on a VI 60 schedule, the first response that occurs after an average of 60 seconds is reinforced, but the actual time between reinforcements might vary anywhere from 1 to 120 seconds or more. Police in Illinois once used a VI schedule to encourage seat belt use. They stopped drivers at random times and awarded prizes to those who were buckled up (Mortimer et al., 1988). Kindergarten teachers, too, use VI schedules when they give rewards to children who are in their seats when a chime sounds at random intervals. As shown in Figure 5.10, different schedules of reinforcement produce different patterns of responding (Skinner, 1961). Both fixed-ratio and variable-ratio schedules produce especially high response rates overall. The reason, in both cases, is that the frequency of reward depends directly on the rate of responding. Under a fixed-interval schedule, it does not matter how many responses are made during the time between rewards. Because the timing of the reinforcement is so predictable, the rate of responding typically drops immediately after reinforcement and then increases as the time for another reward approaches. When teachers schedule quizzes on the same day each week, most students will study just before each quiz and then almost cease studying immediately afterward. By contrast, the possibility of unpredictable “pop” quizzes (on a variable-interval schedule) typically generates more consistent studying patterns (Kouyoumdjian, 2004). partial reinforcement extinction effect A phenomenon in which behaviors learned under a partial reinforcement schedule are more difficult to extinguish than those learned on a continuous reinforcement schedule. Schedules and Extinction Ending the relationship between an operant response and its consequences weakens that response. In fact, failure to reinforce a response eventually extinguishes it. The response occurs less and less often and eventually may disappear. If you keep sending text messages to someone who never replies, you eventually stop trying. But extinction in operant conditioning does not erase learned relationships (Delamater, 2004). If a discriminative stimulus for reinforcement reappears some time after an operant response has been extinguished, that response may recur (spontaneously recover). And if again reinforced, the response will quickly return to its former level, as though extinction had never happened. In general, behaviors learned under a partial reinforcement schedule are far more difficult to extinguish than those learned on a continuous reinforcement schedule. This phenomenon is called the partial reinforcement extinction effect. Imagine,