Comments
Transcript
Basic Components of Operant Conditioning
Instrumental and Operant Conditioning: Learning the Consequences of Behavior FIGURE 179 5.5 Thorndike’s Puzzle Box This drawing illustrates the kind of puzzle box used in Thorndike’s research. His cats learned to open the door and reach food by stepping on the pedal, but the learning occurred gradually. Some cats actually took longer to get out of the box on one trial than on a previous trial. follow them, between behavior and its consequences. A child learns to say “please” to get a piece of candy. A headache sufferer learns to take a pill to escape the pain. A dog learns to “shake hands” to get a treat. From the Puzzle Box to the Skinner Box Edward L. Thorndike, an American psychologist, did much of the fundamental research on the consequences of behavior. While Pavlov was exploring classical conditioning, Thorndike was studying animals’ intelligence, including their ability to solve problems. For example, he placed a hungry cat in a puzzle box like the one in Figure 5.5. The cat had to learn some response—such as stepping on a pedal—to unlock the door and get food. During the first few trials in the puzzle box, the cat explored and prodded until it finally hit the pedal. The animal eventually solved the puzzle, but very slowly. It did not appear to understand, or suddenly gain insight into, the problem (Thorndike, 1898). After many trials, though, the cat solved the puzzle quickly each time it was placed in the box. What was it learning? Thorndike argued that any response (such as pacing or meowing) that did not produce a satisfying effect (opening the door) gradually became weaker. And any response (pressing the pedal) that did have a satisfying effect gradually became stronger. The cat’s learning, said Thorndike, is governed by the law of effect. According to this law, if a response made to a particular stimulus is followed by a satisfying effect (such as food or some other reward), that response is more likely to occur the next time the stimulus is present. In contrast, responses that produce discomfort are less likely to be performed again. Thorndike described this kind of learning as instrumental conditioning, because responses are strengthened when they are instrumental in producing rewards (Thorndike, 1905). About thirty years after Thorndike published his work, B. F. Skinner extended and formalized many of Thorndike’s ideas. Skinner noted that, during instrumental conditioning, an organism learns a response by operating on the environment. So he used the term operant conditioning to refer to the learning process in which behavior is changed by its consequences, specifically by rewards and punishments. To study operant conditioning, Skinner devised some new tools. One of these was a small chamber that, despite Skinner’s objections, came to be known as the Skinner box (see Figure 5.6). law of effect A law stating that if a response made in the presence of a particular stimulus is rewarded, the same response is more likely to occur when that stimulus is encountered again. operant conditioning A process in which responses are learned on the basis of their rewarding or punishing consequences. Basic Components of Operant Conditioning The chamber Skinner designed allowed researchers to arrange relationships between a particular response and its consequences. If an animal pressed a lever in the chamber, for example, it might receive a food pellet. The researchers could then analyze how consequences affect behavior. It turned out that stimulus generalization, stimulus discrimination, extinction, spontaneous recovery, and other phenomena seen in classical conditioning also appear in operant conditioning. However, research in operant conditioning also focused on concepts known as operants, reinforcers, and discriminative stimuli. 180 Chapter 5 Learning FIGURE 5 .6 B. F. Skinner (1904–1990) In the operant conditioning chamber shown here, the rat can press a bar to obtain food pellets from a tube. Operant and instrumental conditioning are similar in most respects, but they do differ in one way. Instrumental conditioning is measured by how long it takes for a response (such as a bar press) to occur. Operant conditioning is measured by the rate at which responses occur. In this chapter, the term operant conditioning refers to both. Skinner coined the term operant or operant response to distinguish the responses in operant conditioning from those in classical conditioning. In classical conditioning, the conditioned response does not affect whether or when the stimulus occurs. Pavlov’s dogs salivated when a tone sounded. The salivation had no effect on the tone or on whether food was presented. In contrast, an operant has some effect on the world. It is a response that operates on the environment. When a child says, “Momma, I’m hungry” and is then fed, the child has made an operant response that influences when food will appear. A reinforcer is a stimulus that increases the probability that the operant behavior will occur again. There are two main types of reinforcers: positive and negative. Positive reinforcers strengthen a response if they are presented after that response occurs. The food given to a hungry pigeon after it pecks a key is a positive reinforcer for key pecking. For people, positive reinforcers can include food, smiles, money, and other desirable outcomes. Presentation of a positive reinforcer after a response is called positive reinforcement. Negative reinforcers are the removal of unpleasant stimuli, such as pain or noise. For example, the disappearance of a headache after you take a pain reliever acts as a negative reinforcer that makes you more likely to take that pain reliever in the future. When a response is strengthened by the removal of an unpleasant stimulus such as pain, the process is called negative reinforcement. Notice that reinforcement always increases the likelihood of the behavior that precedes it, whether the reinforcer is adding something pleasant or removing something unpleasant. Figure 5.7 shows this relationship. Operants and Reinforcers operant A response that has some effect on the world. reinforcer A stimulus event that increases the probability that the response immediately preceding it will occur again. positive reinforcers Stimuli that strengthen a response if they follow that response. negative reinforcers The removal of unpleasant stimuli. escape conditioning The process of learning responses that stop an aversive stimulus. avoidance conditioning The process of learning particular responses that avoid an aversive stimulus. The effects of negative reinforcement can be seen in both escape conditioning and avoidance conditioning. Escape conditioning occurs when we learn responses that stop an unpleasant stimulus. The left-hand panel of Figure 5.8 shows an example from an animal laboratory, but escape conditioning operates in humans, too. We not only learn to take pills to stop pain, but some parents learn to stop their child’s annoying demands for a toy by agreeing to buy it. And television viewers learn to use the mute button to shut off obnoxious commercials. When an animal or a person responds to a signal in a way that avoids an aversive stimulus before it arrives, avoidance conditioning has occurred (see the right-hand Escape and Avoidance Conditioning Instrumental and Operant Conditioning: Learning the Consequences of Behavior FIGURE 5 .7 Positive and Negative Reinforcement Remember that behavior is strengthened through positive reinforcement when something pleasant or desirable occurs following the behavior. Behavior is strengthened through negative reinforcement when the behavior results in the removal or termination of something unpleasant. To see how these principles apply in your own life, list two examples of situations in which your behavior was affected by positive reinforcement and two in which you were affected by negative reinforcement. 181 POSITIVE REINFORCEMENT Behavior You put coins into a vending machine. doing 2 learn by Presentation of a pleasant or positive stimulus You receive a cold can of soda. Frequency of behavior increases You put coins in vending machines in the future. Termination of an unpleasant stimulus The date ends early. Frequency of behavior increases You use the same tactic on future boring dates. NEGATIVE REINFORCEMENT Behavior In the middle of a boring date, you say you have a headache. sections of Figure 5.8). Avoidance conditioning is an important influence on everyday behavior. We go to work even when we would rather stay in bed, we stop at red lights even when we are in a hurry, and we apologize for our mistakes even before they are discovered. Each of these behaviors helps us avoid a negative consequence, such as lost pay, a traffic ticket, or a scolding. Avoidance conditioning represents a marriage of classical and operant conditioning. If, as shown in Figure 5.8, a buzzer predicts shock (an unconditioned stimulus), the FIGURE 5 A Shuttle Box .8 A shuttle box has two sections, usually separated by a barrier, and its floor is an electric grid. Shock can be administered through the grid to either section. The left-hand panel shows escape conditioning, in which an animal learns to get away from a mild shock by jumping over the barrier when the electricity is turned on. The next two panels show avoidance conditioning. Here, the animal has learned to avoid shock altogether by jumping over the barrier when it hears a warning buzzer just before shock occurs. Escape conditioning Source: Adapted from Hintzman (1978). Avoidance conditioning 182 Chapter 5 Learning FIGURE 5 .9 Stimulus Discrimination This rat could jump from a stand through any of three doors, but it was reinforced only if it jumped through the door that differed from the other two. The rat learned to do this quite well: On this occasion, it discriminated vertical from horizontal stripes. buzzer becomes a conditioned stimulus (CS). Through classical conditioning, the buzzer (now a CS) triggers a conditioned fear response (CR). Like the shock itself, conditioned fear is an unpleasant internal sensation. The animal then learns the instrumental response of jumping the barrier. The instrumental response (jumping) is reinforced because it reduces fear. Once learned, avoidance is a difficult habit to break, because avoidance responses continue to be reinforced by fear reduction (Solomon, Kamin, & Wynne, 1953). So even if the shock is turned off in a shuttle box, animals may keep jumping when the buzzer sounds because they never discover that avoidance is no longer necessary. The same is often true of people. Those who avoid escalators out of fear never get a chance to find out that they are safe. Those with limited social skills may avoid potentially embarrassing social situations, but doing so also prevents them from learning how to be successful in those situations. The study of avoidance conditioning has not only expanded our understanding of negative reinforcement but also has led some psychologists to consider more complex cognitive processes in operant learning. These psychologists suggest, for example, that in order for people to learn to avoid an unpleasant event (such as getting fired or paying a fine), they must have established an expectancy or other mental representation of that event. The role of such mental representations is emphasized in the cognitive theories of learning described later in this chapter. The consequences of our behavior often depend on the situation we are in. Most people know that a flirtatious comment about someone’s appearance may be welcomed on a date, but not at the office. And even if you have been rewarded for telling jokes, you are not likely to do so at a funeral. In the language of operant conditioning, situations serve as discriminative stimuli, which are stimuli that signal whether reinforcement is available if a certain response is made. Stimulus discrimination occurs when an organism learns to make a particular response in the presence of one stimulus but not another (see Figure 5.9). The response is then said to be under stimulus control. Stimulus discrimination allows people or animals to learn what is appropriate (reinforced) and inappropriate (not reinforced) in particular situations. Stimulus generalization also occurs in operant conditioning. That is, an animal or person will perform a response in the presence of a stimulus that is similar to, but not exactly like, a stimulus that has signaled reinforcement in the past. The more similar the new stimulus is to the old one, the more likely it is that the response will be performed. Suppose you ate a wonderful meal at a restaurant called “Captain Jack’s,” which Discriminative Stimuli and Stimulus Control discriminative stimuli Stimuli that signal whether reinforcement is available if a certain response is made.