Comments
Description
Transcript
53 141 Genes in Pieces
wea25324_ch14_394-435.indd Page 395 13/12/10 7:22 AM user-f467 /Volume/204/MHDQ268/wea25324_disk1of1/0073525324/wea25324_pagefile 14.1 Genes in Pieces its way into the cytoplasm before translation can begin. This allows an interval between transcription and translation traditionally known as the posttranscriptional phase. In this chapter we will see that most eukaryotic genes, in contrast to typical bacterial genes, are interrupted by noncoding DNA. RNA polymerase cannot distinguish the coding region of the gene from the noncoding regions, so it transcribes everything. Thus, the cell must remove the noncoding RNA from the original transcript, in a process called splicing. Eukaryotes also tack special structures onto the 59- and 39-ends of their mRNAs. The 59-structure is called a cap, and the 39-structure is a string of AMPs called poly(A). All three of these events occur in the nucleus before the mRNA emigrates to the cytoplasm, and it is becoming increasingly clear that all three occur before transcription is over. Thus, it might be more correct to refer to them as cotranscripional, rather than posttranscriptional, events. To avoid any confusion, we will refer to them as mRNAprocessing events. It appears that all three of these events are coordinated. We will return to this theme at the end of Chapter 15, after we have studied splicing (this chapter) and capping and polyadenylation (Chapter 15) in detail. 14.1 Genes in Pieces If we expressed the sequence of the human b-globin gene as a sentence, here is how it might look: This is bhgty the human b-globin qwtzptlrbn gene. Two regions (italicized) within the gene obviously make no sense: they contain sequences totally unrelated to the globin coding sequences surrounding them. These are sometimes called intervening sequences, or IVSs, but they usually go by the name Walter Gilbert gave them: introns. Similarly, the parts of the gene that make sense are sometimes called coding regions, or expressed regions, but Gilbert’s name for them is more popular: exons. Some genes, especially in lower eukaryotes, have no introns at all; others have an abundance. The current record (362 introns) is held by the human titin gene, which codes for a huge muscle protein. Evidence for Split Genes Consider the major late locus of adenovirus—the first place introns were found, by Phillip Sharp and his colleagues in 1977. The adenovirus major late locus contains several genes that are transcribed late in infection. These genes encode structural proteins, such as hexon, one of the viral 395 coat proteins. Several lines of evidence converged at that time to show that the genes of the adenovirus major late locus are interrupted, but perhaps the easiest to understand comes from studies using a technique called R-looping. In R-looping experiments, RNA is hybridized to its DNA template. In other words, the DNA template strands are separated to allow a double-stranded hybrid to form between one of these strands and the RNA product. Such a hybrid double-stranded polynucleotide is actually a bit more stable than a double-stranded DNA under the conditions of the experiment. After the hybrid forms, it is examined by electron microscopy. These experiments can be done in two basic ways: (1) using DNA whose two strands are separated only enough to let the RNA hybridize or (2) completely separating the two DNA strands before hybridization. Sharp and colleagues used the latter method, hybridizing singlestranded adenovirus DNA to mature mRNA for one of the viral coat proteins: the hexon protein. Figure 14.1 shows the results. (Do not be confused by the similarity between the terms exon and hexon. They are not related.) If the hexon gene had no introns, a smooth, linear hybrid would occur where the mRNA lined up with its DNA template. But what if introns do occur in this gene? Clearly, no introns are present in the mature mRNA, or they would code for nonsense that would appear in the protein product. Therefore, introns are sequences that occur in the DNA but are missing from mRNA. That means the hexon DNA and hexon mRNA will not be able to form a smooth hybrid. Instead, the intron regions of the DNA will not find counterparts in the mRNA and so will form unhybridized loops. That is exactly what happened in the experiment shown in Figure 14.1. The loops there are made of DNA, but we still call them R loops because hybridization with RNA caused them to form. The electron micrograph shows an RNA–DNA hybrid interrupted by three single-stranded DNA loops (labeled A, B, and C). These loops represent the introns in the hexon gene. Each loop is preceded by a short hybrid region, and the last loop is followed by a long hybrid region. Thus, the gene has four exons: three short ones near the beginning, followed by one large one. The three short exons are transcribed into a leader region that appears at the 59-end of the hexon mRNA before the coding region; the long exon contains the coding region of the gene. In fact, the major late genes have different coding regions, but all share the same leader region encoded in the same three short exons. When we discover something as surprising as introns in a virus, we wonder whether it is just a bizarre viral phenomenon that has no relationship to eukaryotic cellular processes. Thus, it was important to determine whether eukaryotic cellular genes also have introns. One of the first such demonstrations was an R-looping experiment done by Pierre Chambon and colleagues, using the chicken ovalbumin gene. They observed six DNA loops of various sizes that could not hybridize to the mRNA, so this gene contains wea25324_ch14_394-435.indd Page 396 396 13/12/10 7:22 AM user-f467 /Volume/204/MHDQ268/wea25324_disk1of1/0073525324/wea25324_pagefile Chapter 14 / RNA Processing I: Splicing genes are a bit different from those in mRNA genes. For example, tRNA introns are relatively small, ranging in size from 4 to about 50 bp long. Not all tRNA genes have introns; those that do have only one, and it is adjacent to the DNA bases corresponding to the anticodon of the tRNA. Genes in mitochondria and chloroplasts can also have introns. Indeed, these introns are some of the most interesting, as we will see. SUMMARY Most higher eukaryotic genes coding for mRNA and tRNA, and a few coding for rRNA, are interrupted by unrelated regions called introns. The other parts of the gene, surrounding the introns, are called exons; the exons contain the sequences that finally appear in the mature RNA product. Genes for mRNAs have been found with anywhere from zero to 362 introns. Transfer RNA genes have either zero or one. (a) 5′ A B C RNA Splicing Hybrid Consider the problem introns pose. They are present in genes but not in mature RNA. How is it that the information in introns does not find its way into the mature RNA products of the genes? The two main possibilities are: (1) The introns are never transcribed; the polymerase somehow jumps from one exon to the next and ignores the introns in between. (2) The introns are transcribed, yielding a primary transcript, an overlarge gene product that is cut down to size by removing the introns. As wasteful as it seems, the latter possibility is the correct one. The process of cutting introns out of immature RNAs and stitching together the exons to form the final product is called RNA splicing. The splicing process is outlined in Figure 14.2, although, as we will see later in the chapter, this picture is considerably oversimplified. 3′ (b) A B C Hexon coding (c) Figure 14.1 R-looping experiments reveal introns in adenovirus. (a) Electron micrograph of a cloned fragment of adenovirus DNA containing the 59-part of the late hexon gene, hybridized to mature hexon mRNA. The loops represent introns in the gene that cannot hybridize to mRNA. (b) Interpretation of the electron micrograph, showing the three intron loops (labeled A, B, and C), the hybrid (heavy red line), and the unhybridized region of DNA upstream of the gene (upper left). The fork at the lower right is due to the 39-end of the mRNA, which cannot hybridize because the 39-end of the gene is not included. Therefore, the mRNA forms intramolecular double-stranded structures that have a forked appearance. (c) Linear arrangements of the hexon gene, showing the three short leader exons, the two introns separating them (A and B), and the long intron (C) separating the leaders from the coding exon of the hexon gene. All exons are represented by red boxes. (Source: (a) Berget, M., Moore, and Sharp, Spliced segments at the 39 terminus of adenovirus 2 late mRNA. Proceedings of the National Academy of Sciences USA 74:3173, 1977.) six introns spaced among seven exons. It is also interesting that most of the introns were considerably longer than most of the exons. This preponderance of introns is typical of higher eukaryotic genes. Introns in lower eukaryotes such as yeast tend to be shorter and much rarer. So far we have discussed introns only in mRNA genes, but some tRNA genes also have introns, and even rRNA genes sometimes do. The introns in both these latter types of Start of transcription Intron 1 Gene: Exon 1 Intron 2 Exon 2 Exon 3 Transcription Intron 1 Primary transcript: Exon 1 Intron 2 Exon 2 Exon 3 Splicing Mature transcript: Exon 1 Exon 2 Exon 3 Figure 14.2 Outline of splicing. The introns in a gene are transcribed along with the exons (colored boxes) in the primary transcript. Then they are removed as the exons are spliced together. wea25324_ch14_394-435.indd Page 397 13/12/10 7:22 AM user-f467 /Volume/204/MHDQ268/wea25324_disk1of1/0073525324/wea25324_pagefile 14.1 Genes in Pieces How do we know splicing takes place? Actually, at the time introns were discovered, circumstantial evidence to support splicing already existed. A class of large nuclear RNAs called heterogeneous nuclear RNA (hnRNA), widely believed to be precursors to mRNA, had been found (Chapter 10). These hnRNAs are the right size (larger than mRNAs) and have the right location (nuclear) to be unspliced mRNA precursors. Furthermore, hnRNA turns over very rapidly, which means it is made and converted to smaller RNAs quickly. This too suggested that these RNAs are merely intermediates in the formation of more stable RNAs. However, no direct evidence existed to show that hnRNA could be spliced to yield mRNA. The mouse b-globin mRNA and its precursor provided an ideal place to look for such evidence. The mouse globin mRNA precursor is a member of the hnRNA population. It is found only in the nucleus, turns over very rapidly, and is about twice as large (1500 bases) as mature globin mRNA (750 bases). Also, mouse immature red blood cells make so much globin (about 90% of their protein) that a- and b-globin mRNAs are abundant and can be purified relatively easily; even their precursors exist in appreciable quantities. This abundance made experiments feasible. Furthermore, the b-globin precursor is the right size to contain both exons and introns. Charles Weissmann and Philip Leder and their coworkers used R-looping to test the hypothesis that the precursor still contained the introns. The experimental plan was to hybridize mature globin mRNA, or its precursor, to the cloned globin gene, then observe the resulting R loops (Figure 14.3). We know what the results with the mature mRNA should be. Because this RNA has no intron sequences, the introns in the gene will 397 loop out. On the other hand, if the precursor RNA still has all the intron sequences, no such loops will form. That is what happened. You may have a little difficulty recognizing the structures in Figure 14.3 because this R-looping was done with double-, instead of single-stranded, DNA. Thus, the RNA hybridized to one of the DNA strands, displacing the other. The precursor RNA gave a smooth, uninterrupted R-loop; the mature mRNA gave an R-loop interrupted by an obvious loop of double stranded DNA, which represents the large intron. The small intron was not visible in this experiment. Notice that the term intron can be used for intervening sequences in either DNA or RNA. SUMMARY Messenger RNA synthesis in eukaryotes occurs in stages. The first stage is synthesis of the primary transcription product, an mRNA precursor that still contains introns copied from the gene, if any were present. This precursor is part of a pool of large nuclear RNAs called hnRNAs. The second stage is mRNA maturation. Part of the maturation of an mRNA precursor is the removal of its introns in a process called splicing. This yields the maturesized mRNA. Splicing Signals Consider the importance of accurate splicing. If too little RNA is removed from an mRNA precursor, the mature RNA will be interrupted by nonsense regions. If too much is removed, important sequences may be left out. Intron (a) Figure 14.3 Introns are transcribed. (a) R-looping experiment in which the mouse globin mRNA precursor was hybridized to a cloned mouse b-globin gene. A smooth hybrid formed, demonstrating that the introns are represented in the mRNA precursor. (b) Similar R-looping experiment in which mature mouse globin mRNA was used. Here, the large intron in the gene looped out, showing that this intron (b) was no longer present in the mRNA. The small intron was not detected in this experiment. In the interpretive drawings, the dotted black lines represent RNA and the solid red lines represent DNA. (Source: Tilghman, S., P. J. Curtis, D. C. Tiemeier, P. Leder, and C. Weissmann, The intervening sequence of a mouse b-globin gene is transcribed within the 15S b-globin mRNA precursor. Proceedings of the National Academy of Sciences USA 75:1312, 1978.) wea25324_ch14_394-435.indd Page 398 398 13/12/10 7:22 AM user-f467 /Volume/204/MHDQ268/wea25324_disk1of1/0073525324/wea25324_pagefile Chapter 14 / RNA Processing I: Splicing Given the importance of accurate splicing, signals must occur in the mRNA precursor that tell the splicing machinery exactly where to “cut and paste.” What are these signals? One way to find out is to look at the base sequences of a number of different genes, locate the intron boundaries, and see what sequences are common to all of them. In principle, these common sequences could be part of the signal for splicing. The most striking observation, first made by Chambon, is that almost all introns in nuclear mRNA precursors begin and end the same way: exon/GU–intron–AG/exon In other words, the first two bases in the intron of a transcript are GU and the last two are AG. This kind of conservation does not occur by accident; surely the GU–AG motif is part of the signal that says, “Splice here.” However, a typical intron will contain several GU’s and AG’s within it. Why are these not used as splice sites? The answer is that splicing signals are more complex than that. They contain sequences at the exon-intron boundaries that extend beyond simply GU and AG, and they include a “branchpoint” sequence near the 3’-end of the intron, which we will discuss later in this chapter. Sequencing of many genes has revealed the following mammalian consensus sequences: 59-AG/GUAAGU–intron–YNCURAC–YnNYAG/G-39 where the slashes denote the exon–intron borders, Y is either pyrimidine (U or C), Yn denotes a string of about nine pyrimidines, R is either purine (A or G), A is a special A in the “branchpoint” sequence within the intron, and N is any base. The consensus sequences in yeast mRNA precursors are also well studied, and a little different from those in mammals: 59-/GUAUGU–intron–UACUAAC–YAG/-39 Finding consensus sequences is one thing; showing that they are really important is another. Several research groups have found ample evidence supporting the importance of these splice junction consensus sequences. Their experiments were of two basic types. In one, they mutated the consensus sequences at the splice junctions in cloned genes, then checked whether proper splicing still occurred. In the other, they collected defective genes from human patients with presumed splicing problems and examined the genes for mutations near the splice junctions. Both approaches gave the same answer: Disturbing the consensus sequences usually inhibits normal splicing. Although the splice signals at the borders of an exon are necessary, they are not sufficient to define an exon. We will learn later in this chapter that the “branchpoint” sequence near the end of an intron is also required for the next exon to be recognized as such. Even all three consensus sequences are not always sufficient. That is because many introns in higher eukaryotes are enormous, ranging up to over 100 kb, and they can contain many exon-size sequences that are bounded by normal-looking splicing signals, including branchpoint sequences. Yet somehow, these “pseudoexons” rarely if ever get spliced into mature mRNAs. What sets the real exons apart from these pseudoexons? Part of the answer is that real exons tend to contain sequences known as exonic splicing enhancers (ESEs), which stimulate splicing, and pseudoexons tend to contain exonic splicing silencers (ESSs), which inhibit splicing. We will discuss these phenomena more fully later in this chapter. SUMMARY The splicing signals in nuclear mRNA precursors are remarkably uniform. The first two bases of the intron are almost always GU, and the last two are almost always AG. The 59- and 39-splice sites have consensus sequences that extend beyond the GU and AG motifs, and there is also a branchpoint consensus sequence. All three consensus sequences are important to proper splicing; when they are mutated, abnormal splicing can occur. Effect of Splicing on Gene Expression It seems obvious that splicing introduces a degree of inefficiency into the gene expression process. Introns must be transcribed, only to be immediately removed from premRNAs and degraded. Moreover, inaccurate splicing can disrupt an mRNA and lead to mistranslation. So it is fair to ask why evolution has not eliminated splicing from eukaryotes. Indeed, introns are relatively rare and small in simple eukaryotes like yeasts, but they are abundant and long— typically much longer than exons—in higher eukaryotes, including humans. One reason that splicing may have evolved to become so prominent in higher eukaryotes is that it actually facilitates gene expression. In 2003, Shihua Lu and Bryan Cullen surveyed 10 human genes with and without introns in their 59-untranslated regions and found that the introns improved gene expression in every case—from a relatively modest two-fold to about 35-fold in the case of the b-globin gene, which actually depends on introns for efficient expression. The advantage of introns comes from at least two sources: They stimulate efficient mRNA 39-end formation, and they make translation more efficient. It seems paradoxical that the presence or absence of introns could affect translation, as translation occurs in the cytoplasm, long after the introns have been removed. But we need to consider the fact that mRNAs do not exist as naked RNAs. Rather, they are complexed with a wide variety of proteins in the nucleus, and many of these proteins travel with the mRNA as a messenger ribonucleoprotein (mRNP)