53 141 Genes in Pieces

by taratuta

on 19 января 2017

Category: Documents

>> Downloads: 20

146

views

Report

Comments

Description

Download 53 141 Genes in Pieces

Transcript

53 141 Genes in Pieces

wea25324_ch14_394-435.indd Page 395
13/12/10
7:22 AM user-f467
/Volume/204/MHDQ268/wea25324_disk1of1/0073525324/wea25324_pagefile
14.1 Genes in Pieces
its way into the cytoplasm before translation can begin.
This allows an interval between transcription and translation traditionally known as the posttranscriptional phase.
In this chapter we will see that most eukaryotic genes, in
contrast to typical bacterial genes, are interrupted by noncoding DNA. RNA polymerase cannot distinguish the coding region of the gene from the noncoding regions, so it
transcribes everything. Thus, the cell must remove the noncoding RNA from the original transcript, in a process called
splicing. Eukaryotes also tack special structures onto the
59- and 39-ends of their mRNAs. The 59-structure is called a
cap, and the 39-structure is a string of AMPs called poly(A).
All three of these events occur in the nucleus before the
mRNA emigrates to the cytoplasm, and it is becoming increasingly clear that all three occur before transcription is
over. Thus, it might be more correct to refer to them as
cotranscripional, rather than posttranscriptional, events. To
avoid any confusion, we will refer to them as mRNAprocessing events. It appears that all three of these events
are coordinated. We will return to this theme at the end of
Chapter 15, after we have studied splicing (this chapter) and
capping and polyadenylation (Chapter 15) in detail.
14.1 Genes in Pieces
If we expressed the sequence of the human b-globin gene as
a sentence, here is how it might look:
This is bhgty the human b-globin qwtzptlrbn gene.
Two regions (italicized) within the gene obviously make
no sense: they contain sequences totally unrelated to the
globin coding sequences surrounding them. These are sometimes called intervening sequences, or IVSs, but they usually
go by the name Walter Gilbert gave them: introns. Similarly,
the parts of the gene that make sense are sometimes called
coding regions, or expressed regions, but Gilbert’s name for
them is more popular: exons. Some genes, especially in lower
eukaryotes, have no introns at all; others have an abundance.
The current record (362 introns) is held by the human titin
gene, which codes for a huge muscle protein.
Evidence for Split Genes
Consider the major late locus of adenovirus—the first place
introns were found, by Phillip Sharp and his colleagues in
1977. The adenovirus major late locus contains several
genes that are transcribed late in infection. These genes
encode structural proteins, such as hexon, one of the viral
395
coat proteins. Several lines of evidence converged at that
time to show that the genes of the adenovirus major late
locus are interrupted, but perhaps the easiest to understand
comes from studies using a technique called R-looping.
In R-looping experiments, RNA is hybridized to its DNA
template. In other words, the DNA template strands are
separated to allow a double-stranded hybrid to form between one of these strands and the RNA product. Such a
hybrid double-stranded polynucleotide is actually a bit more
stable than a double-stranded DNA under the conditions of
the experiment. After the hybrid forms, it is examined by
electron microscopy. These experiments can be done in two
basic ways: (1) using DNA whose two strands are separated
only enough to let the RNA hybridize or (2) completely separating the two DNA strands before hybridization. Sharp
and colleagues used the latter method, hybridizing singlestranded adenovirus DNA to mature mRNA for one of the
viral coat proteins: the hexon protein. Figure 14.1 shows
the results. (Do not be confused by the similarity between
the terms exon and hexon. They are not related.)
If the hexon gene had no introns, a smooth, linear hybrid would occur where the mRNA lined up with its DNA
template. But what if introns do occur in this gene? Clearly,
no introns are present in the mature mRNA, or they would
code for nonsense that would appear in the protein product. Therefore, introns are sequences that occur in the DNA
but are missing from mRNA. That means the hexon DNA
and hexon mRNA will not be able to form a smooth hybrid. Instead, the intron regions of the DNA will not find
counterparts in the mRNA and so will form unhybridized
loops. That is exactly what happened in the experiment
shown in Figure 14.1. The loops there are made of DNA,
but we still call them R loops because hybridization with
RNA caused them to form.
The electron micrograph shows an RNA–DNA hybrid
interrupted by three single-stranded DNA loops (labeled A,
B, and C). These loops represent the introns in the hexon
gene. Each loop is preceded by a short hybrid region, and
the last loop is followed by a long hybrid region. Thus, the
gene has four exons: three short ones near the beginning,
followed by one large one. The three short exons are transcribed into a leader region that appears at the 59-end of
the hexon mRNA before the coding region; the long exon
contains the coding region of the gene. In fact, the major
late genes have different coding regions, but all share the
same leader region encoded in the same three short exons.
When we discover something as surprising as introns in
a virus, we wonder whether it is just a bizarre viral phenomenon that has no relationship to eukaryotic cellular processes. Thus, it was important to determine whether
eukaryotic cellular genes also have introns. One of the first
such demonstrations was an R-looping experiment done by
Pierre Chambon and colleagues, using the chicken ovalbumin gene. They observed six DNA loops of various sizes
that could not hybridize to the mRNA, so this gene contains
wea25324_ch14_394-435.indd Page 396
396
13/12/10
7:22 AM user-f467
/Volume/204/MHDQ268/wea25324_disk1of1/0073525324/wea25324_pagefile
Chapter 14 / RNA Processing I: Splicing
genes are a bit different from those in mRNA genes. For
example, tRNA introns are relatively small, ranging in size
from 4 to about 50 bp long. Not all tRNA genes have introns;
those that do have only one, and it is adjacent to the DNA
bases corresponding to the anticodon of the tRNA. Genes in
mitochondria and chloroplasts can also have introns. Indeed,
these introns are some of the most interesting, as we will see.
SUMMARY Most higher eukaryotic genes coding for
mRNA and tRNA, and a few coding for rRNA, are interrupted by unrelated regions called introns. The other
parts of the gene, surrounding the introns, are called
exons; the exons contain the sequences that finally
appear in the mature RNA product. Genes for mRNAs
have been found with anywhere from zero to 362 introns. Transfer RNA genes have either zero or one.
(a)
5′
A
B
C
RNA Splicing
Hybrid
Consider the problem introns pose. They are present in genes
but not in mature RNA. How is it that the information in
introns does not find its way into the mature RNA products
of the genes? The two main possibilities are: (1) The introns
are never transcribed; the polymerase somehow jumps from
one exon to the next and ignores the introns in between.
(2) The introns are transcribed, yielding a primary transcript,
an overlarge gene product that is cut down to size by removing the introns. As wasteful as it seems, the latter possibility
is the correct one. The process of cutting introns out of immature RNAs and stitching together the exons to form the
final product is called RNA splicing. The splicing process is
outlined in Figure 14.2, although, as we will see later in the
chapter, this picture is considerably oversimplified.
3′
(b)
A
B
C
Hexon coding
(c)
Figure 14.1 R-looping experiments reveal introns in adenovirus.
(a) Electron micrograph of a cloned fragment of adenovirus DNA
containing the 59-part of the late hexon gene, hybridized to mature
hexon mRNA. The loops represent introns in the gene that cannot
hybridize to mRNA. (b) Interpretation of the electron micrograph,
showing the three intron loops (labeled A, B, and C), the hybrid (heavy
red line), and the unhybridized region of DNA upstream of the gene
(upper left). The fork at the lower right is due to the 39-end of the
mRNA, which cannot hybridize because the 39-end of the gene is not
included. Therefore, the mRNA forms intramolecular double-stranded
structures that have a forked appearance. (c) Linear arrangements of
the hexon gene, showing the three short leader exons, the two introns
separating them (A and B), and the long intron (C) separating the
leaders from the coding exon of the hexon gene. All exons are
represented by red boxes. (Source: (a) Berget, M., Moore, and Sharp, Spliced
segments at the 39 terminus of adenovirus 2 late mRNA. Proceedings of the
National Academy of Sciences USA 74:3173, 1977.)
six introns spaced among seven exons. It is also interesting
that most of the introns were considerably longer than most
of the exons. This preponderance of introns is typical of
higher eukaryotic genes. Introns in lower eukaryotes such
as yeast tend to be shorter and much rarer.
So far we have discussed introns only in mRNA genes,
but some tRNA genes also have introns, and even rRNA
genes sometimes do. The introns in both these latter types of
Start of transcription
Intron 1
Gene:
Exon 1
Intron 2
Exon 2
Exon 3
Transcription
Intron 1
Primary transcript:
Exon 1
Intron 2
Exon 2
Exon 3
Splicing
Mature transcript:
Exon 1
Exon 2 Exon 3
Figure 14.2 Outline of splicing. The introns in a gene are transcribed
along with the exons (colored boxes) in the primary transcript. Then
they are removed as the exons are spliced together.
wea25324_ch14_394-435.indd Page 397
13/12/10
7:22 AM user-f467
/Volume/204/MHDQ268/wea25324_disk1of1/0073525324/wea25324_pagefile
14.1 Genes in Pieces
How do we know splicing takes place? Actually, at the
time introns were discovered, circumstantial evidence to
support splicing already existed. A class of large nuclear
RNAs called heterogeneous nuclear RNA (hnRNA), widely
believed to be precursors to mRNA, had been found (Chapter 10). These hnRNAs are the right size (larger than
mRNAs) and have the right location (nuclear) to be unspliced mRNA precursors. Furthermore, hnRNA turns
over very rapidly, which means it is made and converted to
smaller RNAs quickly. This too suggested that these RNAs
are merely intermediates in the formation of more stable
RNAs. However, no direct evidence existed to show that
hnRNA could be spliced to yield mRNA.
The mouse b-globin mRNA and its precursor provided
an ideal place to look for such evidence. The mouse globin
mRNA precursor is a member of the hnRNA population. It
is found only in the nucleus, turns over very rapidly, and is
about twice as large (1500 bases) as mature globin mRNA
(750 bases). Also, mouse immature red blood cells make
so much globin (about 90% of their protein) that a- and
b-globin mRNAs are abundant and can be purified relatively easily; even their precursors exist in appreciable
quantities. This abundance made experiments feasible. Furthermore, the b-globin precursor is the right size to contain
both exons and introns. Charles Weissmann and Philip
Leder and their coworkers used R-looping to test the hypothesis that the precursor still contained the introns.
The experimental plan was to hybridize mature globin
mRNA, or its precursor, to the cloned globin gene, then
observe the resulting R loops (Figure 14.3). We know what
the results with the mature mRNA should be. Because this
RNA has no intron sequences, the introns in the gene will
397
loop out. On the other hand, if the precursor RNA still has
all the intron sequences, no such loops will form. That is
what happened. You may have a little difficulty recognizing
the structures in Figure 14.3 because this R-looping was
done with double-, instead of single-stranded, DNA. Thus,
the RNA hybridized to one of the DNA strands, displacing
the other. The precursor RNA gave a smooth, uninterrupted R-loop; the mature mRNA gave an R-loop interrupted by an obvious loop of double stranded DNA, which
represents the large intron. The small intron was not visible
in this experiment. Notice that the term intron can be used
for intervening sequences in either DNA or RNA.
SUMMARY Messenger RNA synthesis in eukaryotes
occurs in stages. The first stage is synthesis of the
primary transcription product, an mRNA precursor
that still contains introns copied from the gene, if
any were present. This precursor is part of a pool of
large nuclear RNAs called hnRNAs. The second
stage is mRNA maturation. Part of the maturation
of an mRNA precursor is the removal of its introns
in a process called splicing. This yields the maturesized mRNA.
Splicing Signals
Consider the importance of accurate splicing. If too little
RNA is removed from an mRNA precursor, the mature
RNA will be interrupted by nonsense regions. If too much
is removed, important sequences may be left out.
Intron
(a)
Figure 14.3 Introns are transcribed. (a) R-looping experiment in
which the mouse globin mRNA precursor was hybridized to a cloned
mouse b-globin gene. A smooth hybrid formed, demonstrating that
the introns are represented in the mRNA precursor. (b) Similar
R-looping experiment in which mature mouse globin mRNA was used.
Here, the large intron in the gene looped out, showing that this intron
(b)
was no longer present in the mRNA. The small intron was not detected
in this experiment. In the interpretive drawings, the dotted black lines
represent RNA and the solid red lines represent DNA. (Source: Tilghman,
S., P. J. Curtis, D. C. Tiemeier, P. Leder, and C. Weissmann, The intervening
sequence of a mouse b-globin gene is transcribed within the 15S b-globin mRNA
precursor. Proceedings of the National Academy of Sciences USA 75:1312, 1978.)
wea25324_ch14_394-435.indd Page 398
398
13/12/10
7:22 AM user-f467
/Volume/204/MHDQ268/wea25324_disk1of1/0073525324/wea25324_pagefile
Chapter 14 / RNA Processing I: Splicing
Given the importance of accurate splicing, signals must
occur in the mRNA precursor that tell the splicing machinery exactly where to “cut and paste.” What are these signals? One way to find out is to look at the base sequences
of a number of different genes, locate the intron boundaries, and see what sequences are common to all of them. In
principle, these common sequences could be part of the
signal for splicing. The most striking observation, first
made by Chambon, is that almost all introns in nuclear
mRNA precursors begin and end the same way:
exon/GU–intron–AG/exon
In other words, the first two bases in the intron of a transcript are GU and the last two are AG. This kind of conservation does not occur by accident; surely the GU–AG
motif is part of the signal that says, “Splice here.” However,
a typical intron will contain several GU’s and AG’s within
it. Why are these not used as splice sites? The answer is
that splicing signals are more complex than that. They
contain sequences at the exon-intron boundaries that extend beyond simply GU and AG, and they include a
“branchpoint” sequence near the 3’-end of the intron,
which we will discuss later in this chapter. Sequencing of
many genes has revealed the following mammalian consensus sequences:
59-AG/GUAAGU–intron–YNCURAC–YnNYAG/G-39
where the slashes denote the exon–intron borders, Y is either
pyrimidine (U or C), Yn denotes a string of about nine pyrimidines, R is either purine (A or G), A is a special A in the
“branchpoint” sequence within the intron, and N is any base.
The consensus sequences in yeast mRNA precursors are also
well studied, and a little different from those in mammals:
59-/GUAUGU–intron–UACUAAC–YAG/-39
Finding consensus sequences is one thing; showing that
they are really important is another. Several research groups
have found ample evidence supporting the importance of
these splice junction consensus sequences. Their experiments were of two basic types. In one, they mutated the
consensus sequences at the splice junctions in cloned genes,
then checked whether proper splicing still occurred. In
the other, they collected defective genes from human
patients with presumed splicing problems and examined
the genes for mutations near the splice junctions. Both
approaches gave the same answer: Disturbing the consensus sequences usually inhibits normal splicing.
Although the splice signals at the borders of an exon are
necessary, they are not sufficient to define an exon. We will
learn later in this chapter that the “branchpoint” sequence
near the end of an intron is also required for the next exon to
be recognized as such. Even all three consensus sequences are
not always sufficient. That is because many introns in higher
eukaryotes are enormous, ranging up to over 100 kb, and
they can contain many exon-size sequences that are bounded
by normal-looking splicing signals, including branchpoint
sequences. Yet somehow, these “pseudoexons” rarely if ever
get spliced into mature mRNAs. What sets the real exons
apart from these pseudoexons? Part of the answer is that
real exons tend to contain sequences known as exonic splicing enhancers (ESEs), which stimulate splicing, and pseudoexons tend to contain exonic splicing silencers (ESSs), which
inhibit splicing. We will discuss these phenomena more fully
later in this chapter.
SUMMARY The splicing signals in nuclear mRNA
precursors are remarkably uniform. The first two
bases of the intron are almost always GU, and the
last two are almost always AG. The 59- and 39-splice
sites have consensus sequences that extend beyond
the GU and AG motifs, and there is also a branchpoint consensus sequence. All three consensus sequences are important to proper splicing; when they
are mutated, abnormal splicing can occur.
Effect of Splicing on Gene Expression
It seems obvious that splicing introduces a degree of inefficiency into the gene expression process. Introns must be
transcribed, only to be immediately removed from premRNAs and degraded. Moreover, inaccurate splicing can
disrupt an mRNA and lead to mistranslation. So it is fair to
ask why evolution has not eliminated splicing from eukaryotes. Indeed, introns are relatively rare and small in simple
eukaryotes like yeasts, but they are abundant and long—
typically much longer than exons—in higher eukaryotes,
including humans.
One reason that splicing may have evolved to become so
prominent in higher eukaryotes is that it actually facilitates
gene expression. In 2003, Shihua Lu and Bryan Cullen surveyed 10 human genes with and without introns in their
59-untranslated regions and found that the introns improved
gene expression in every case—from a relatively modest
two-fold to about 35-fold in the case of the b-globin gene,
which actually depends on introns for efficient expression.
The advantage of introns comes from at least two sources:
They stimulate efficient mRNA 39-end formation, and they
make translation more efficient.
It seems paradoxical that the presence or absence of introns could affect translation, as translation occurs in the
cytoplasm, long after the introns have been removed. But we
need to consider the fact that mRNAs do not exist as naked
RNAs. Rather, they are complexed with a wide variety of
proteins in the nucleus, and many of these proteins travel
with the mRNA as a messenger ribonucleoprotein (mRNP)