The Greater Complexity of Eukaryotic Genomes Requires Elaborate Mechanisms for Gene Regulation

by taratuta

on 19-01-2017

Category: Documents

>> Downloads: 18

131

views

Report

Comments

Description

Download The Greater Complexity of Eukaryotic Genomes Requires Elaborate Mechanisms for Gene Regulation

Transcript

The Greater Complexity of Eukaryotic Genomes Requires Elaborate Mechanisms for Gene Regulation

III. Synthesizing the Molecules of Life
31. The Control of Gene Expression
31.1. Prokaryotic DNA-Binding Proteins Bind Specifically to Regulatory Sites in Operons
Figure 31.12. Helix-Turn-Helix Motif. These structures show three sequence-specific DNA-binding proteins that
interact with DNA through a helix-turn-helix motif (highlighted in yellow). In each case, the helix-turn-helix units
within a protein dimer are approximately 34 Å apart, corresponding to one full turn of DNA.
III. Synthesizing the Molecules of Life
31. The Control of Gene Expression
31.1. Prokaryotic DNA-Binding Proteins Bind Specifically to Regulatory Sites in Operons
Figure 31.13. DNA Recognition Through β Strands. The structure of the methionine repressor bound to DNA reveals
that residues in β strands, rather than α helices, participate in the crucial interactions between the protein and
DNA.
III. Synthesizing the Molecules of Life
31. The Control of Gene Expression
31.2. The Greater Complexity of Eukaryotic Genomes Requires Elaborate
Mechanisms for Gene Regulation
Gene regulation is significantly more complex in eukaryotes than in prokaryotes for a number of reasons. First, the
genome being regulated is significantly larger. The E. coli genome consists of a single, circular chromosome containing
4.6 Mb. This genome encodes approximately 2000 proteins. In comparison, one of the simplest eukaryotes,
Saccharomyces cerevisiae (baker's yeast), contains 16 chromosomes ranging in size from 0.2 to 2.2 Mb (Figure 31.14).
The yeast genome totals 17 Mb and encodes approximately 6000 proteins. The genome within a human cell contains 23
pairs of chromosomes ranging in size from 50 to 250 Mb. Approximately 40,000 genes are present within the 3000 Mb
of human DNA. It would be very difficult for a DNA-binding protein to recognize a unique site in this vast array of
DNA sequences. Consequently, more-elaborate mechanisms are required to achieve specificity.
Megabase (Mb)
A length of DNA consisting of 106 base pairs (if double stranded) or
106 bases (if single stranded).
Another source of complexity in eukaryotic gene regulation is the many different cell types present in most eukaryotes.
Liver and pancreatic cells, for example, differ dramatically in the genes that are highly expressed (see Table 31.1).
Moreover, eukaryotic genes are not generally organized into operons. Instead, genes that encode proteins for steps within
a given pathway are often spread widely across the genome. Finally, transcription and translation are uncoupled in
eukaryotes, eliminating some potential gene-regulatory mechanisms.
31.2.1. Nucleosomes Are Complexes of DNA and Histones
The DNA in eukaryotic chromosomes is not bare. Rather eukaryotic DNA is tightly bound to a group of small basic
proteins called histones. In fact, histones constitute half the mass of a eukaryotic chromosome. The entire complex of a
cell's DNA and associated protein is called chromatin. Five major histones are present in chromatin: four histones, called
H2A, H2B, H3, and H4, associate with one another; the other histone is called H1. Histones have strikingly basic
properties because a quarter of the residues in each histone is either arginine or lysine.
In 1974, Roger Kornberg proposed that chromatin is made up of repeating units, each containing 200 bp of DNA and
two copies each of H2A, H2B, H3, and H4, called the histone octomer. These repeating units are known as nucleosomes.
Strong support for this model comes from the results of a variety of experiments, including observations of appropriately
prepared samples of chromatin viewed by electron microscopy (Figure 31.15). Chromatin viewed with the electron
microscope has the appearance of beads on a string; each bead has a diameter of approximately 100 Å. Partial digestion
of chromatin with DNAse yields the isolated beads. These particles consist of fragments of DNA 200 bp in length
bound to the eight histones. More extensive digestion yields a reduced DNA fragment of 145 bp bound to the histone
octamer. This smaller complex of the histone octamer and the 145-bp DNA fragment is the nucleosome core particle.
The DNA connecting core particles in undigested chromatin is called linker DNA. Histone H1 binds, in part, to the linker
DNA.
31.2.2. Eukaryotic DNA Is Wrapped Around Histones to Form Nucleosomes
The overall structure of the nucleosome was revealed through electron microscopic and x-ray crystallographic studies
pioneered by Aaron Klug and his colleagues. More recently, the three-dimensional structure of a reconstituted
nucleosome core (Figure 31.16) was determined to relatively high resolution by x-ray diffraction methods. As was
shown by Evangelos Moudrianakis, the four types of histone that make up the protein core are homologous and similar
in structure (Figure 31.17). The eight histones in the core are arranged into a (H3)2(H4)2 tetramer and a pair of H2AH2B dimers. The tetramer and dimers come together to form a left-handed superhelical ramp around which the DNA
wraps. In addition, each histone has an amino-terminal tail that extends out from the core structure. These tails are
flexible and contain a number of lysine and arginine residues. As we shall see, covalent modifications of these tails play
an essential role in modulating the affinity of the histones for DNA and other properties.
The DNA forms a left-handed superhelix as it wraps around the outside of the histone octamer. The protein core forms
contacts with the inner surface of the superhelix at many points, particularly along the phosphodiester backbone and the
minor groove of the DNA. Nucleosomes will form on almost all DNA sites, although some sequences are preferred
because the dinucleotide steps are properly spaced to favor bending around the histone core. Histone H1, which has a
different structure from the other histones, seals off the nucleosome at the location at which the linker DNA enters and
leaves the nucleosome. The amino acid sequences of histones, including their amino- terminal tails, are remarkably
conserved from yeast through human beings.
The winding of DNA around the nucleosome core contributes to DNA's packing by decreasing its linear extent. An
extended 200-bp stretch of DNA would have a length of about 680 Å. Wrapping this DNA around the histone octamer
reduces the length to approximately 100 Å along the long dimension of the nucleosome. Thus the DNA is compacted by
a factor of seven. However, human chromosomes in metaphase, which are highly condensed, are compacted by a factor
of 104. Clearly, the nucleosome is just the first step in DNA compaction. What is the next step? The nucleosomes
themselves are arranged in a helical array approximately 360 Å across, forming a series of stacked layers approximately
110 Å apart (Figure 31.18). The folding of these fibers of nucleosomes into loops further compacts DNA.
The writhing of DNA around the histone core in a left-handed helical manner also stores negative supercoils; if the DNA
in a nucleosome is straightened out, it will be underwound (Section 23.3.2). This underwinding is exactly what is needed
to separate the two DNA strands during replication and transcription (Sections 27.5 and 28.1.5).
31.2.3. The Control of Gene Expression Requires Chromatin Remodeling
Does chromatin structure play a role in the control of gene expression? Early observations suggested that it does indeed.
The treatment of cell nuclei with the nonspecific DNA-cleaving enzyme DNAse I revealed that regions adjacent to genes
that are being actively transcribed are more sensitive to cleavage than are other sites in the genome, suggesting that the
DNA in these regions is less compacted than it is elsewhere in the genome and more accessible to proteins. In addition,
some sites, usually within 1 kb of the start site of an active gene, are exquisitely sensitive to DNAse I and other
nucleases. These hypersensitive sites correspond to regions that have few nucleosomes or have nucleosomes in an altered
conformational state. Hyper-sensitive sites are cell-type specific and developmentally regulated. For example, globin
genes in the precursors of erythroid cells from 20-hour-old chicken embryos are insensitive to DNAse I. However, when
hemoglobin synthesis begins at 35 hours, regions adjacent to these genes become highly susceptible to digestion. In
tissues such as the brain that produce no hemoglobin, the globin genes remain resistant to DNAse I throughout
development and into adulthood. The results of these studies suggest that a prerequisite for gene expression is a relaxing
of the chromatin structure.
Recent experiments even more clearly revealed the role of chromatin structure in regulating access to DNA binding sites.
Genes required for galactose utilization in yeast are activated by a DNA-binding protein called GAL4, which recognizes
DNA binding sites with two 5 -CGG-3 sequences separated by 11 base pairs (Figure 31.19). Approximately 4000
potential GAL4 binding sites of the form 5 -CGG(N)11CCG-3 are present in the yeast genome, but only 10 of them
regulate genes necessary for galactose metabolism. What fraction of the potential binding sites are actually bound by
GAL4? This question is addressed through the use of a technique called chromatin immunoprecipitation (ChIP). GAL4
is first cross-linked to the DNA to which it is bound in chromatin. The DNA is then fragmented into small pieces, and
antibodies to GAL4 are used to isolate the chromatin fragments containing GAL4. The cross-linking is reversed, and the
DNA is isolated and characterized. The results of these studies reveal that only approximately 10 of the 4000 potential
GAL4 sites are occupied by GAL4 when the cells are growing on galactose; more than 99% of the sites appear to be
blocked. Thus, whereas in prokaryotes all sites appear to be equally accessible, chromatin structure shields a large
number of the potential binding sites in eukaryotic cells. GAL4 is thereby prevented from binding to sites that are
unimportant in galactose metabolism.
These lines of evidence and others reveal that chromatin structure is altered in active genes compared with inactive ones.
How is chromatin structure modified? As we shall see in Section 31.3.4, specific covalent modifications of histone
proteins are crucial. In addition, the binding of specific proteins to DNA sequences called enhancers at specific sites in
the genome plays a role.
31.2.4. Enhancers Can Stimulate Transcription by Perturbing Chromatin Structure
We can now understand the action of enhancers, already introduced in Section 28.2.6. Recall that these DNA sequences,
although they have no promoter activity of their own, greatly increase the activities of many promoters in eukaryotes,
even when the enhancers are located at a distance of several thousand base pairs from the gene being expressed.
Enhancers function by serving as binding sites for specific regulatory proteins. (Figure 31.20). An enhancer is effective
only in the specific cell types in which appropriate regulatory proteins are expressed. In many cases, these DNA-binding
proteins influence transcription initiation by perturbing the local chromatin structure to expose a gene or its regulatory
sites rather than by direct interactions with RNA polymerase. This mechanism accounts for the ability of enhancers to act
at a distance.
The properties of enhancers are illustrated by studies of the enhancer controlling the muscle isoform of creatine kinase
(Section 14.1.5). The results of mutagenesis and other studies revealed the presence of an enhancer located between
1350 and 1050 base pairs upstream of the start site of the gene for this enzyme. Experimentally inserting this enhancer
near a gene not normally expressed in muscle cells is sufficient to cause the gene to be expressed at high levels in muscle
cells, but not other cells (Figure 31.21).
31.2.5. The Modification of DNA Can Alter Patterns of Gene Expression
The modification of DNA provides another mechanism, in addition to packaging with histones, for inhibiting
inappropriate gene expression in specific cell types. Approximately 70% of the 5 -CpG-3 sequences in mammalian
genomes are methylated at the C-5 position of cytosine by specific methyltransferases. However, the distribution of these
methylated cytosines varies, depending on the cell type. Consider, again, the globin genes. In cells that are actively
expressing hemoglobin, the region from approximately 1 kb upstream of the start site of the β-globin gene to
approximately 100 bp downstream of the start site contains fewer 5-methylcytosine residues than does the corresponding
region in cells that do not express these genes. The relative absence of 5-methylcytosines near the start site is referred to
as hypomethylation. The methyl group of 5-methylcytosine protrudes into the major groove where it could easily
interfere with the binding of proteins that stimulate transcription.
The distribution of CpG sequences in mammalian genomes is not uniform. The deamination of 5-methylcytosine
produces thymine; so CpG sequences are subject to mutation to TpG. Many CpG sequences have been converted
into TpG through this mechanism. However, sites near the 5 ends of genes have been maintained because of their role in
gene expression. Thus, most genes are found in CpG islands, regions of the genome that contain approximately four
times as many CpG sequences as does the remainder of the genome. Note that methylation is not a universal regulatory
device, even in multicellular eukaryotes. For example, Drosophila DNA is not methylated at all.
III. Synthesizing the Molecules of Life
31. The Control of Gene Expression
31.2. The Greater Complexity of Eukaryotic Genomes Requires Elaborate Mechanisms for Gene Regulation
Figure 31.14. Yeast Chromosomes. Pulsed-field electrophoresis allows the separation of 16 yeast chromosomes. [From
G. Chu, D. Wollrath, and R. W. Davis. Science 234(1986):1583.]
III. Synthesizing the Molecules of Life
31. The Control of Gene Expression
31.2. The Greater Complexity of Eukaryotic Genomes Requires Elaborate Mechanisms for Gene Regulation
Figure 31.15. Chromatin Structure. An electron micrograph of chromatin showing its "beads on a string" character.
[Courtesy of Dr. Ada Olins and Dr. Donald Olins.]
III. Synthesizing the Molecules of Life
31. The Control of Gene Expression
31.2. The Greater Complexity of Eukaryotic Genomes Requires Elaborate Mechanisms for Gene Regulation
Figure 31.16. Nucleosome Core Particle. The structure consists of a core of eight histone proteins surrounded by DNA.
(A) A view showing the DNA wrapping around the histone core. (B) A view related to that in part A by a 90degree rotation shows that the DNA forms a left-handed superhelix as it wraps around the core. (C) A schematic
view.
III. Synthesizing the Molecules of Life
31. The Control of Gene Expression
31.2. The Greater Complexity of Eukaryotic Genomes Requires Elaborate Mechanisms for Gene Regulation
Figure 31.17. Homologous Histones. Histones H2A, H2B, H3, and H4 each adopt a similar three-dimensional structure
as a consequence of common ancestry. Some parts of the tails present at the termini of the proteins are not shown.
III. Synthesizing the Molecules of Life
31. The Control of Gene Expression
31.2. The Greater Complexity of Eukaryotic Genomes Requires Elaborate Mechanisms for Gene Regulation
Figure 31.18. Higher-Order Chromatin Structure. A proposed model for chromatin arranged in a helical array
consisting of six nucleosomes per turn of helix. The DNA double helix (shown in red) is wound around each histone
octamer (shown in blue). [After J. T. Finch and A. Klug. Proc. Natl. Acad. Sci. USA 73(1976):1900.]
III. Synthesizing the Molecules of Life
31. The Control of Gene Expression
31.2. The Greater Complexity of Eukaryotic Genomes Requires Elaborate Mechanisms for Gene Regulation
Figure 31.19. Gal4 Binding Sites. The yeast transcription factor GAL4 binds to DNA sequences of the form 5 -CGG(N)
11CCG-3 . Two zinc-based domains are present in the DNA-binding region of this protein. These domains contact
the 5 -CGG-3 sequences, leaving the center of the site uncontacted.