The Greater Complexity of Eukaryotic Genomes Requires Elaborate Mechanisms for Gene Regulation
by taratuta
Comments
Transcript
The Greater Complexity of Eukaryotic Genomes Requires Elaborate Mechanisms for Gene Regulation
III. Synthesizing the Molecules of Life 31. The Control of Gene Expression 31.1. Prokaryotic DNA-Binding Proteins Bind Specifically to Regulatory Sites in Operons Figure 31.12. Helix-Turn-Helix Motif. These structures show three sequence-specific DNA-binding proteins that interact with DNA through a helix-turn-helix motif (highlighted in yellow). In each case, the helix-turn-helix units within a protein dimer are approximately 34 Å apart, corresponding to one full turn of DNA. III. Synthesizing the Molecules of Life 31. The Control of Gene Expression 31.1. Prokaryotic DNA-Binding Proteins Bind Specifically to Regulatory Sites in Operons Figure 31.13. DNA Recognition Through β Strands. The structure of the methionine repressor bound to DNA reveals that residues in β strands, rather than α helices, participate in the crucial interactions between the protein and DNA. III. Synthesizing the Molecules of Life 31. The Control of Gene Expression 31.2. The Greater Complexity of Eukaryotic Genomes Requires Elaborate Mechanisms for Gene Regulation Gene regulation is significantly more complex in eukaryotes than in prokaryotes for a number of reasons. First, the genome being regulated is significantly larger. The E. coli genome consists of a single, circular chromosome containing 4.6 Mb. This genome encodes approximately 2000 proteins. In comparison, one of the simplest eukaryotes, Saccharomyces cerevisiae (baker's yeast), contains 16 chromosomes ranging in size from 0.2 to 2.2 Mb (Figure 31.14). The yeast genome totals 17 Mb and encodes approximately 6000 proteins. The genome within a human cell contains 23 pairs of chromosomes ranging in size from 50 to 250 Mb. Approximately 40,000 genes are present within the 3000 Mb of human DNA. It would be very difficult for a DNA-binding protein to recognize a unique site in this vast array of DNA sequences. Consequently, more-elaborate mechanisms are required to achieve specificity. Megabase (Mb) A length of DNA consisting of 106 base pairs (if double stranded) or 106 bases (if single stranded). Another source of complexity in eukaryotic gene regulation is the many different cell types present in most eukaryotes. Liver and pancreatic cells, for example, differ dramatically in the genes that are highly expressed (see Table 31.1). Moreover, eukaryotic genes are not generally organized into operons. Instead, genes that encode proteins for steps within a given pathway are often spread widely across the genome. Finally, transcription and translation are uncoupled in eukaryotes, eliminating some potential gene-regulatory mechanisms. 31.2.1. Nucleosomes Are Complexes of DNA and Histones The DNA in eukaryotic chromosomes is not bare. Rather eukaryotic DNA is tightly bound to a group of small basic proteins called histones. In fact, histones constitute half the mass of a eukaryotic chromosome. The entire complex of a cell's DNA and associated protein is called chromatin. Five major histones are present in chromatin: four histones, called H2A, H2B, H3, and H4, associate with one another; the other histone is called H1. Histones have strikingly basic properties because a quarter of the residues in each histone is either arginine or lysine. In 1974, Roger Kornberg proposed that chromatin is made up of repeating units, each containing 200 bp of DNA and two copies each of H2A, H2B, H3, and H4, called the histone octomer. These repeating units are known as nucleosomes. Strong support for this model comes from the results of a variety of experiments, including observations of appropriately prepared samples of chromatin viewed by electron microscopy (Figure 31.15). Chromatin viewed with the electron microscope has the appearance of beads on a string; each bead has a diameter of approximately 100 Å. Partial digestion of chromatin with DNAse yields the isolated beads. These particles consist of fragments of DNA 200 bp in length bound to the eight histones. More extensive digestion yields a reduced DNA fragment of 145 bp bound to the histone octamer. This smaller complex of the histone octamer and the 145-bp DNA fragment is the nucleosome core particle. The DNA connecting core particles in undigested chromatin is called linker DNA. Histone H1 binds, in part, to the linker DNA. 31.2.2. Eukaryotic DNA Is Wrapped Around Histones to Form Nucleosomes The overall structure of the nucleosome was revealed through electron microscopic and x-ray crystallographic studies pioneered by Aaron Klug and his colleagues. More recently, the three-dimensional structure of a reconstituted nucleosome core (Figure 31.16) was determined to relatively high resolution by x-ray diffraction methods. As was shown by Evangelos Moudrianakis, the four types of histone that make up the protein core are homologous and similar in structure (Figure 31.17). The eight histones in the core are arranged into a (H3)2(H4)2 tetramer and a pair of H2AH2B dimers. The tetramer and dimers come together to form a left-handed superhelical ramp around which the DNA wraps. In addition, each histone has an amino-terminal tail that extends out from the core structure. These tails are flexible and contain a number of lysine and arginine residues. As we shall see, covalent modifications of these tails play an essential role in modulating the affinity of the histones for DNA and other properties. The DNA forms a left-handed superhelix as it wraps around the outside of the histone octamer. The protein core forms contacts with the inner surface of the superhelix at many points, particularly along the phosphodiester backbone and the minor groove of the DNA. Nucleosomes will form on almost all DNA sites, although some sequences are preferred because the dinucleotide steps are properly spaced to favor bending around the histone core. Histone H1, which has a different structure from the other histones, seals off the nucleosome at the location at which the linker DNA enters and leaves the nucleosome. The amino acid sequences of histones, including their amino- terminal tails, are remarkably conserved from yeast through human beings. The winding of DNA around the nucleosome core contributes to DNA's packing by decreasing its linear extent. An extended 200-bp stretch of DNA would have a length of about 680 Å. Wrapping this DNA around the histone octamer reduces the length to approximately 100 Å along the long dimension of the nucleosome. Thus the DNA is compacted by a factor of seven. However, human chromosomes in metaphase, which are highly condensed, are compacted by a factor of 104. Clearly, the nucleosome is just the first step in DNA compaction. What is the next step? The nucleosomes themselves are arranged in a helical array approximately 360 Å across, forming a series of stacked layers approximately 110 Å apart (Figure 31.18). The folding of these fibers of nucleosomes into loops further compacts DNA. The writhing of DNA around the histone core in a left-handed helical manner also stores negative supercoils; if the DNA in a nucleosome is straightened out, it will be underwound (Section 23.3.2). This underwinding is exactly what is needed to separate the two DNA strands during replication and transcription (Sections 27.5 and 28.1.5). 31.2.3. The Control of Gene Expression Requires Chromatin Remodeling Does chromatin structure play a role in the control of gene expression? Early observations suggested that it does indeed. The treatment of cell nuclei with the nonspecific DNA-cleaving enzyme DNAse I revealed that regions adjacent to genes that are being actively transcribed are more sensitive to cleavage than are other sites in the genome, suggesting that the DNA in these regions is less compacted than it is elsewhere in the genome and more accessible to proteins. In addition, some sites, usually within 1 kb of the start site of an active gene, are exquisitely sensitive to DNAse I and other nucleases. These hypersensitive sites correspond to regions that have few nucleosomes or have nucleosomes in an altered conformational state. Hyper-sensitive sites are cell-type specific and developmentally regulated. For example, globin genes in the precursors of erythroid cells from 20-hour-old chicken embryos are insensitive to DNAse I. However, when hemoglobin synthesis begins at 35 hours, regions adjacent to these genes become highly susceptible to digestion. In tissues such as the brain that produce no hemoglobin, the globin genes remain resistant to DNAse I throughout development and into adulthood. The results of these studies suggest that a prerequisite for gene expression is a relaxing of the chromatin structure. Recent experiments even more clearly revealed the role of chromatin structure in regulating access to DNA binding sites. Genes required for galactose utilization in yeast are activated by a DNA-binding protein called GAL4, which recognizes DNA binding sites with two 5 -CGG-3 sequences separated by 11 base pairs (Figure 31.19). Approximately 4000 potential GAL4 binding sites of the form 5 -CGG(N)11CCG-3 are present in the yeast genome, but only 10 of them regulate genes necessary for galactose metabolism. What fraction of the potential binding sites are actually bound by GAL4? This question is addressed through the use of a technique called chromatin immunoprecipitation (ChIP). GAL4 is first cross-linked to the DNA to which it is bound in chromatin. The DNA is then fragmented into small pieces, and antibodies to GAL4 are used to isolate the chromatin fragments containing GAL4. The cross-linking is reversed, and the DNA is isolated and characterized. The results of these studies reveal that only approximately 10 of the 4000 potential GAL4 sites are occupied by GAL4 when the cells are growing on galactose; more than 99% of the sites appear to be blocked. Thus, whereas in prokaryotes all sites appear to be equally accessible, chromatin structure shields a large number of the potential binding sites in eukaryotic cells. GAL4 is thereby prevented from binding to sites that are unimportant in galactose metabolism. These lines of evidence and others reveal that chromatin structure is altered in active genes compared with inactive ones. How is chromatin structure modified? As we shall see in Section 31.3.4, specific covalent modifications of histone proteins are crucial. In addition, the binding of specific proteins to DNA sequences called enhancers at specific sites in the genome plays a role. 31.2.4. Enhancers Can Stimulate Transcription by Perturbing Chromatin Structure We can now understand the action of enhancers, already introduced in Section 28.2.6. Recall that these DNA sequences, although they have no promoter activity of their own, greatly increase the activities of many promoters in eukaryotes, even when the enhancers are located at a distance of several thousand base pairs from the gene being expressed. Enhancers function by serving as binding sites for specific regulatory proteins. (Figure 31.20). An enhancer is effective only in the specific cell types in which appropriate regulatory proteins are expressed. In many cases, these DNA-binding proteins influence transcription initiation by perturbing the local chromatin structure to expose a gene or its regulatory sites rather than by direct interactions with RNA polymerase. This mechanism accounts for the ability of enhancers to act at a distance. The properties of enhancers are illustrated by studies of the enhancer controlling the muscle isoform of creatine kinase (Section 14.1.5). The results of mutagenesis and other studies revealed the presence of an enhancer located between 1350 and 1050 base pairs upstream of the start site of the gene for this enzyme. Experimentally inserting this enhancer near a gene not normally expressed in muscle cells is sufficient to cause the gene to be expressed at high levels in muscle cells, but not other cells (Figure 31.21). 31.2.5. The Modification of DNA Can Alter Patterns of Gene Expression The modification of DNA provides another mechanism, in addition to packaging with histones, for inhibiting inappropriate gene expression in specific cell types. Approximately 70% of the 5 -CpG-3 sequences in mammalian genomes are methylated at the C-5 position of cytosine by specific methyltransferases. However, the distribution of these methylated cytosines varies, depending on the cell type. Consider, again, the globin genes. In cells that are actively expressing hemoglobin, the region from approximately 1 kb upstream of the start site of the β-globin gene to approximately 100 bp downstream of the start site contains fewer 5-methylcytosine residues than does the corresponding region in cells that do not express these genes. The relative absence of 5-methylcytosines near the start site is referred to as hypomethylation. The methyl group of 5-methylcytosine protrudes into the major groove where it could easily interfere with the binding of proteins that stimulate transcription. The distribution of CpG sequences in mammalian genomes is not uniform. The deamination of 5-methylcytosine produces thymine; so CpG sequences are subject to mutation to TpG. Many CpG sequences have been converted into TpG through this mechanism. However, sites near the 5 ends of genes have been maintained because of their role in gene expression. Thus, most genes are found in CpG islands, regions of the genome that contain approximately four times as many CpG sequences as does the remainder of the genome. Note that methylation is not a universal regulatory device, even in multicellular eukaryotes. For example, Drosophila DNA is not methylated at all. III. Synthesizing the Molecules of Life 31. The Control of Gene Expression 31.2. The Greater Complexity of Eukaryotic Genomes Requires Elaborate Mechanisms for Gene Regulation Figure 31.14. Yeast Chromosomes. Pulsed-field electrophoresis allows the separation of 16 yeast chromosomes. [From G. Chu, D. Wollrath, and R. W. Davis. Science 234(1986):1583.] III. Synthesizing the Molecules of Life 31. The Control of Gene Expression 31.2. The Greater Complexity of Eukaryotic Genomes Requires Elaborate Mechanisms for Gene Regulation Figure 31.15. Chromatin Structure. An electron micrograph of chromatin showing its "beads on a string" character. [Courtesy of Dr. Ada Olins and Dr. Donald Olins.] III. Synthesizing the Molecules of Life 31. The Control of Gene Expression 31.2. The Greater Complexity of Eukaryotic Genomes Requires Elaborate Mechanisms for Gene Regulation Figure 31.16. Nucleosome Core Particle. The structure consists of a core of eight histone proteins surrounded by DNA. (A) A view showing the DNA wrapping around the histone core. (B) A view related to that in part A by a 90degree rotation shows that the DNA forms a left-handed superhelix as it wraps around the core. (C) A schematic view. III. Synthesizing the Molecules of Life 31. The Control of Gene Expression 31.2. The Greater Complexity of Eukaryotic Genomes Requires Elaborate Mechanisms for Gene Regulation Figure 31.17. Homologous Histones. Histones H2A, H2B, H3, and H4 each adopt a similar three-dimensional structure as a consequence of common ancestry. Some parts of the tails present at the termini of the proteins are not shown. III. Synthesizing the Molecules of Life 31. The Control of Gene Expression 31.2. The Greater Complexity of Eukaryotic Genomes Requires Elaborate Mechanisms for Gene Regulation Figure 31.18. Higher-Order Chromatin Structure. A proposed model for chromatin arranged in a helical array consisting of six nucleosomes per turn of helix. The DNA double helix (shown in red) is wound around each histone octamer (shown in blue). [After J. T. Finch and A. Klug. Proc. Natl. Acad. Sci. USA 73(1976):1900.] III. Synthesizing the Molecules of Life 31. The Control of Gene Expression 31.2. The Greater Complexity of Eukaryotic Genomes Requires Elaborate Mechanisms for Gene Regulation Figure 31.19. Gal4 Binding Sites. The yeast transcription factor GAL4 binds to DNA sequences of the form 5 -CGG(N) 11CCG-3 . Two zinc-based domains are present in the DNA-binding region of this protein. These domains contact the 5 -CGG-3 sequences, leaving the center of the site uncontacted.