|
|
|
Review
Department of Biochemistry, Duke University Medical Center, Durham, North Carolina 27710, USA
| Abstract |
|---|
|
|
|---|
[Keywords: CTD; RNA polymerase II; cotranscriptional; nuclear organization; phosphorylation]
A feature of the CTD that was discovered early, and that clearly carries functional implications, was that it is subject to hyperphosphorylation. RNAPII can exist in a form with a highly phosphorylated CTD (subunit II0; RNAPII0) and a form with a nonphosphorylated CTD (subunit IIa; RNAPIIA) (for a review, see Dahmus and Dynan 1992
). Phosphorylation occurs principally on Ser2 and Ser5 of the repeats (Dahmus 1995
, 1996
), although these positions are not equivalent (West and Corden 1995
; Yuryev and Corden 1996
). A consequence of hyperphosphorylation is that the mobility in SDS gels of the II0 form of the largest subunit is markedly reduced relative to that of form IIa (e.g., see Greenleaf 1992
). Learning that RNAPII could exist in two forms led to efforts to understand functional differences between them. We now know that the phosphorylation state changes as RNAPII progresses through the transcription cycle.
Early results from Dahmus suggested that the initiating RNAPII was form IIA while the elongating enzyme was form II0 (Cadena and Dahmus 1987
; Payne et al. 1989
). In the meantime, the first CTD kinase (yeast CTDK-I) was purified (Lee and Greenleaf 1989
, 1991
; Sterner et al. 1995
) and used to prepare biochemical amounts of hyperphosphorylated recombinant CTD, which was then employed to generate and affinity purify antiphosphoCTD antibodies (Lee and Greenleaf 1991
; Weeks et al. 1993
). These antibodies were used in fluorescence microscopy to investigate the in vivo distribution of RNAPII0 on Drosophila polytene chromosomes. Consistent with the results from Dahmus, this approach demonstrated that sites of active transcription contained RNAPII0, whereas some inactive genes and promoter-proximal sites with paused polymerases contained RNAPIIA (Weeks et al. 1993
). Using a different cross-linking method and assay, Lis and colleagues (O'Brien et al. 1994
) observed the same distribution. These results reinforced the idea that promoter binding and early events are carried out by RNAPIIA, whereas elongation is carried out by RNAPII0 (Dahmus 1994
). Almost all subsequent experiments are consistent with this overall notion. However, it should be kept in mind that some genes may differ from this picture (e.g., Lee and Lis 1998
). An example of gene class-specific differences in CTD phosphorylation was already found in 1993: By immunofluorescence, elongating RNAPs on developmentally induced loci in Drosophila (ecdysone puffs on polytene chromosomes) were recognized exclusively as "II0" enzymes, whereas RNAPs on stress-induced loci (heat-shock puffs) were recognized as both "II0" and "IIA" forms (Weeks et al. 1993
).
It is very important to note that the "II0" designation simply indicates hyperphosphorylation of the CTD (as detected originally by mobility shift of the Rpb1 subunit); RNAPII0, however, is not necessarily a homogeneous population of molecules. While RNAPII0 does consist of RNAPs with hyperphosphorylated CTDs, the patterns of phosphorylation on individual CTDs can vary widely. This variation can be due to differential phosphorylation of Ser2 versus Ser5 residues and/or to differential phosphorylation of repeats along the length of the CTD. As expanded on below, modulating these patterns regulates the affinity of the CTD for its binding partners, and consequently different phosphorylation patterns present at different stages of transcription control the timely recruitment to transcribing RNAPII of factors important for RNA maturation and other events. Much of this review deals with what these CTD phosphorylation patterns may be, how they are created, and what their functional significance is. While the recent past has witnessed significant progress toward answering these questions, our hope is that this review will underscore the point that we have a great deal to learn about virtually every aspect of CTD phosphorylation and function.
| Patterns and consequences of CTD phosphorylation |
|---|
|
|
|---|
If the CTD is not required for catalyzing the synthesis of RNA chains, what does it do? While its purpose was nebulous for some time, it is now clear that a major function of the CTD is to serve as a binding scaffold for a variety of nuclear factors. Since the activities of bound factors become physically associated with RNAPII, the processes they represent become linked to this transcriptase. The early proposal that the PCTD (phosphoCTD) physically links pre-mRNA processing to transcription by tethering processing factors to elongating RNAPII (Corden 1990
; Greenleaf 1993
) has been borne out experimentally over the last decade (for reviews, see Corden and Patturajan 1997
; Goldstrohm et al. 2001
; Maniatis and Reed 2002
; Proudfoot et al. 2002
). We now understand that the PCTD, via its recruitment of PCTD-binding factors, plays a major role in coordinating a number of nuclear processes with RNA chain synthesis and the translocation of RNAPII along a gene.
The role of CTD phosphorylation in facilitating prem-RNA processing has thus far been best characterized for 5'-end capping and 3'-end cleavage and polyadenylation. The 7-methyl G5'ppp5'N cap is added when the transcript is
25 bases long, soon after its 5' end emerges from the exit channel of RNAPII (Jove and Manley 1984
; Rasmussen and Lis 1993
). That acquisition of such a cap is unique to RNAPII transcripts (Shatkin 1976
), and transcripts made by a CTD-less RNAPII are very inefficiently capped (McCracken et al. 1997a
), suggested that capping enzyme associates with the transcription complex via interactions with the CTD. An exploration of this hypothesis led to the finding that capping enzyme indeed associates physically with the CTD of RNAPII in vitro (Cho et al. 1997
; McCracken et al. 1997a
). Subsequent cross-linking studies (chromatin immunoprecipitation, or ChIP) showed that capping enzyme also associates with transcribed genes in vivo, in a manner that requires CTD phosphorylation; consistent with 5' capping being an early event in the life of a nascent transcript, capping enzyme localizes to genes near their 5' ends (Komarnitsky et al. 2000
; Schroeder et al. 2000
).
Analogous to capping, the formation of 3' ends of messages is also coupled to transcription by RNAPII through interactions between the CTD and the processing machinery (for a review, see Proudfoot 2004
). Attempts to uncover the biochemical basis of this functional link led to the finding that cleavage and polyadenylation factors bind to the PCTD in vitro (McCracken et al. 1997b
; Birse et al. 1998
). ChIP experiments reveal that CF IA, a factor involved in 3'-end formation, accumulates toward the 3' ends of genes (Licatalosi et al. 2002
), and its cross-linking is dependent on CTD phosphorylation (Licatalosi et al. 2002
; Ahn et al. 2004
).
CTD phosphorylation patterns along genes
With the advent of ChIP it became feasible to explore the phosphorylation status of the CTD on RNAPs at different positions along a transcription unit. The commercial availability of anti-CTD monoclonal antibodies (mAbs) with phosphorylation pattern-dependent specificities helped spur these studies. A major finding was that phosphorylation of Ser5 residues predominates near the beginning of genes, whereas polymerases near the ends of genes are extensively phosphorylated on Ser2 residues (Komarnitsky et al. 2000
; Morris et al. 2005
).
In vivo, Ser5 phosphorylation near the 5' ends of genes depends principally on the kinase activity of TFIIH (Kin28 in yeast; CDK7 in metazoans) (Komarnitsky et al. 2000
; Schroeder et al. 2000
). In vitro, this kinase correspondingly adds phosphates to Ser5 positions of CTD repeats (Hengartner et al. 1998
; Sun et al. 1998
). Subsequent to the action of TFIIH kinase, Ser2 residues are phosphorylated by CTDK-I in yeast (CDK9 kinase in metazoans) (Marshall et al. 1996
; Lee and Greenleaf 1997
; Prelich 2002
). Fittingly, in vitro CTDK-I preferentially adds phosphates to Ser2 residues of repeats already containing Ser5PO4 (Jones et al. 2004
).
In parallel with its location at the 5' end of genes in vivo, capping enzyme binds directly to and its activity is modulated by Ser5P CTD repeats in vitro (e.g., E.J. Cho et al. 1998
; Ho and Shuman 1999
). These findings are consistent with the observed interaction between genes encoding capping enzyme and TFIIH kinase (Rodriguez et al. 2000
). Analogously, the accumulation of CF IA at the 3' end of genes depends on CTDK-I (Ahn et al. 2004
), the gene for the catalytic subunit of CTDK-I shows genetic interactions with 3'-end-forming factors (Lindstrom and Hartzog 2001
; Skaar and Greenleaf 2002
), and Pcf11, a subunit of CF IA, binds directly to repeats with Ser2 phosphates (Licatalosi et al. 2002
).
This developing picture of CTD phosphorylation, which has revealed most about the two ends of a gene, raises a number of important questions: What is the phosphorylation pattern in the middle of a gene? How does Ser5P transition into Ser2P? Can Ser2P and Ser5P residues coexist in the same heptad? Why are contiguous repeats required for viability? Recent results are beginning to answer some of these questions.
Analogous to the studies on 5'- and 3'-end factors, investigations into PCTD-binding proteins that are found specifically in the body of a gene should provide insights into CTD phosphorylation patterns internal to transcription units. A case in point is the histone methyltransferase Set2, a PCTD-binding protein found at internal sites along transcription units (Li et al. 2002
, 2003
; Krogan et al. 2003
; Schaft et al. 2003
; Xiao et al. 2003
). Recent data indicate that the histone H3 K36 methyl groups added by Set2 recruit a histone deacetylase that dampens the activity potential of just-transcribed chromatin (Carrozza et al. 2005
; Joshi and Struhl 2005
; Keogh et al. 2005
; Chu et al. 2006
). Set2-mediated methylation of H3 K36 in vivo requires the presence of CTDK-I (Krogan et al. 2003
; Xiao et al. 2003
). Correspondingly, the PCTD-binding domain of Set2 (the SRI [Set2Rpb1-interacting] domain) is essential for cotranscriptional methylation (Kizer et al. 2005
). The binding specificity of this
100-amino-acid domain was determined using a series of synthetic CTD peptides of varying length and phosphorylation patterns (Kizer et al. 2005
; M. Li et al. 2005
). The SRI domain displays several novel and notable requirements for optimal binding: It needs repeats phosphorylated on both Ser2 and Ser5; it needs at least four phosphate groups; and these phosphates need to be on contiguous heptads. Thus the SRI domain binds optimally to a diheptad comprising doubly phosphorylated repeats.
The SRI domain's binding requirements argue for the presence of doubly phosphorylated (Ser2,5P) repeats on the CTD of elongating RNAPII. The presence of such repeats is also supported by the demonstrated proficiency of CTDK-I at generating 2,5P repeats in vitro (Jones et al. 2004
). Both of these features are consistent with the requirement of Set2 for the presence of CTDK-I in vivo.
What we do and do not know: a working model of CTD phosphorylation and function
Much of the above information has been incorporated into a working model shown in Figure 1. This overview contains many features of recently published models (Sims et al. 2004
; Zorio and Bentley 2004
), but it differs from the others in explicitly including proteins that bind to Ser2,5P repeats as major components of the elongating complex. This view of RNAPII action implies that the CTD exists in at least four major phosphorylation states during the transcription of a gene. RNAPII at a promoter initially carries a largely unphosphorylated CTD, and the enzyme is associated with a set of factors, such as Mediator, that interact with this form of the CTD. Integrator, a newly described factor involved in snRNA 3'-end processing, is another such multiprotein complex (Baillat et al. 2005
). In the case of Mediator, there are contacts between the factor and both the CTD and the body of RNAPII (Asturias et al. 1999
; Dotson et al. 2000
), although the identity of the Mediator subunits that bind directly to the CTD are not yet known.
|
After initiation, an elongation-phase kinase (CTDK-I in yeast; P-TEFb in metazoa) (Marshall et al. 1996
; Lee and Greenleaf 1997
) modifies mainly Ser2 residues to generate elongation-proficient RNAPII; elongation-related factors such as Set2 bind to the CTD in this third state of phosphorylation. (It is not clear whether CTK1 is the actual ortholog of CDK9. A detailed study of evolutionary relationships among CDKs extant in sequence databases of humans, Drosophila melanogaster, and several simpler eukaryotes suggests that the yeast protein closest to human and Drosophila CDK9 is Bur1 [Liu and Kipreos 2000
]. The analysis further suggests that yeast Ctk1 is most related to two uncharacterized human proteins, gi|14110386| and gi|20521690|, and one uncharacterized Drosophila protein, gi|24668141|.) Because of the binding specificity of Set2 and because CTDK-I efficiently uses Ser5P repeats to generate Ser2,5P repeats in vitro (Jones et al. 2004
), we propose that during elongation the CTD contains repeats phosphorylated at both Ser2 and Ser5 positions. This proposal is not at odds with the observation that the ChIP signal generated by mAb H5 (usually used to detect Ser2P) increases when RNAPII is at sites within transcription units (e.g., Komarnitsky et al. 2000
), because mAb H5 actually binds Ser2,5P repeats better than Ser2P repeats (Jones et al. 2004
). Also, the H14 (Ser5P-specific mAb) signal persists as RNAPII moves from the 5' end into the gene (Boehm et al. 2003
), even though it may decrease (Ahn et al. 2004
; Morris et al. 2005
); the remaining H14 signal could then indicate the presence of either Ser5P or Ser2,5P repeats (Jones et al. 2004
). Furthermore, the initial step in the ChIP procedure, formaldehyde cross-linking, can couple proteins to the CTD (A.L. Greenleaf, unpubl.) and may block an unknown number of epitopes; such epitope masking will reduce the ChIP signal of the cognate antibody by an unknown amount. Overall, extant data do not permit an unequivocal assignment of the number of Ser2,5P repeats or their distribution along the CTD of elongating RNAPII. It will be interesting to see if all proteins that are recruited to the middle of genes via PCTD binding display Set2-like specificity for doubly phosphorylated repeats. This need not be the case, because there may be a mixture of differently phosphorylated repeats on the CTD at any given time.
Finally, near the 3' end of the gene it is widely believed that CTD phosphorylation is dominated by Ser2P residues; this is consistent with the binding specificity and localization of some 3'-end processing factors (Licatalosi et al. 2002
; Ahn et al. 2004
). If there are actually fewer Ser5P residues at the 3' end, a Ser5P-specific protein phosphatase must act on the PCTD, as indicated in Figure 1 (for a review, see Meinhart et al. 2005
). As with the 5' end and middle of the gene, however, there are caveats to the idea that Ser2 phosphorylation predominates at 3' ends. First, not all genes analyzed by ChIP experiments show a loss of mAb H14 reactivity (Ser5P) at the 3' end (Boehm et al. 2003
). Second, the antibody usually used to detect Ser2P repeats (mAb H5) actually reacts better with Ser2,5P repeats, as mentioned (Jones et al. 2004
). Third, the repeat nature of the CTD bespeaks a large number of potential phospho-epitopes, and quantifying these is extremely difficult. Fourth, proteins cross-linked to the CTD will block access of the cognate antibodies, altering ChIP signals by an unknown amount. Thus, while the overall idea that Ser2P residues increase in abundance toward the 3' end is likely to be upheld, it will take substantial additional effort to establish the actual number, distribution, and protein occupancy of heptad repeat types on the CTD of RNAPII at the 3' ends of genes.
To stimulate discussion and experiments, we have incorporated some of the facts and caveats discussed above into a set of hypothetical results that relate phosphorylation patterns on different CTDs to antibody signals that might be generated by ChIP analysis of those CTDs (Fig. 2). We emphasize that these results are hypothetical, and are presented to stimulate discussion. CTD #1, for example, comprises eight nonphosphorylated (NP) repeats and nine Ser5P repeats, as might be found on an initiating RNAPII after TFIIH acts on it. There are also CTD-binding proteins cross-linked to CTD #1, obscuring five NP repeats and two 5P repeats. Thus, three NP repeats and seven 5P repeats are available for antibody binding. In a ChIP gedanken experiment we employ the most commonly used anti-CTD mAbs to analyze CTD #1. The resulting signal strengths will be proportional to the number of repeats recognized: three for mAb 8WG16 (NP) and seven for mAb H14 (5P). The mAb H5, which reacts with Ser2P (and Ser2,5P repeats) gives a signal on NP and 5P repeats we will call background ().
|
| PCTD-associating proteins (PCAPs) and functions of the CTD |
|---|
|
|
|---|
The unique amino acid sequence and restricted amino acid composition of the CTD engender it with some unusual properties. For example, the CTD is very hydrophilic, and in aqueous solution it has little stable secondary structure (Cagas and Corden 1995
; Bienkiewicz et al. 2000
); thus, it has the ability to adopt numerous conformations that should enable it to bind cognate factors of different structural types. Its length, if stretched out, is potentially >1200 Å in mammals (1500 Å if the linker region is included); thus there is room for binding of several factors. It is easy to see how the PCTD could tether several discrete functional entities to an elongating RNAPII at the same time. Of evolutionary interest, the CTDs in most animals, plants and fungi contain many identical repeats; it has been proposed that divergence of CTD repeat sequences has been constrained during much of eukaryotic evolution by essential interactions between different CTD-binding factors and canonical CTD repeats (Stiller and Cook 2004
; Guo and Stiller 2005
).
The first PCAPs and RNA processing
The first systematic effort to identify CTD-binding proteins involved a yeast two-hybrid screen that used part of the mammalian CTD as bait. This screen uncovered two classes of protein (carrying a CTD-interacting domain [CID] either at the N or C terminus), now called SCAFs (SR-like CTD-associated factors) (Yuryev et al. 1996
; Conrad et al. 2000
). While the functions of the mammalian SCAFs are not yet known, the yeast homolog of one class, Nrd1 protein, in fact, functions in processing of RNAPII transcripts (Steinmetz et al. 2001
; Arigo et al. 2006
). Following quickly on the heels of finding the SCAFs, several other PCAPs were recognized, largely as a result of studying CTD-truncated RNAPII in mammalian cells. Unexpectedly, RNAPII missing most of its CTD caused more drastic defects in pre-mRNA processing than in transcription itself; this observation led to demonstrations that, for example, cleavage/polyadenylation factors display CTD associations (McCracken et al. 1997b
; Barilla et al. 2001
; Licatalosi et al. 2002
; Maniatis and Reed 2002
; Proudfoot et al. 2002
; Kyburz et al. 2003
). As mentioned earlier, it was found that CF-IA subunit Pcf11 binds directly to the CTD, preferentially to repeats phosphorylated on Ser2 (Licatalosi et al. 2002
). Interestingly, the CID of Pcf11 is homologous to that of Nrd1; these and other PCTD-interacting domains (PCIDs) are discussed more in a later section.
RNA processing factors acting at the other end of the gene were also found to bind the PCTD, as several groups demonstrated capping enzyme/PCTD connections (Cho et al. 1997
; McCracken et al. 1997a
; Yue et al. 1997
). Subsequently Shuman and colleagues (Ho and Shuman 1999
) showed that whereas mammalian guanylyltransferase (GTase) binds CTD repeats carrying either Ser2P or Ser5P, only the repeats with Ser5P allosterically activate the enzyme. The groups of Buratowski (Komarnitsky et al. 2000
) and Bentley (Schroeder et al. 2000
) showed that capping enzyme is cross-linkable to chromatin at promoter regions of genes but not at internal or 3'-terminal regions, nicely correlating its localization with its function. Recently the structure of a capping GTase complexed with a PCTD peptide was solved (Fabrega et al. 2003
). This structure is a good example of how a flexible CTD can fit into an extended docking site on its binding partner.
Splicing factors also were shown to associate with the phosphorylated form of RNAPII by coprecipitation or co-localization approaches (e.g., Kim et al. 1997
; Misteli and Spector 1999
); however, these demonstrations could not reveal which factor or subunit contacted the PCTD. Additional splicing factors that bind the PCTD and the transcription-splicing connection are discussed more in a later section.
A few more PCAPs
A number of additional proteins that bind directly to the PCTD were found by diverse approaches. Recently, use of CTD phospho-peptides in an affinity chromatography approach identified a novel PCTD-binding protein called Rtt103 (YDR289c) that binds specifically to Ser2P repeats, presumably via a domain with homology with the CID of Nrd1 and Pf11; identifying proteins that interact with Rtt103 led to implicating the exonuclease Rat1 in transcription termination (Kim et al. 2004
). Another two-hybrid screen yielded some of the same SCAFs mentioned above, and in addition, revealed a putative prolyl isomerase, SRCyp (Bourquin et al. 1997
); as proposed for Ess1 (below) (Morris et al. 1999
; Wu et al. 2000
), this activity may be involved in modulating the structure of the PCTD and/or its associated factors. Investigations of large human RNAPII complexes revealed that the histone acetyltransferase PCAF interacts with the phosphorylated form of RNAPII (H. Cho et al. 1998
); it may be that this association is important for facilitating the movement of RNAPII0 through chromatin. Along related lines, a multisubunit "Elongator" has been copurified with RNAPII0 (although direct binding to the CTD has not been demonstrated), and contains among its subunits a HAT activity (Otero et al. 1999
; Wittschieben et al. 1999
). Interestingly, genetic studies (Jona et al. 2001
) revealed interactions between genes encoding Elongator subunits and CTK1, the gene encoding the catalytic subunit of CTDK-I. A number of other proteins of disparate function have been shown to bind directly to the PCTD, including the KRAB/Cys2-His2 zinc finger protein ZNF74 (Grondin et al. 1997
), and the splicing and transcription-associated proteins PSF and p54nrb/NonO (Emili et al. 2002
). Interestingly, it should be noted that a pool of nontranscribing RNAPII carries a PCTD and is associated with certain transcription and processing factors in potential assembly areas called "transcriptosomes" (cf. Gall 2000
).
Not just RNA processing anymore: many more PCAPs and functions
Because most of the PCAPs mentioned to this point were not discovered in a systematic way, and because the original and subsequent two-hybrid screens picked up only a very small number of PCAPs (Yuryev et al. 1996
; Bourquin et al. 1997
; Guo et al. 2004
), the existence of additional PCAPs seemed very likely. A biochemical approach that included PCTD direct-binding assays and affinity-matrix purification procedures was applied to yeast extracts and resulted in identification of novel PCAPs. The initial group included a prolyl isomerase (Ess1), a splicing factor (Prp40), and a ubiquitin ligase (Rsp5) (Morris et al. 1999
; Morris and Greenleaf 2000
). Subsequently, an improved approach, applied on a larger scale, revealed >100 proteins in the yeast proteome that are retained specifically on an affinity matrix carrying a synthetic three-repeat CTD peptide in which both Ser2 and Ser5 residues of each repeat are phosphorylated ("2,5P" peptide column) (Phatnani et al. 2004
); recall that this is the pattern preferentially generated by CTDK-I. A striking feature of the proteins identified by this approach is the number of functional classes into which they fall. In addition to pre-mRNA RNA processing factors, proteins recovered from the 2,5P peptide column represent factors with known or proposed roles in transcription, chromatin structure modification, DNA damage/repair, protein degradation, protein synthesis, RNA degradation, snRNA modification, and snoRNP biogenesis.
The idea that PCTD binding by these proteins is functionally meaningful has been investigated to date for several of the proteins; most is currently known about the histone methyltransferase, Set2, whose PCTD-mediated link to elongating RNAPII was described earlier. It is worth reiterating that binding studies with recombinant Set2 constructs demonstrated that its SRI domain binds with high selectivity to Ser2,5P CTD repeats (Kizer et al. 2005
). NMR structure determination (M. Li et al. 2005
; Vojnic et al. 2006
) together with point mutagenesis and phospho-peptide binding studies (M. Li et al. 2005
, see a later section) provide a molecular picture of the SRI domain and suggest how it binds to 2,5P CTD repeats, tethering Set2 to elongating RNAPII. Recent ideas about Set2 function (Carrozza et al. 2005
; Joshi and Struhl 2005
; Keogh et al. 2005
; Chu et al. 2006
) tie in nicely with the notion that it is part of a transcription elongation megacomplex.
A few of the other proteins identified in the biochemical search through the yeast proteome have already been shown to bind directly to the PCTD; these include Ess1, Prp40, Ssd1, and Hrr25 (Morris et al. 1999
; Morris and Greenleaf 2000
; Phatnani et al. 2004
). Interestingly, the binding domains of three of these proteins bind best to Ser2,5P repeats, whereas the binding domain of Ssd1 binds equally well to Ser2,5P and Ser2P repeats (Phatnani et al. 2004
); the functional significance of these specificities have yet to be explored in vivo. Of this group, the protein with perhaps the most novel implications is Hrr25, a protein kinase involved in response to DNA damage (Ho et al. 1997
). The selective binding of Hrr25 to 2,5P repeats suggests a role in DNA damage responses for RNAPII carrying repeats phosphorylated in this pattern. This suggestion is in line with published studies showing that ctk1
strains are sensitive to certain DNA damaging agents (Ostapenko and Solomon 2003
). On the other hand, PCTD-associated Hrr25 may be involved in other processes. For example, recent results implicate Hrr25 and two other 2,5P-binding proteins (Enp1, Tsr1) in ribosome biogenesis (Schafer et al. 2006
).
Another kinase isolated as a 2,5P-binding protein is Hog1, a stress-activated protein kinase that plays an essential role in adaptation to conditions of high osmolarity. Interestingly, it has recently been found to interact directly with Rpb1 in a manner that appears to depend both on CTD phosphorylation and osmotic stress. Moreover, in osmotically stressed cells, Hog1 can be cross-linked to the coding regions of osmoregulated genes (Proft et al. 2006
). It will be interesting to see whether the recruitment and/or activity of Hog1 is dependent on its direct binding to the PCTD. Along similar lines, preliminary studies on Cbf5, another protein isolated by 2,5P-affinity chromatography, have now revealed that it is a direct-binding PCAP (R.J. Boruta, H.P. Phatnani, and A.L. Greenleaf, unpubl.). Cbf5 is a component of the H/ACA snoRNP, an RNA/protein particle that converts certain U residues in rRNAs and other RNAs to pseudouridine (Meier 2005
). This result fits nicely with the developing understanding of cotranscriptional assembly of the H/ACA snoRNP, and especially with the contemporaneous discovery that the process depends on CTDK-I (Ballarino et al. 2005
; Yang et al. 2005
). It also provides another example supporting the view that many of the proteins isolated by 2,5P-affinity chromatography will ultimately be found to interact with the PCTD in a functionally meaningful way.
Unexpectedly, a large number of the 2,5P-repeat-binding proteins have known or proposed roles in protein synthesis or degradation. In addition there are several proteins with connections to the proteasome or with potential chaperone-like functions. These associations are consistent with a role for the PCTD in a cotranscriptional process that involves translation (Iborra et al. 2001
; Brogna et al. 2002
); alternatively, they may bind the PCTD in the course of executing other functions (e.g., some ribosomal proteins regulate splicing of their own message [Dabeva and Warner 1993
; Fewell and Woolford 1999
; Warner 1999
]; others perform different extraribosomal functions [Wool 1996
; Jeffery 2003
; Zimmermann 2003
]). If some kind of nuclear translation does occur (Dahlberg and Lund 2004
; Iborra et al. 2004b
), it likely participates in nonsense-mediated decay (Iborra et al. 2004a
). Consistent with this idea, recent systems biology analyses strongly suggest connections between the transcription/pre-mRNA processing/export machinery and the NMD machinery (Maciag et al. 2006
).
| PCIDs and binding modes |
|---|
|
|
|---|
Pin1, a prolyl isomerase
The first structure of a PCAP, mammalian Pin1 (homologous to yeast Ess1), was reported in 1997 (Ranganathan et al. 1997
), and the structure of a fungal homolog was reported more recently (Z. Li et al. 2005
). Pin1/Ess1 consists of a prolyl isomerase domain and a small N-terminal WW domain. In the mammalian protein the domains are coupled via a flexible linker, whereas the fungal enzyme has a more rigid linker, leading to a different spatial relationship between the domains in the two structures. The structure of mammalian Pin1 complexed with a CTD one-repeat phospho-peptide shows that the peptide, which assumes an extended coil-like conformation, contacts only the WW domain (Verdecia et al. 2000
). Curiously, while the peptide was phosphorylated on both Ser2 and Ser5, only the phosphate on Ser5 made contact with the protein. One explanation for this may be that with only seven amino acids (YSPTSPS; phosphorylated serines are underlined) the peptide was not long enough to contain the actual in vivo binding epitope, which might extend across the canonical repeat boundaries (consideration of repeat "phasing" is discussed in Greenleaf 2003
). Later binding studies with yeast Ess1 WW domain, using multirepeat CTD peptides, indicate that it does have a strong preference for Ser2,5P repeats (Phatnani et al. 2004
). It may be that because the peptides used in these later tests comprised three canonical repeats they contained binding epitopes that extend across repeat boundaries.
The 5' end: capping enzyme
The next structure of a complex between the PCTD and a cognate binding protein was that of Cgt1 capping GTase bound to a four-repeat peptide carrying phosphates on each Ser5 residue (Fabrega et al. 2003
). In this case, both the type of protein bound and the phosphorylation pattern on the CTD peptide were different from the earlier studies. Nevertheless, a number of the questions posed earlier were answered in these experiments, the results of which included many novel findings. For example, unlike Pin1, the region of Cgt1 that binds the PCTD is not a separate domain; rather, it is a part of the GTase domain, but separate from the active site. The PCTD binds in a long groove that extends some 40 Å along the protein surface. At each end there is a "docking site" that binds a Ser5-PO4 and several adjacent residues of the CTD peptide. Each docking site makes critical contacts with highly conserved CTD repeat residues, such as the almost invariant Tyr, in addition to the Ser5-PO4. An important feature of docking site 1 is that it binds residues from two consecutive canonical repeats; thus, observing this binding mode depended on using a peptide containing more than one canonical repeat. Very interestingly, the central portion of the three repeats involved in interacting with the Cgt1 protein loops out from the surface of the protein and does not participate directly in binding. The conformation of the CTD segments bound to Cgt1 contrasts with that in the Pin1 complex by not being coiled, but extended and nonhelical. Thus the first two structures solved demonstrated that not all PCIDs are the same and also that the flexible CTD sequence can adopt different conformations in binding to different proteins.
The 3' end: polyA/cleavage factor Pcf11
Yet another mode of binding was observed when the structure of the CID of yeast Pcf11 was determined. The CID is a domain of
140 amino acids discovered a decade ago (Yuryev et al. 1996
). In some proteins, such as yeast Pcf11, the CID shows binding specificity for repeats carrying Ser2P (Licatalosi et al. 2002
). In other proteins, however, it apparently can have a different binding specificity; mammalian SCAF8, for example, binds best to doubly phosphorylated repeats (Patturajan et al. 1998b
). Meinhart and Cramer (2004)
solved the crystal structure of the Pcf11 CID by itself, and then they soaked in a 12-residue repeat peptide in which the central Ser2 was phosphorylated. Unexpectedly, the Ser2 phosphate group does not make any contacts with the CID. Noble et al. (2005)
, who also recently solved the CID structure, determined that the KD for a similar peptide was
180 µM; intriguingly, Hollingworth et al. (2006)
found that RNA also binds weakly to the CID of Pcf11, displaying an apparent competition with CTD phospho-peptides. It will be interesting to see if this competition is functionally significant for 3'-end processing in vivo. In addition, experiments utilizing differently phosphorylated peptides will be important in comparing binding modes for CIDs with homologous structures but different binding specificities (e.g., Pcf11 and SCAF8).
The middle: FF domains and SRI domain
The solution structures of two other classes of PCID have been solved by NMR methods, and additional binding motifs and modes have emerged. Certain FF domains,
50-residue motifs characterized by two conserved Phe residues (Bedford and Leder 1999
), were shown to bind to the PCTD (Carty et al. 2000
; Morris and Greenleaf 2000
). Interestingly, the FF1 domain from the yeast splicing factor Prp40 (FF1Prp40) has a 3D structure extremely similar to that of the FF1 domain from the mammalian splicing-related protein HYPA/FBP11 (FF1FBP11), but its binding specificity is very different (Allen et al. 2002
; Gasch et al. 2005
). Whereas FF1FBP11 binds the PCTD, FF1Prp40 instead binds to N-terminal TPR repeats of the multifunctional yeast protein Clf1. Such differences are not too surprising, since the amino acid sequences of different FF domains are poorly conserved. For example, Gasch et al. (2005)
carried out a phylogenetic analysis of FFs from splicing-related factors and found that they could be placed in several different subgroups. To a large extent, this grouping placed FF domains with similar pKas together. As might be expected for a domain that binds the negatively charged PCTD, FF1FBP11 has a basic pKa of 9.6; in contrast the FF1Prp40 that does not bind the PCTD has a pKa of 4.7. The other individual FF domain previously shown to bind the PCTD, FF5 of CA150 (Carty et al. 2000
), has a pKa of 9.1, consistent with this analysis. It will be interesting to see if all basic FF domains, dispersed among different proteins, bind the PCTD. Also, the identification of the binding partners of the other FF domain classes (neutral and acidic) should be very informative.
The solution structure of another small domain, which is found only in one class of chromatin-modifying enzyme, was solved recently. The histone methyltransferase Set2 contains at its C terminus, a 100 residue domain that binds the PCTD, tethering Set2 to elongating RNAPII and coupling methylation of Lys 36 in histone H3 to transcription elongation (Kizer et al. 2005
). Human and yeast SRI domains are structurally quite similar even though the amino acids sequences are only
20% identical (M. Li et al. 2005
; Vojnic et al. 2006
). NMR resonance perturbation experiments suggest that the PCTD-binding sites are similarly positioned on the two domains, which have similar binding characteristics (M. Li et al. 2005
; Vojnic et al. 2006
).
As additional PCIDs are identified and their structures are solved, it will be instructive to see how many families of PCID there are, how they are distributed among factors of different functions, and how they bind to the PCTD. Elucidating this structural information and combining it with functional studies will be important for filling in gaps in our understanding of the CTD and its functions.
| The PCTD as a major organizer of nuclear functions |
|---|
|
|
|---|
We have seen that the CTD of actively transcribing RNAPII is phosphorylated at multiple sites, and that the pattern of phosphorylation changes as polymerase traverses a transcription unit; in turn, different phosphorylation patterns recruit different proteins to the CTD. Thus, during the process of RNA chain synthesis the PCTD orchestrates formation of a megacomplex that is linked to RNAPII. However, the elongation megacomplex is not static in composition, but changes components and capabilities as RNAPII moves through different regions of a gene. A simplified overview of these events was depicted in the model of Figure 1. A more detailed snapshot of a hypothetical elongation megacomplex in the middle of a gene is shown in Figure 3.
|
650 Å [Meinhart et al. 2005
150 Å [Cramer et al. 2001Considering first the DNA/chromatin template, the histone methyltransferase Set2 is depicted as simultaneously contacting the PCTD and a nucleosome near transcribing RNAPII, since strong evidence exists that Set2 binds directly to the PCTD via its SRI domain and cotranscriptionally modifies histone H3 in nucleosomes (discussed above). A chromatin remodeling factor (CRF) is included to represent potentially CTD-bound factors that modify chromatin structure to facilitate transcription by RNAPII. Hrr25, a PCTD-interacting protein implicated in the response to DNA damage, is also shown attached to the PCTD, where it might receive a signal from the polymerase that DNA damage has been encountered (red adduct about to enter RNAPII).
As for the RNA transcript, we show Prp40 binding to the PCTD, positioning its associated U1snRNP to recognize a 5' splice site in the transcript, tethering it to the PCTD until branchpoint-binding protein (BBP) and the associated 3' splice site are encountered. We have also positioned the H/ACA snoRNP component Cbf5 such that it can access a hypothetical intron-encoded small nucleolar RNA (snoRNA) (red recognition sequence indicated in RNA) to initiate cotranscriptional snoRNP assembly (discussed above). Analogously, it may be that U3 snoRNP assembly also begins cotranscriptionally, since we found two of its components, Utp20 and Rrp5, in the collection of proteins bound to doubly phosphorylated CTD repeats (Phatnani et al. 2004
). In the hypothetical megacomplex, we have also included putative PCAPs involved in proteasome function (e.g., Cic1) and RNA degradation (e.g., Mrt4). Also present is a representative factor (XF) that links RNA processing to nuclear export (for a review, see Maniatis and Reed 2002
). Finally, we point out that the order in which these interactions occur is not known; for instance, PCAPs that also bind RNA (e.g., Prp40, Cbf5) could bind the PCTD either before or after binding their cognate RNA. In addition, PCAP binding to the PCTD may be stabilized by interactions with other components of the megacomplex (e.g., Phatnani et al. 2004
, and its Supplementary Tables 1, 2).
The snapshot of the elongation megacomplex illustrated in Figure 3 represents only one of many possible configurations. Because of the number of possible phosphorylation sites in the CTD and the existence of multiple CTD kinases and phosphatases (and one or more prolyl isomerases), the extent and pattern of CTD phosphorylation potentially can be modulated to generate a vast number of different phospho-epitopes (Sudol et al. 2001
; Buratowski 2003
). The arrays of binding sites thus generated have the potential to recruit many combinations of binding partners. We propose that this feature contributes to remodeling or fine-tuning the functional capabilities of the RNAPII elongation megacomplex in response to different signals. These signals could indicate, for example, position along the transcription unit, presence and nature of introns (Batsche et al. 2006
), alterations in cellular physiology, actions of gene-specific modulators, or presence of DNA damage. Determining the nature of such signals and how they function should form the basis for exciting future experiments.
Open questions
The past decade has seen significant progress in our understanding of the CTD and its interacting factors, but crucial questions about many aspects of CTD phosphorylation and function remain open. Among many important questions about the CTD that remain to be answered are the following baker's dozen:
We have listed these questions because it is useful to realize what we still do not know about the CTD. Keeping these unknowns in mind should both guide interpretations of experimental data and help stimulate new experiments. We have made great strides in the last 10 yr, but the amount left to learn suggests that the next decade of CTD investigation will be at least as productive and even more exciting.
| Acknowledgments |
|---|
|
|
|---|
| Footnotes |
|---|
E-MAIL: arno{at}biochem.duke.edu; FAX (919) 684-8885. ![]()
Article is online at http://www.genesdev.org/cgi/doi/10.1101/gad.1477006.
| References |
|---|
|
|
|---|
Akoulitchev, S., Makela, T.P., Weinberg, R.A., and Reinberg, D. 1995. Requirement for TFIIH kinase activity in transcription by RNA polymerase II. Nature 377: 557560.[CrossRef][Medline]
Allen, M., Friedler, A., Schon, O., and Bycroft, M. 2002. The structure of an FF domain from human HYPA/FBP11. J. Mol. Biol. 323: 411416.[CrossRef][Medline]
Arigo, J.T., Carroll, K.L., Ames, J.M., and Corden, J.L. 2006. Regulation of yeast NRD1 expression by premature transcription termination. Mol. Cell 21: 641651.[CrossRef][Medline]
Asturias, F.J., Jiang, Y.W., Myers, L.C., Gustafsson, C.M., and Kornberg, R.D. 1999. Conserved structures of mediator and RNA polymerase II holoenzyme. Science 283: 985987.
Baillat, D., Hakimi, M.A., Naar, A.M., Shilatifard, A., Cooch, N., and Shiekhattar, R. 2005. Integratora multiprotein mediator of small nuclear RNA processing, associates with the C-terminal repeat of RNA polymerase II. Cell 123: 265276.[CrossRef][Medline]
Ballarino, M., Morlando, M., Pagano, F., Fatica, A., and Bozzoni, I. 2005. The cotranscriptional assembly of snoRNPs controls the biosynthesis of H/ACA snoRNAs in Saccharomyces cerevisiae . Mol. Cell. Biol. 25: 53965403.
Barilla, D., Lee, B.A., and Proudfoot, N.J. 2001. Cleavage/polyadenylation factor IA associates with the carboxyl-terminal domain of RNA polymerase II in Saccharomyces cerevisiae. Proc. Natl. Acad. Sci. 98: 445450.
Barron-Casella, E. and Corden, J.L. 1992. Conservation of the mammalian RNA polymerase II largest-subunit C-terminal domain. J. Mol. Evol. 35: 405410.[Medline]
Batsche, E., Yaniv, M., and Muchardt, C. 2006. The human SWI/SNF subunit Brm is a regulator of alternative splicing. Nat. Struct. Mol. Biol. 13: 2229.[CrossRef][Medline]
Bedford, M.T. and Leder, P. 1999. The FF domain: A novel motif that often accompanies WW domains. Trends Biochem. Sci. 24: 264265.[CrossRef][Medline]
Bienkiewicz, E.A., Moon Woody, A., and Woody, R.W. 2000. Conformation of the RNA polymerase II C-terminal domain: Circular dichroism of long and short fragments. J. Mol. Biol. 297: 119133.[CrossRef][Medline]
Birse, C.E., Minvielle-Sebastia, L., Lee, B.A., Keller, W., and Proudfoot, N.J. 1998. Coupling termination of transcription to messenger RNA maturation in yeast. Science 280: 298301.
Boehm, A.K., Saunders, A., Werner, J., and Lis, J.T. 2003. Transcription factor and polymerase recruitment, modification, and movement on dhsp70 in vivo in the minutes following heat shock. Mol. Cell. Biol. 23: 76287637.
Bourquin, J.P., Stagljar, I., Meier, P., Moosmann, P., Silke, J., Baechi, T., Georgiev, O., and Schaffner, W. 1997. A serine/arginine-rich nuclear matrix cyclophilin interacts with the C-terminal domain of RNA polymerase II. Nucleic Acids Res. 25: 20552061.
Brogna, S., Sato, T.A., and Rosbash, M. 2002. Ribosome components are associated with sites