RNA polymerases (RNAPs) are among the most important cellular enzymes. They are present in all living organisms from Bacteria and Archaea to Eukarya and are responsible for DNA-dependent transcription. Although in Bacteria and Archaea there is only one RNAP, Eukarya possess up to three RNAPs in animals (I, II and III) and five in plants (IV and V) [1-2]. All of the RNAPs are evolutionarily related and have common structural and functional properties. The minimally conserved structural organization is represented by the bacterial enzyme, which contains only 4 subunits (α, α’, β, β’), whereas Archaea and Eukarya RNAPs are composed of 12 subunits (Rpb1-Rpb12) . In prokaryotes, one RNAP transcribes all of the genes into all of the RNAs, however, in eukaryotes, this is achieved by three RNAPs. RNAPI transcribes genes that encode for 18S and 28S ribosomal RNAs; RNAPIII transcribes short genes, such as tRNAs and 5S ribosomal RNA, and RNAPII transcribes all protein-coding genes and genes for small noncoding RNAs (e.g., small nuclear RNAs (snRNAs) that are involved in splicing). The largest catalytic subunits of all three eukaryotic polymerases share homology among themselves and with the largest subunit of bacterial polymerase . Solely the largest subunit of RNAPII (Rpb1) contains an unusual evolutionarily conserved carboxy-terminal domain (CTD) , which is subjected to numerous post-translational modifications of extraordinary importance in gene expression regulation [6-8].RNAPII transcription plays a central role in gene expression and is highly regulated at many steps, such as initiation, elongation and termination. Furthermore, phosphorylation of the Rpb1 CTD is known to regulate all of the transcription steps and coordinate these steps with other nuclear events. Prior to mRNA biosynthesis, RNAPII proceeds through several steps, such as promoter recognition, preinitiation complex (PIC) assembly, open complex formation, initiation and promoter escape. This sequence of events is initiated by the binding of gene-specific activators and coactivators, which results in the recruitment of basal transcription machinery (i.e., general transcription factors (GTFs): TFIIA, TFIIB, TFIID, TFIIE, TFIIF, and TFIIH) and RNAPII to promoters [9-11]. Basal transcription factors position RNAPII on promoters to form the PIC but also function at later steps, such as promoter melting and initiation site selection. Thereafter, initiation proceeds, and RNAPII leaves the promoter during promoter clearance and proceeds into processive transcript elongation. Finally, when the gene has been fully transcribed, transcriptional termination occurs, and RNAPII is released and recycled to reinitiate a new round of transcription [12-14].
During its passage across a gene, RNAPII must overcome challenges. Initially, the polymerase needs to escape from the promoter, and the synthesis of the pre-mRNAs must be tightly coupled to its subsequent processing (i.e., capping, splicing, and polyadenylation). Then, initiation factors must be exchanged for elongation factors , which are thought to increase the transcription rate and RNAPII processivity. In fact, recently, there has been an extraordinary increase in the number of proteins known to influence transcription elongation by avoiding transcriptional arrest, facilitating chromatin passage and mRNA processing [16-21], allowing mRNA packaging into a mature ribonucleoprotein (mRNP) and controlling mRNP quality and mRNA export [13, 22-28]. Therefore, the discovery of all of these factors has provided further evidence that the elongation phase is also highly regulated in eukaryotic cells and strictly coordinated with other nuclear processes [12-14].
2. RNAPII CTD phosphorylation: The CTD code
During the last two decades, gene expression studies have provided further evidence that many steps in gene expression, originally considered distinct and independent, are, in fact, highly coordinated, linked and regulated in a complex web of connections [29-30]. The central coordinator that directs this regulatory network (i.e., from transcription initiation to termination and with pre-mRNA processing) in combination with many other nuclear functions is RNAPII, and the carboxy-terminal domain (CTD) of its largest subunit is of remarkable importance. CTD phosphorylation regulates and coordinates the entire transcription cycle with pre-mRNA processing, mRNA transport and with chromatin remodeling and modification . The CTD, therefore, has a critical integrating role in essentially all of the mRNA biogenesis steps, thus, it is subject to a dynamic regulation during the transcription cycle (i.e., [21, 31-32]). Therefore, RNAPII phosphorylation is one of the key processes in the regulation of transcription specifically and gene expression in general; consequently, deciphering the mechanisms that underlie RNAPII phosphorylation regulation has become one of the most studied issues in the field of gene expression.
RNAPII is comprised of 12 subunits (Rpb1-12) that are structurally and functionally conserved from yeast to mammals [33-34]. In 1985, the largest subunit of RNAPII, Rpb1, from mouse and
Original studies showed that two RNAPII forms can be differentiated in SDS-PAGE gels because of the different mobility of Rpb1 . These two forms were termed RNAPIIA and RNAPIIO, and they differ in the extent of CTD phosphorylation. RNAPIIA is hypophosphorylated , and RNAPIIO is hyperphosphorylated . Moreover, both forms, IIA and IIO, are functionally distinct because the IIA form is preferentially recruited to the promoter and associated with preinitiation complexes , whereas RNAPIIO functions during elongation, is highly phosphorylated  and thus requires de-phosphorylation to stimulate its recruitment into the PIC complexes and to reinitiate a new round of transcription . We currently know that this earlier two-step transcription cycle model that is based on the two RNAPII forms is overly simple. Different phosphorylated forms of RNAPII are specific and characteristic of the different steps that occur during the transcription cycle , and the correct progression of RNAPII through the transcription cycle is dependent on changes in the CTD phosphorylation status. Differential CTD phosphorylation promotes the exchange of initiation and elongation factors during promoter clearance , the exchange of elongation and 3’-end processing factors at termination , as well as RNAPII recycling  and, moreover, links pre-mRNA processing and other nuclear events with transcription [17, 42]. Therefore, the CTD phosphorylation cycle is very complex. It is widely known that the three serines (i.e., Ser2, Ser5, and Ser7) [7, 51], the tyrosine [52-53] and the threonine  in each repeat can be phosphorylated. Additionally, both proline residues can be isomerized by a prolyl isomerase . Moreover, glycosylation of serines and threonine can also occur , and in human cells, the CTD can be methylated at some of the degenerate repeat sites . The multitude of possible CTD modifications, especially Ser phosphorylations, in combination with the numerous repetitions, gives rise to a wide range of variations (i.e., phosphorylation patterns) that have been termed the “CTD code” (Figure 2) [6-8].
The RNAPII CTD code determines and coordinates the timely sequential recruitment of required specific factors during the transcription cycle. Therefore, the CTD functions as a scaffold that coordinates mRNA biogenesis, such as transcription initiation , promoter clearance , elongation , and termination [31, 61-62], as well as RNA processing [17, 21, 30] and snRNA and snoRNA gene expression [63-65] by recruiting the appropriate set of factors when required during active transcription. These factors recognize CTD phosphorylation patterns either indirectly or directly by contacting phosphorylated residues. Among the CTD-associated factors are export and histone modifier factors and DNA repair factors .
2.1. Ser2P and Ser5P, and to a lesser extent, Ser7P, are the main determinants of the CTD code
To determine precisely which serine residues are phosphorylated in a particular repeat has been challenging because of the numbers of phospho-acceptor amino acid residues and consensus motif repetitions (Figure 1). However, studies involving chromatin immunoprecipitation with specific monoclonal antibodies have provided evidence that differential phosphorylation of the Ser residues coincides with the temporal and spatial recruitment of different factors [8, 32, 48, 66-67]. In fact, these antibodies have been largely used to decipher and characterize the role of CTD phosphorylation during the transcription cycle and in gene expression regulation [8, 32, 68]. Antibodies that selectively recognize either Ser2 or Ser5 phosphorylation (i.e., Ser2P or Ser5P, respectively) were the first residues to be described ; phosphorylation of these two residues has been extensively studied, and they have been considered as the two main determinants of the CTD code . It is widely known that CTD phosphorylation switch from Ser5 to Ser2 during the course of transcription and is subject to a dynamic regulation during the whole transcription cycle [69-71]. The level of Ser5 phosphorylation peaks early in the transcription cycle and remains constant or decreases as RNAPII progresses to the 3′ end of the gene (Figure 3); [48, 67, 72]). In contrast, Ser2 phosphorylation is the predominant modification in the coding and 3′-end gene regions and occurs simultaneously with productive elongation [31, 48, 73]. On the other hand, de-phosphorylation of Ser5 occurs during the initiation-elongation transition and throughout the entire elongation step, whereas Ser2 de-phosphorylation occurs at the end of transcription to recycle the polymerase and reinitiate a new round of transcription. Therefore, reversible phosphorylation/de-phosphorylation of the CTD plays a significant role in modulating the transcription cycle [31-32].
Most recently, the use of new anti-CTD monoclonal antibodies has demonstrated that Ser7, which is the most degenerate position of the CTD , can be phosphorylated during the transcription of snRNA genes and protein-coding genes [64, 68, 74]. Subsequently, this mark increases the complexity of the CTD code [7-8]. Ser7 phosphorylation is mediated by the same kinase [74-75], although, at least in
2.2. Tyrosine 1 and Threonine 4 can also be phosphorylated
Tyrosine 1 (Tyr1) is evolutionarily conserved and present in all of the 52 repeats of the mammalian CTD, and in all of the 26-27 repeats of the yeast CTD. Although, it is well known that Tyr1 is susceptible to phosphorylation by tyrosine kinases
In mammals, there is an important degeneracy at some positions in the CTD, mainly in most of the carboxy-terminal repeats. Thus, the last repeat of the CTD is followed by a conserved 10 amino acid extension (Figure 1; ) that contains a constitutive site for the casein kinase (CK) II site . Though deletion of this extension results in degradation of the CT, and effects in transcription and pre-mRNA processing [83-84], mutation of the CKII target site does not affect RNAPII CTD stability. Additionally, this extension is required for the phosphorylation of Tyr1 by c-Abl in mammals and it has been suggested that Tyr1 phosphorylation could be involved in functions specific of these higher eukaryotes . Finally, non-consensus residues, such as lysine and arginine, are also present in the CTD, and they could be potentially modified by acetylation, ubiquitylation, sumoylation (lysine residues) and methylation (lysine and arginine residues) . Therefore, the possibilities of CTD modifications are enormous, and only some of the modifications have been demonstrated to influence, while interacting with numerous factors, different aspects of gene expression.
3. Modifying enzymes: Kinases and phosphatases
Most of what is known concerning CTD-protein interactions, and in particular RNAPII CTD modifying enzymes, is derived from animal and yeast models, especially
Below, in figure 4, we will highlight the most relevant features and functions of CTD kinases and phosphatases, with special emphasis on the budding yeast enzymes because extensive studies on RNAPII CTD phosphorylation have been performed on that organism, and most of these enzyme complexes are evolutionarily conserved.
3.1. RNAPII CTD kinases
The CTD is phosphorylated by members of the cyclin-dependent kinase (CDK) family, which usually consists of a catalytic and a cyclin subunit. Although CDKs are cell cycle regulators, several members of this family have direct functions in RNAPII activity regulation [39, 88]. All these kinases are members of multiprotein transcription regulatory complexes and, in mammals, the best known are Cdk7/CycH, Cdk8/CycC and Cdk9/CycT; recently, Cdk12/CycK has been characterized as a new CTD kinase . These kinases are evolutionarily conserved, and the following four complexes with kinase activity have been identified in the well-known yeast model
|ySrb10 / Srb11|
hCdk8 / CycC
Bdf1 and Taf2, Med2
Gcn4, Msn2, Ste12, Gal4
PIC inhibition / activation
Scaffold complex formation
|yKin28 / Ccl1-Tfb3|
hCdk7 / CycH
Scaffold complex formation
Capping complex recruitment
Bur1 activity stimulation
Elongation factor Paf1C recruitment
SAGA complex recruitment
snRNA 3’ processing
|[64, 72, 74-76, 90, 94-100]|
|yBur1 / Bur2|
hCdk9 / CycT
hDSIF (ySpt5), Rad6/Bre1
|Ctk1 activity stimulation|
PAF complex recruitment
Histone genes 3’-end processing
|[54, 99, 101-104]|
|yCtk1 / Ctk2-Ctk3|
hCdk12 / CycK
|RNAPII release from basal initiation factors|
3’-processing factors recruitment
|[16, 49, 89, 105-112]|
3.1.1. Pre-initiation and Initiation RNAPII CTD kinases
Human Cdk8/CycC and yeast Srb10/Srb11 are part of the CDK-module of Mediator , a large complex of 25-30 proteins that is structured in 4 sub-complexes or modules that act as a molecular bridge between DNA-binding transcription factors and RNAPII [114-115]. Mediator is required for the expression of nearly all RNAPII transcribed genes . Cdk8/Srb10 is part of the CDK-module (Cdk8, cyclin C, MED12 and MED13 in mammals; Srb8, 9, 10 and 11 in yeast), which dynamically associates with Mediator [93, 117]. Although Cdk8/Srb10 can phosphorylate Ser2 and Ser5 of the CTD repeats
The Cdk7/cyclin H complex in mammals and its homolog in
Additionally, yeast Kin28 phosphorylates two subunits of Mediator (i.e., Med4 and Rgr1/Med14), and although the functionally of these modifications is unknown, it has been demonstrated that Mediator significantly enhances the phosphorylation of RNAPII CTD by Kin28 [94, 96]. In fact,
As discussed above, in yeast, Kin28 and Srb10 have overlapping functions in promoting transcription, PIC dissociation and subsequent scaffold complex formation . Genetic analysis has provided further evidence that Kin28 and Srb10 are not redundant because only Kin28 is essential for growth, and Srb10 is much less processive in terms of phosphorylation than Kin28 . It is clear that Kin28 is the primary kinase responsible for the high level of phosphorylation of RNAPII during initiation [48, 67, 94, 141]. In fact, one essential role of Kin28 that Srb10 does not have is the stimulation of pre-mRNA processing. However, what appears clear, at least in yeast, is that PIC dissociation is dependent on the kinase activities of Kin28 and Srb10. Additionally, another function of RNAPII CTD Ser5 phosphorylation by Kin28 is the enhancement of Bur1/Bur2 recruitment and Ser2 CTD phosphorylation near the promoters . Moreover, it has recently been demonstrated that TFIIH kinase places bivalent marks on the CTD, thereby phosphorylating Ser7 during transcription initiation [74-75].
3.1.2. RNAPII CTD elongating kinases
Eukaryotic organisms possess many factors that regulate transcriptional elongation; among these factors is Cdk9 kinase, which is the catalytic subunit of the positive transcription elongation factor b (P-TEFb) that controls the elongation phase of transcription by RNAPII in mammals and
In higher eukaryotes, the transcription factor P-TEFb not only regulates CTD phosphorylation, but it also inhibits the action of transcriptional repressors and is required for the association of several elongation factors with the transcribing polymerase. P-TEFb also targets DRB sensitivity-inducing factor (DSIF) and negative elongation factor (NELF) [142-144] (Table 1). Thus, P-TEFb promotes transcription by the following two different mechanisms: inhibiting the action of transcriptional repressors and phosphorylating the CTD during transcription elongation. Until recently, it was believed that Cdk9 was the only CTD Ser2 kinase in higher eukaryotes. In fact, Cdk9 can reconstitute the activity of both
Bur1 kinase and its cyclin, Bur2, form an essential CDK in
Bur1 shares another function with the mammalian and
Ctk1 was originally identified as the kinase subunit of the yeast CTDK-I complex that catalyzes phosphorylation of the RNAPII CTD . Ctk2 is the cyclin, and the Ctk3 function remains unknown. Ctk1 is the principal kinase that is responsible for CTD-Ser2P during transcription elongation, which is coincident with reduced Ser5P [73, 156]. Although Ctk1 is not directly involved in transcription elongation [16, 18, 157], it associates with RNAPII throughout elongation , and the kinase activity of Ctk1 is required for the association of polyadenylation and termination factors  and histone modification factors . Additionally, Ctk1 interacts genetically as well as biochemically with the TREX complex , which couples transcription elongation to mRNA export . Moreover, Ctk1 promotes the dissociation of basal transcription factors from elongating RNAPII, early during transcription, however, kinase activity is not required .
In addition to its functions in transcribing gene coding proteins, Ctk1 is involved in RNAPI transcription, interacts with RNAPI
3.2. RNAPII CTD phosphatases
Dynamic de-phosphorylation of Ser2P and Ser5P make a significant contribution to changes in CTD phosphorylation patterns during the transcription cycle and is essential for RNAPII recycling [8, 31]. Dephosphrylation is achieved by several CTD phosphatases (Table 2). Initially, only one phosphatase was identified, Fcp1, which is required for Ser2P de-phosphorylation, transcription elongation and RNAPII recycling to initiate new rounds of transcription [47, 165]. Two other CTD phosphatase were later identified in yeast, Ssu72, a component of the mRNA 3' end processing machinery [79, 88, 166] and Rtr1 . In mammals, in addition to Fcp1, there are other CTD phosphatases, i.e., the small phosphatase SCP1  and RPAP2, which is the human homolog of Rtr1 . Briefly, Fcp1 dephosphorylates Ser2P ; Ssu72 dephosphorylates Ser5P and Ser7P [50, 69]; and Rtr1 in yeast and SCP1 in mammals specifically dephosphorylate Ser5P [79, 167-168].
Chromatin immunoprecipitation studies have provided further evidence that the increase in Ser2P occurs as transcription progresses through the gene and follows Ser5P de-phosphorylation. Rtr1 in yeast was identified as the RNAPII CTD phosphatase driven the Ser5-Ser2P transition at the 5’ regions of the transcribed genes. Rtr1 genetically interacts with the RNAPII machinery, and Rtr1 deletion provokes global Ser5P accumulation in whole-cell extracts and Ser5P association throughout the coding regions [167, 171]. RPAP2 was identified in a systematic analysis carried out to determine the composition and organization of the soluble RNAPII machinery , and as in the case of Rtr1, Ser5P levels increase
|CTD-Ser5P||Promote Ser5P to Ser2P transition|
Association of Integrator with snRNA genes
Transcription termination and 3’-end processing
Facilitate Fcp1 activity
Gene looping and RNAPII recycling
|[50, 69, 79, 166, 172-176]|
|y/h Fcp1||CTD-Ser2P||Positive regulator of RNAPII transcription|
Transcription termination and RNAPII recycling
|[47, 73, 165, 177-182]|
Ssu72 was first described as a Ser5P phosphatase and recently as a Ser7P phosphatase [50, 69]. In fact, Ssu72 was originally identified as functionally interacting with the general transcription factor TFIIB [184-185]. Afterward, it was demonstrated that Ssu72 is part of the cleavage and polyadenylation factor (CPF) with a role at the 3’-end of genes [166, 175]. In fact, Ssu72 is crucial for transcription-coupled 3’-end processing and termination of protein-coding genes [175, 186-187]. Later, Ssu72 was characterized as a Ser5P phosphatase  and a potential tyrosine phosphatase  and, most recently, it has been demonstrated that Ssu72 is also a Ser7 phosphatase [50, 69]. A genome-wide distribution analysis of Ssu72 has demonstrated two peaks of association (Figure 4): a low peak at the 5’-end of genes and a higher peak at the cleavage and polyadenylation site or immediately after it . In agreement with it, Ssu72 dephosphorylates RNAPII CTD following cleavage and polyadenylation and recycles the terminating RNAPII, giving rise to a hypophosphorylated polymerase. In fact, inactivation of Ssu72 leads to the accumulation of Ser7P marks that avoids RNAPII recruitment to the PIC, and therefore inhibits transcription initiation, which results in cell death . In other words, Ssu72 is critical for transcription termination, 3’-end processing and RNAPII recycling to restart a new round of transcription. Additionally, it has been shown that Ssu72 has a function in gene looping . In a screen looking for mammalian retinoblastoma tumor suppressors, a human homolog of yeast Ssu72 was identified. As in yeast, mammalian Ssu72 associates with TFIIB and the yeast cleavage/polyadenylation factor Pta1, and exhibits intrinsic phosphatase activity . The crystal complex structure that is formed by human symplekin (Pta1 in yeast), hSsu72 and a CTD phosphopeptide has been elucidated, and hSsu72 was demonstrated to have a function in coupling transcription to pre-mRNA 3'-end processing .
Fcp1 was the first discovered CTD phosphatase and is highly conserved among eukarya [177, 189-191]. It directly de-phosphorylates RNAPII, and its activity is stimulated
Gene transcription is decreased in cells lacking Fcp1 function, and
4. Other factors influencing RNAPII CTD phosphorylation
Although many factors can have effects on CTD phosphorylation, we will highlight the following two that we believe are of significant relevance: the prolyl isomerases hPin1/yEss1 and the structure of the RNAPII itself. In addition, we will describe the role of ySub1 in CTD phosphorylation, because it has been extensively studied by us.
4.1. hPin1 / yEss1
The CTD can adopt either cis- or trans-conformations, which can significantly affect its modification, especially its phosphorylation. Peptidyl prolyl isomerases (PPIases) are enzymes that accelerate the rates of rotation about the peptide bond preceding proline and are important for protein folding and regulation of dynamic cellular processes [193-194]. Pin1 in mammals and Ess1 in
4.2. RNAPII structure and Rpb1-CTD localization
The structure of the complete 12-subunit RNAPII (Rpb1-12) is known [203-204]. Rpb4 and Rpb7 subunits form a conserved sub-complex that is conserved in all three eukaryotic RNA polymerases and archaea RNAP [205-206]. Crystal structures of the Rpb4/7 heterodimer in the context of the complete RNAPII complex localized it in the proximity of the Rpb1-CTD [203, 207], and biochemical and genetic studies suggest that Rpb4/7 might have a function in the recruitment of some CTD-binding proteins to transcribing RNAPII. Moreover, it is possible that this sub-complex, Rpb4/7, would regulate the access of CTD modifying enzymes during the whole transcription cycle [203, 207, 209-212]. Actually, structural studies have provided further evidence that the CTD extends from the RNAPII core enzyme near the RNA exit channel , where it is ideally located to bind and be affected by the action of a multitude of factors, among them kinases, phosphatases and isomerases. In fact, in yeast, the isopropylisomerase Ess1 and the phosphatase Fcp1 are associated with Rpb7 and Rpb4, respectively [55, 87, 208].
4.3. The ssDNA binding protein Sub1 as a general regulator of transcription
Sub1 is an ssDNA binding protein that has been implicated in several steps of mRNA metabolism, such as initiation, transcription termination and 3’-end processing [186, 213-215]. Sub1 was originally described as a transcriptional stimulatory protein that is homologous to the human positive coactivator PC4, which physically interacts with activators and components of the RNAPII basal transcription machinery [216-220]. Sub1 genetically and physically interacts with TFIIB [214-215, 221], and several functions have been proposed for Sub1 that include stimulating PIC recruitment and promoter escape. In fact, most recently, using a quantitative proteomic screen to identify promoter-bound PIC components, Sub1 was identified as a functional PIC component that is associated with RNAPII complexes . In addition, we have recently demonstrated that Sub1 globally regulates RNAPII CTD phosphorylation (Figure 5, ) and that it is a
5. RNAPII CTD phosphorylation and pre-mRNA processing
The CTD is an unordered structure that extends from the RNAPII core enzyme, near the RNA exit channel [204, 209]. This localization is convenient to interact with a plethora of factors, such as the CTD-modifying enzymes and binding factors involved in distinct nuclear processes, for example, components of the RNA processing machinery [32, 88]. Furthermore, its length and the ability to adopt numerous conformations permit it to interact with different factors at the same time [31-32], and it is currently clear that these interactions depends on the CTD phosphorylation patterns during the transcription cycle [8, 21].
As transcription progresses the nascent RNA is capped to protect the 5′ end, intron sequences are removed, and a polyadenylated tail is added to the 3’ end. Coupling mRNA processing to transcription increases processing efficiency and allows multiple regulatory pathways to guarantee that only correctly modified mRNAs are exported. For more than a decade, numerous studies have provided evidence that the CTD serves as a scaffold for the assembly of an enormous variety of protein complexes to coordinate not only transcription of non-coding and protein-coding genes [8, 58-62, 64-65], but also pre-mRNA processing [21, 31-32]: capping [42, 135, 137], splicing , and 3’-end cleavage and polyadenylation . All of these functions are achieved through the recognition and reading of the CTD code during the transcription cycle [6-8, 31]. Thus, co-transcriptional CTD-mediated processing of nascent RNA plays a crucial role in both recruitment of RNA processing machineries and regulation of their activities. Indeed, a functional CTD is not required for
The capping reaction consists in the addition of an inverted 7-methylguanosine cap to the first RNA residue by a 5'-5' triphosphate bridge. It is a characteristic of all RNAPII transcripts and is added to the 5’-end of nascent transcripts when they are only 25-50 bases long. The capping complex contains the following three enzymatic activities: RNA 5'-triphosphatase, guanylyl transferase and RNA (guanine-7) methyltransferase [17, 67]. In yeast, these activities are achieved by three enzymes (i.e., Cet1, Ceg1 and Abd1, respectively), whereas in metazoans, these activities are performed by two enzymes (i.e., HCE and MT) because guanylyl transferase and RNA 5'-triphosphatase are two functionally domains of HCE protein . Following Ser5 phosphorylation by TFIIH, the mRNA capping complex binds directly and specifically to Ser5P residues through the Ceg1 subunit in yeast or the guanylyl transferase domain in metazoans [48, 67, 78, 95, 137]. Furthermore, phosphorylated CTD interaction with the capping complex allosterically stimulates the capping enzyme activity and in response, enhances early transcription [136, 232]. Because the CTD is located near the RNA exit channel, its interaction with the capping complex permits its positioning for rapid processing of the mRNA 5’-end as the nascent transcript emerges from the polymerase. This is thought to protect the RNA from degradation and promote RNAPII to proceed into productive transcription elongation. In fact, by coupling capping and early transcription, only capped RNA will be elongated [67, 136, 232-233].
Not only capping and transcription are linked at the 5’-end regions of protein coding-genes, but also polyadenylation and transcription termination at the 3’-end regions. In brief, 3’-end processing consists of the following two-step reaction: endonucleolytic cleavage of the pre-mRNA and subsequent addition of a poly(A) tail . Both enzymatic reactions require a functional CTD [42, 230]. In fact, deletion of the CTD or absence of CTD phosphorylation negatively affects 3’-end processing [16, 30, 106, 157, 234]. Furthermore, the CTD binds 3′-end processing factors and stimulates cleavage/polyadenylation
Ser7 phosphorylation has been functionally related with 3’-end processing of snRNA in higher eukaryotes. Human snRNA genes, contrary to protein-coding genes, are not polyadenylated, and instead of a poly(A) signal, they contain a conserved 3′ box RNA-processing element that is recognized by the snRNA gene-specific Integrator RNA 3′ end-processing complex. This complex binds to RNAPII CTD and links transcription and 3’-end processing [63-64, 240-241]. Therefore, in metazoans, Ser7P, in combination with Ser2P, is a major determinant for the recruitment of the Integrator complex to snRNA genes during its transcription [64, 240-241]. In yeast, the Integrator-like complex recruitment depends on Ser7 phosphorylation, the promoter elements and the specialized PIC that binds those elements . After promoter escape, the RNA processing complex travels with the elongating phosphorylated polymerase up to the 3’-end box at the end of the snRNA transcription unit, where it associates with the nascent transcript in a co-transcriptional-dependent manner.
As in the case of capping and cleavage/polyadenylation, a number of studies performed
6. RNAPII CTD phosphorylation coordinates transcription to other nuclear processes
6.1. Coupling the CTD code and the histone code
The nucleosome is the basic element of chromatin and consists of a histone octamer composed of two copies of histone 2A (H2A), H2B, H3 and H4, wrapped by 146 bp of DNA . The histones carry numerous post-translational modifications, and some of these are associated with transcription. In fact, a general view is that histone post-translational modifications draw parallel with either positive or negative transcriptional states. Numerous discoveries have led to the idea that such modifications regulate transcription either directly by causing structural changes to chromatin (e.g., histone acetylation) or indirectly by recruiting protein complexes (e.g., histone methylation) [255-257]. Therefore, chromatin not only plays an essential role in packaging the DNA, but also in regulating gene expression. Most histone modifications reside in their amino- and carboxy-terminal tails, and a few of them in their globular domains. As in the case of CTD phosphorylation, where Ser5P triggers Ser2 phosphorylation, some histone modifications mark the deposition of another, thus creating a complex epigenetic signal code, the “histone code”, that governs chromatin organization and DNA-dependent processes such as transcription. Therefore, the histone code is responsible for an active or inactive chromatin state with respect to transcription, because it coordinates the recruitment of various chromatin modifying and remodeling complexes to regulate chromatin structure and, consequently, transcription [258-259]. Because this review focuses on RNAPII CTD phosphorylation, only certain histone modifications, which are functionally related to the CTD code and transcription, will be discussed. There are excellent reviews that discuss all the histone modifications and their roles in different nuclear processes [255, 258, 260-261].
Lysine is a key substrate residue because it undergoes many exclusive modifications important for transcription regulation (i.e., acetylation, methylation, ubiquitination and SUMOylation [255, 261]. The lysine residues can be mono-, di- or trimethylated, and each level of modification can result in distinct biological effects. In brief, with respect to transcription, acetylation activates and sumoylation appears to be repressive, and both modifications may mutually interfere. On the other hand, methylation can have distinct effects; thus, lysine 4 in histone H3 (H3K4me3) is trimethylated at the 5’-ends of genes during activation, whereas trimethylation of H3K9 occurs in transcriptionally silent regions. Arginine residues of H3 and H4 can also be mono- or dimethylated, which activate transcription. Serine/threonine phosphorylation of H3 in specific sites also marks activated transcription, and ubiquitination of H2B and H2A are associated with active and repressed transcription, respectively (reviewed by [255-257]. All histone modifications are removable by specific enzymes (e.g., histone deacetylases (HDACs), phosphatases, and ubiquitin proteases ([255-257], and references therein). In fact HDACs play important regulatory roles during active transcription .
Methylation of H3K4 and K36 are the most well characterized histone modifications with roles in active transcription , and whose functions are directly linked to RNAPII CTD phosphorylation (Figure 6). H3K4 is methylated by the Set1/COMPASS complex, while K36 is mediated by the Set2 complex. The profile of H3K4 tri-methylation (H3K4me3) strongly correlates with the distribution pattern of the RNAPII CTD-Ser5P. It is mainly found around the transcriptional start site (TSS) contributing to transcription initiation, elongation and RNA processing . Set1 recruitment and H3K4 tri-methylation usually peaks at the promoter and 5’ region of a gene, depending on Kin28/Cdk7 activity (Figure 7) and Paf1 complex, a RNAPII-associated complex , and contributes to transcription initiation, elongation and RNA processing  . H3K4 mono and di-methylation tend to expand along the coding regions compared to try-methylation. On the other hand, H3K36 methylation by Set2 is observed across the entire coding region with an increase toward the 3’-ends of actively transcribed genes. Ctk1 also regulates H3K4 methylation [158, 266]. Thereby, differently phosphorylated CTD by Kin28 and Ctk1 is responsible for the characteristic distribution of H3K4 tri-methylation in the coding region . In contrast to Set1, the recruitment of Set2 and H3K36 methylation depends on a CTD-Ser2P/Ser5P double mark (Figure 6), and therefore, on Ctk1 kinase activity [158, 266]. Interestingly, the other Ser2 kinase complex, Bur1/2, also promotes Set2 recruitment and assists H3-K36 methylation, particularly at the 5′ ends of genes and is required for the histone 2B ubiquitination activity of the Rad6/Bre1 complex [101, 103, 152].
H3 acetylation / deacetylation is also relevant during active transcription. Thus, histone acetyl and deacetyl transferase complexes (HAT and HDACs, respectively) are recruited to the transcriptional machinery during elongation through the interaction with RNAPII. Indeed, they modulate histone occupancy in the coding regions of actively transcribed genes, and this depends on CTD phosphorylation status [267-268]. HAT acetylates nucleosomes promoting nucleosomes eviction and allowing RNAPII to pass through. Afterward, the nucleosomes are immediately reassembled behind the polymerase and HDACs are co-transcriptionally recruited to rapidly and efficiently deacetylate the reassembled nucleosomes behind the polymerase. Altogether, this avoids cryptic transcription and maintains active transcription . Methylation of histone H3 by Set1 and Set2 is required for deacetylation of nucleosomes in coding regions by the histone deacetylase complexes (HDACs) Set3C and Rpd3C(S), respectively. HDACs’ recruitment is triggered by H3K4 methylation at promoters and within coding regions to restrict hyperactetylated histones to promoters and to maintain transcription activity. Set1-H3K4me2 can be recognized by two different HDACs, RPD3S or SET3C . The Set1-SET3C pathway preferentially affects actively transcribed genes with promoters configured for efficient initiation/re-initiation . In contrast, Set1-RPD3S pathway is active at loci subjected to cryptic and weak transcription encompassing repressed promoters of coding genes. Related to this, phosphorylation of the CTD by Kin28/Cdk7 is important for the initial recruitment of the Rpd3S and Set3 HDACs to coding sequences (, Figure 7). In fact, it has been reported that Set3C and Rpd3C(S) are co-transcriptionally recruited in the absence of Set1 and Set2, but stimulated by the CTD kinase Kin28/Cdk7. Hence, the Rpd3C(S) and Set3C co-transcriptional recruitment is stimulated by CTD Ser5P to achieve the deacetylation of H3 residues. This, together with evidence that the RNAPII CTD recruits additional chromatin modifying complexes, histone chaperones and elongation factors, suggest that phosphorylated RNAPII is crucial in coordinating the activities of the many factors required for regulating histone dynamics and consequently transcription elongation at actively transcribing genes (, and references therein).
In addition to the complexes involved in mRNA processing, several other proteins bind to the RNAs as soon as their 5’-end emerges from the RNAPII, packaging them into a messenger ribonucleoparticle (mRNP). This set of interactions of packaging and export factors play a dual function of protecting the RNA from degradation and preparing it to be exported. Although interactions between the CTD and many mRNA processing factors have been characterized, this is not the case for mRNA packaging and export factors. However, packaging and export seems to be also coupled to transcription through RNAPII CTD interactions because defects in transcription elongation, splicing, and 3′-end processing affect export . In yeast, mRNA export is linked to transcription through the TREX (transcription export) complex, which is composed of the THO complex (Tho2, Hpr1, Mft1, and Thp2) and the evolutionally conserved RNA export proteins, Sub2 (UAP56 in human) and Yra1 (REF/Aly in human), and a novel protein termed Tex1 [271-272]. Deletions of the individual THO components causes defects on transcription, transcription-dependent hyper-recombination, and on mRNA export [160, 273]. In addition, the Sub2/Yra1 complex is directly recruited to the actively transcribed regions via the THO complex [272, 274]. Although it has been shown that the TREX complex and Ctk1 are functionally related , recruitment of the TREX complex to transcribed genes is not dependent on Ctk1 in yeast , and the association of the human TREX complex to transcript might be coupled to transcription indirectly through splicing . Then, the potential role of the CTD and CTD phosphorylation in this process remains unclear, however in a very recent study, the mRNA export factor Yra1 was identified as a CTD phosphorylated-binding protein . Then, this study provides strong support for the idea that the phosphorylated CTD is directly involved in the cotranscriptional recruitment of export factors to active genes. In summary, many aspects of the mRNA metabolism from the 5' capping to the export occur co-transcriptionally and are coordinated through transcription, with the RNAPII CTD and its phosphorylation being the main coordinator in most cases (Figure 7).
7. RNAPII CTD phosphorylation and transcription regulation
The levels of CTD phosphorylation/de-phosphorylation are precisely modulated during the entire transcription cycle, which regulates the association of many important factors with initiating and elongating RNAPII, such as transcription and pre-mRNA processing factors, chromatin modifiers and mRNA export factors . The interplay of all of these factors is essential to regulating transcription and, consequently, gene expression. Subsequently, in a regular protein-coding gene, the following set of coordinated nuclear events must occur for it to be properly transcribed in a functional mRNA before it is exported to the cytoplasm and translated (Figure 7). Unphosphorylated RNAPII is recruited to the pre-initiation complex (PIC); then, after its binding to the promoters, it is phosphorylated on Ser5 by yKin28 (hCdk7). Ser5 phosphorylation is required for RNAPII dissociation from the PIC and consequently promotes transcription initiation. Simultaneously, Ser5 phosphorylation targets capping and splicing factor recruitment, the Set1 methyltransferase complex, and the Set3C and Rpd3S histone deacetylase complexes. During early elongation, Ser5P levels are decreased, whereas Ser2P levels increase due to the kinase activity of yBur1 (hCdk9) near the promoters, and by the kinase activity of yCtk1 (hCdk12) at the forward coding and 3’-ends, which leads to the recruitment of the histone methyltransferase Set2 and the activation of the Rpd3S complex, which prevents cryptic transcription within the genes. When RNAPII arrives at and recognizes the termination site, the 3’-end-processing factors that are associated with the CTD achieve cleavage and polyadenylation of the nascent mRNA, which also requires proper phosphorylation of the polymerase. During the termination process, the CTD is de-phosphorylated by Ssu72 and Fcp1, and the polymerase is recycled to initiate a new round of transcription. All along the gene, packaging and export factors (TREX complexes) are incorporated into the transcriptional machinery protecting the transcript from degradation and preparing it for export to the cytoplasm.
8. Therapeutic potential
Cellular differentiation, morphogenesis, development and adaptability of all organisms are subjected to proper gene expression and, therefore, variations in gene regulation can have profound effects on protein function, challenging the viability of the organisms. Currently, it is clear that RNAPII phosphorylation has an important role on gene expression, and therefore, in all the processes mentioned above. Consequently, over the last decade CTD phosphorylation has attracted the attention of biomedical research, especially due to the fact that the CTD kinase Cdk9 has been involved in several physiological cell processes, whose deregulation may be associated with cancer, and also due to the fact that Cdk9 activity is required for human immunodeficiency virus type 1 (HIV-1) replication. Related to it, many studies have shown the enormous potential of Cdk9 kinase inhibition as a treatment of several kinds of tumors, HIV infection, and cardiac hypertrophy.
The human immunodeficiency virus type 1 (HIV-1) requires host cell factors for all steps of the viral replication, among them the transcription elongation factor P-TEFb. Transcription of HIV-1 viral genes is achieved by host RNAPII and is induced by a viral trans-activator protein, Tat. When bound to the TAR viral RNA region, Tat activates HIV-1 transcription by early recruiting of host transcriptional activators including P-TEFb, which phosphorylates RNAPII CTD promoting viral transcript elongation . Thus, treatment with drugs that inhibit Cdk9, such as flavopiridol, has been used as a retroviral therapy on AIDS patients . Therefore, from the point of view of basic research, the study of the functions of Cdk9 and RNAPII CTD phosphorylations are of great interest in understanding the mechanisms that regulate HIV replication, which consequently lead to progress on AIDS biomedical research. Further evidence has been provided of a deregulated Cdk9 function in several tumors such as lymphoma, neuroblastoma, primary neuroectodermal tumor, rhabidomiosarcoma or prostate cancer [282-284]. The Cdk9 inhibition by chemotherapeutic agents, such as flavopiridol or CY-202, has shown to reduce transcription in malignant cells, mainly affecting the short half-lives RNAs. Most of these RNAs code for anti-apoptotic proteins, for instance onco-protein Mcl-1, which is necessary in tumor proliferation maintenance. Unfortunately, they only have a modest activity in patients although promising studies continue at present . Cardiac hypertrophy consists of an increased size of cardiomyocytes, associated to some cardiac diseases as hypertension or diminished heart function. Hypertrophy is a physiological response to a stress stimulus that results in an increase of the cell size, and that may eventually produce a heart failure. Increased cell size produces increased mRNAs transcription, which requires Cdk9 activity. Thus, it has been shown that therapy with Cdk9 inhibitors benefits patients with cardiovascular disorders .
9. Concluding remarks
The primary function of the RNAPII CTD phosphorylation in eukaryotes is the integration of transcription with distinct nuclear processes. Thus, CTD phosphorylation operates as a fine-tuning regulatory mechanism during the whole transcription cycle and is consequently of extraordinary importance for proper gene expression. Since the late 1980’s, an overwhelming number of laboratories have tried to decipher the mechanism underlying the creation of a CTD code and how this code is translated during transcription to coordinate mRNA processing, export and chromatin modifications. Although great progress has been achieved, most recently due to wide-genomic analysis techniques, a number of issues remained unsolved. For instance, it is very challenging to determine the exact phosphorylation state of specific residues within specific repeats during each step of transcription, as well as to determine the exact number of repeats that are phosphorylated within the CTD at every step of transcription, and how this is related to CTD specific roles in gene expression. Moreover, it needs to be determined if phosphorylation of the repeats with non-consensus sequences is regulated in the same manner as the consensus repeats, and if this is achieved by the same set of CTD modifying enzymes. In addition, other residues such as lysine and arginine can be potentially modified; therefore, further increasing the complexity of the CTD, and suggesting that if they are transcriptionally modified, may further elucidate the CTD functions or discover new ones. Finally, detailed understanding of RNAPII CTD phosphorylation is very relevant and will add insight into the processes that alter gene expression, such as HIV infection and cancer, and will help to investigate if other human CTD modifying enzymes, in addition to Cdk9, may be good candidates for therapy. In conclusion, research has made much progress, but further progress is still needed, and the new massive techniques in genomics and proteomics will help to advance complete understanding much faster.
This work was supported by a grant from the Spanish Ministerio de Ciencia e Innovación (BFU 2009-07179) to OC. AG was supported by a fellowship from the Junta de Castilla y León. The IBFG acknowledges support from “Ramón Areces Foundation”.