RNA Polymerase II Phosphorylation and Gene Expression Regulation

This work was supported by a grant from the Spanish Ministerio de Ciencia e Innovacion (BFU 2009-07179) to OC. AG was supported by a fellowship from the Junta de Castilla y Leon. The IBFG acknowledges support from “Ramon Areces Foundation”.


Introduction
RNA polymerases (RNAPs) are among the most important cellular enzymes. They are present in all living organisms from Bacteria and Archaea to Eukarya and are responsible for DNA-dependent transcription. Although in Bacteria and Archaea there is only one RNAP, Eukarya possess up to three RNAPs in animals (I, II and III) and five in plants (IV and V) [1][2]. All of the RNAPs are evolutionarily related and have common structural and functional properties. The minimally conserved structural organization is represented by the bacterial enzyme, which contains only 4 subunits (''), whereas Archaea and Eukarya RNAPs are composed of 12 subunits (Rpb1-Rpb12) [3]. In prokaryotes, one RNAP transcribes all of the genes into all of the RNAs, however, in eukaryotes, this is achieved by three RNAPs. RNAPI transcribes genes that encode for 18S and 28S ribosomal RNAs; RNAPIII transcribes short genes, such as tRNAs and 5S ribosomal RNA, and RNAPII transcribes all protein-coding genes and genes for small noncoding RNAs (e.g., small nuclear RNAs (snRNAs) that are involved in splicing). The largest catalytic subunits of all three eukaryotic polymerases share homology among themselves and with the largest subunit of bacterial polymerase [4]. Solely the largest subunit of RNAPII (Rpb1) contains an unusual evolutionarily conserved carboxy-terminal domain (CTD) [5], which is subjected to numerous post-translational modifications of extraordinary importance in gene expression regulation [6][7][8].RNAPII transcription plays a central role in gene expression and is highly regulated at many steps, such as initiation, elongation and termination. Furthermore, phosphorylation of the Rpb1 CTD is known to regulate all of the transcription steps and coordinate these steps with other nuclear events. Prior to mRNA biosynthesis, RNAPII proceeds through several steps, such as promoter recognition, preinitiation complex (PIC) assembly, open complex formation, initiation and promoter escape. This sequence of events is initiated by the binding of gene-specific activators and coactivators, which results in the recruitment of basal transcription machinery (i.e., general transcription factors (GTFs): TFIIA, TFIIB, TFIID, TFIIE, TFIIF, and TFIIH) and RNAPII to promoters [9][10][11]. Basal transcription factors position RNAPII on promoters to form the PIC but also function at later steps, such as promoter melting and initiation site selection. Thereafter, initiation proceeds, and RNAPII leaves the promoter during promoter clearance and proceeds into processive transcript elongation. Finally, when the gene has been fully transcribed, transcriptional termination occurs, and RNAPII is released and recycled to reinitiate a new round of transcription [12][13][14].
During its passage across a gene, RNAPII must overcome challenges. Initially, the polymerase needs to escape from the promoter, and the synthesis of the pre-mRNAs must be tightly coupled to its subsequent processing (i.e., capping, splicing, and polyadenylation). Then, initiation factors must be exchanged for elongation factors [15], which are thought to increase the transcription rate and RNAPII processivity. In fact, recently, there has been an extraordinary increase in the number of proteins known to influence transcription elongation by avoiding transcriptional arrest, facilitating chromatin passage and mRNA processing [16][17][18][19][20][21], allowing mRNA packaging into a mature ribonucleoprotein (mRNP) and controlling mRNP quality and mRNA export [13,[22][23][24][25][26][27][28]. Therefore, the discovery of all of these factors has provided further evidence that the elongation phase is also highly regulated in eukaryotic cells and strictly coordinated with other nuclear processes [12][13][14].

RNAPII CTD phosphorylation: The CTD code
During the last two decades, gene expression studies have provided further evidence that many steps in gene expression, originally considered distinct and independent, are, in fact, highly coordinated, linked and regulated in a complex web of connections [29][30]. The central coordinator that directs this regulatory network (i.e., from transcription initiation to termination and with pre-mRNA processing) in combination with many other nuclear functions is RNAPII, and the carboxy-terminal domain (CTD) of its largest subunit is of remarkable importance. CTD phosphorylation regulates and coordinates the entire transcription cycle with pre-mRNA processing, mRNA transport and with chromatin remodeling and modification [13]. The CTD, therefore, has a critical integrating role in essentially all of the mRNA biogenesis steps, thus, it is subject to a dynamic regulation during the transcription cycle (i.e., [21,[31][32]). Therefore, RNAPII phosphorylation is one of the key processes in the regulation of transcription specifically and gene expression in general; consequently, deciphering the mechanisms that underlie RNAPII phosphorylation regulation has become one of the most studied issues in the field of gene expression.
RNAPII is comprised of 12 subunits (Rpb1-12) that are structurally and functionally conserved from yeast to mammals [33][34]. In 1985, the largest subunit of RNAPII, Rpb1, from mouse and Saccharomyces cerevisiae, was cloned [4,35], and its sequence revealed that it contained a highly conserved carboxy-terminal domain (CTD). This domain has been extensively studied since then and, although it is a simple repetition in tandem of the heptapeptide consensus sequence Tyr1-Ser2-Pro3-Thr4-Ser5-Pro6-Ser7 (YSPTSPS); Figure 1), the CTD has an extremely complex functionality. The consensus sequence is present in animals, plants, yeast, and in many protists [5,[36][37], and it has been hypothesized that the CTD structure has originated through amplifications of a repetitive DNA sequence and that the number of repeats appears directly correlated with genomic complexity ( Figure 1A; [38]). For example, mouse and human CTDs contain 52 repeats [35,[39][40]; the Drosophila CTD contains 45 repeats [41]; 25-27 repeats are found in the yeast CTD ( Figure 1A; [4]); and 15 repeats are found in protozoan CTDs [5,38]. Although the CTD is completely dispensable for in vitro transcription, it is required for efficient RNA processing [17,42]. In fact, the CTD is essential for cell viability because its deletion is lethal in mice, Drosophila and yeast, and partial truncations or site-specific mutations cause specific growth defects [5,42]. Original studies showed that two RNAPII forms can be differentiated in SDS-PAGE gels because of the different mobility of Rpb1 [43]. These two forms were termed RNAPIIA and RNAPIIO, and they differ in the extent of CTD phosphorylation. RNAPIIA is hypophosphorylated [44], and RNAPIIO is hyperphosphorylated [45]. Moreover, both forms, IIA and IIO, are functionally distinct because the IIA form is preferentially recruited to the promoter and associated with preinitiation complexes [46], whereas RNAPIIO functions during elongation, is highly phosphorylated [44] and thus requires dephosphorylation to stimulate its recruitment into the PIC complexes and to reinitiate a new round of transcription [47]. We currently know that this earlier two-step transcription cycle The RNAPII CTD code determines and coordinates the timely sequential recruitment of required specific factors during the transcription cycle. Therefore, the CTD functions as a scaffold that coordinates mRNA biogenesis, such as transcription initiation [58], promoter clearance [59], elongation [60], and termination [31,[61][62], as well as RNA processing [17,21,30] and snRNA and snoRNA gene expression [63][64][65] by recruiting the appropriate set of factors when required during active transcription. These factors recognize CTD phosphorylation patterns either indirectly or directly by contacting phosphorylated residues. Among the CTD-associated factors are export and histone modifier factors and DNA repair factors [21].

Ser2P and Ser5P
, and to a lesser extent, Ser7P, are the main determinants of the CTD code To determine precisely which serine residues are phosphorylated in a particular repeat has been challenging because of the numbers of phospho-acceptor amino acid residues and consensus motif repetitions ( Figure 1). However, studies involving chromatin immunoprecipitation with specific monoclonal antibodies have provided evidence that differential phosphorylation of the Ser residues coincides with the temporal and spatial recruitment of different factors [8,32,48,[66][67]. In fact, these antibodies have been largely used to decipher and characterize the role of CTD phosphorylation during the transcription cycle and in gene expression regulation [8,32,68]. Antibodies that selectively recognize either Ser2 or Ser5 phosphorylation (i.e., Ser2P or Ser5P, respectively) were the first residues to be described [66]; phosphorylation of these two residues has been extensively studied, and they have been considered as the two main determinants of the CTD code [6]. It is widely known that CTD phosphorylation switch from Ser5 to Ser2 during the course of transcription and is subject to a dynamic regulation during the whole transcription cycle [69][70][71]. The level of Ser5 phosphorylation peaks early in the transcription cycle and remains constant or decreases as RNAPII progresses to the 3′ end of the gene ( Figure 3); [48,67,72]). In contrast, Ser2 phosphorylation is the predominant modification in the coding and 3′-end gene regions and occurs simultaneously with productive elongation [31,48,73]. On the other hand, de-phosphorylation of Ser5 occurs during the initiation-elongation transition and throughout the entire elongation step, whereas Ser2 de-phosphorylation occurs at the end of transcription to recycle the polymerase and reinitiate a new round of transcription. Therefore, reversible phosphorylation/de-phosphorylation of the CTD plays a significant role in modulating the transcription cycle [31][32].
Most recently, the use of new anti-CTD monoclonal antibodies has demonstrated that Ser7, which is the most degenerate position of the CTD [41], can be phosphorylated during the transcription of snRNA genes and protein-coding genes [64,68,74]. Subsequently, this mark increases the complexity of the CTD code [7][8]. Ser7 phosphorylation is mediated by the same kinase [74][75], although, at least in Saccharomyces cerevisiae, Ser7P appears not to be dephosphorylated by the same Ser5 phosphatase (see below) [76].The first study on Ser7 phosphorylation provided further evidence that this modification is functionally important for transcription and processing of snRNAs [8,64] and hypothesized that the CTD code could be gene-transcription dependent. In mammals, Ser7P peaks at the promoter region of snRNA genes but is enhanced toward the 3′ end of protein-coding genes [68]. Recent genome-wide distribution studies in yeast have provided further evidence that Ser7P in protein coding genes occurs early during transcription initiation and is maintained during the entire transcription cycle. In fact, Ser7P is not only maintained, but it is also generated de novo during transcription elongation. Additionally, it has been hypothesized that Ser7 phosphorylation could facilitate elongation and suppress cryptic transcription [77]. During transcription initiation and promoter escape RNAPII CTD is phosphorylated on Ser5 (Ser5P) [48,78]. Concurrently, Ser7 is phosphorylated (Ser7P), establishing a bivalent mark at both protein-coding and noncoding genes [74][75][76]. Shortly after promoter dissociation, Ser5P is rapidly removed while phosphorylated Ser2 (Ser2P) and Ser7P continue to accumulate [70,77]. Finally, all CTD marks are rapidly removed at the end of transcription, and the hypophosphorylated RNAPII (in grey) is ready to assemble into the preinitiation complex and re-initiate transcription [73,[79][80]. Small circles represent phosphorylated serine residues (green cirlces for Ser5P, blue circles for Ser7P and red circles for Ser2P). Differently colored big circles represent the distinct phosphorylated forms of RNAPII during initiation, elongation and termination. TSS: transcription start site; p(A): polyadenylation site.

Tyrosine 1 and Threonine 4 can also be phosphorylated
Tyrosine 1 (Tyr1) is evolutionarily conserved and present in all of the 52 repeats of the mammalian CTD, and in all of the 26-27 repeats of the yeast CTD. Although, it is well known that Tyr1 is susceptible to phosphorylation by tyrosine kinases in vivo [52][53] and that Tyr1 mutations are lethal in yeast [81], the function of this modification is unknown. Additionally, threonine 4 (Thr4) is also subjected to phosphorylation, at least in mammalian and in yeast cells [54,82], and recently it has been demonstrated that phosphorylation of the Thr4 residues is required specifically for histone mRNA 3' end processing, which facilitates the recruitment of 3' processing factors to histone genes, and is evolutionarily conserved from yeast to human [54].
In mammals, there is an important degeneracy at some positions in the CTD, mainly in most of the carboxy-terminal repeats. Thus, the last repeat of the CTD is followed by a conserved 10 amino acid extension (Figure 1; [5]) that contains a constitutive site for the casein kinase (CK) II site [83]. Though deletion of this extension results in degradation of the CT, and effects in transcription and pre-mRNA processing [83][84], mutation of the CKII target site does not affect RNAPII CTD stability. Additionally, this extension is required for the phosphorylation of Tyr1 by c-Abl in mammals and it has been suggested that Tyr1 phosphorylation could be involved in functions specific of these higher eukaryotes [85]. Finally, non-consensus residues, such as lysine and arginine, are also present in the CTD, and they could be potentially modified by acetylation, ubiquitylation, sumoylation (lysine residues) and methylation (lysine and arginine residues) [86]. Therefore, the possibilities of CTD modifications are enormous, and only some of the modifications have been demonstrated to influence, while interacting with numerous factors, different aspects of gene expression.

Modifying enzymes: Kinases and phosphatases
Most of what is known concerning CTD-protein interactions, and in particular RNAPII CTD modifying enzymes, is derived from animal and yeast models, especially Saccharomyces cerevisiae, since the consensus sequence and repetitive structure of the CTD in addition to the CTD-modifying enzymes are highly conserved across a wide range of organisms. A number of kinases and phosphatases that target the CTD have been described and extensively studied (Tables 1 and 2, and reference therein). Recent genomewide distribution studies of the CTD modifications in yeast have provided further evidence that complex interplay exists between these enzymes (i.e., kinases, phosphatases and isomerase), which coordinate a universal RNAPII CTD cycle [69]. These modifying enzymes alter specific serine residues within the CTD repeats and have distinct and specific functions along the transcription cycle. Although the catalytic mechanisms of CTD kinases and phosphatases are known, the basis for their specificity remains incompletely understood [87][88].
Below, in figure 4, we will highlight the most relevant features and functions of CTD kinases and phosphatases, with special emphasis on the budding yeast enzymes because extensive studies on RNAPII CTD phosphorylation have been performed on that organism, and most of these enzyme complexes are evolutionarily conserved.

RNAPII CTD kinases
The CTD is phosphorylated by members of the cyclin-dependent kinase (CDK) family, which usually consists of a catalytic and a cyclin subunit. Although CDKs are cell cycle regulators, several members of this family have direct functions in RNAPII activity regulation [39,88]. All these kinases are members of multiprotein transcription regulatory complexes and, in mammals, the best known are Cdk7/CycH, Cdk8/CycC and Cdk9/CycT; recently, Cdk12/CycK has been characterized as a new CTD kinase [89]. These kinases are evolutionarily conserved, and the following four complexes with kinase activity have been identified in the well-known yeast model Saccharomyces cerevisae: Kin28/Ccl1, Srb10/Srb11, Bur1/Bur2 and Ctk1/Ctk2 (Table 1).   [48,67,72]. In contrast, Ser2 phosphorylation is the predominant modification in the gene body and towards the 3′ end, and occurs concurrently with productive elongation [31,48]. Ctk1 is the principal kinase responsible for Ser2 phosphorylation in the body of the genes [16,73]. In addition to Ctk1, the Bur1/Bur2 kinase complex phosphorylates Ser2 when RNAPII is near the promoter and stimulates Ser2 phosphorylation by Ctk1 during elongation [99]. Several CTD-phosphatases have been shown to specifically de-phosphorylate Ser5P (Ssu72 and Rtr1), Ser2P (Fcp1) and Ser7 (Ssu72) to promote the initiation-elongation transition, elongation, termination, and RNAPII recycling [50,73,79,167,182]. Srb10 was demonstrated to phosphorylate the RNAPII CTD prior to PIC assembly, negatively regulating transcription initiation [92].

Pre-initiation and Initiation RNAPII CTD kinases Cdk8/CycC and Srb10/Srb11
Human Cdk8/CycC and yeast Srb10/Srb11 are part of the CDK-module of Mediator [113], a large complex of 25-30 proteins that is structured in 4 sub-complexes or modules that act as a molecular bridge between DNA-binding transcription factors and RNAPII [114][115].
Mediator is required for the expression of nearly all RNAPII transcribed genes [116]. Cdk8/Srb10 is part of the CDK-module (Cdk8, cyclin C, MED12 and MED13 in mammals; Srb8, 9, 10 and 11 in yeast), which dynamically associates with Mediator [93,117]. Although Cdk8/Srb10 can phosphorylate Ser2 and Ser5 of the CTD repeats in vitro [90, 92-94, 109, 113], the in vivo relevance of Cdk8/Srb10 remains to be defined. In fact, several studies have provided evidence that Srb10/11 can have both negative [116] and positive [118] effects on gene expression in vivo. Srb10 was demonstrated to phosphorylate the RNAPII CTD prior to PIC assembly, negatively regulating transcription initiation ( Figure 4; [92]). Notably, human Cdk8 represses transcription via phosphorylation and inactivation of the cyclin H subunit of TFIIH, which is the Cdk7 partner [90]. However, subsequent work showed that Srb10 functions in association with Kin28 (hCdk7) to promote RNAPII re-initiation [94]. Following PIC formation and an initial round of transcription, it is thought that subsequent rounds of RNAPII binding and promoter clearance are facilitated via a "scaffold complex" that is composed of a subset of Mediator subunits and GTFs (except TFIIB and TFIIF) that remains bound at the promoter [119]. Therefore, Kin28 and Srb10 have overlapping positive functions in promoting transcription and in the formation of the scaffold complex [94]. Srb10 phosphorylates two subunits of the general transcription factor TFIID (Bdf1 and Taf2) at the PIC; however, the role of these phosphorylation events has not yet been defined. Moreover, Srb10 phosphorylates and inactivates some transcription factors [120][121][122] by triggering their nuclear export or degradation [123][124] and phosphorylates and enhances the activity of others (Table 1). In summary, the in vivo relevance of RNAPII phosphorylation by Cdk8/Srb10 and its role in gene expression have yet to be elucidated.
Additionally, yeast Kin28 phosphorylates two subunits of Mediator (i.e., Med4 and Rgr1/Med14), and although the functionally of these modifications is unknown, it has been demonstrated that Mediator significantly enhances the phosphorylation of RNAPII CTD by Kin28 [94,96]. In fact, in vitro assembly of TFIIH into a pre-initiation complex requires Mediator [139], and following transcription initiation, phosphorylation of Ser5 by Kin28 parallels with the release of Mediator from the CTD of RNAPII as promoter clearance occurs [80].
As discussed above, in yeast, Kin28 and Srb10 have overlapping functions in promoting transcription, PIC dissociation and subsequent scaffold complex formation [94]. Genetic analysis has provided further evidence that Kin28 and Srb10 are not redundant because only Kin28 is essential for growth, and Srb10 is much less processive in terms of phosphorylation than Kin28 [140]. It is clear that Kin28 is the primary kinase responsible for the high level of phosphorylation of RNAPII during initiation [48,67,94,141]. In fact, one essential role of Kin28 that Srb10 does not have is the stimulation of pre-mRNA processing. However, what appears clear, at least in yeast, is that PIC dissociation is dependent on the kinase activities of Kin28 and Srb10. Additionally, another function of RNAPII CTD Ser5 phosphorylation by Kin28 is the enhancement of Bur1/Bur2 recruitment and Ser2 CTD phosphorylation near the promoters [99]. Moreover, it has recently been demonstrated that TFIIH kinase places bivalent marks on the CTD, thereby phosphorylating Ser7 during transcription initiation [74][75].

RNAPII CTD elongating kinases Cdk9/CycT and Cdk12/CycK
Eukaryotic organisms possess many factors that regulate transcriptional elongation; among these factors is Cdk9 kinase, which is the catalytic subunit of the positive transcription elongation factor b (P-TEFb) that controls the elongation phase of transcription by RNAPII in mammals and Drosophila melanogaster [12]. Cdk9 is the major Ser2 kinase, but it also contributes to Ser5 phosphorylation in vitro and in vivo during the initiation-elongation transition and the polymerase release of promoter-proximal pausing [109,142]. Cdk9 activity is also required for efficient coupling of transcription with pre-mRNA processing [108]. Additionally, very recently, it has been shown that Thr4 is phosphorylated by Cdk9 [54].
In higher eukaryotes, the transcription factor P-TEFb not only regulates CTD phosphorylation, but it also inhibits the action of transcriptional repressors and is required for the association of several elongation factors with the transcribing polymerase. P-TEFb also targets DRB sensitivity-inducing factor (DSIF) and negative elongation factor (NELF) [142][143][144] (Table 1). Thus, P-TEFb promotes transcription by the following two different mechanisms: inhibiting the action of transcriptional repressors and phosphorylating the CTD during transcription elongation. Until recently, it was believed that Cdk9 was the only CTD Ser2 kinase in higher eukaryotes. In fact, Cdk9 can reconstitute the activity of both S. cerevisiae Ser2 CTD kinases, Bur1 and Ctk1. However, it has recently been demonstrated that Drosophila have one ortholog of yeast Ctk1, Cdk12, whereas humans have two, Cdk12 and Cdk13; only Cdk12 has been clearly demonstrated to be an elongating CTD kinase [89,145].

Bur1/Bur2
Bur1 kinase and its cyclin, Bur2, form an essential CDK in S. cerevisiae involved in transcription elongation [147][148]. Although Bur1 and Ctk1 kinase complexes appear to functionally reconstitute the activity of P-TEFb in yeast [149], Bur1 is more related in sequence and functionally to mammalian P-TEFb than Ctk1 [147,149], and as we have discussed, it is clear that Cdk12 is the functional equivalent of yeast Ctk1 [89,145]. Bur1 can phosphorylate Ser2 and Ser5 [99,147,150] [151], and although it was first demonstrated to show some preference for Ser5 and to be less active than Ctk1 or Kin28 [147], later studies provided evidence that Bur1 interacts with the RNAPII CTD and phosphorylates at Ser2. In fact, Bur1 phosphorylates elongating RNAPII molecules that have been previously phosphorylated at Ser5 and are located near the promoter during early transcription elongation ( Figure 4, and [99]. Thus, it has been hypothesized that Bur1/Bur2 is recruited to RNAPII, whose repeats are phosphorylated on Ser5 to enhance phosphorylation on Ser2 by Ctk1. Consistent with it, Bur1 produces the Ser2 phosphorylated residues that remains when Ctk1 is inactivated [152]. Bur1 also stimulates transcription elongation as its mammalian homologue P-TEFb [150,152], and mutations on BUR1 cause sensitivity to drugs that are known to affect transcription elongation (e.g., 6-azauracil) [147,150]. More recently, a chemical-genomic analysis has provided further evidence that Bur1 also phosphorylates Ser7 in the body of the genes [77].
Bur1 shares another function with the mammalian and Schizosaccharomyces pombe Cdk9 [142,153]. Bur1 kinase activity is important for the in vivo phosphorylation of the elongation factor Spt5 (mammalian DSIF) [102,154]. Spt5 contains a carboxy-terminal domain that consists of approximately 15 repeats (CTR) that are similar to the RNAPII CTD [102], which is subject to phosphorylation. The Spt5-CTR is required for efficient elongation by RNAPII and for chromatin modifications in transcribed regions (see below). Thus, Spt5 phosphorylation mediates, at least in part, Bur1 kinase roles on transcription elongation and histone modifications [154].

Ctk1/Ctk2-Ctk3
Ctk1 was originally identified as the kinase subunit of the yeast CTDK-I complex that catalyzes phosphorylation of the RNAPII CTD [155]. Ctk2 is the cyclin, and the Ctk3 function remains unknown. Ctk1 is the principal kinase that is responsible for CTD-Ser2P during transcription elongation, which is coincident with reduced Ser5P [73,156]. Although Ctk1 is not directly involved in transcription elongation [16,18,157], it associates with RNAPII throughout elongation [49], and the kinase activity of Ctk1 is required for the association of polyadenylation and termination factors [16] and histone modification factors [158]. Additionally, Ctk1 interacts genetically as well as biochemically with the TREX complex [159], which couples transcription elongation to mRNA export [160]. Moreover, Ctk1 promotes the dissociation of basal transcription factors from elongating RNAPII, early during transcription, however, kinase activity is not required [105].
In addition to its functions in transcribing gene coding proteins, Ctk1 is involved in RNAPI transcription, interacts with RNAPI in vivo [161], and it is required for the integrity of the rDNA tandem array [162]. All of these studies suggest that Ctk1 might participate in the regulation of distinct nuclear transcriptional machineries. Additionally, it has been demonstrated that Ctk1 is required for DNA damage-induced transcription [163], and notably, that Ctk1 has a role in the fidelity of translation elongation in the cytoplasm [110,164].

Rtr1 / RPAP2
Chromatin immunoprecipitation studies have provided further evidence that the increase in Ser2P occurs as transcription progresses through the gene and follows Ser5P dephosphorylation. Rtr1 in yeast was identified as the RNAPII CTD phosphatase driven the Ser5-Ser2P transition at the 5' regions of the transcribed genes. Rtr1 genetically interacts with the RNAPII machinery, and Rtr1 deletion provokes global Ser5P accumulation in whole-cell extracts and Ser5P association throughout the coding regions [167,171]. RPAP2 was identified in a systematic analysis carried out to determine the composition and organization of the soluble RNAPII machinery [169], and as in the case of Rtr1, Ser5P levels increase in vivo when RPAP2 is knocked down. Additionally, RPAP2 depletion affects snRNA gene expression as it does mutations of the Ser7 residue [64]. In fact, Ser7P recruits the 3'-end processing Integrator complex and RPAP2 to drive Ser5 de-phosphorylation of RNAPII CTD during the transcription of snRNA genes [170,183]. Recently, a model has been proposed in which RPAP2 recruitment to snRNA genes through CTD-Ser7P triggers a cascade of events that are critical for proper gene expression [170].

Ssu72
Ssu72 was first described as a Ser5P phosphatase and recently as a Ser7P phosphatase [50,69]. In fact, Ssu72 was originally identified as functionally interacting with the general transcription factor TFIIB [184][185]. Afterward, it was demonstrated that Ssu72 is part of the cleavage and polyadenylation factor (CPF) with a role at the 3'-end of genes [166,175]. In fact, Ssu72 is crucial for transcription-coupled 3'-end processing and termination of proteincoding genes [175,[186][187]. Later, Ssu72 was characterized as a Ser5P phosphatase [79] and a potential tyrosine phosphatase [188] and, most recently, it has been demonstrated that Ssu72 is also a Ser7 phosphatase [50,69]. A genome-wide distribution analysis of Ssu72 has demonstrated two peaks of association ( Figure 4): a low peak at the 5'-end of genes and a higher peak at the cleavage and polyadenylation site or immediately after it [50]. In agreement with it, Ssu72 dephosphorylates RNAPII CTD following cleavage and polyadenylation and recycles the terminating RNAPII, giving rise to a hypophosphorylated polymerase. In fact, inactivation of Ssu72 leads to the accumulation of Ser7P marks that avoids RNAPII recruitment to the PIC, and therefore inhibits transcription initiation, which results in cell death [50]. In other words, Ssu72 is critical for transcription termination, 3'end processing and RNAPII recycling to restart a new round of transcription. Additionally, it has been shown that Ssu72 has a function in gene looping [172]. In a screen looking for mammalian retinoblastoma tumor suppressors, a human homolog of yeast Ssu72 was identified. As in yeast, mammalian Ssu72 associates with TFIIB and the yeast cleavage/polyadenylation factor Pta1, and exhibits intrinsic phosphatase activity [176]. The crystal complex structure that is formed by human symplekin (Pta1 in yeast), hSsu72 and a CTD phosphopeptide has been elucidated, and hSsu72 was demonstrated to have a function in coupling transcription to pre-mRNA 3'-end processing [187].
Gene transcription is decreased in cells lacking Fcp1 function, and fcp1 mutants exhibited a general accumulation of hyperphosphorylated RNAPII in whole-cell extracts, and specifically in the gene coding regions [178]. Fcp1 also has the ability to stimulate RNAPII transcript elongation in vitro independent of its phosphatase activity [182], which suggests that it associates with and modulates elongating RNAPII. In agreement with this, chromatin immunoprecipitation studies have demonstrated that Fcp1 associates with the promoter and coding region of active genes in vivo [73]. Recent genome-wide studies have provided further evidence that Fcp1 associates with genes from promoter to 3'-end regions, showing the highest association of Fcp1 with the cleavage and polyadenylation site. This association occurs after Bur1 and Ctk1 have dissociated, which permits Fcp1 to completely dephosphorylate all the remaining Ser2P residues ([50], Figure 4). Fcp1 is also responsible for de-phosphorylation of RNAPII following its release from DNA [165]. Fcp1 association with genes at the cleavage polyadenylation site overlaps with Ssu72 association, whereas this overlapping does not exist at the 5' and coding regions (Figure 4). This fact indicates that CTD de-phosphorylation may be coupled at the 3'-ends, and it has been hypothesized that Ssu72 activity may be important for Fcp1 function, thereby coupling Ser2P dephosphorylation to the removal of Ser5P and Ser7P [69].

Other factors influencing RNAPII CTD phosphorylation
Although many factors can have effects on CTD phosphorylation, we will highlight the following two that we believe are of significant relevance: the prolyl isomerases hPin1/yEss1 and the structure of the RNAPII itself. In addition, we will describe the role of ySub1 in CTD phosphorylation, because it has been extensively studied by us.

hPin1 / yEss1
The CTD can adopt either cis-or trans-conformations, which can significantly affect its modification, especially its phosphorylation. Peptidyl prolyl isomerases (PPIases) are enzymes that accelerate the rates of rotation about the peptide bond preceding proline and are important for protein folding and regulation of dynamic cellular processes [193][194]. Pin1 in mammals and Ess1 in S. cerevisiae are RNAPII CTD PPIases. Phosphorylated Ser2 and Ser5 match with the pSer-Pro sequence that is recognized by Pin1, and the CTD appears to be its principal target of regulation [195][196]. Pin1 has specificity for phosphorylated Ser/Thr-Pro sequences, and it modulates RNAPII activity during cell cycle at least in part by regulating RNAPII CTD phosphorylation levels [195]. Yeast Ess1 physically interacts with the CTD [55,197], and it preferentially binds and isomerizes in vitro Ser5P residues [198]. Although Pin1 stimulates RNAPII CTD hyperphosphorylation, which results in transcription repression and inhibition of mRNA splicing [195][196], in vivo studies have proposed that Ess1 promotes RNAPII CTD de-phosphorylation. In any case, both isomerases have important functions in transcription. Therefore, initiation-elongation transition is inhibited by Pin1 [196], whereas Ess1 affects multiple steps, such as initiation, elongation, 3′-end processing, and termination [197,[199][200][201]. In fact, it has been demonstrated that Ess1 promotes Ssu72-dependent function by creating the CTD structural conformation that is recognized by Ssu72 [202], and recently it has been confirmed that isomerization is a key regulator of RNAPII CTD de-phosphorylation at the end of genes [69].

RNAPII structure and Rpb1-CTD localization
The structure of the complete 12-subunit RNAPII (Rpb1-12) is known [203][204]. Rpb4 and Rpb7 subunits form a conserved sub-complex that is conserved in all three eukaryotic RNA polymerases and archaea RNAP [205][206]. Crystal structures of the Rpb4/7 heterodimer in the context of the complete RNAPII complex localized it in the proximity of the Rpb1-CTD [203,207], and biochemical and genetic studies suggest that Rpb4/7 might have a function in the recruitment of some CTD-binding proteins to transcribing RNAPII. Moreover, it is possible that this sub-complex, Rpb4/7, would regulate the access of CTD modifying enzymes during the whole transcription cycle [203,207,[209][210][211][212]. Actually, structural studies have provided further evidence that the CTD extends from the RNAPII core enzyme near the RNA exit channel [204], where it is ideally located to bind and be affected by the action of a multitude of factors, among them kinases, phosphatases and isomerases. In fact, in yeast, the isopropylisomerase Ess1 and the phosphatase Fcp1 are associated with Rpb7 and Rpb4, respectively [55,87,208].

The ssDNA binding protein Sub1 as a general regulator of transcription
Sub1 is an ssDNA binding protein that has been implicated in several steps of mRNA metabolism, such as initiation, transcription termination and 3'-end processing [186,[213][214][215]. Sub1 was originally described as a transcriptional stimulatory protein that is homologous to the human positive coactivator PC4, which physically interacts with activators and components of the RNAPII basal transcription machinery [216][217][218][219][220]. Sub1 genetically and physically interacts with TFIIB [214][215]221], and several functions have been proposed for Sub1 that include stimulating PIC recruitment and promoter escape. In fact, most recently, using a quantitative proteomic screen to identify promoter-bound PIC components, Sub1 was identified as a functional PIC component that is associated with RNAPII complexes [225]. In addition, we have recently demonstrated that Sub1 globally regulates RNAPII CTD phosphorylation ( Figure 5, [222]) and that it is a bona fide elongation factor that influences transcription elongation rates (García and Calvo, unpublished results). Although it has been broadly studied, and several functions have been hypothesized for Sub1 [213,215,[222][223][224][225]; however, the exact mechanism by which Sub1 functions in transcription remains unclear. Sub1 globally regulates RNAPII-CTD phosphorylation during the entire transcription cycle by modulating, albeit differentially, the activity and recruitment of CTD modifying enzymes [222,224]. We have proposed a model showing how Sub1 might function to globally regulate RNAPII CTD phosphorylation ( Figure 5). In wild-type cells (wt), non-phosphorylated Sub1 joins the promoter (possibly via TFIIB; [214][215]221]), contacting the promoter via its DNA binding domain. At that point, Sub1 interacts with the Cdk8-Mediator complex, helping to maintain the PIC in a stable but inactive conformation. Sub1 is then phosphorylated (possibly by the action of kinases at the PIC, similarly to PC4, its human homolog), losing its DNA binding capacity and promoting clearance of TFIIB [214][215]226]. The PIC next changes conformation such that Kin28 can be activated, and with the help of Srb10 promotes PIC dissociation into the scaffold complex as well as the recruitment of elongating kinases Ctk1 and Bur1. In contrast, in the absence of Sub1 (sub1Δ), Srb10 activity and recruitment are decreased, while Kin28 recruitment and activity increases, in agreement with TFIIH being negatively regulated by Cdk8-containing Mediator complexes [90,227]. As a result, Ser5P levels are increased, and consequently Bur1 and Ctk1 association with chromatin is also enhanced [99,228]. Furthermore, in sub1Δ cells there is a reduction on Fcp1 phosphatase levels and its association with chromatin, which induces an additional increase in Ser2P, impairing RNAPII recycling after transcription termination. Thus, a decrease in RNAPII recruitment is observed in cells lacking Sub1 [224]. Additionally, Sub1 also influences Spt5 elongation factor phosphorylation by Bur1 (García and Calvo, unpublished results). We currently do not understand the biochemical basis for these effects. We have not found evidence that Sub1 associates with any of the CTD kinases or evidence that Sub1 influences the CTD kinase activities by influencing post-translational modifications of the kinases. Therefore, we currently consider two possible explanations for the effects of Sub1 on the activities of the CTD kinases. One explanation is that Sub1 enhances the association (or dissociation) of an unidentified, common regulator with the kinases, whereas the other is that Sub1 in some manner influences kinase accessibility to the CTD.

RNAPII CTD phosphorylation and pre-mRNA processing
The CTD is an unordered structure that extends from the RNAPII core enzyme, near the RNA exit channel [204,209]. This localization is convenient to interact with a plethora of factors, such as the CTD-modifying enzymes and binding factors involved in distinct nuclear processes, for example, components of the RNA processing machinery [32,88]. Furthermore, its length and the ability to adopt numerous conformations permit it to interact with different factors at the same time [31][32], and it is currently clear that these interactions depends on the CTD phosphorylation patterns during the transcription cycle [8,21].
As transcription progresses the nascent RNA is capped to protect the 5′ end, intron sequences are removed, and a polyadenylated tail is added to the 3' end. Coupling mRNA processing to transcription increases processing efficiency and allows multiple regulatory pathways to guarantee that only correctly modified mRNAs are exported. For more than a decade, numerous studies have provided evidence that the CTD serves as a scaffold for the assembly of an enormous variety of protein complexes to coordinate not only transcription of non-coding and protein-coding genes [8,[58][59][60][61][62][64][65], but also pre-mRNA processing [21,[31][32]: capping [42,135,137], splicing [229], and 3'-end cleavage and polyadenylation [42]. All of these functions are achieved through the recognition and reading of the CTD code during the transcription cycle [6][7][8]31]. Thus, co-transcriptional CTD-mediated processing of nascent RNA plays a crucial role in both recruitment of RNA processing machineries and regulation of their activities. Indeed, a functional CTD is not required for in vitro transcription by RNAPII, but it is essential for efficient pre-mRNA processing [42,[230][231].

Capping
The capping reaction consists in the addition of an inverted 7-methylguanosine cap to the first RNA residue by a 5'-5' triphosphate bridge. It is a characteristic of all RNAPII transcripts and is added to the 5'-end of nascent transcripts when they are only 25-50 bases long. The capping complex contains the following three enzymatic activities: RNA 5'triphosphatase, guanylyl transferase and RNA (guanine-7) methyltransferase [17,67]. In yeast, these activities are achieved by three enzymes (i.e., Cet1, Ceg1 and Abd1, respectively), whereas in metazoans, these activities are performed by two enzymes (i.e., HCE and MT) because guanylyl transferase and RNA 5'-triphosphatase are two functionally domains of HCE protein [17]. Following Ser5 phosphorylation by TFIIH, the mRNA capping complex binds directly and specifically to Ser5P residues through the Ceg1 subunit in yeast or the guanylyl transferase domain in metazoans [48,67,78,95,137]. Furthermore, phosphorylated CTD interaction with the capping complex allosterically stimulates the capping enzyme activity and in response, enhances early transcription [136,232]. Because the CTD is located near the RNA exit channel, its interaction with the capping complex permits its positioning for rapid processing of the mRNA 5'-end as the nascent transcript emerges from the polymerase. This is thought to protect the RNA from degradation and promote RNAPII to proceed into productive transcription elongation. In fact, by coupling capping and early transcription, only capped RNA will be elongated [67,136,[232][233].

3'-end processing
Not only capping and transcription are linked at the 5'-end regions of protein coding-genes, but also polyadenylation and transcription termination at the 3'-end regions. In brief, 3'-end processing consists of the following two-step reaction: endonucleolytic cleavage of the pre-mRNA and subsequent addition of a poly(A) tail [17]. Both enzymatic reactions require a functional CTD [42,230]. In fact, deletion of the CTD or absence of CTD phosphorylation negatively affects 3'-end processing [16,30,106,157,234]. Furthermore, the CTD binds 3′end processing factors and stimulates cleavage/polyadenylation in vivo and in vitro [42,230]. The cleavage is achieved by a complex that consists of CstF, CPSF, CF1, and CF2 in higher eukaryotes and CF1A, CF1B, and CFII in yeast, whereas the polyadenylation reaction is performed by a poly(A) polymerase in both cases [17] . Cleavage/polyadenylation factors CPSF and CstF can specifically bind to CTD affinity columns and are copurified with RNAPII [42]. In yeast, several 3'-end factors preferentially binds phosphorylated CTD [72,[106][107]235]. Furthermore, yeast 3'-end processing factors are recruited depending on Ser2 phosphorylation by Ctk1 when RNAPII reaches the 3'-end regions of the transcribed genes. Therefore, regulation of CTD phosphorylation as the polymerase transcribes facilitates coordination of the assembly of the 3′-end processing machinery with transcription [16]. Additionally, the polyadenylation signals are required for proper transcription termination in mammals and yeast [236][237]. In fact, Rtt103, which is a 3'-end mRNA processing factor, interacts with the CTD phosphorylated on Ser2 and recruits a 5'-3' RNA exonuclease, thereby promoting the release of RNAPII from the DNA [238][239]. In summary, Ser5 phosphorylation by TFIIH kinase (Kin28/Cdk7) is required to recruit the RNA-capping machinery to RNAPII [48,67,72], whereas Ser2 phosphorylation is required for the recruitment of 3'-end processing complexes and for transcription termination [16,30,106,[238][239] (Figure 7). However, it is unknown whether phosphorylation of Ser5 and Ser2 of all of the repeats or only some of the repeats is required to enhance capping and cleavage/polyadenylation, respectively.
Ser7 phosphorylation has been functionally related with 3'-end processing of snRNA in higher eukaryotes. Human snRNA genes, contrary to protein-coding genes, are not polyadenylated, and instead of a poly(A) signal, they contain a conserved 3′ box RNAprocessing element that is recognized by the snRNA gene-specific Integrator RNA 3′ endprocessing complex. This complex binds to RNAPII CTD and links transcription and 3'-end processing [63][64][240][241]. Therefore, in metazoans, Ser7P, in combination with Ser2P, is a major determinant for the recruitment of the Integrator complex to snRNA genes during its transcription [64,[240][241]. In yeast, the Integrator-like complex recruitment depends on Ser7 phosphorylation, the promoter elements and the specialized PIC that binds those elements [74]. After promoter escape, the RNA processing complex travels with the elongating phosphorylated polymerase up to the 3'-end box at the end of the snRNA transcription unit, where it associates with the nascent transcript in a co-transcriptionaldependent manner.

Splicing
As in the case of capping and cleavage/polyadenylation, a number of studies performed in vivo and in vitro during the last decades have demonstrated the existence of a functional interaction between the transcriptional machinery and the splicing apparatus [21,242]. However, this functional interaction and the underlying mechanism are less accurately understood. The most complex pre-mRNA processing reaction is splicing, which is carried out by a large complex, the spliceosome, consisting of at least 150 protein components and five snRNAs [242]. The first indication of a coupling between transcription and splicing came from studies demonstrating that truncation of the CTD severely altered splicing in vitro [42]. Later it was shown that the CTD directly affects splicing, and that a phosphorylated CTD is required for the efficient splicing reaction [231,243]. These data provided evidence that an elongating RNAPII with phosphorylated CTD is an active component of the splicing reaction. A number of physical links between the phosphorylated CTD and the splicing apparatus have been established, and chromatin immunoprecipitation analysis have shown that the direct binding of the splicing machinery to the nascent RNA is responsible in a large part for the co-transcriptional splicing in yeast and mammals [244][245]. Hyperphosphorylated, but not hypophosphorylated RNAPII, has been found associated with splicing factors and detected in active spliceosomes [246][247][248]. For instance, in yeast, the splicing factor Prp40 binds to phosphorylated CTD [249]; in mammals, Spt6 binds selectively to the CTD-Ser2P [112], and the spliceosome-associated protein CA150 interacts with phosphorylated CTD while interacting with the SF1 splicing factor [250][251]. Therefore, all these studies led to the idea that the phosphorylated CTD acts as a scaffold, binding multiple splicing factors, and directly enhancing the spliceosome assembly.
Corroborating this idea, a recent study identified a splicing factor, U2AF65, that interacts directly with the CTD to activate splicing and likely plays a role in spliceosome assembly [242,252]. Another recent study provided evidence that coupling transcription and splicing through CTD phosphorylation can be a regulatory point in the control of gene expression. For instance, it has been described that a set of inducible genes can be actively transcribed by RNAPII phosphorylated on Ser5, but not on Ser2, under non-inducing conditions, giving rise to a full length unspliced transcript. However, after induction, Cdk9 is recruited, phosphorylates the CTD on Ser2 and the generated transcript is properly spliced [253]. This fact strongly implicates Ser2P as a key in the integration of splicing and transcription. In addition to constitutive splicing, functional links between the CTD and alternative splicing have also been provided [86]. Thus, it has been suggested that the CTD may regulate the choice of alternative exons by increasing the local concentration of splicing factors [229], and that possibility participate in the physical modulation of alternative splicing.

Coupling the CTD code and the histone code
The nucleosome is the basic element of chromatin and consists of a histone octamer composed of two copies of histone 2A (H2A), H2B, H3 and H4, wrapped by 146 bp of DNA [254]. The histones carry numerous post-translational modifications, and some of these are associated with transcription. In fact, a general view is that histone post-translational modifications draw parallel with either positive or negative transcriptional states. Numerous discoveries have led to the idea that such modifications regulate transcription either directly by causing structural changes to chromatin (e.g., histone acetylation) or indirectly by recruiting protein complexes (e.g., histone methylation) [255][256][257]. Therefore, chromatin not only plays an essential role in packaging the DNA, but also in regulating gene expression. Most histone modifications reside in their amino-and carboxy-terminal tails, and a few of them in their globular domains. As in the case of CTD phosphorylation, where Ser5P triggers Ser2 phosphorylation, some histone modifications mark the deposition of another, thus creating a complex epigenetic signal code, the "histone code", that governs chromatin organization and DNA-dependent processes such as transcription. Therefore, the histone code is responsible for an active or inactive chromatin state with respect to transcription, because it coordinates the recruitment of various chromatin modifying and remodeling complexes to regulate chromatin structure and, consequently, transcription [258][259]. Because this review focuses on RNAPII CTD phosphorylation, only certain histone modifications, which are functionally related to the CTD code and transcription, will be discussed. There are excellent reviews that discuss all the histone modifications and their roles in different nuclear processes [255,258,[260][261].
Lysine is a key substrate residue because it undergoes many exclusive modifications important for transcription regulation (i.e., acetylation, methylation, ubiquitination and SUMOylation [255,261]. The lysine residues can be mono-, di-or trimethylated, and each level of modification can result in distinct bio l o g i c a l e f f e c t s . I n b r i e f , w i t h r e s p e c t t o transcription, acetylation activates and sumoylation appears to be repressive, and both modifications may mutually interfere. On the other hand, methylation can have distinct effects; thus, lysine 4 in histone H3 (H3K4me3) is trimethylated at the 5'-ends of genes during activation, whereas trimethylation of H3K9 occurs in transcriptionally silent regions. Arginine residues of H3 and H4 can also be mono-or dimethylated, which activate transcription. Serine/threonine phosphorylation of H3 in specific sites also marks activated transcription, and ubiquitination of H2B and H2A are associated with active and repressed transcription, respectively (reviewed by [255][256][257]. All histone modifications are removable by specific enzymes (e.g., histone deacetylases (HDACs), phosphatases, and ubiquitin proteases ( [255][256][257], and references therein). In fact HDACs play important regulatory roles during active transcription [262].
Methylation of H3K4 and K36 are the most well characterized histone modifications with roles in active transcription [263], and whose functions are directly linked to RNAPII CTD phosphorylation ( Figure 6). H3K4 is methylated by the Set1/COMPASS complex, while K36 is mediated by the Set2 complex. The profile of H3K4 tri-methylation (H3K4me3) strongly correlates with the distribution pattern of the RNAPII CTD-Ser5P. It is mainly found around the transcriptional start site (TSS) contributing to transcription initiation, elongation and RNA processing [264]. Set1 recruitment and H3K4 tri-methylation usually peaks at the promoter and 5' region of a gene, depending on Kin28/Cdk7 activity ( Figure 7) and Paf1 complex, a RNAPII-associated complex [265], and contributes to transcription initiation, elongation and RNA processing [264] [98]. H3K4 mono and di-methylation tend to expand along the coding regions compared to try-methylation. On the other hand, H3K36 methylation by Set2 is observed across the entire coding region with an increase toward the 3'-ends of actively transcribed genes. Ctk1 also regulates H3K4 methylation [158,266]. Thereby, differently phosphorylated CTD by Kin28 and Ctk1 is responsible for the characteristic distribution of H3K4 tri-methylation in the coding region [158]. In contrast to Set1, the recruitment of Set2 and H3K36 methylation depends on a CTD-Ser2P/Ser5P double mark (Figure 6), and therefore, on Ctk1 kinase activity [158,266]. Interestingly, the other Ser2 kinase complex, Bur1/2, also promotes Set2 recruitment and assists H3-K36 methylation, particularly at the 5′ ends of genes and is required for the histone 2B ubiquitination activity of the Rad6/Bre1 complex [101,103,152]. H3 acetylation / deacetylation is also relevant during active transcription. Thus, histone acetyl and deacetyl transferase complexes (HAT and HDACs, respectively) are recruited to the transcriptional machinery during elongation through the interaction with RNAPII. Indeed, they modulate histone occupancy in the coding regions of actively transcribed genes, and this depends on CTD phosphorylation status [267][268]. HAT acetylates nucleosomes promoting nucleosomes eviction and allowing RNAPII to pass through. Afterward, the nucleosomes are immediately reassembled behind the polymerase and HDACs are co-transcriptionally recruited to rapidly and efficiently deacetylate the reassembled nucleosomes behind the polymerase. Altogether, this avoids cryptic transcription and maintains active transcription [262]. Methylation of histone H3 by Set1 and Set2 is required for deacetylation of nucleosomes in coding regions by the histone deacetylase complexes (HDACs) Set3C and Rpd3C(S), respectively. HDACs' recruitment is triggered by H3K4 methylation at promoters and within coding regions to restrict hyperactetylated histones to promoters and to maintain transcription activity. Set1-H3K4me2 can be recognized by two different HDACs, RPD3S or SET3C [264]. The Set1-SET3C pathway preferentially affects actively transcribed genes with promoters configured for efficient initiation/re-initiation [269]. In contrast, Set1-RPD3S pathway is active at loci subjected to cryptic and weak transcription encompassing repressed promoters of coding genes. Related to this, phosphorylation of the CTD by Kin28/Cdk7 is important for the initial recruitment of the Rpd3S and Set3 HDACs to coding sequences ( [268], Figure 7). In fact, it has been reported that Set3C and Rpd3C(S) are co-transcriptionally recruited in the absence of Set1 and Set2, but stimulated by the CTD kinase Kin28/Cdk7. Hence, the Rpd3C(S) and Set3C co-transcriptional recruitment is stimulated by CTD Ser5P to achieve the deacetylation of H3 residues. This, together with evidence that the RNAPII CTD recruits additional chromatin modifying complexes, histone chaperones and elongation factors, suggest that phosphorylated RNAPII is crucial in coordinating the activities of the many factors required for regulating histone dynamics and consequently transcription elongation at actively transcribing genes ( [262], and references therein).

Transport
In addition to the complexes involved in mRNA processing, several other proteins bind to the RNAs as soon as their 5'-end emerges from the RNAPII, packaging them into a messenger ribonucleoparticle (mRNP). This set of interactions of packaging and export factors play a dual function of protecting the RNA from degradation and preparing it to be exported. Although interactions between the CTD and many mRNA processing factors have been characterized, this is not the case for mRNA packaging and export factors. However, packaging and export seems to be also coupled to transcription through RNAPII CTD interactions because defects in transcription elongation, splicing, and 3′-end processing affect export [270]. In yeast, mRNA export is linked to transcription through the TREX (transcription export) complex, which is composed of the THO complex (Tho2, Hpr1, Mft1, and Thp2) and the evolutionally conserved RNA export proteins, Sub2 (UAP56 in human) and Yra1 (REF/Aly in human), and a novel protein termed Tex1 [271][272]. Deletions of the individual THO components causes defects on transcription, transcription-dependent hyper-recombination, and on mRNA export [160,273]. In addition, the Sub2/Yra1 complex is directly recruited to the actively transcribed regions via the THO complex [272,274]. Although it has been shown that the TREX complex and Ctk1 are functionally related [159], recruitment of the TREX complex to transcribed genes is not dependent on Ctk1 in yeast [16], and the association of the human TREX complex to transcript might be coupled to transcription indirectly through splicing [275]. Then, the potential role of the CTD and CTD phosphorylation in this process remains unclear, however in a very recent study, the mRNA export factor Yra1 was identified as a CTD phosphorylated-binding protein [276]. Then, this study provides strong support for the idea that the phosphorylated CTD is directly involved in the cotranscriptional recruitment of export factors to active genes. In summary, many aspects of the mRNA metabolism from the 5' capping to the export occur cotranscriptionally and are coordinated through transcription, with the RNAPII CTD and its phosphorylation being the main coordinator in most cases (Figure 7). Figure 7. RNAPII CTD phosphorylation/ de-phosphorylation is co-transcriptionally connected and coordinated with other nuclear processes: pre-mRNA processing; histone modifications and mRNA export. The main complexes required for co-transcriptional processes occurring during the expression of a regular protein coding-gene are shown. See text for details.

RNAPII CTD phosphorylation and transcription regulation
The levels of CTD phosphorylation/de-phosphorylation are precisely modulated during the entire transcription cycle, which regulates the association of many important factors with initiating and elongating RNAPII, such as transcription and pre-mRNA processing factors, chromatin modifiers and mRNA export factors [21]. The interplay of all of these factors is essential to regulating transcription and, consequently, gene expression. Subsequently, in a regular protein-coding gene, the following set of coordinated nuclear events must occur for it to be properly transcribed in a functional mRNA before it is exported to the cytoplasm and translated (Figure 7). Unphosphorylated RNAPII is recruited to the pre-initiation complex (PIC); then, after its binding to the promoters, it is phosphorylated on Ser5 by yKin28 (hCdk7). Ser5 phosphorylation is required for RNAPII dissociation from the PIC and consequently promotes transcription initiation. Simultaneously, Ser5 phosphorylation targets capping and splicing factor recruitment, the Set1 methyltransferase complex, and the Set3C and Rpd3S histone deacetylase complexes. During early elongation, Ser5P levels are decreased, whereas Ser2P levels increase due to the kinase activity of yBur1 (hCdk9) near the promoters, and by the kinase activity of yCtk1 (hCdk12) at the forward coding and 3'ends, which leads to the recruitment of the histone methyltransferase Set2 and the activation of the Rpd3S complex, which prevents cryptic transcription within the genes. When RNAPII arrives at and recognizes the termination site, the 3'-end-processing factors that are associated with the CTD achieve cleavage and polyadenylation of the nascent mRNA, which also requires proper phosphorylation of the polymerase. During the termination process, the CTD is de-phosphorylated by Ssu72 and Fcp1, and the polymerase is recycled to initiate a new round of transcription. All along the gene, packaging and export factors (TREX complexes) are incorporated into the transcriptional machinery protecting the transcript from degradation and preparing it for export to the cytoplasm.

Therapeutic potential
Cellular differentiation, morphogenesis, development and adaptability of all organisms are subjected to proper gene expression and, therefore, variations in gene regulation can have profound effects on protein function, challenging the viability of the organisms. Currently, it is clear that RNAPII phosphorylation has an important role on gene expression, and therefore, in all the processes mentioned above. Consequently, over the last decade CTD phosphorylation has attracted the attention of biomedical research, especially due to the fact that the CTD kinase Cdk9 has been involved in several physiological cell processes, whose deregulation may be associated with cancer, and also due to the fact that Cdk9 activity is required for human immunodeficiency virus type 1 (HIV-1) replication. Related to it, many studies have shown the enormous potential of Cdk9 kinase inhibition as a treatment of several kinds of tumors, HIV infection, and cardiac hypertrophy.
The human immunodeficiency virus type 1 (HIV-1) requires host cell factors for all steps of the viral replication, among them the transcription elongation factor P-TEFb. Transcription of HIV-1 viral genes is achieved by host RNAPII and is induced by a viral trans-activator protein, Tat. When bound to the TAR viral RNA region, Tat activates HIV-1 transcription by early recruiting of host transcriptional activators including P-TEFb, which phosphorylates RNAPII CTD promoting viral transcript elongation [280]. Thus, treatment with drugs that inhibit Cdk9, such as flavopiridol, has been used as a retroviral therapy on AIDS patients [281]. Therefore, from the point of view of basic research, the study of the functions of Cdk9 and RNAPII CTD phosphorylations are of great interest in understanding the mechanisms that regulate HIV replication, which consequently lead to progress on AIDS biomedical research. Further evidence has been provided of a deregulated Cdk9 function in several tumors such as lymphoma, neuroblastoma, primary neuroectodermal tumor, rhabidomiosarcoma or prostate cancer [282][283][284]. The Cdk9 inhibition by chemotherapeutic agents, such as flavopiridol or CY-202, has shown to reduce transcription in malignant cells, mainly affecting the short half-lives RNAs. Most of these RNAs code for anti-apoptotic proteins, for instance onco-protein Mcl-1, which is necessary in tumor proliferation maintenance. Unfortunately, they only have a modest activity in patients although promising studies continue at present [285]. Cardiac hypertrophy consists of an increased size of cardiomyocytes, associated to some cardiac diseases as hypertension or diminished heart function. Hypertrophy is a physiological response to a stress stimulus that results in an increase of the cell size, and that may eventually produce a heart failure. Increased cell size produces increased mRNAs transcription, which requires Cdk9 activity. Thus, it has been shown that therapy with Cdk9 inhibitors benefits patients with cardiovascular disorders [286].

Concluding remarks
The primary function of the RNAPII CTD phosphorylation in eukaryotes is the integration of transcription with distinct nuclear processes. Thus, CTD phosphorylation operates as a fine-tuning regulatory mechanism during the whole transcription cycle and is consequently of extraordinary importance for proper gene expression. Since the late 1980's, an overwhelming number of laboratories have tried to decipher the mechanism underlying the creation of a CTD code and how this code is translated during transcription to coordinate mRNA processing, export and chromatin modifications. Although great progress has been achieved, most recently due to wide-genomic analysis techniques, a number of issues remained unsolved. For instance, it is very challenging to determine the exact phosphorylation state of specific residues within specific repeats during each step of transcription, as well as to determine the exact number of repeats that are phosphorylated within the CTD at every step of transcription, and how this is related to CTD specific roles in gene expression. Moreover, it needs to be determined if phosphorylation of the repeats with non-consensus sequences is regulated in the same manner as the consensus repeats, and if this is achieved by the same set of CTD modifying enzymes. In addition, other residues such as lysine and arginine can be potentially modified; therefore, further increasing the complexity of the CTD, and suggesting that if they are transcriptionally modified, may further elucidate the CTD functions or discover new ones. Finally, detailed understanding of RNAPII CTD phosphorylation is very relevant and will add insight into the processes that alter gene expression, such as HIV infection and cancer, and will help to investigate if other human CTD modifying enzymes, in addition to Cdk9, may be good candidates for therapy. In conclusion, research has made much progress, but further progress is still needed, and the new massive techniques in genomics and proteomics will help to advance complete understanding much faster.

Author details
Olga Calvo and Alicia García Instituto de Biología Funcional y Genómica, Consejo Superior de Investigaciones Científicas / Universidad de Salamanca, Spain