Mammalian Cis-Acting RNA Sequence Elements Mammalian Cis-Acting RNA Sequence Elements

Cis-acting regulatory sequence elements are sequences contained in the 3 0 and 5 0 untranslated region, introns, or coding regions of precursor RNAs and mature mRNAs that are selectively recognized by a complementary set of one or more trans-acting factors to regulate posttranscriptional gene expression. This chapter focuses on mammalian cis-acting regulatory elements that had been recently discovered in different regions: pre-processed and mature. The chapter begins with an overview of two large networks of mRNAs that contain conserved AU-rich elements (AREs) or GU-rich ele- ments (GREs), and their role in mammalian cell physiology. Other, less conserved, cis-acting elements and their functional role in different steps of RNA maturation and metabolism will be discussed. The molecular characteristics of pathological cis-acting sequences that rose from gene mutations or transcriptional aberrations are briefly outlined, with the proposed approach to restore normal gene expression. Concise models of the function of posttranscriptional regulatory networks within different cellu- lar compartments conclude this chapter. (AREs)


Introduction
The control of gene expression is fundamental to mammalian cell life. Although much of this control occurs at the level of transcription, posttranscriptional control is both prevalent and momentous [1]. Work over the past quarter century has resulted in the identification of unifying concepts in posttranscriptional regulation. One unifying concept states that posttranscriptional regulation is mediated by two major molecular components: cis-acting regulatory sequence elements and trans-acting factors. Cis-acting regulatory sequence elements are subsequences contained in the 5 0 untranslated region (UTR), 3 0 UTR, introns, and coding regions of precursor RNA and mature mRNA that are selectively recognized by a complementary set of one or more trans-acting factors to regulate posttranscriptional gene expression. The lists of conserved cis-elements have been expanding over the past decade, but the mechanisms of the precise assembly of RNA-binding complexes in an orchestrated temporal and spatial manner have not been comprehensively described. Conserved sequences within pre-mRNAs play a major role in determining the mRNA's configuration, stability, and ultimately the posttranslational fate of protein products. Mammalian pre-mRNAs contain almost as much conserved sequence as that ascribed to transcriptional regulatory elements, and many of these cis-elements can be attributed to known molecular functions, as described in the following paragraphs.
Trans-acting factors include RNA-binding proteins (RNA-BPs) and microRNAs (miRNAs), which are able to influence the fate of mRNA by controlling processes such as translation and mRNA degradation (reviewed in Refs. [2][3][4][5]). The combinatorial interplay between RNA-BPs, various miRNAs, and a given mRNA allows for the transcript-specific regulation critical to many cellular decisions during cell division, cell quiescence, or cell senescence [6]. RNA-BP classification is growing and becoming more defined as more structural data become available. Significant progress has been made in defining RNA-binding domains, such as an RNA recognition motif (RRM), zinc fingers, double-stranded RNA-binding domains, K homology domains, pumilio homology domains, and others, that were recently reviewed in [7,8].
In the pre-genomic era, very few cis-acting RNA sequences had been discovered, for example, AU-rich elements (AREs) in the 3 0 UTR of cytokine mRNAs [9]. Advances in genomic methodologies escalated the discoveries and functional identifications of cis-acting sequences. Microarray-based studies that evaluated mRNA stability and translation on a genome-wide basis have provided valuable information about the role of posttranscriptional regulation of a wide variety of transcripts that have an important physiological function [10][11][12]. Genomewide measurements of mRNA decay and bioinformatic sequence motif discovery methods were used to identify the GU-rich element (GRE) as a highly conserved sequence that was enriched in the 3 0 UTR and other regions of mRNA transcripts [13]. Various experimental approaches have been developed to understand the functional importance of cis-acting sequence interactions and the network of transcripts that they regulate. One of the most widely used techniques involves immunopurification of specific RNA-binding proteins from cellular extracts followed by a high-throughput analysis of the co-purified RNA species [14]. The coupling of this technique to powerful bioinformatic analysis has led researchers to understand the binding specificity of cis-acting elements [15]. The advent of new technology such as next generation sequencing (NGS) and chemical cross-linking procedures has allowed for finescale mapping of cis-binding motifs as well as for the refinement of RNA-binding proteinbinding sites. A variety of methods have been developed to identify the in vivo target RNAs of a given RNA-BP, including microarray (Chip) or high-throughput sequencing (Seq) of RNA isolated by RNA-BP immunoprecipitation (RIP-Chip, RIP-Seq, and RIPiT-Seq), photoactivatable ribonucleoside-enhanced cross-linking and immunoprecipitation (RIP-CLIP), individual-nucleotide resolution cross-linking and immunoprecipitation (iCLIP), or UV crosslinking and immunoprecipitation (HiTS-CLIP) [16][17][18][19][20]. These methodologies involve RNA immunoprecipitation techniques with RNA-BP, followed by the NGS analysis of associated mRNA or microRNA transcripts and genome-wide identification of cis-elements within RNA target transcripts. More novel techniques such as sequence-specificity landscapes (SEQRS), HiTS-Kin/HiTS-EQ, and digestion optimized (DO)RIP-Seq focus on the identification of multiple trans-acting factors [7,21,22]. These techniques allow for the evaluation of the specificity of cellular RNA-BP/RNA-binding patterns from cell lysates under different conditions and might aid in the interpretation of a multiprotein complex formation and RNA-BP competition for RNA substrate. Identified RNA-binding complexes can then be isolated and interrogated in vitro using structural and cell-based reporter assays.
This chapter focuses on mammalian cis-acting regulatory elements that have been recently discovered in different regions of mRNA: preprocessed and mature. First, we summarize recent observations of two large networks of mRNAs that contain conserved AREs or GREs in their pre-mRNA splicing sites, polyadenylation sites, and 3 0 /5 0 UTRs. We outline the known roles for ARE and GRE in regulation of mRNA stability or translation and their role in mammalian cell physiology, with a particular emphasis on their role in the dynamic response toward environmental and developmental signals. Second, we describe advances in the identification of other conserved cis-acting elements and their functional role in different steps of RNA maturation and metabolism. We briefly outline the molecular characteristics of pathological cis-acting sequences raised from gene mutation or transcriptional aberration and overview novel approaches to restore normal gene expression. We conclude with an overview of a concise predictive model of the function of posttranscriptional regulatory networks within different cellular compartments.

AU-rich element (ARE)
It was noted over a quarter of a century ago that mRNAs exhibit substantial variations in turnover rate upon exposure to different cell stimuli [23][24][25]. Of the prominent discoveries in the mammalian cis-acting elements field, the AU-rich element was the most notable as it was the most robust determinant of mRNA instability in cytokines and early response genes [26]. Insight into the biological significance and physiological function of ARE as a coordinate regulator of posttranscriptional network was revealed through the experimental identification of ELAVL1 (HuR) and ZNF36 (TTP) proteins [27][28][29]. The structure of AREs is defined as a repeating pentamer (AUUUA) with 1 or 2 A to U substitutions [9]. Bioinformatic searches throughout the human transcriptome have provided computational estimation of sequence characteristics and nucleotide lengths of ARE sequences required for mRNA to be unstable [30,31]. The number of pentamers has an additive effect on mRNA decay and deadenylation processes. AREs are classified into five clusters depending on their sequence content and position of A or U. Cluster I AREs contain up to five copies of AUUUA motifs with a nearby U-rich region and cause synchronous RNA deadenylation [32]. Cluster II AREs are composed of at least two overlapping copies of the AUUUA with an adjacent (U/A) nonamer region and cause asynchronous deadenylation. Clusters III through V AREs were identified to contain more U-rich regions and were rather 'poorly structured' ( Table 1), with an inconsistent deadenylation pattern. This classification system has proved to be helpful in understanding the observed behavior and function of ARE-containing transcripts [25]. Genome-wide analyses of mRNA transcript half-lives showed that many labile transcripts contain conserved ARE sequence elements in their 3 0 UTRs [21]. Overall, 3 0 UTR-ARE-containing transcripts represent approximately 5% of the transcriptome [33]. Human mRNAs encoding cytokines and members of the NFkB cascade are particularly enriched for AREs ( Table 1). AREs play decisive roles in regulating the effects of cytokines on inflammatory responses since mutation of the ARE in cytokines such as TNF-alpha, IFNG, or IRF5 [34] resulted in profound autoimmune-like inflammatory syndrome [35,36]. In general, transcripts containing functional AREs have short half-lives, although they can be rapidly stabilized in different cell types or stimulation conditions through complex posttranscriptional mechanisms involving trans-acting factors [10,37]. Numerous trans-binding factors interact with AREs (e.g., ELAVLs, ZFP36, KSRP, TIA1, TIAL1, HRNPC1, and others, which are described in other chapters of this book) and determine the outcomes for harboring ARE transcripts. The majority of these proteins shuttle between the cytoplasm and the nucleus, where they can affect RNA splicing and 3 0 -end processing, in addition to altering the rate of decay in the cytoplasm [38]. In this respect, it is interesting to note that AREs are also found in intronic regions of pre-mRNAs [39][40][41][42]. This observation leads to the speculation that trans-acting factors could bind ARE in the nucleus and fulfill a function that is different from their cytoplasmic one. Furthermore, a considerable overlap in the binding sites for ARE-BP with other cis-elements, such as GU-rich and poly-U sequences, warrants further investigation since the formation of secondary RNA structure might involve all of the above and subsequently rule the coordinate behavior of RNA-BPs in different cellular compartments or under different cellular stimuli [43][44][45].

GU-rich element (GRE)
GU-rich elements (GREs) are recognized as essential regulators of mRNA splicing, stability, and translation in mammalian cells [11,46]. GU-rich containing RNAs represent approximately 8% of transcripts of the human transcriptome [47]. Genome-wide analyses of mRNA decay rates allowed for discovery of non-ARE-containing cohorts of mRNAs that exhibited rapid turnover. Computational de novo motif search identified conserved sequence elements in their 3 0 UTRs in a form of a consensus U(GUUUG)n sequences [13] or GU repeats [48]. These elements were first tested in vivo in reporter systems and conferred instability onto reporter mRNAs. A well-utilized rabbit beta-globin reporter system identified GREs as sequences that regulate the decay of exogenously expressed GRE-containing reporter transcripts within cells [13]. Further verification of GRE-mediated mRNA decay came from the observation that siRNA-mediated knockdown of protein CELF1 led to the stabilization of GRE-containing beta-globin reporter transcripts as well as endogenous GRE-containing transcripts [49][50][51].
These studies also showed that both GU-rich sequences and GU repeats are also enriched in unstable mRNAs, though a number of GUUUG pentamers in the GRE do not seem to correlate with the mRNA decay rate. GREs were subsequently tested for RNA-binding specificities to CELF1 and CELF2 proteins in systemic evolution of ligands exponential enrichment, yeast three-hybrid system selection methods, and surface plasmon resonance quantitative binding assays, revealing that the CELF family preferentially bind to 15-22 nucleotide GU-rich RNA sequences [52][53][54]. Several studies reported that other proteins bind to very short UG repeats with higher affinity, but dropped once the repeats became longer than 15 nucleotides (e.g., TARDBP, FUS) [55,56]. Binding to dispersed GRE pentanucleotides (mostly by RRMcontaining proteins) have also been reported, although unified functional consequences of it are just beginning to emerge (refer to a comprehensive review in Ref. [57]).
Using whole genome microarrays and high-throughput NGS methodologies, GRE targets have been identified in a number of mammalian cells, for example, resting and activated human T cells, mouse brain cells, and myoblasts or human malignant cell lines [48,[58][59][60][61].
The majority of studies extensively characterized GREs as binding sites located predominantly in 3 0 UTRs and caused mRNA decay (or stabilization) depending upon the cellular and environmental context [62]. These UG-rich sequences serve as binding sites for the family of CELF and ELAVL proteins. Interestingly, these two families of RNA-binding proteins share over 80% of sequence conservation within RNA recognition motifs but cause opposite outcomes: the CELF family binding to GRE leads to mRNA degradation, but the ELAVL family function as mRNA stabilizers [63]. In addition, several studies reported that UGU repeat sequences were enriched in introns, with the same frequency as AREs [64,65]. The authors found significant enrichment of short UG-rich motifs in intronic regions flanking exons, supporting a role for GRE in alternative splicing [66,67], which activate or repress the splicing of pre-mRNA targets through a competitive binding by MBNL and CELF proteins. This is not surprising, as an estimated 90% of human genes produce alternatively spliced mRNA transcripts [68,69]. Alignment of the genomic regions adjacent to canonical and alternative polyadenylation sites identified UUCUG and UGUU as conserved cis-elements, which are essential for mRNA maturation and polyadenylation site utilization [70][71][72][73].
Thus, ARE and GRE can regulate pre-mRNA splicing, translation, and/or mRNA deadenylation or decay depending on the repertoire of proteins they interact with in different intracellular settings. The classification of AREs and GREs has been described in multiple manuscripts [74][75][76][77], and an overview is shown in Table 1. Single nucleotide polymorphism studies in humans demonstrated that SNPs in ARE and GRE sites are associated with higher risk of human diseases that involve adaptive immune response; mutations in these conserved cisacting elements resulted in changes in RNA stability and binding preferences for RNA-BPs (reviewed in ref. [44,63,78,79]). The opposing effects of RNA-BP on mRNA turnover may have important implications for the role of posttranscriptional regulation in proliferative diseases such as cancer. Most existing data suggest that the unbalanced expression and function of ARE-BPs appears to drive neoplastic growth and proliferation and contribute to cancer pathogenesis [44,80]. A definitive causal connection, that is clinically relevant to human pathology, has not yet been demonstrated.

Poly(A) tail and polyadenylation sequences
The addition and removal of the poly(A) tail are the rate-limiting steps of maturation and degradation processes that the majority of mammalian mRNAs undergo [81][82][83]. Two tightly coupled reactionscleavage and polyadenylationinvolve a large number of protein components. Alternative polyadenylation of RNA is a posttranscriptional modification that plays an important role in gene expression, as it produces mRNAs that share the same coding region, but differ in their 3 0 UTRs. This process is highly tissue specific and results in the generation of alternative mRNA isoforms with different stability rates and translational efficiency and even subcellular localization [84][85][86]. In mammals, the poly(A) cleavage/polyadenylation site is composed of three sets of consensus cis-elements: the highly conserved AAUAAA hexamer and less conserved U/GU-rich and UGUA elements. A bioinformatics analysis showed that an overwhelming majority of mammalian mRNAs harbor a conserved AAUAAA or a close canonical variant, AUUAAA, sequences [87,88]. Flanking sequences are very important for the poly(A) site to function [89]. For example, two downstream U/GU-rich regions are both necessary for binding of the specific cleavage polyadenylation complex [90,91]. A number of trans-binding factors regulate poly(A) site utilization and the efficiency of pre-mRNA processing in the nucleus, including five large families of CPSF, HNRNP, CF, MBNL, and CSTF proteins as well as snoRNAs [92][93][94][95]. These families have opposing effects on polyadenylation site utilization in nascent RNAs, determining the final pool of mature mRNA isoforms and subsequent choreography and activity of trans-binding factors in the cytoplasm (reviewed in [96,97]). Immediately after cleavage, poly(A) polymerases (PAPs) promote lengthening of the poly(A) tail, completing the mRNA maturation process [98,99]. Genomewide polyadenylation site (PAS) analysis in mammalian cells identified a great diversity of PAS utilization in different tissues and organs [73,100]. Mutations can cause the loss of the canonical adenylation signal and subsequent switch to alternative PAS utilization [101].
Another conserved regulatory cis-element is the cytoplasmic polyadenylation element (CPE). Many mammalian RNAs contain a CPE, a UUUUA/U sequence, located in the 3 0 UTR. The CPE serves as a binding site for cytoplasmic polyadenylation element-binding (CPEBs) proteins 1-4 [102]. The most obtrusive differences in the CPE usage have been described under conditions of stress [103].
The nuclear poly(A)-binding proteins (PABPs) act as poly(A) keepers during the mRNA processing through first binding to newly added (A) 12 nucleotides and allowing the poly(A) tail to grow up to 250 nucleotides before the mRNA is exported into the cytoplasm [104,105].
In the cytoplasm, the poly(A) tail acts as a cis-regulatory element and mediates mRNA translation. Recently developed methodologies make it affordable to count differentially polyadenylated mRNAs and assess the length of the poly(A) tail [106][107][108]. In somatic cells, mRNA deadenylation can lead to the degradation or stabilization of translationally silent transcripts; however, the importance of the poly(A) tail length in these processes is currently under scrutiny as there is an evidence that the translation is regulated independently of their poly(A) tail length in the somatic cell cycle [109]. As for embryonic developmental processes, translationally repressed mRNAs can be reactivated by cytoplasmic poly(A) tail elongation at the precise time when their encoded proteins are needed to be translated [108,110].

Other intermediate cis-elements
A number of ARE-like transcripts have been identified in several mammalian systems to regulate important posttranscriptional networks of gene expression.
Poly (U) sequences are the third most conserved cis-element after ARE and GRE, which have been recently found within sequence composition at cross-link nucleotides site using the CLIP assay [111]. Frequencies of poly(U) are most highly enriched for UUUUU pentanucleotides. The HNRNPC and HNRNPD (AUF1) can recognize and bind to U sequences in pre-mRNAs, mature mRNAs, and non-coding RNAs and influence target transcript diversity in the nucleus through pre-mRNA splicing and the stability in the cytoplasm [41]. It is interesting to note that clusters V of ARE and GRE elements (see Table 1) include hundreds of mRNAs harboring Upentanucleotides in the 3 0 UTR, suggesting that CELF and ELAVL families can also bind to poly(U) tracts under certain conditions, perhaps with lower affinity [112].
Uridylation is an independent biochemical process that is facilitated by uridylation enzymes such as ZCCHC11 and ZCCHC6. In mammalian cells, uridylation readily occurs on deadenylated mRNAs through the recognition of short poly(A) tails (<25 nt). Protein PABPC1 antagonizes uridylation of polyadenylated mRNAs, contributing to changes in mRNA half-lives [113]. MicroRNA can also induce uridylation of its targets; however, selectivity of mRNA uridylation has not been decisively demonstrated. The development of novel methods, such as TAIL-Seq, allows for genome-wide discovery of alternative mRNA tailing processes such as uridylation and guanylation at downstream sites of shortened poly(A) tails [114]. Dynamic control of mRNA tailing is implicated in turnover and translational control and is fundamental for early embryonic development [115].
GC-rich sequences were also found to be conserved in coding and non-coding regions of mammalian mRNAs. Classified as GC-rich elements (GCREs), these were identified in NCL (nucleolin), PCBP1 and UPF protein-binding complexes [116]. GCREs regulate mRNA stability, decay, and translational efficiency [117]. Several lines of evidence establish primary function for GCRE as regulators of mRNA transcription [118].
The CU-rich element (CURE) is a target for several RNA-or DNA-binding proteins, for example, PCBP1 [119] and PTBP1 [120,121] and regulates gene expression via a broad, but poorly defined spectrum of posttranslational mechanisms.
Oligonucleotides (T/C)nGGG/G from four separate strands can be folded into stacked tertiary structures known as G-quadruplexes, forming polymorphic loops of three G-quartet layers with four G-tracts [122][123][124]. Folded G-structures (Gs) 2-7 are found in 3 0 and 5 0 UTRs, but are very rare in coding and intergenic regions, and could influence all aspects of RNA metabolism [125,126]. Studies have shown that 3 0 UTR G-quadruplexes can bind more than two dozen proteins that interact with the Gs structure and serve as regulators of transcription, splicing, processing, localization, and stability and have been recently discussed in excellent reviews [127,128]. Moreover, bioinformatics and computational scans have shown the prevalence of intermolecular DNA-RNA G-quadruplexes and (Gs) 4 pairing with miRNA in mammalian cells [129,130]. These observations imply almost endless possibilities of intermolecular interactions, which undoubtedly would have significant impact on our understanding of transcriptional and posttranscriptional gene expression and regulation in mammalian cells.
Internal ribosome entry sites (IRESs) are heterogeneous cis-acting regulatory elements located primarily in 5 0 untranslated regions of mammalian mRNAs. IRESs facilitate alternative mRNA translation, skipping the need for the m7GpppN cap structure and many translation initiation trans-acting factors in the recognition process of the translation initiation codon (e.g., AUG) by ribosomal subunits [131]. Since the length of IRES can be several hundred nucleotides long, it was difficult to identify IRES' structural elements that are important for the common secondary structures or functions [132,133]. In depth sequence scans through the human transcriptome identified a variety of poly-U, poly-A, and CU-rich k-mers that seem to be important determinants of the IRES activity [134]. These k-mers represent binding sites for IRES trans-acting factors and are located at positions less than 150 nt upstream of the AUG start-codon [135]. Translation initiation mediated by IRES is commonly presented as a cell survival mechanism in response to stress; however, the significance of this process and implications to human diseases are unknown due to lack of solid in vitro experimental results that would unambiguously demonstrate the effect in vivo [136].
Pumilio response element (PRE) is another cis-element that is well defined in nonmammalian systems. A consensus 5 0 -UGUANAUA was derived from gel shift, RIP, PAR-CLIP, and crystal structure approaches [137]. It is present in almost 3000 mammalian mRNAs and serves as a cis-element for the PUM family of proteins [138,139]. PUMs exert two modes of mRNA translational repression: deadenylation-mediated repression and a deadenylation-independent mechanism [140].
Another novel 3 0 UTR motif (UAAC/GUUAU) is also prevalent (7% of mammalian 3 0 UTRs contain one or more copies) and has strong species conservation [141]. This motif is a binding target for HNRNP A2/B1 and A1 and is involved in mRNA deadenylation. A fundamental role of UAAC/GUUAU and similar elements as regulators of the mammalian mRNA translational activation or repression is yet to be demonstrated [142].

Short multivalent regulatory motifs
Mapping mammalian pre-mRNA positional enrichment of short intronic splicing regulatory elements (ISREs) is another example of the identification of cis-acting elements that are most important for pre-mRNA splicing. De novo searches for multivalent RNA motifs identified a number of conserved tetra-to hexamers that mediate the position-specific combinatorial binding by RNA-binding proteins [143,144]. The position of short motifs can predict the tissuespecific RNA isoform abundance and can serve as an intronic splicing enhancer or silencer during embryonic development and in adult organisms [145]. Since the consensus sequence elements of splice sites are very short (e.g., 5 0 -UUAGGU, AAGGAC, AAGAAC, CCUCUG, GCUGCG, CUGCUG-3 0 ), the mechanism by which the spliceosome distinguishes them as authentic splice sites remains a long-standing question. One of the explanations provided in [146,147] suggests that these sequences form specific secondary structures that increase binding affinities to RNA-binding motifs across many RNA-BPs. The strong association of ISREs with differences in splicing patterns, but poor evolutionary conservation, suggests the role for these motifs to act as cis-acting splice codes that allow for the progressive divergence of alternative splicing in vertebrates [148].

MicroRNAs (miRNAs)
MicroRNAs are conserved regulatory sequences that pervasively act, in trans, toward mRNA. miRNA-binding sites are important regulators of mRNA half-life and activity. The majority of miRNAs influence mRNA life span through biochemical interactions with mRNA and/or RNA-BPs [149]. This could be achieved through direct competition for a shared binding site or through remodeling of the mRNA structure to favor (or impede) miRNA association nearby [150]. In support of this, a recent bioinformatics analysis determined that UUUGUUU motifs, which bear an uncanny resemblance to GRE-binding sites, are enriched in the adjacent to many miRNA-binding sites, and their presence tends to augment miRNA activity [151]. On the other hand, any miRNA that contains a UGUKUGU or UAUKUAU seed sequences (K represents G or U) could in theory bind and occlude GRE-BP-or ARE-BP-binding motifs, which prevent any interaction with cis-elements within mRNA. For example, the mir-122 interaction with CELF1 has been demonstrated, proposing that CELF1 can play a role in the degradation of GRE-containing miRNAs [152]. It has been computed that the proximity of RNA-BP-binding sites and residues pairing to miRNA can quantitatively predict mRNA ciselement performance for several intensely studied RNA-BPs and miRNAs [153][154][155]. Although mechanistic details of interplay between cis-acting elements, RNA-BPs, and miRNAs are understudied, they perhaps should be a high priority, given recent observations that miRNA expression and/or processing are affected in many human diseases and disorders [156][157][158]. Significant progress has been made by bioinformaticians and biologists to better understand system biology of the RNA life cycle; several useful metadata hubs were created, which incorporate existing experimental data and computational approaches [159,160]. The comprehensive list of available software and websites has been recently reviewed in Ref. [161]. However, we are still far from having a comprehensive understanding of mechanisms of RNA biogenesis and its relevance in physiological and pathological conditions.

Pathological cis-elements
The human genome contains a large number of short repetitive sequences that are prone to higher than average mutation rates and transcriptional errors [162], which can engender a tandem repeat expansion in cis-acting elements of 3 0 or 5 0 UTR, introns, or coding regions, and cause a large variety of inherited human diseases. For example, endogenous nucleotide repeat expansions are implicated in many human autosomal dominant diseases and have emerged as new groups of repeat expansion disorder associated with tri-or pentanucleotide repeat expansion pathogenesis. Pathological repeats can elicit toxicity that is triggered by toxic RNA or abnormally translated protein dipeptide or homopolymeric peptides [163]. Disorders as such include, but are not limited to the following conditions: • Spinocerebellar ataxia (SCAs types 1-37) is the largest and the most diverse group of inherited neurological diseases in which neurological dysfunction is driven by defects known as ataxias. Several mutations in tandem repeat expansions were discovered, including coding (CAG)n mutations in SCA1, 2, 3, 6, 7, and 17 genes; non-coding (CTG)n in SCA8 [164]; non-coding (CAG)n in SCA12; (ATTCT)nin SCA10; (TGGAA)nin SCA31; and (GGCCTG)in SCA36 (please see OMIM.org for details).
• Huntington's disease is caused by CAG expansion repeats in the HTT gene [167]; • Fragile X syndrome (FXS) arises when the FMR1 gene reach <230 CGG repeats.
Molecular pathogenesis of endogenous nucleotide repeat expansion diseases is complicated and pertained to the presence of repeat-associated non-AUG translation (RAN), where translation of mutant polypeptides is initiated without an AUG-initiation codon or it is driven by the open reading frame shifts due to expanded three-base-pair repeats during skipped mispairing in the course of DNA synthesis (reviewed in [169,170]). Although the posttranscriptional modification state of these transcripts (e.g., mRNA capping and polyadenylation) is unknown, two translational pathways are described: (1) ATG-initiated translation produces multiple polypeptides if there are multiple ORFs within the transcript. (2) RAN translation of the expanded repeat can produce up to six distinct RAN polypeptides: poly-Gln, poly-Ala, and poly-Ser RAN proteins (from CTG/CAG repeats); and poly-Leu, poly-Ala and poly-Cys polypeptides from the CAG/ CUG repeat mRNA. Repeats located in antisense transcripts of above listed genes are also substrates for RAN translation, further expanding the number of pathological dipeptides or homopolymeric RAN proteins produced during disease pathogenesis.
An interesting common aspect of these pathologies is that they are caused by mutated ciselements and are often produced through bidirectional transcription. Resultant toxic RNA causes intracellular stress and sequestration of RNA-BPs toward expanded sequence repeats [171], which changes the biochemistry of posttranscriptional regulatory networks in affected tissues. The abovementioned diseases represent an incomplete list of a growing number of disorders that can potentially have similar therapeutic opportunities. The recently developed 'base editor' CRISPR-Cas9 methodology has demonstrated a high power of nucleotide-level precision editing, making this approach suitable for repeat excision as genetic therapies for the above listed conditions [172] and may also correct many other RNA pathologies, for example, those driven by nonsense-mediated mRNA decay [173].

Models for the effects of cis-acting elements
mRNA molecules move through different cellular compartments within messenger ribonucleoprotein (mRNP) complexes in dynamic association with RNA-binding proteins that bind to conserved cis-elements shared by subsets of transcripts [174]. The association of specific trans-binding factors with conserved regulatory cis-elements shared by subsets of mRNAs coordinates the fate of these bound transcripts through posttranscriptional processes such as splicing, intracellular localization, translation, storage, or mRNA decay [175,176]. Not surprisingly, very few transcripts have only one type of regulatory element. Focusing on individual scenarios, we built a concise predictive model of higher-order complexes that can be formed simultaneously within different cellular compartments, starting in from the nucleus and moving into the cytoplasm.
A. Regulation of splicing by cis-elements ( Figure 1A): The cis-elements within precursor RNA are catalyzed by different components of the spliceosome during constitutive splicing events [177]. Binding by RNA-BP to short intronic splicing regulatory elements (ISREs) regulates exon inclusion or exon skipping during stagespecific constitutive splicing transitions, in a position-dependent manner [67]. These processes are orchestrated by biochemical recognition and binding on a competitive basis by a family of U proteins that compose the spliceosome.
RNA-BPs also bind to multivalent intronic sequences in precursor mRNA and regulate the alternative splicing (e.g., exon skipping, alternative splice site retention, or intron retention). Alternatively-spliced transcripts may contain different 3 0 or 5 0 UTRs that can be subject to differential C. Regulation of translation by cis-acting elements ( Figure 1C): Most eukaryotic mRNAs are translated by the cap-dependent mechanism, which requires recognition of the cap structure (m7GpppN) at the 5 0 end by early initiation factor complexes (eIFs). EIFs recruit ribosomal subunits and initiator Met-tRNA and scan along the 5 0 UTR of the mRNA to reach the start codon (an AUG triplet). During the scanning, the secondary RNA structure unwinds in an ATP-dependent manner. The 5 0 UTR is rich in GC-content and is prone to folding into secondary structures, which may hinder ribosomal assembly [178]. Hairpin loops as secondary structure regulatory elements were described only for a handful of mRNAs, and their role in genome-wide translation is not known. A combination of new ribo-sequencing with fluorescent visualization might shed light on the role of hairpin loops in translation in the near future [179][180][181][182]. Other internal 5 0 UTR cis-element structures are AREs and GREs. Their effects on translation are mediated by a combination of RNA-BPs. They are often found to be part of hairpin loops. Visualizing a folded hairpin structure in vivo is not possible at current resolution limits.
The translation initiation via internal ribosomal entry site (IRES) occurs in a capindependent manner. Mammalian IRES facilitates bypassing of the eIF4E-m7GpppN cap interaction and recruitment of the small and large ribosomal subunits and tRNA to the transcript, initiating translation at the canonical AUG start codon.
The poly(A) tail also plays a role in translation as an mRNA stabilizer and a facilitator of mRNA circularization, which promotes translation. De-adenylation processes tend to slowdown the translation rate and eventually lead to mRNA degradation.
D. Regulation of mRNA stabilization or decay by cis-acting elements ( Figure 1D): In mammalian cells, mRNA stabilization or decay is regulated by cis-elements in the 3 0 UTR. Numerous known RNA-BPs serve as trans-binding factors for ARE/GRE and other elements to facilitate transcript deadenylation and subsequent decay by exonucleases. There are also a number of RNA-BPs with the opposite function, which stabilize and promote mRNA translation. Posttranslational alteration of RNA-BPs (particularly within RNA-binding domains) can lead them to dissociate from RNA-binding complexes, and be replaced by other competitors, thereby contributing to mRNA de/stabilization [76]. A finetuned balance must be reached in cells for proper function at the organismal level.
E. Interplay between mRNA, miRNA and RNA-BPs ( Figure 1E): The estimates on how different miRNA and mRNA are loaded into the RNA-BP-bound RISC (RNA-induced silencing complex) were derived from CLIP assays results [184][185][186]. Several scenarios are possible to extract from these: If both miRNA and RNA-BP are bound to the 3 0 UTR of mRNA, they will be sufficiently close to each other and the complex can be identified by CLIP. They would work cooperatively to promote the assembly of decay machinery. Independent binding by a competitor RNA-BP might disrupt this complex. The strength of miRNA-mRNA canonical and noncanonical bond formation can be computed to project possible biochemical outcomes [187][188][189].
The mRNA 3 0 UTR length and secondary structure formation can greatly influence both miRNA and RNA-BP-binding efficiency; it can also disrupt or assuage the assembly of RNA-BP complexes by providing high affinity or multioccupancy binding sites. The outcomes of this scenario could be anywhere from marginal translational repression to accelerated mRNA degradation.
Cis-acting sequences within miRNAs that resemble cis-elements (ARE or GRE) have perfect complementarity to RNA-BP's RNA-recognition motifs (RRMs). They can, in theory, occlude RRM-binding sites, acting as alternative inhibitors of RNA-BP activity. This could potentiate (or hinder) translational repression and mRNA degradation of target mRNA, depending on which RNA-BP was affected.

Conclusions and perspectives
Examples given in this chapter suggest that mRNA regulation is important in multiple aspects of mammalian biology; however, it is largely unknown how the combinatorial regulation is achieved at the biological complexity of the organisms. Transcriptome-wide mapping of ciselements and trans-binding sites demonstrates huge regulatory potentials for non-coding parts of mRNA. The more details we learn about cross-talk, molecular assembly, and compartmentalization of RNA-protein complexes, the more unifying principles we may find. Understanding of the factors and elements involved in the regulation of a particular gene expression in a single cell [190] is of paramount importance when designing molecular therapies or when attempting to modulate the expression of a target gene. Thus, scientists and geneticists have exciting opportunities ahead in the field of therapeutic genome editing.