Open access peer-reviewed chapter

Mammalian Cis-Acting RNA Sequence Elements

By Irina Vlasova-St. Louis and Calandra Sagarsky

Submitted: July 23rd 2017Reviewed: October 31st 2017Published: February 21st 2018

DOI: 10.5772/intechopen.72124

Downloaded: 634

Abstract

Cis-acting regulatory sequence elements are sequences contained in the 3′ and 5′ untranslated region, introns, or coding regions of precursor RNAs and mature mRNAs that are selectively recognized by a complementary set of one or more trans-acting factors to regulate posttranscriptional gene expression. This chapter focuses on mammalian cis-acting regulatory elements that had been recently discovered in different regions: pre-processed and mature. The chapter begins with an overview of two large networks of mRNAs that contain conserved AU-rich elements (AREs) or GU-rich elements (GREs), and their role in mammalian cell physiology. Other, less conserved, cis-acting elements and their functional role in different steps of RNA maturation and metabolism will be discussed. The molecular characteristics of pathological cis-acting sequences that rose from gene mutations or transcriptional aberrations are briefly outlined, with the proposed approach to restore normal gene expression. Concise models of the function of posttranscriptional regulatory networks within different cellular compartments conclude this chapter.

Keywords

  • cis-elements
  • posttranscriptional gene regulation
  • mRNA splicing
  • translation
  • mRNA stability
  • decay
  • AU-rich elements (AREs) or GU-rich elements (GREs)

1. Introduction

The control of gene expression is fundamental to mammalian cell life. Although much of this control occurs at the level of transcription, posttranscriptional control is both prevalent and momentous [1]. Work over the past quarter century has resulted in the identification of unifying concepts in posttranscriptional regulation. One unifying concept states that posttranscriptional regulation is mediated by two major molecular components: cis-acting regulatory sequence elements and trans-acting factors. Cis-acting regulatory sequence elements are subsequences contained in the 5′ untranslated region (UTR), 3′ UTR, introns, and coding regions of precursor RNA and mature mRNA that are selectively recognized by a complementary set of one or more trans-acting factors to regulate posttranscriptional gene expression. The lists of conserved cis-elements have been expanding over the past decade, but the mechanisms of the precise assembly of RNA-binding complexes in an orchestrated temporal and spatial manner have not been comprehensively described. Conserved sequences within pre-mRNAs play a major role in determining the mRNA’s configuration, stability, and ultimately the posttranslational fate of protein products. Mammalian pre-mRNAs contain almost as much conserved sequence as that ascribed to transcriptional regulatory elements, and many of these cis-elements can be attributed to known molecular functions, as described in the following paragraphs.

Trans-acting factors include RNA-binding proteins (RNA-BPs) and microRNAs (miRNAs), which are able to influence the fate of mRNA by controlling processes such as translation and mRNA degradation (reviewed in Refs. [2, 3, 4, 5]). The combinatorial interplay between RNA-BPs, various miRNAs, and a given mRNA allows for the transcript-specific regulation critical to many cellular decisions during cell division, cell quiescence, or cell senescence [6]. RNA-BP classification is growing and becoming more defined as more structural data become available. Significant progress has been made in defining RNA-binding domains, such as an RNA recognition motif (RRM), zinc fingers, double-stranded RNA-binding domains, K homology domains, pumilio homology domains, and others, that were recently reviewed in [7, 8].

In the pre-genomic era, very few cis-acting RNA sequences had been discovered, for example, AU-rich elements (AREs) in the 3′ UTR of cytokine mRNAs [9]. Advances in genomic methodologies escalated the discoveries and functional identifications of cis-acting sequences. Microarray-based studies that evaluated mRNA stability and translation on a genome-wide basis have provided valuable information about the role of posttranscriptional regulation of a wide variety of transcripts that have an important physiological function [10, 11, 12]. Genome-wide measurements of mRNA decay and bioinformatic sequence motif discovery methods were used to identify the GU-rich element (GRE) as a highly conserved sequence that was enriched in the 3′ UTR and other regions of mRNA transcripts [13]. Various experimental approaches have been developed to understand the functional importance of cis-acting sequence interactions and the network of transcripts that they regulate. One of the most widely used techniques involves immunopurification of specific RNA-binding proteins from cellular extracts followed by a high-throughput analysis of the co-purified RNA species [14]. The coupling of this technique to powerful bioinformatic analysis has led researchers to understand the binding specificity of cis-acting elements [15]. The advent of new technology such as next generation sequencing (NGS) and chemical cross-linking procedures has allowed for fine-scale mapping of cis-binding motifs as well as for the refinement of RNA-binding protein-binding sites. A variety of methods have been developed to identify the in vivo target RNAs of a given RNA-BP, including microarray (Chip) or high-throughput sequencing (Seq) of RNA isolated by RNA-BP immunoprecipitation (RIP-Chip, RIP-Seq, and RIPiT-Seq), photoactivatable ribonucleoside-enhanced cross-linking and immunoprecipitation (RIP-CLIP), individual-nucleotide resolution cross-linking and immunoprecipitation (iCLIP), or UV cross-linking and immunoprecipitation (HiTS-CLIP) [16, 17, 18, 19, 20]. These methodologies involve RNA immunoprecipitation techniques with RNA-BP, followed by the NGS analysis of associated mRNA or microRNA transcripts and genome-wide identification of cis-elements within RNA target transcripts. More novel techniques such as sequence-specificity landscapes (SEQRS), HiTS-Kin/HiTS-EQ, and digestion optimized (DO)RIP-Seq focus on the identification of multiple trans-acting factors [7, 21, 22]. These techniques allow for the evaluation of the specificity of cellular RNA-BP/RNA-binding patterns from cell lysates under different conditions and might aid in the interpretation of a multiprotein complex formation and RNA-BP competition for RNA substrate. Identified RNA-binding complexes can then be isolated and interrogated in vitro using structural and cell-based reporter assays.

This chapter focuses on mammalian cis-acting regulatory elements that have been recently discovered in different regions of mRNA: preprocessed and mature. First, we summarize recent observations of two large networks of mRNAs that contain conserved AREs or GREs in their pre-mRNA splicing sites, polyadenylation sites, and 3′/5′ UTRs. We outline the known roles for ARE and GRE in regulation of mRNA stability or translation and their role in mammalian cell physiology, with a particular emphasis on their role in the dynamic response toward environmental and developmental signals. Second, we describe advances in the identification of other conserved cis-acting elements and their functional role in different steps of RNA maturation and metabolism. We briefly outline the molecular characteristics of pathological cis-acting sequences raised from gene mutation or transcriptional aberration and overview novel approaches to restore normal gene expression. We conclude with an overview of a concise predictive model of the function of posttranscriptional regulatory networks within different cellular compartments.

2. AU-rich element (ARE)

It was noted over a quarter of a century ago that mRNAs exhibit substantial variations in turnover rate upon exposure to different cell stimuli [23, 24, 25]. Of the prominent discoveries in the mammalian cis-acting elements field, the AU-rich element was the most notable as it was the most robust determinant of mRNA instability in cytokines and early response genes [26]. Insight into the biological significance and physiological function of ARE as a coordinate regulator of posttranscriptional network was revealed through the experimental identification of ELAVL1 (HuR) and ZNF36 (TTP) proteins [27, 28, 29]. The structure of AREs is defined as a repeating pentamer (AUUUA) with 1 or 2 A to U substitutions [9]. Bioinformatic searches throughout the human transcriptome have provided computational estimation of sequence characteristics and nucleotide lengths of ARE sequences required for mRNA to be unstable [30, 31]. The number of pentamers has an additive effect on mRNA decay and deadenylation processes. AREs are classified into five clusters depending on their sequence content and position of A or U. Cluster I AREs contain up to five copies of AUUUA motifs with a nearby U-rich region and cause synchronous RNA deadenylation [32]. Cluster II AREs are composed of at least two overlapping copies of the AUUUA with an adjacent (U/A) nonamer region and cause asynchronous deadenylation. Clusters III through V AREs were identified to contain more U-rich regions and were rather ‘poorly structured’ ( Table 1 ), with an inconsistent deadenylation pattern. This classification system has proved to be helpful in understanding the observed behavior and function of ARE-containing transcripts [25].

Trans-acting factorsFunctional categoriesARE sequencesClusterGRE sequencesFunctional categoriesTrans-acting factors
ELAVL1
ELAVL2
ZFP36
KSRP
TIA1, TIAL1
HNRNPC1 HNRNPD GAPDH
Cytokines,
Chemokines
Growth factors;
Cell signaling;
Apoptosis
AUUUAUUUAUUUAUUUAUUUAIGUUUGUUUGUUUGUUUGUUUGTranscription factors;
Cell cycle;
Cell metabolism; Cell–cell communication
regulators
CELF1
CELF2
ELAVL4
RBM38
TARDBP
FUS
AUUUAUUUAUUUAUUUAIIGUUUGUUUGUUUGUUUG
WAUUUAUUUAUUUAWIIIGUKUGUUUGUKUG
WWAUUUAUUUAWWIVKKGUUUGUUUGKK
WWWWAUUUAWWWWVKKKU/GUKUG/UKKK

Table 1.

Structural and functional comparison of AU-rich and GU-rich elements.

ARE and GRE mRNAs were clustered (with allowance for one mismatch) into five subclasses based on the number of pentameric repeats (AUUUA or GUUUG) and surrounding sequences. W indicates A or U. K indicates G or U. This table was made based on previous publications [33, 47, 74, 75, 77]. ARE- or GRE-containing transcripts in clusters I and II contain four or more overlapping AUUUA or GUUUG pentamers and are each represented by only a few hundred transcripts. Most of the transcripts in these clusters are cytokines, transcription factors, and early response genes. Clusters III through V contain shorter sequences with less sequence repetition and contain up to several thousand members.

Trans-acting factors that bind to ARE (far left column) or GRE (far right column) are: ELAVL1,2,4 (embryonic lethal, abnormal vision)-like 1,2,4; ZFP36, zinc finger protein 36; TIA1, T-cell intracellular antigen 1; TIAL1, TIA1-cytotoxic granule associated RNA-binding protein like 1; KSRP, KH-type splicing regulatory protein; HNRNP C1,D, heterogeneous nuclear ribonucleoprotein C1, D; GAPDH, Glyceraldehyde 3-phosphate dehydrogenase; CELF1,2, CUGBP-ELAV-like family member 1,2; RBM38, RNA-binding protein 38; TARDBP, Tat RNA regulatory element (TAR) DNA-binding protein; FUS, fused in sarcoma.

Genome-wide analyses of mRNA transcript half-lives showed that many labile transcripts contain conserved ARE sequence elements in their 3′ UTRs [21]. Overall, 3′ UTR-ARE-containing transcripts represent approximately 5% of the transcriptome [33]. Human mRNAs encoding cytokines and members of the NFkB cascade are particularly enriched for AREs ( Table 1 ). AREs play decisive roles in regulating the effects of cytokines on inflammatory responses since mutation of the ARE in cytokines such as TNF-alpha, IFNG, or IRF5 [34] resulted in profound autoimmune-like inflammatory syndrome [35, 36]. In general, transcripts containing functional AREs have short half-lives, although they can be rapidly stabilized in different cell types or stimulation conditions through complex posttranscriptional mechanisms involving trans-acting factors [10, 37]. Numerous trans-binding factors interact with AREs (e.g., ELAVLs, ZFP36, KSRP, TIA1, TIAL1, HRNPC1, and others, which are described in other chapters of this book) and determine the outcomes for harboring ARE transcripts. The majority of these proteins shuttle between the cytoplasm and the nucleus, where they can affect RNA splicing and 3′-end processing, in addition to altering the rate of decay in the cytoplasm [38]. In this respect, it is interesting to note that AREs are also found in intronic regions of pre-mRNAs [39, 40, 41, 42]. This observation leads to the speculation that trans-acting factors could bind ARE in the nucleus and fulfill a function that is different from their cytoplasmic one. Furthermore, a considerable overlap in the binding sites for ARE-BP with other cis-elements, such as GU-rich and poly-U sequences, warrants further investigation since the formation of secondary RNA structure might involve all of the above and subsequently rule the coordinate behavior of RNA-BPs in different cellular compartments or under different cellular stimuli [43, 44, 45].

3. GU-rich element (GRE)

GU-rich elements (GREs) are recognized as essential regulators of mRNA splicing, stability, and translation in mammalian cells [11, 46]. GU-rich containing RNAs represent approximately 8% of transcripts of the human transcriptome [47]. Genome-wide analyses of mRNA decay rates allowed for discovery of non-ARE-containing cohorts of mRNAs that exhibited rapid turnover. Computational de novo motif search identified conserved sequence elements in their 3′ UTRs in a form of a consensus U(GUUUG)n sequences [13] or GU repeats [48]. These elements were first tested in vivo in reporter systems and conferred instability onto reporter mRNAs. A well-utilized rabbit beta-globin reporter system identified GREs as sequences that regulate the decay of exogenously expressed GRE-containing reporter transcripts within cells [13]. Further verification of GRE-mediated mRNA decay came from the observation that siRNA-mediated knockdown of protein CELF1 led to the stabilization of GRE-containing beta-globin reporter transcripts as well as endogenous GRE-containing transcripts [49, 50, 51]. These studies also showed that both GU-rich sequences and GU repeats are also enriched in unstable mRNAs, though a number of GUUUG pentamers in the GRE do not seem to correlate with the mRNA decay rate. GREs were subsequently tested for RNA-binding specificities to CELF1 and CELF2 proteins in systemic evolution of ligands exponential enrichment, yeast three-hybrid system selection methods, and surface plasmon resonance quantitative binding assays, revealing that the CELF family preferentially bind to 15–22 nucleotide GU-rich RNA sequences [52, 53, 54]. Several studies reported that other proteins bind to very short UG repeats with higher affinity, but dropped once the repeats became longer than 15 nucleotides (e.g., TARDBP, FUS) [55, 56]. Binding to dispersed GRE pentanucleotides (mostly by RRM-containing proteins) have also been reported, although unified functional consequences of it are just beginning to emerge (refer to a comprehensive review in Ref. [57]).

Using whole genome microarrays and high-throughput NGS methodologies, GRE targets have been identified in a number of mammalian cells, for example, resting and activated human T cells, mouse brain cells, and myoblasts or human malignant cell lines [48, 58, 59, 60, 61]. The majority of studies extensively characterized GREs as binding sites located predominantly in 3′ UTRs and caused mRNA decay (or stabilization) depending upon the cellular and environmental context [62]. These UG-rich sequences serve as binding sites for the family of CELF and ELAVL proteins. Interestingly, these two families of RNA-binding proteins share over 80% of sequence conservation within RNA recognition motifs but cause opposite outcomes: the CELF family binding to GRE leads to mRNA degradation, but the ELAVL family function as mRNA stabilizers [63]. In addition, several studies reported that UGU repeat sequences were enriched in introns, with the same frequency as AREs [64, 65]. The authors found significant enrichment of short UG-rich motifs in intronic regions flanking exons, supporting a role for GRE in alternative splicing [66, 67], which activate or repress the splicing of pre-mRNA targets through a competitive binding by MBNL and CELF proteins. This is not surprising, as an estimated 90% of human genes produce alternatively spliced mRNA transcripts [68, 69]. Alignment of the genomic regions adjacent to canonical and alternative polyadenylation sites identified UUCUG and UGUU as conserved cis-elements, which are essential for mRNA maturation and polyadenylation site utilization [70, 71, 72, 73].

Thus, ARE and GRE can regulate pre-mRNA splicing, translation, and/or mRNA deadenylation or decay depending on the repertoire of proteins they interact with in different intracellular settings. The classification of AREs and GREs has been described in multiple manuscripts [74, 75, 76, 77], and an overview is shown in Table 1 . Single nucleotide polymorphism studies in humans demonstrated that SNPs in ARE and GRE sites are associated with higher risk of human diseases that involve adaptive immune response; mutations in these conserved cis-acting elements resulted in changes in RNA stability and binding preferences for RNA-BPs (reviewed in ref. [44, 63, 78, 79]). The opposing effects of RNA-BP on mRNA turnover may have important implications for the role of posttranscriptional regulation in proliferative diseases such as cancer. Most existing data suggest that the unbalanced expression and function of ARE-BPs appears to drive neoplastic growth and proliferation and contribute to cancer pathogenesis [44, 80]. A definitive causal connection, that is clinically relevant to human pathology, has not yet been demonstrated.

4. Poly(A) tail and polyadenylation sequences

The addition and removal of the poly(A) tail are the rate-limiting steps of maturation and degradation processes that the majority of mammalian mRNAs undergo [81, 82, 83]. Two tightly coupled reactions – cleavage and polyadenylation – involve a large number of protein components. Alternative polyadenylation of RNA is a posttranscriptional modification that plays an important role in gene expression, as it produces mRNAs that share the same coding region, but differ in their 3′ UTRs. This process is highly tissue specific and results in the generation of alternative mRNA isoforms with different stability rates and translational efficiency and even subcellular localization [84, 85, 86]. In mammals, the poly(A) cleavage/polyadenylation site is composed of three sets of consensus cis-elements: the highly conserved AAUAAA hexamer and less conserved U/GU-rich and UGUA elements. A bioinformatics analysis showed that an overwhelming majority of mammalian mRNAs harbor a conserved AAUAAA or a close canonical variant, AUUAAA, sequences [87, 88]. Flanking sequences are very important for the poly(A) site to function [89]. For example, two downstream U/GU-rich regions are both necessary for binding of the specific cleavage polyadenylation complex [90, 91]. A number of trans-binding factors regulate poly(A) site utilization and the efficiency of pre-mRNA processing in the nucleus, including five large families of CPSF, HNRNP, CF, MBNL, and CSTF proteins as well as snoRNAs [92, 93, 94, 95]. These families have opposing effects on polyadenylation site utilization in nascent RNAs, determining the final pool of mature mRNA isoforms and subsequent choreography and activity of trans-binding factors in the cytoplasm (reviewed in [96, 97]). Immediately after cleavage, poly(A) polymerases (PAPs) promote lengthening of the poly(A) tail, completing the mRNA maturation process [98, 99]. Genome-wide polyadenylation site (PAS) analysis in mammalian cells identified a great diversity of PAS utilization in different tissues and organs [73, 100]. Mutations can cause the loss of the canonical adenylation signal and subsequent switch to alternative PAS utilization [101].

Another conserved regulatory cis-element is the cytoplasmic polyadenylation element (CPE). Many mammalian RNAs contain a CPE, a UUUUA/U sequence, located in the 3′ UTR. The CPE serves as a binding site for cytoplasmic polyadenylation element-binding (CPEBs) proteins 1–4 [102]. The most obtrusive differences in the CPE usage have been described under conditions of stress [103].

The nuclear poly(A)-binding proteins (PABPs) act as poly(A) keepers during the mRNA processing through first binding to newly added (A)12 nucleotides and allowing the poly(A) tail to grow up to 250 nucleotides before the mRNA is exported into the cytoplasm [104, 105]. In the cytoplasm, the poly(A) tail acts as a cis-regulatory element and mediates mRNA translation. Recently developed methodologies make it affordable to count differentially polyadenylated mRNAs and assess the length of the poly(A) tail [106, 107, 108]. In somatic cells, mRNA deadenylation can lead to the degradation or stabilization of translationally silent transcripts; however, the importance of the poly(A) tail length in these processes is currently under scrutiny as there is an evidence that the translation is regulated independently of their poly(A) tail length in the somatic cell cycle [109]. As for embryonic developmental processes, translationally repressed mRNAs can be reactivated by cytoplasmic poly(A) tail elongation at the precise time when their encoded proteins are needed to be translated [108, 110].

5. Other intermediate cis-elements

A number of ARE-like transcripts have been identified in several mammalian systems to regulate important posttranscriptional networks of gene expression.

Poly (U) sequences are the third most conserved cis-element after ARE and GRE, which have been recently found within sequence composition at cross-link nucleotides site using the CLIP assay [111]. Frequencies of poly(U) are most highly enriched for UUUUU pentanucleotides. The HNRNPC and HNRNPD (AUF1) can recognize and bind to U sequences in pre-mRNAs, mature mRNAs, and non-coding RNAs and influence target transcript diversity in the nucleus through pre-mRNA splicing and the stability in the cytoplasm [41]. It is interesting to note that clusters V of ARE and GRE elements (see Table 1 ) include hundreds of mRNAs harboring U-pentanucleotides in the 3′ UTR, suggesting that CELF and ELAVL families can also bind to poly(U) tracts under certain conditions, perhaps with lower affinity [112].

Uridylation is an independent biochemical process that is facilitated by uridylation enzymes such as ZCCHC11 and ZCCHC6. In mammalian cells, uridylation readily occurs on deadenylated mRNAs through the recognition of short poly(A) tails (<25 nt). Protein PABPC1 antagonizes uridylation of polyadenylated mRNAs, contributing to changes in mRNA half-lives [113]. MicroRNA can also induce uridylation of its targets; however, selectivity of mRNA uridylation has not been decisively demonstrated. The development of novel methods, such as TAIL-Seq, allows for genome-wide discovery of alternative mRNA tailing processes such as uridylation and guanylation at downstream sites of shortened poly(A) tails [114]. Dynamic control of mRNA tailing is implicated in turnover and translational control and is fundamental for early embryonic development [115].

GC-rich sequences were also found to be conserved in coding and non-coding regions of mammalian mRNAs. Classified as GC-rich elements (GCREs), these were identified in NCL (nucleolin), PCBP1 and UPF protein-binding complexes [116]. GCREs regulate mRNA stability, decay, and translational efficiency [117]. Several lines of evidence establish primary function for GCRE as regulators of mRNA transcription [118].

The CU-rich element (CURE) is a target for several RNA- or DNA-binding proteins, for example, PCBP1 [119] and PTBP1 [120, 121] and regulates gene expression via a broad, but poorly defined spectrum of posttranslational mechanisms.

Oligonucleotides (T/C)nGGG/G from four separate strands can be folded into stacked tertiary structures known as G-quadruplexes, forming polymorphic loops of three G-quartet layers with four G-tracts [122, 123, 124]. Folded G-structures (Gs)2–7 are found in 3′ and 5′ UTRs, but are very rare in coding and intergenic regions, and could influence all aspects of RNA metabolism [125, 126]. Studies have shown that 3′ UTR G-quadruplexes can bind more than two dozen proteins that interact with the Gs structure and serve as regulators of transcription, splicing, processing, localization, and stability and have been recently discussed in excellent reviews [127, 128]. Moreover, bioinformatics and computational scans have shown the prevalence of intermolecular DNA–RNA G-quadruplexes and (Gs)4 pairing with miRNA in mammalian cells [129, 130]. These observations imply almost endless possibilities of intermolecular interactions, which undoubtedly would have significant impact on our understanding of transcriptional and posttranscriptional gene expression and regulation in mammalian cells.

Internal ribosome entry sites (IRESs) are heterogeneous cis-acting regulatory elements located primarily in 5′ untranslated regions of mammalian mRNAs. IRESs facilitate alternative mRNA translation, skipping the need for the m7GpppN cap structure and many translation initiation trans-acting factors in the recognition process of the translation initiation codon (e.g., AUG) by ribosomal subunits [131]. Since the length of IRES can be several hundred nucleotides long, it was difficult to identify IRES’ structural elements that are important for the common secondary structures or functions [132, 133]. In depth sequence scans through the human transcriptome identified a variety of poly-U, poly-A, and CU-rich k-mers that seem to be important determinants of the IRES activity [134]. These k-mers represent binding sites for IRES trans-acting factors and are located at positions less than 150 nt upstream of the AUG start-codon [135]. Translation initiation mediated by IRES is commonly presented as a cell survival mechanism in response to stress; however, the significance of this process and implications to human diseases are unknown due to lack of solid in vitro experimental results that would unambiguously demonstrate the effect in vivo [136].

Pumilio response element (PRE) is another cis-element that is well defined in nonmammalian systems. A consensus 5′- UGUANAUA was derived from gel shift, RIP, PAR-CLIP, and crystal structure approaches [137]. It is present in almost 3000 mammalian mRNAs and serves as a cis-element for the PUM family of proteins [138, 139]. PUMs exert two modes of mRNA translational repression: deadenylation-mediated repression and a deadenylation-independent mechanism [140].

Another novel 3′ UTR motif (UAAC/GUUAU) is also prevalent (7% of mammalian 3′ UTRs contain one or more copies) and has strong species conservation [141]. This motif is a binding target for HNRNP A2/B1 and A1 and is involved in mRNA deadenylation. A fundamental role of UAAC/GUUAU and similar elements as regulators of the mammalian mRNA translational activation or repression is yet to be demonstrated [142].

6. Short multivalent regulatory motifs

Mapping mammalian pre-mRNA positional enrichment of short intronic splicing regulatory elements (ISREs) is another example of the identification of cis-acting elements that are most important for pre-mRNA splicing. De novo searches for multivalent RNA motifs identified a number of conserved tetra- to hexamers that mediate the position-specific combinatorial binding by RNA-binding proteins [143, 144]. The position of short motifs can predict the tissue-specific RNA isoform abundance and can serve as an intronic splicing enhancer or silencer during embryonic development and in adult organisms [145]. Since the consensus sequence elements of splice sites are very short (e.g., 5′-UUAGGU, AAGGAC, AAGAAC, CCUCUG, GCUGCG, CUGCUG-3′), the mechanism by which the spliceosome distinguishes them as authentic splice sites remains a long-standing question. One of the explanations provided in [146, 147] suggests that these sequences form specific secondary structures that increase binding affinities to RNA-binding motifs across many RNA-BPs. The strong association of ISREs with differences in splicing patterns, but poor evolutionary conservation, suggests the role for these motifs to act as cis-acting splice codes that allow for the progressive divergence of alternative splicing in vertebrates [148].

7. MicroRNAs (miRNAs)

MicroRNAs are conserved regulatory sequences that pervasively act, in trans, toward mRNA. miRNA-binding sites are important regulators of mRNA half-life and activity. The majority of miRNAs influence mRNA life span through biochemical interactions with mRNA and/or RNA-BPs [149]. This could be achieved through direct competition for a shared binding site or through remodeling of the mRNA structure to favor (or impede) miRNA association nearby [150]. In support of this, a recent bioinformatics analysis determined that UUUGUUU motifs, which bear an uncanny resemblance to GRE-binding sites, are enriched in the adjacent to many miRNA-binding sites, and their presence tends to augment miRNA activity [151]. On the other hand, any miRNA that contains a UGUKUGU or UAUKUAU seed sequences (K represents G or U) could in theory bind and occlude GRE-BP- or ARE-BP-binding motifs, which prevent any interaction with cis-elements within mRNA. For example, the mir-122 interaction with CELF1 has been demonstrated, proposing that CELF1 can play a role in the degradation of GRE-containing miRNAs [152]. It has been computed that the proximity of RNA-BP-binding sites and residues pairing to miRNA can quantitatively predict mRNA cis-element performance for several intensely studied RNA-BPs and miRNAs [153, 154, 155]. Although mechanistic details of interplay between cis-acting elements, RNA-BPs, and miRNAs are understudied, they perhaps should be a high priority, given recent observations that miRNA expression and/or processing are affected in many human diseases and disorders [156, 157, 158]. Significant progress has been made by bioinformaticians and biologists to better understand system biology of the RNA life cycle; several useful metadata hubs were created, which incorporate existing experimental data and computational approaches [159, 160]. The comprehensive list of available software and websites has been recently reviewed in Ref. [161]. However, we are still far from having a comprehensive understanding of mechanisms of RNA biogenesis and its relevance in physiological and pathological conditions.

8. Pathological cis-elements

The human genome contains a large number of short repetitive sequences that are prone to higher than average mutation rates and transcriptional errors [162], which can engender a tandem repeat expansion in cis-acting elements of 3′ or 5′ UTR, introns, or coding regions, and cause a large variety of inherited human diseases. For example, endogenous nucleotide repeat expansions are implicated in many human autosomal dominant diseases and have emerged as new groups of repeat expansion disorder associated with tri- or pentanucleotide repeat expansion pathogenesis. Pathological repeats can elicit toxicity that is triggered by toxic RNA or abnormally translated protein dipeptide or homopolymeric peptides [163]. Disorders as such include, but are not limited to the following conditions:

  • Spinocerebellar ataxia (SCAs types 1–37) is the largest and the most diverse group of inherited neurological diseases in which neurological dysfunction is driven by defects known as ataxias. Several mutations in tandem repeat expansions were discovered, including coding (CAG)n mutations in SCA1, 2, 3, 6, 7, and 17 genes; non-coding (CTG)n in SCA8 [164]; non-coding (CAG)n in SCA12; (ATTCT)n – in SCA10; (TGGAA)n – in SCA31; and (GGCCTG) – in SCA36 (please seeOMIM.orgfor details).

  • Myotonic dystrophies (DM), where (DM1) is associated with >300 CUG, repeats in the DMPK mRNA; and (DM2) – with >CCUG repeats in ZF9 mRNA [165].

  • Amyotrophic lateral sclerosis (ALS) and frontotemporal dementia are associated with GGGGCC/CCCCGG repeat expansion in the non-coding region of the C9orf72 (C9ALS/FTD) gene [166].

  • Huntington’s disease is caused by CAG expansion repeats in the HTT gene [167];

  • Fragile X syndrome (FXS) arises when the FMR1 gene reach <230 CGG repeats.

  • Fragile X-associated tremor/ataxia syndrome (FXTAS) is associated with CGG/CCG repeat expansion in the fragile X gene, FMR1 [168].

Molecular pathogenesis of endogenous nucleotide repeat expansion diseases is complicated and pertained to the presence ofrepeat-associatednon-AUG translation (RAN), where translation of mutant polypeptides is initiated without an AUG-initiation codon or it is driven by the open reading frame shifts due to expanded three-base-pair repeats during skipped mispairing in the course of DNA synthesis (reviewed in [169, 170]). Although the posttranscriptional modification state of these transcripts (e.g., mRNA capping and polyadenylation) is unknown, two translational pathways are described: (1) ATG-initiated translation produces multiple polypeptides if there are multiple ORFs within the transcript. (2) RAN translation of the expanded repeat can produce up to six distinct RAN polypeptides: poly-Gln, poly-Ala, and poly-Ser RAN proteins (from CTG/CAG repeats); and poly-Leu, poly-Ala and poly-Cys polypeptides from the CAG/CUG repeat mRNA. Repeats located in antisense transcripts of above listed genes are also substrates for RAN translation, further expanding the number of pathological dipeptides or homopolymeric RAN proteins produced during disease pathogenesis.

An interesting common aspect of these pathologies is that they are caused by mutated cis-elements and are often produced through bidirectional transcription. Resultant toxic RNA causes intracellular stress and sequestration of RNA-BPs toward expanded sequence repeats [171], which changes the biochemistry of posttranscriptional regulatory networks in affected tissues. The abovementioned diseases represent an incomplete list of a growing number of disorders that can potentially have similar therapeutic opportunities. The recently developed ‘base editor’ CRISPR-Cas9 methodology has demonstrated a high power of nucleotide-level precision editing, making this approach suitable for repeat excision as genetic therapies for the above listed conditions [172] and may also correct many other RNA pathologies, for example, those driven by nonsense-mediated mRNA decay [173].

9. Models for the effects of cis-acting elements

mRNA molecules move through different cellular compartments within messenger ribonucleoprotein (mRNP) complexes in dynamic association with RNA-binding proteins that bind to conserved cis-elements shared by subsets of transcripts [174]. The association of specific trans-binding factors with conserved regulatory cis-elements shared by subsets of mRNAs coordinates the fate of these bound transcripts through posttranscriptional processes such as splicing, intracellular localization, translation, storage, or mRNA decay [175, 176]. Not surprisingly, very few transcripts have only one type of regulatory element. Focusing on individual scenarios, we built a concise predictive model of higher-order complexes that can be formed simultaneously within different cellular compartments, starting in from the nucleus and moving into the cytoplasm.

  1. Regulation of splicing by cis-elements ( Figure 1A ):

    The cis-elements within precursor RNA are catalyzed by different components of the spliceosome during constitutive splicing events [177]. Binding by RNA-BP to short intronic splicing regulatory elements (ISREs) regulates exon inclusion or exon skipping during stage-specific constitutive splicing transitions, in a position-dependent manner [67]. These processes are orchestrated by biochemical recognition and binding on a competitive basis by a family of U proteins that compose the spliceosome.

    RNA-BPs also bind to multivalent intronic sequences in precursor mRNA and regulate the alternative splicing (e.g., exon skipping, alternative splice site retention, or intron retention). Alternatively-spliced transcripts may contain different 3′ or 5′ UTRs that can be subject to differential translational regulation of mature transcripts. An important regulators of alternative splicing efficiency are PTBP, SR, RBM, and HNRNP families of proteins and snRNAs. The use of alternative exons leads to the production of transcripts with different open reading frames (ORFs) and diversifies the repertoire of encoded proteins, giving rise to protein isoforms with alternative N- and C- termini.

  2. Regulation of adenylation by cis-acting elements ( Figure 1B ):

    Alternative polyadenylation (APA) occurs in a tandem manner with splicing. Many splicing factors are also 3′-end processing factors within the mRNA 3′-end cleavage and polyadenylation (CPA) complexes. The recognition of cis-elements upstream of canonical or alternative PAS serves as a docking site for specific RNA-binding proteins (e.g., CPSF, CF, CSTFs, HNRNPs, MBNL, and CPEB), which in turn recruit canonical poly(A) polymerases (PAPOL). The CPA complex requires stabilization by a downstream GU/GC-rich sequence element (DSE) and its interaction with the CPSF-processing factors. The upstream sequence element (USE) is U-rich and serves an auxiliary role, binding to CF and PAPOL, and also stabilizes the cleavage complex.

    The cleavage and polyadenylation specific factor (CPSF) binds weaker noncanonical polyadenylation (AUUAAA) signals and cuts at the proximal polyadenylation site (PAS). The utilization of distal canonical PAS results in the processing of the full mature transcript. Cleavage at the proximal PAS leads to shortening of the 3′ untranslated region and loss of regulatory sequences within the 3′ UTR (e.g., ARE or GRE or miRNA-binding sites). MBNL can mask the region upstream of weak noncanonical PA signals, blocking the binding of cleavage factor I (CF).

    The CPEB1 protein binds the cytoplasmic polyadenylation element (CPE, consensus sequence 5′-UUUUUAU -3′) located upstream of non-canonical PA signals within the mRNA and shuttles it into the cytoplasm. The cytoplasmic CPEB1-CPE complex recruits poly(A) polymerase (PAP), which promotes the lengthening of the poly(A) tail and increases translation efficiency. The greater the distance between CPE and poly(A) tails of transcripts, the weaker the rate of adenylation.

  3. Regulation of translation by cis-acting elements ( Figure 1C ):

    Most eukaryotic mRNAs are translated by the cap-dependent mechanism, which requires recognition of the cap structure (m7GpppN) at the 5′ end by early initiation factor complexes (eIFs). EIFs recruit ribosomal subunits and initiator Met-tRNA and scan along the 5′ UTR of the mRNA to reach the start codon (an AUG triplet). During the scanning, the secondary RNA structure unwinds in an ATP-dependent manner. The 5′ UTR is rich in GC-content and is prone to folding into secondary structures, which may hinder ribosomal assembly [178]. Hairpin loops as secondary structure regulatory elements were described only for a handful of mRNAs, and their role in genome-wide translation is not known. A combination of new ribo-sequencing with fluorescent visualization might shed light on the role of hairpin loops in translation in the near future [179, 180, 181, 182]. Other internal 5′ UTR cis-element structures are AREs and GREs. Their effects on translation are mediated by a combination of RNA-BPs. They are often found to be part of hairpin loops. Visualizing a folded hairpin structure in vivo is not possible at current resolution limits.

    The translation initiation via internal ribosomal entry site (IRES) occurs in a cap-independent manner. Mammalian IRES facilitates bypassing of the eIF4E-m7GpppN cap interaction and recruitment of the small and large ribosomal subunits and tRNA to the transcript, initiating translation at the canonical AUG start codon.

    G-quadruplexes within/near IRES may potentiate alternative translation. However, G4 structures in 3′ or 5′ UTRs and an open reading frame mainly repress cap-dependent translation (reviewed in Ref. [183]).

    The poly(A) tail also plays a role in translation as an mRNA stabilizer and a facilitator of mRNA circularization, which promotes translation. De-adenylation processes tend to slowdown the translation rate and eventually lead to mRNA degradation.

  4. Regulation of mRNA stabilization or decay by cis-acting elements ( Figure 1D ):

    In mammalian cells, mRNA stabilization or decay is regulated by cis-elements in the 3′ UTR. Numerous known RNA-BPs serve as trans-binding factors for ARE/GRE and other elements to facilitate transcript deadenylation and subsequent decay by exonucleases. There are also a number of RNA-BPs with the opposite function, which stabilize and promote mRNA translation. Posttranslational alteration of RNA-BPs (particularly within RNA-binding domains) can lead them to dissociate from RNA-binding complexes, and be replaced by other competitors, thereby contributing to mRNA de/stabilization [76]. A fine-tuned balance must be reached in cells for proper function at the organismal level.

  5. Interplay between mRNA, miRNA and RNA-BPs ( Figure 1E ):

    The estimates on how different miRNA and mRNA are loaded into the RNA-BP-bound RISC (RNA-induced silencing complex) were derived from CLIP assays results [184, 185, 186]. Several scenarios are possible to extract from these: If both miRNA and RNA-BP are bound to the 3′ UTR of mRNA, they will be sufficiently close to each other and the complex can be identified by CLIP. They would work cooperatively to promote the assembly of decay machinery. Independent binding by a competitor RNA-BP might disrupt this complex. The strength of miRNA-mRNA canonical and noncanonical bond formation can be computed to project possible biochemical outcomes [187, 188, 189].

    The mRNA 3′ UTR length and secondary structure formation can greatly influence both miRNA and RNA-BP-binding efficiency; it can also disrupt or assuage the assembly of RNA-BP complexes by providing high affinity or multioccupancy binding sites. The outcomes of this scenario could be anywhere from marginal translational repression to accelerated mRNA degradation.

    Cis-acting sequences within miRNAs that resemble cis-elements (ARE or GRE) have perfect complementarity to RNA-BP’s RNA-recognition motifs (RRMs). They can, in theory, occlude RRM-binding sites, acting as alternative inhibitors of RNA-BP activity. This could potentiate (or hinder) translational repression and mRNA degradation of target mRNA, depending on which RNA-BP was affected.

Figure 1.

Predictive scenarios of cis-element effects and trans-binding factors behavior on mRNA splicing, adenylation, translation, and decay. Blunt arrows indicate direct suppression; arrows represent activation. These figures are made by using the ingenuity pathway analysis software based upon the observations from previous studies or suggested regulatory mechanisms. A. Consensus multivalent sequences represent the intronic splice sites that are recognized by a family of small nuclear ribonucleoproteins (U snRNPs). These regulatory cis-elements can be divided into two types: (1) intronic regions which almost always begin with the dinucleotide GU and end with AG; and (2) intronic regions which have either AU and AC termini or GU and AG termini. Introns are also rich with pyrimidine nucleotides that cumulatively compose a pyrimidine binding tract, which also have a unique poly(A) branch point sequence upstream. Of the other four types of cis-acting elements: two are located within exons (exonic splicing enhancers, ESEs, and exonic splicing silencers, ESSs), and two are located within introns (intronic splicing enhancers, ISEs, and intronic splicing silencers, ISSs). The key trans-acting splicing factors are shown: SR, serine/arginine-rich (SR) proteins; U1 small nuclear ribonucleoproteins (U1 snRNPs); HNRNPs, heterogeneous nuclear ribonucleoproteins; PTB, polypyrimidine tract binding protein. B. Adenylation of pre-mRNA is triggered by cis-regulatory sequences named poly(A) signals: AAUAAA or/and AUUAAA; the U/GU-rich and UGUA elements. By direct analogy to splicing, canonical adenylation is regulated by RNA-BPs or snRNAs. CF, cleavage factor; CSTF, cleavage stimulation factor; CPSF, cleavage polyadenylation specificity factor; MBNL, muscle blind like protein; PAP, poly(A) polymerase; PABP, poly(A) binding protein; CPEB, cytoplasmic polyadenylation element binding protein; miRNA BS, miRNA binding sites; S RNA-BP, stabilizing RNA-binding protein; D RNA-BP, destabilizing RNA-binding protein; CPA, cleavage polyadenylation assembly; CPE, cytoplasmic polyadenylation element. C. Cis-mediated regulation of canonical and alternative translation includes sequences in all parts of mRNA. In canonical translation, the initiation factors (RNA-BPs) bind the 5′ m7GpppN cap, and then linearly scan through the 5′ UTR until reaching an AUG start codon. For simplicity, the components of the translation machinery are shown as eIF2 and eIFs (eukaryotic early translation initiation factors). PABP, poly(A) binding protein; IRES, internal ribosomal entry site; P, phosphorylation of RNA-BP. D. Schematic illustration of the cytoplasmic mRNA decay complex formation. The details for this scenario are provided in the text. S RNA-BP, stabilizing RNA-binding proteins; D RNA-BP, destabilizing RNA-binding proteins; PABP, poly(A)-binding protein; eIF2 and eIFs, eukaryotic early translation initiation factors. E. Scenarios for miRNA mediated mRNA translational repression or decay pathways. The details for this scenario are provided in the text. RISC, RNA-induced silencing complex; P, phosphorylation of RNA-BP.

10. Conclusions and perspectives

Examples given in this chapter suggest that mRNA regulation is important in multiple aspects of mammalian biology; however, it is largely unknown how the combinatorial regulation is achieved at the biological complexity of the organisms. Transcriptome-wide mapping of cis-elements and trans-binding sites demonstrates huge regulatory potentials for non-coding parts of mRNA. The more details we learn about cross-talk, molecular assembly, and compartmentalization of RNA-protein complexes, the more unifying principles we may find. Understanding of the factors and elements involved in the regulation of a particular gene expression in a single cell [190] is of paramount importance when designing molecular therapies or when attempting to modulate the expression of a target gene. Thus, scientists and geneticists have exciting opportunities ahead in the field of therapeutic genome editing.

Acknowledgments

This work is supported by University of Minnesota department of Medicine start-up fund to I. V-S. We acknowledge the University of Minnesota Supercomputing Institute for providing the access to Ingenuity Pathway Assistant.

Conflict of interest

None declared.

Abbreviations

3′ UTR3′ untranslated region
AREAU-rich element adenylate(A)- and uridylate(U)-rich element
DMPKDystrophia myotonica protein kinase
GREGU-rich element, guanidine(G)- and uridylate(U)-rich element
m7GpppN cap7-Methylguanosine cap
Met-tRNAMethionine loaded onto transfer RNA
NFkBNuclear factor kappa-light chain enhancer of activated B cells
PTBPPolypyrimidine tract binding protein
RRMRNA-recognition motif
SRSF1Serine/arginine-rich splicing factor
UPF1Up-frameshift protein 1

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Irina Vlasova-St. Louis and Calandra Sagarsky (February 21st 2018). Mammalian Cis-Acting RNA Sequence Elements, Gene Expression and Regulation in Mammalian Cells - Transcription From General Aspects, Fumiaki Uchiumi, IntechOpen, DOI: 10.5772/intechopen.72124. Available from:

chapter statistics

634total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

Influence of Endogenous Viral Sequences on Gene Expression

By Kozue Sofuku and Tomoyuki Honda

Related Book

First chapter

Stress Response of Dietary Phytochemicals in a Hormetic Manner for Health and Longevity

By Ceren Gezer

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us