NGS‐based methods to profile transcriptome‐wide RNA modifications.
Epitranscriptomics is a newly burgeoning field pertaining to the complete delineation and elucidation of chemical modifications of nucleotides found within all classes of RNA that do not involve a change in the ribonucleotide sequence. More than 140 diverse and distinct nucleotide modifications have been identified in RNA, dwarfing the number of nucleotide modifications found in DNA. The majority of epitranscriptomic modifications have been identified in ribosomal RNA (rRNA), transfer RNA (tRNA), and small nuclear RNA (snRNA). However, in total, the knowledge of the occurrence, and specifically the function, of RNA modifications remains scarce. Recently, the rapid advancement of next‐generation sequencing and mass spectrometry technologies have allowed for the identification and functional characterization of nucleotide modifications in both protein‐coding and non‐coding RNA on a global, transcriptome scale. In this chapter, we will introduce the concepts of nucleotide modification, summarize transcriptome‐wide RNA modification mapping techniques, highlight recent studies exploring the functions of RNA modifications and their association to disease, and finally offer insight into the future progression of epitranscriptomics.
- RNA modifications
- gene expression
RNA has been shown to play critical roles in regulating cellular functions. Comparative transcriptomics between mammals has revealed that ∼66% of human genomic DNA is transcribed. Remarkably, only ∼2% of the transcriptional production is protein‐coding messenger RNA (mRNA), while ∼98% encompasses a wide variety of non‐coding RNA (ncRNA) molecules [1, 2]. ncRNAs have been classified functionally as either housekeeping or regulatory. The housekeeping ncRNA genes include ribosomal RNA (rRNA), transfer RNA (tRNA), and small nuclear RNA (snRNA), while examples of regulatory ncRNAs are microRNA (miRNA) and long non‐coding RNA (lncRNA) [3–5]. The complexity of RNA is further complicated by numerous post‐transcriptional modifications which alter the chemical structure of the nucleotides without changing the nucleotide sequence. Similar to the field of epigenetics which investigates the modifications of DNA and histone proteins, the study of chemical modifications of RNA is called epitranscriptomics [6, 7]. More than 140 chemically diverse and distinct modified nucleotides have been identified in both mRNA and ncRNA, including N6‐methyladenosine (m6A), 5‐methyl cytidine (m5C), pseudouridine (Ѱ), adenosine (A) to inosine (I), and N1‐methyladenosine (m1A). These modifications have been identified mostly in the housekeeping ncRNAs [3, 4, 8]; however, chemical modifications have also been detected in mRNA and the regulatory ncRNAs [9–11]. Unfortunately, the knowledge about the occurrence and function of RNA modifications at transcriptome level remains scarce. Recently, the interest in RNA modifications and their functions have gained momentum owing mainly to the application of novel modifications to next‐generation sequencing (NGS) and mass spectrometry technologies, which have allowed transcriptome‐wide detection of distinct RNA modifications [12, 13]. Accurate regulation of the transcriptome is critical for gene expression and its subsequent control of cellular functions, including metabolism, proliferation, differentiation, and development. Thus, alterations in transcriptome regulation can disrupt cellular functions and lead to disease. Accumulating evidence has identified and functionally characterized several distinct types of chemical modifications of RNA nucleotides in both protein‐coding and ncRNAs, further advancing the burgeoning field of epitranscriptomics. In this chapter, we will first provide an overview of RNA modifications and then synopsize several transcriptome‐wide RNA modification mapping techniques such as m6A‐seq, m5C‐seq, pseudouridine‐seq, and NAD captureSeq. Next, we will highlight novel insights into the potential functions of RNA modifications and their disease relevance as revealed and facilitated by epitranscriptomic profiling. Finally, we will offer our perspective on how the field will progress or evolve in the near future.
2. An overview of post‐transcriptional modifications of RNA
The process of mRNA maturation involving 5ʹ‐capping, splicing, and polyadenylation has been well studied . However, the more subtle post‐transcriptional modifications of epitranscriptomics, also termed RNA‐epigenetics, are now just fully coming to light. The post‐transcriptional modifications found in RNA are often called marks because they mark a region of RNA that potentially contributes to the regulation of cellular processes, including gene expression, protein translation, or RNA stability. Like mRNA maturation, enzymes are required to catalyze the reactions, which chemically modify RNA nucleotides. The most common post‐transcriptional RNA modification, Ψ, was also the first to be discovered . Originally discovered in rRNA and tRNA, Ψ modifications are also present in mRNA [16, 17]. Site‐specific isomerization of uridine (U) to Ψ (5‐ribosyluracil) is irreversibly catalyzed via Ψ synthases. The family of Ψ synthases (PUS) consists of enzymes which can either function independently or those that require H/ACA ribonucleotide complexes . Compared to U, Ψ contains an extra imino group (>C═NH), which serves as an additional hydrogen bond donor, while the carbon‐carbon (C─C) glycosidic bond linking the sugar to the base is more stable than the carbon‐nitrogen (C─N) found in U. These two chemical changes confer rigidity to the sugar‐phosphate backbone and enhances local base stacking .
The most common internal modification in eukaryotic mRNA is m6A . Unlike Ψ, m6A modifications are reversible, suggesting that the modifications are involved in regulatory switches. Methyltransferases (METTL3, METTL14, and WTAP), termed writers, catalyze the methylation of adenosine [21–23], whereas demethylases (FTO and ALKBH5), termed erasers, remove the methyl group [24, 25]. The m6A marks are recognized by YTH domain proteins, termed readers, which regulate mRNA processing and metabolism [26, 27].
An additional class of nucleotide modifications, termed RNA editing, creates an irreversible change in the nucleotide sequence. These modifications include insertions, deletions, and base substitutions and occur in all classes of RNA. When they occur in mRNA, the amino acid sequence of the protein will be altered relative to the sequence encoded by genomic DNA. RNA editing by deamination results in adenosine (A) to inosine (I) and cytosine (C) to uridine (U). A‐to‐I editing is an abundant class of RNA modifications found throughout metazoans . The conversion of A‐to‐I residues by base deamination results in the synthesis of distinct proteins, which creates functional diversity and serves to enhance the response to rapid environmental changes . RNA editing by deamination is mediated by two major classes of enzymes; the first class is a group of tissue‐specific and context‐dependent adenosine deaminases called ADARs [30–32]. The ADAR enzyme class (adenosine deaminases acting on RNA) catalyzes hydrolytic deamination of A‐to‐I in double‐stranded regions of RNA secondary structure . The second class of enzymes, the vertebrate‐specific apolipoprotein B mRNA editing catalytic polypeptide‐like (APOBEC) family, promotes C‐to‐U editing by cytosine deamination . APOBEC1, the first‐discovered member of the APOBEC family, was characterized as the zinc‐dependent cytidine deaminase which catalyzed a C‐to‐U modification, resulting in an in‐frame stop codon in APOB mRNA .
3. NGS‐based RNA modification techniques
The first transcriptome‐wide and NGS‐based approach for mapping m6A modifications demonstrated the feasibility of identifying RNA modifications across the entire transcriptome and established the field of epitranscriptomics . The most important aspects of NGS‐based techniques are the ability to map modifications on a global scale at the single nucleotide resolution and that the modified nucleotides are analyzed within the context of the surrounding gene sequence. These features insure that the nucleotide modifications are accurately assigned to the appropriate RNA and not falsely attributed to homologous genes or RNA contaminates . Now, several high‐throughput NGS‐based technologies, including RNA‐seq, have been established to profile and quantitate RNA modifications (m6A, m6Am, m5C, m1A, A‐to‐I, Ѱ, and NAD cap). These RNA‐seq‐based methodologies can be divided into two classes: immunoprecipitation‐based and chemical‐based methods. Table 1 lists six representative NGS‐based detection methods of RNA modifications.
|m6A‐seq , MeRIP‐seq , m6A‐LAICIC‐seq ||m6A, m6Am||Methyl‐RNA immunoprecipitation and UV cross‐linking|
|m1A‐ID‐seq ||m1A||Methyl‐RNA immunoprecipitation and the inherent ability of m1A to stall reverse transcription|
|Bisulfite sequencing ||m5C||Chemical conversion of modified nucleotides|
|ICE‐seq ||A‐to‐I editing||Cyanoethylation of RNA combined with reverse transcription|
|Pseudo‐seq , Ѱ‐seq ||ѱ||Chemical modification to terminate reverse transcription in the pseudouridylated site|
|NAD captureSeq ||NAD||Chemoenzymatic capture|
RNA immunoprecipitation (RIP)‐based methods use an RNA modification‐specific antibody or an enzyme‐specific antibody to capture modified RNA followed by RNA‐seq. m6A‐seq , methylated RIP‐seq (MeRIP‐seq)  and m6A‐level, and isoform‐characterization sequencing (m6A‐LAIC‐seq)  combine RNA‐seq with RIP specific for m6A methylation. Figure 1A displays a typical m6A‐seq workflow. RIP is performed using an anti‐m6A antibody to enrich m6A‐modified RNAs followed by cDNA library preparation and high throughput NGS sequencing and finally analysis to identify the occurrence and consensus motif (RRACU) of global m6A modifications. A modified RIP approach, called m6A individual‐nucleotide‐resolution by cross‐linking and immunoprecipitation (miCLIP), uses ultraviolet light‐induced antibody RNA cross‐linking to induce site‐specific mutations at m6A marks. These mutational signatures block reverse transcription and facilitate the detection of m6A marks at single‐nucleotide resolution . As illustrated in Figure 1B, m1A‐ID‐seq, which combines m1A immunoprecipitation and the m1A residue to cause truncated reverse transcription products, has been applied successfully for the transcriptome‐wide characterizations of m1A .
Chemical‐based methods rely on the misincorporation of nucleotide or nucleotide conversion to truncate or stop RNA products during reverse transcription. RNA bisulfite conversion followed by high‐throughput sequencing (BS‐seq, Figure 2A) is a chemical conversion method based on converting unmodified cytosine residues to uracil and keeping m5C residues unchanged by bisulfite treatment. BS‐seq is the only method currently available for the detection of site‐specific endogenous m5C [40, 41]. Inosine chemical erasing (ICE) uses nucleotide switching to detect A‐to‐I modifications . Inosine ribonucleotides are cyanoethylated with acrylonitrile to form N1‐cyanoethylinosine (ce1I). Subsequently, the Watson‐Crick base pairing of I with C is inhibited by the newly formed N1‐cyanoethyl group of ce1I. Thus, cyanoethylation of I blocks cDNA synthesis by preventing extension of the cDNA that bears a cytosine (C) corresponding to the editing site during reverse transcription. However, I will be replaced by guanosine (G)  (Figure 2B). To detect RNA pseudouridylation, several groups developed Pseudo‐seq (Ѱ‐seq). RNA is treated with N3‐[N‐cyclohexyl‐Nʹ‐β‐(4‐methylmorpholinium) ethylcarbodiimide‐Ѱ (N3‐CMC‐Ѱ)], which binds covalently to U, G, and Ѱ residues and then exposed to alkaline pH to reduce stable U‐CMC and G‐CMC adducts. Reverse transcription will pause at the remaining intact Ѱ‐CMC sites, allowing for the mapping of Ѱ‐modifications [16, 17] (Figure 2C). Comparison of mapping reads from CMC‐treated samples versus non‐treated controls, Ѱ will be detected as the sites with an increased proportion of reads supporting reverse transcription termination. NAD captureSeq (Figure 2D) requires the chemo‐enzymatic modification of NAD which is capping the 5ʹ end of RNA. The first step, the transglycosylation of NAD, is catalyzed by ADP‐ribosyl cyclase (ADPRC) from Aplysia californica in the presence of an alkynyl alcohol. In the second step, the modified NAD is biotinylated by a copper‐catalyzed azide‐alkyne cycloaddition. Thirdly, the biotin‐linked RNA is captured on streptavidin beads and processed further for cDNA library preparation and NGS. The NAD‐biotin‐captured sequences are then identified by comparison with the control samples which were not subjected to the first step of chemo‐enzymatic biotinylation .
4. Physiological functions of RNA modifications
Although we do not have full knowledge on the effects of RNA modification on physiological function, there is increasing evidence that they play critical roles in the regulation of gene expression, cellular functions, and development. Disruptions of RNA modification mechanisms have also been associated with disease. We present here a few examples, which demonstrate the importance of RNA modification on physiological function.
As stated earlier, m6A modifications are commonly found throughout eukaryotes, as demonstrated by multiple m6A‐seq studies. Human m6A‐seq analyses revealed 12,769 putative m6A sites within 6990 and 250 protein‐coding and non‐coding transcripts, respectively , whereas, in mice, 4513 m6A peaks were identified in 3376 and 66 protein‐coding and non‐coding transcripts, respectively . The m6A consensus motif, RRACU, was identified with a median distance from m6A peaks of 24 nucleotides . Interestingly, the majority of m6A sites were conserved between both mouse and human transcriptomes and enriched further within long internal exons and around stop codons, suggesting strong evolutionary selection [26, 36]. m6A‐LAIC‐seq showed that methylated transcripts utilized proximal alternative polyadenylation (APA) sites, which resulted in shorter 3′ untranslated regions, whereas non‐methylated transcripts tended to use distal APA sites . This observation correlated with the finding that m6A‐modified transcripts had both significantly shorter RNA half lives and slightly lower translational efficiencies than unmarked transcripts .
In vitro and in vivo genetic depletion of the m6A writer, Mettl3, in both mouse and human, led to the absence of m6A modification within Nanog mRNA which encodes a pluripotency factor. The absence of m6A marks extended Nanog expression throughout differentiation and inhibited embryonic stem cell exit from self‐renewal towards lineage differentiation . m6A‐seq in mouse naïve embryonic stem cells (ESCs), 11‐day‐old embryoid bodies (EBs), and mouse embryonic fibroblasts (MEFs) revealed m6A marks in naïve pluripotency‐promoting genes reduced mRNA stability of key pluripotency‐promoting transcripts and facilitated differentiation . These findings suggest that m6A modification provides the flexibility of the stem cell transcriptome required to differentiate into different lineages . NANOG is also important in both the maintenance and specification of cancer stem cells which can metastasize and form primary tumors. The exposure of breast cancer cells to hypoxia induced the expression of the eraser ALKBH5 which resulted in m6A demethylation in the 3ʹ UTR of NANOG mRNA and the increased half life of NANOG mRNA, thereby promoting the breast cancer stem cell (BCSC) phenotype . The m6A reader YTHDF2 protects the 5′ UTR of stress‐induced transcripts from demethylation. Cap‐independent translation initiation was enhanced by 5′ UTR methylation . m6A modification is critical for the regulation of HIV‐1 replication and HIV‐1's effect on the host immune system . HIV‐1 viral infection induced m6A modification in both host and viral mRNAs. HIV‐1 coding, non‐coding, and splicing regulatory regions contained a total of 14 m6A methylation peaks. In addition, methylation of two highly conserved m6A target sites in the HIV‐1 rev response element (RRE) stem loop II region enriched the binding of the HIV‐1 rev protein to the RRE in vivo and enhanced nuclear export of HIV‐1 RNA . The long non‐coding RNA X‐inactive specific transcript (XIST) regulates transcriptional silencing of genes on the X chromosome. XIST is heavily modified with at least 78 m6A sites. Knockdown of METTL3 leads to decreased XIST m6A marks and impairs XIST‐mediated gene silencing .
The tRNA T‐loop at position 58 commonly contains a m1A modification , along with position 9 of metazoan mitochondrial tRNAs  and eukaryotic rRNAs . Initiator tRNAMet contains fully modified m1A 58 which stabilizes its tertiary structure. Hypomodification of tRNA m1A 58 affects the association with polysomes and the subsequent efficiency of translation [53, 54]. m1A modifications in tRNA function in response to environmental stress , whereas m1A‐modified rRNA regulates ribosome biogenesis . m1A‐ID‐seq demonstrated that m1A methylation regulated the dynamic response to stimuli and identified 901 m1A peaks enriched within the 5ʹ UTR near the start codons of 600 distinct protein‐coding and non‐coding RNAs .
m5C sites have been detected in several eukaryotic tRNA, Rrna, and mRNA. m5C marks stabilize the secondary structure of tRNA, alter aminoacylation and codon recognition , and regulate translational fidelity . A low level of internal m5C was found in mRNA cap structures in mammalian‐ and virus‐infected mammalian cells [58, 59]. BS‐seq identified 10,275 sites in protein‐coding and non‐coding RNAs . m5C marks in mRNAs were enriched near argonaute‐binding sites within the 3ʹ UTR .
A‐to‐I editing sites are distributed through human mRNA, including exons, introns. and 5ʹ and 3ʹ UTRs . Alu repeat elements contain the highest frequency of A‐to‐I editing sites among the untranslated regions of the genome . Intronic editing mediated by ADAR1 contributes to the maintenance of mature mRNA by protecting it against unfavorable processing of the Alu sequence and by degradation of aberrant transcripts by nonsense‐mediated decay (NMD) . A‐to‐I RNA editing is diminished in brain tissue from patients with Alzheimer's disease relative to controls . The reduction occurs predominantly in the hippocampus and to a lesser extent in the temporal and frontal lobes. These alterations result in decreased levels of protein recoding, the process of changing the amino acid sequence by A‐to‐I editing, in Alzheimer's disease . The APOBEC3 family of cytidine deaminases has been associated with mutations in cancer genomes in several types of cancer. Accumulated data linking mutations in oncogenes and tumor suppressor genes with APOBEC3B activity are providing evidence that cytidine deaminase‐induced mutagenesis is activated in tumorigenesis, thus providing novel therapeutic targets .
Pseudo‐seq revealed that mRNA Ψ marks mRNA are regulated in response to stimuli, such as serum starvation in human cells and nutrient deprivation in yeast. The observations indicate that Ψ triggers a rapid regulatory mechanism to rewire the genetic code through inducible mRNA . Pseudouridylation of rRNA and telomerase RNA component (TERC) were also found to be reduced in dyskeratosis congenita patients . Furthermore, missense mutations in pseudouridine synthase 1 (PUS1) may lead to deficient pseudouridylation of mitochondrial tRNAs in mitochondrial myopathy and sideroblastic anemia (MLASA) patients .
NAD captureSeq identified NAD as a 5ʹ RNA cap in a subset of regulatory RNAs in bacteria  and subsequently proposed that this type of capping may be common across all of life . It is safe to predict that investigation of the roles and mechanisms of 5ʹ NAD caps in eukaryotes will draw increasing attention in the biomedical field. This is due to mainly two reasons. First, the chemical modification of the 5ʹ end of RNA is critical for RNA processing, localization, stability, translational efficiency, and epitranscriptomic regulation of gene expression . Second, NAD is both a co‐substrate for enzymes, such as the sirtuins and poly(adenosine diphosphate‐ribose) polymerases, and a critical electron‐carrying coenzyme for enzymes that catalyze oxidation‐reduction reactions. NAD is involved in nearly all physiological processes. For example, cellular NAD+ levels are modulated during aging, and the use and production of NAD+ usage has been associated with prolonged health and life spans . Regulation of NAD‐mediated RNA capping and hence gene expression will undoubtedly enrich our understanding of NAD's expanding roles in normal physiology and disease pathogenesis.
Although rapid advances have been made in the past few years in epitranscriptomics, more work is needed in this field. To date, more than 140 different RNA modifications have been identified. However, there are only a few reliable high‐throughput techniques available to determine the global occurrence of a particular RNA modification. Thus, there is a need for the development of more high‐throughput techniques to characterize the full spectra of RNA modifications. It is also important to pursue the comprehensive identification and characterization of the enzymes responsible for RNA modification since several of these enzymes have been shown to play important roles in development and disease. It is essential to decipher all functions and disease involvements of all RNA modifications. Development of additional technologies to alter RNA modifications, including the engineering of RNA‐modifying enzymes with modified substrate specificity and activity via the CRISPR‐Cas 9 system, will open the door to new types of detection and analysis pipelines. With further technological development, we will be able to elucidate the sequence‐specific signatures in RNA that direct modifications and then better relate these RNA marks to their corresponding biological functions. Finally, the advancement of current approaches, coupled with new technologies, will allow for the development of new therapies and therapeutic targets for human diseases associated with deficient RNA modification.