Classification of small noncoding RNAs.
Despite an inability to encode proteins, small noncoding RNAs (sncRNAs) have critical functions in the regulation of gene expression. They have demonstrated roles in cancer development and progression and are frequently dysregulated. Here we review the biogenesis and mechanism of action, expression patterns, and detection methods of two types of sncRNAs frequently described in cancer: miRNAs and piRNAs. Both miRNAs and piRNAs have been observed to play both oncogenic and tumor-suppressive roles, with miRNAs acting to directly regulate the mRNA of key cancer-associated genes, while piRNAs play crucial roles in maintaining the integrity of the epigenetic landscape. Elucidating these important functions of sncRNAs in normal and cancer biology relies on numerous in silico workflows and tools to profile sncRNA expression. Thus, we also discuss the key detection methods for cancer-relevant sncRNAs, including the discovery of genes that have yet to be described.
- small noncoding RNAs
- gene expression profiling
- computational biology
The central dogma of molecular biology that has prevailed for many decades, states that genetic information flows from DNA to RNA to protein. Nevertheless, RNAs that do not encode proteins were discovered as early as the 1950s [1, 2]. While protein-coding genes represent less than 2% of the human genome, it has been established that ~90% of the genome can be transcribed .
Small noncoding (snc) RNAs refer to ncRNA species that are <200 nucleotides in length and can be further categorized by their shared molecular features and biological mechanisms of action (Table 1). SncRNAs have diverse structural and functional roles in the regulation of gene expression, RNA splicing, epigenetic processes and chromatin structure. Due to their broad roles, the deregulation of sncRNAs has been shown to be involved in human diseases, including cancer. MicroRNAs (miRNAs) and PIWI-interacting RNAs (piRNAs) are two of the most studied sncRNA species. Here we describe current knowledge in the biogenesis and mechanisms of action for these sncRNAs and their expression profiling in cancer.
|MicroRNAs (miRNAs)||Evolutionarily conserved, endogenous, single-stranded sncRNAs, derived from endogenous short hairpin transcripts||18–25|||
|PIWI interacting RNAs (piRNAs)||Largest group; single-stranded ncRNAs; generated by a Dicer-independent mechanism; a uridine at the 5′ end, 5′ monophosphate, and 2′-O-methyl at the 3′ end||21–36|||
|Transfer RNAs and ribosomal RNAs||Often referred to as “housekeeping” RNAs; take part of the translation process in ribonucleoproteins|
|Small nuclear RNAs (snRNAs)||Found within the splicing speckles and cajal bodies of the nucleus; role in processing pre-messenger RNA, regulation of transcription factors and maintaining telomeres||150|||
|Small nucleolar RNAs (snoRNAs)||Regulators of rRNA stability and function; some snoRNAs regulate gene expression and silencing processes|
(i) C/D box snoRNAs (60–200 nt): catalyzing the 2′-O-ribose methylation of rRNA residues
(ii) H/ACA box snoRNAs (120–250 nt): guiding pseudouridylation of rRNA
(iii) Small Cajal body specific RNAs: functions as a Cajal-body localization signal
|Small interfering RNAs (siRNAs)||Partially complementary passenger and guide RNA strands; involved in post-transcriptional gene silencing through the RISC-mediated degradation of mRNA targets||19–23|||
|Transfer RNA Fragments (tRFs)||Generated by specific cleavage of tRNA transcripts;|
(i) Stress induced tRFs (31–40 nt): repress translation and modulate cellular stress-response; interact with AGO proteins to form complexes for RNA interference silencing
(ii) Smaller tRFs (14–30 nt): biogenesis and function unclear; some interact with PIWI or AGO proteins
|Y RNAs||Parts of the Ro ribonucleoprotein. Involved in DNA replication, RNA stability, and responses to stress||100|||
|7SL RNAs||Component of the signal recognition particle (SRP) that mediates co-translational insertion of secretory proteins into the endoplasmic reticulum lumen|||
|Small NF90 associated RNAs (SNaRs)||Interact with NF90’s double-stranded RNA-binding motifs and act as transcriptional regulator||117|||
|Vault RNAs (vtRNAs)||Associated in large ribonucleoprotein particles (Vaults); essential for intracellular trafficking||100|||
1.1.1. miRNA biogenesis
MiRNAs are transcribed by RNA polymerase II to produce primary miRNA (pri-miRNA) transcripts . Pri-miRNAs are folded hairpin intermediary RNA structures that can harbor multiple mature miRNA sequences and even protein-coding exons . After transcription, pri-miRNAs are then processed and cleaved into mature miRNAs through different pathways (Figure 1a). In the “canonical” pathway, pri-miRNAs go through two cleavage events: (i) in the nucleus, the RNAseIII enzyme Drosha cleaves the pri-miRNA hairpin at its base to generate a precursor miRNA (pre-miRNA, ∼60 nt)  and (ii) the pre-miRNA is translocated to the cytoplasm by Exportin-5, where it is cleaved into two mature (∼22 nt) miRNA molecules by Dicer (also an RNAse III enzyme) . The major alternative miRNA processing pathway is the Mirtron pathway . Mirtrons are short hairpin introns with splice acceptor and donor sites. In this pathway, a splicing event takes place instead of cleavage by Drosha. Here, the Mirtron and canonical miRNA pathways converge. Thus, the Mirtron pathway is considered as Drosha-independent, but Dicer-dependent. Several other miRNA processing pathways have also been reported . Co-transcribed miRNAs that share similar seed regions are considered as members of a miRNA family . Mechanistically, either of the strands derived from a mature miRNA duplex can be loaded into the Argonaute (AGO) family of proteins (AGO1–4 in humans) in an ATP-dependent manner to form the RNA-induced silencing complex (RISC) . Although one of the strands is usually preferentially incorporated, this varies according to context, and the sequence of the strand incorporated will determine the targets that will be recognized by RISC .
1.1.2. piRNA biogenesis
PiRNAs are typically transcribed from genomic regions called piRNA clusters, regions which are typically 50–100 kb long, contain mainly transposable DNA elements and their remnants, and are found in large pericentromeric or subtelomeric domains . PiRNAs are generated by RNaseIII-independent pathways that do not involve double-stranded RNA precursors, through two main biogenesis pathways (Figure 1b). (i) Primary processing pathway: cleavage of long piRNA precursors, by PIWI proteins, preferentially at uridine residues . The 3′ ends of piRNAs harbor extra nucleotides, which are trimmed upon association with PIWI proteins . Here, the lengths of mature primary piRNAs are determined and depend on the molecular size of PIWI proteins .
Upon maturation, the 3′ ends of piRNAs are 2′-O-methylated by Hen1/Pimet, which is associated with PIWI proteins . This modification maintains the stability of piRNAs in vivo and can be used as a distinguishing feature in piRNA studies . (ii) Ping-Pong cycle: this pathway is initiated in the cytoplasm to produce “secondary” piRNAs. The PIWI protein-piRNA complex (loaded with primary piRNAs) together with AGO3 are responsible for cleaving both sense and antisense transposon transcripts. Secondary piRNAs result from these transposon fragments and are complementary to the first 10 nt of the loaded primary piRNA . This complex shows a strong bias for uracil at the 5′ end (1-U), and, accordingly, Ago3-piRNAs tend to have adenosine at the 10th nucleotide from the 5′ end (10-A). Thus, 1-U and 10-A are signature to piRNAs made via the Ping-Pong cycle . The cleavage of transposons by the AGO3-piRISCs and Aub-piRISCs, and the generation of secondary piRNAs are the main mechanisms involved in the control of transcript levels and silencing of transposons [13, 31].
Each step of miRNA and piRNA biogenesis is subject to regulation . Thus, examining the biogenesis pathways of these sncRNAs through high throughput sequencing techniques may uncover mechanisms of aberrant miRNA/piRNA expression and deregulation in many human diseases.
1.2. Mechanisms of action
1.2.1. miRNA-mediated mechanisms
Once assembled into RISC, the miRNA 5′ seed region (between nucleotides 2–7) interacts with specific region(s) within the 3′ untranslated region (3′ UTR) of target messenger RNAs (mRNAs) . A single miRNA can interact with multiple target mRNAs. Depending on the miRNA/mRNA complementarity, degradation or repression of the targeted mRNA(s) will be triggered . Pairing with complete complementary target leads to cleavage of the target mRNA and subsequent miRNA and mRNA degradation . However, pairing with imperfect complementarity can lead to AGO2-mediated RNA interference. The interference mechanisms include having: (i) the GW182 component of the RISC to recruit associated proteins that would deadenylate, decap and degrade the target mRNA , (ii) Eukaryotic Translation Initiation Factor 4A2 (eIF4A2) as a “roadblock” to inhibit the ribosome-scanning step of initiation , and (iii) translational activation through recruitment of AGO2 and FXR1 instead of GW182 . Of note, the miRNA-RISC can shuttle between the cytoplasm and the nucleus through Importin-8 or Exportin-1, highlighting the ability of newly-transcribed miRNAs to act in different cellular compartments .
Beyond the regulation of their production, several processes modify miRNA function. MiRNAs have a functional role in transcriptional gene silencing through DNA modification , deposition of repressive histone marks , promoting a transcriptionally active chromatin state , and altering alternative splicing profiles . Alternative splicing, alternative polyadenylation affecting 3′ UTRs, and cell type-specific RNA binding proteins that affect target mRNA secondary structures can change the available pool of miRNA binding targets. Moreover, subcellular localization of a given miRNA-RISC modulates its ability to bind target mRNAs . These cell-type and biological state-specific factors contribute to the specificity of miRNA. Lastly, miRNAs can be released into and detected in extracellular fluids, delivered to different cells, and so act as regulators in autocrine, paracrine and/or endocrine processes .
1.2.2. piRNA-mediated mechanisms
The most well-known function of piRNAs is the silencing of transposons in germline cells to ensure genome stability during gametogenesis . Similar to the lesser-known function of miRNAs, piRNAs primarily act as guides for PIWI proteins and drive histone modifications promoting heterochromatin assembly and DNA methylation .
PIWI-proteins are mainly found in the nucleus and co-localize with Polycomb group protein, playing crucial roles as epigenetic modifiers . Knockout of PIWI proteins decreases histone H3 lysine 9 methylation, a marker of repressed gene expression . The complementary sequence of the piRNAs is responsible for directing these proteins to the specific targets on the genome and recruiting epigenetic factors , supposedly participating in epigenetic control , cell metabolism  and genome stability . Alterations in piRNA expression have significant implications to the biology of stem-cells and cancer .
2. MicroRNA expression profiling in cancer
2.1. MicroRNA detection
Various experimental approaches can be used for measuring miRNA expression levels. The most frequently used are quantitative PCR (qPCR), digital color-coded barcoding profiling, miRNA microarrays, and high-throughput RNA sequencing (RNA-seq) methods. Material considerations and experimental aims dictate which approach is optimal . While qPCR is efficient in analyzing few miRNAs, array and sequencing based methods offer parallel analyses of multiple miRNAs. Experiments that aim to discover previously undescribed transcripts require RNA-seq approaches .
2.2. MicroRNA expression in cancer
RNA expression has been shown to be dysregulated in all stages of cancer and nearly every cancer type [55, 56, 57]. Genome-wide profiling has demonstrated that miRNA expression signatures are associated with tumor type, tumor grade and clinical outcomes; thus, miRNAs are potential candidates for diagnostic and prognostic biomarkers, as well as therapeutic targets [56, 58, 59]. In fact, miRNA expression signatures have been observed to be impacted by smoking status in lung adenocarcinoma patients . Furthermore, the expression patterns of miRNAs may be able to supplement the diagnostic utility of mRNAs, particularly in key tumor features such as subtype identification [58, 60]. There are currently ~2400 human miRNA annotated in miRBase (
|Gene expression databases||ArrayExpress||EMBL-EBI ArrayExpress functional genomics data|
|GEO||NCBI Gene Expression Omnibus|
|Oncomine||Web applications for translational bioinformatics|
|TCGA||NIH The Cancer Genome Atlas|
|miRNAs databases||miRBase||miRbase 22: the microRNA database|
|miRCancer||microRNA Cancer Association Database|
|SonamiR DB||Somatic mutations altering microRNA-ceRNA interactions|
|TransmiR||Transcription factor microRNA regulations|
|piRNAs databases||piRBase||piRNA annotation and function analyses|
|piRNABank||Web analysis of mammalian and Drosophila piRNAs|
|piRNA cluster database||Resource for genomic piRNAs clusters|
|miRNA discovery tools||deepBase||Annotate and discover small, long and circular ncRNAs|
|miRDeep||Identification of novel and known miRNAs in NGS data|
|miRMaster||miRNA analysis framework, novel miRNA detection, isoforms and variants search|
|miRNAkey||Software for the analysis of miRNA sequencing data|
|OASIS||Online small-RNA detection and prediction platform|
|Tools4miRs||Curation of methods for miRNA analysis|
|miRNA target prediction tools||miRDB||miRNA target prediction and functional annotations|
|miRTargetLink||Human microRNA-mRNA interaction networks|
|miRWalk||Online prediction of microRNA binding sites|
|pathDIP||Pathway enrichment analysis by online data integration|
|Targetscan||Predict target sites of conserved miRNAs|
2.3. Identification of novel microRNA sequences
The annotated human miRNA transcriptome mainly contains abundant and conserved miRNA sequences. Therefore, cell lineage- and tissue-specific miRNAs, especially the less abundant species, may not necessarily be included in current miRBase annotations . Re-analyses of high-throughput sequencing data of human tissues, cancers and cell lines have resulted in large scale discoveries of previously unannotated miRNAs that are expressed in a tissue-specific manner [55, 62, 63, 64].
A wide range of stand-alone and web-based miRNA discovery bioinformatics tools have been designed to quantify miRNA expression and to predict miRNA candidates and their isoforms from small RNA sequencing data (Table 2). These tools align the small RNA sequences to reference genomes and predicts putative novel miRNAs precursors based on the molecular features of these sequences, such as their folding characteristics, the formation of hairpin structures and whether this precursor gives rise to the three products of miRNA processing by DICER: a 5′ and a 3′ mature miRNA sequence (and also star sequence), as well as a hairpin loop (Figure 2) . Additionally, other filtering criteria may be incorporated to further enrich for real miRNA candidates, such as GC content, seed sequence composition and similarity to known sequences, as well as expression considerations . Therefore, comparing the features of the novel miRNA candidates to annotated miRNA species present in public repositories, such as miRbase, allows for the estimation of the probability of the miRNA candidate being a real miRNA, as well as the confirmation of their novelty .
2.4. Assessment of miRNA expression and biological function from sequencing data
To estimate miRNA expression levels, high-quality sequence reads, which are mapped to individual miRNAs, are quantified and normalized for differences in sequence depth to allow for comparison between samples . A variety of statistical tests can be applied to determine differential expression. For example, tissue-specificity of the miRNAs derived from a given organ site can be assessed by comparing expression patterns across tissue types, by using Principal Component Analysis (PCA) or nonlinear t-Distributed Stochastic Neighbor Embedding (t-SNE) [62, 63]. Additionally, differential expression of miRNA between biological states, such as neoplastic versus nonmalignant tissue samples, can be compared using various standard parametric or nonparametric statistical tests (Figure 3) [63, 64].
Once miRNAs-of-interest are identified, their function can be assessed through in silico methods of gene-target prediction. Prediction of miRNA:mRNA targets enables the understanding of their involvement in genetic regulatory networks. Since one miRNA can target multiple gene transcripts, it is challenging to comprehensively capture regulatory targets without also yielding false predictions. Therefore, a variety of computational approaches have been developed for the confirmation of miRNA:mRNA target interaction which consider features such as (i) seed match, (ii) conservation, (iii) free energy, and (iv) site accessibility . The growing availability of high throughput next generation sequencing (NGS) data will not only lead to novel miRNA discovery but will allow us to further elucidate the role of miRNA expression in human biology and disease such as cancer.
3. PIWI-interacting RNA expression profiling in cancer
PiRNAs are known to act in an evolutionarily conserved innate protection mechanism against transposable elements in germ cell genomes . Beyond the piRNA functions described in germ cells, there is increasing evidence of multifaceted action not restricted to transposon silencing in somatic cells . Although the function of piRNAs in somatic cells and their relationship with tumorigenesis and cancer progression are still unknown, many studies seek to evaluate PIWI proteins and piRNA expression in a variety of malignancies .
3.1. piRNA detection and resources
Since piRNAs resemble miRNAs in length and structure, the same expression profiling platforms are applicable, wherein small RNA sequencing, microarrays, and quantitative PCR are the most widely used. The identification of piRNAs is mainly performed by small RNA sequencing, through extracting the reads with the proper length (generally from 24 to 32 nucleotides) that present piRNA-like features . As previously discussed, piRNAs are frequently identified by a uridine nucleotide in the first position, have an adenosine nucleotide at the 10th position, have a 2′-O-methylation at the 3′ end, and are mapped in clusters in the genome . Although the expression can be confirmed by in situ hybridization and Northern blotting , the co-immunoprecipitation assay is the gold standard technique . This analysis allows the isolation and characterization of RNAs physically interacting with PIWI-proteins . However, the lack of highly specific antibodies for human PIWI-proteins limits the discovery of relevant piRNAs . Functional studies using knockdown or knockout experiments for newly discovered piRNAs are fundamental to elucidate the biological role of these sequences .
The increasing application of large-scale small RNA sequencing has enabled the discovery of a large amount of piRNAs. The most widely used piRNA compendiums are piRBase and piRNABank, which contain millions of annotated human piRNA sequences—8,438,265 and 11,147,151 annotated piRNAs to date, respectively (Table 2) [76, 77]. Despite the large number of annotated sequences in these databases and many studies describing piRNA expression in somatic and malignant tissues, this knowledge must be considered with caution. It has been demonstrated that different piRNA databases include some RNA fragments that have similar sizes and features to piRNAs, representing possible contaminants; yet, sncRNAs derived from tRNAs have been described to interact with PIWIL2 and are deregulated in cancer .
PIWI-interacting RNAs regulate the expression of mRNAs by guiding PIWI-proteins . Bioinformatics approaches have shown that approximately 28.5% of human mRNA sequences contain at least one retrotransposon sequence in their 3′ UTRs, and those mRNAs can be post-transcriptionally regulated by piRNAs . In addition, many piRNAs do not match transposon sequences, suggesting an even greater set of targets and functional roles for piRNAs . In fact, cross-linking immunoprecipitation (CLIP) analyses unveils many nontransposon mRNAs engaged with PIWI proteins . In Caenorhabditis elegans, it was previous demonstrated that piRNA action is analogous to miRNAs, in that seed sequences are required for mRNA targeting, but unlike miRNAs, piRNAs do not tolerate many mismatches out of this region . Potential piRNA targets can be retrieved using algorithms initially designed for miRNAs, such as miRanda , where stringent alignment (≥170) and free-energy scores (≤−20.0 kcal/mol) are required for piRNA analyses . However, the identification of piRNA targets is very challenging, as the targeting rules are still unsolved .
3.2. piRNA profiles in cancer
PIWI proteins 1–4 and PIWI-related proteins (DDX4, HENMT1, MAEL and TDRD1) have been reported to be disrupted in tumor cell line and patient samples [85, 86]. The sncRNA repertoire of cancer cell lines from the NCI-60 panel (59 cell lines from nine different tissues) was recently characterized, where piRNAs comprised the largest proportion of expressed transcripts, followed by miRNAs and snRNAs . In lung cancer cell lines, it was previously described 555 differentially expressed piRNAs and piRNA-like sncRNAs (piRNA-Ls) compared with lung bronchial epithelial cell lines . Among them, piR-L-163 was found to be downregulated in cancer cell lines and interact with phosphorylated ERM, regulating cell proliferation, migration and invasion.
Interestingly, piRNA expression profiling studies in tumor tissues revealed that piRNAs can be influenced by etiologic factors, such as tobacco consumption and HPV infection in lung and head and neck cancer [88, 89, 90]. The piRNA transcriptome of 6260 samples (from 11 organs) from The Cancer Genome Atlas (TCGA) consortium was prior screened . Tumor samples presented a higher number of expressed piRNAs (n = 522) compared to somatic non-neoplastic tissues (n = 273), suggesting their potential as biomarkers. RNA sequencing found piR-1245 to be overexpressed and demonstrate oncogenic roles in colorectal cancer, inducing proliferation, colony formation, invasion, and apoptosis resistance . Several other piRNAs have been reported to be overexpressed in numerous human malignancies [92, 93, 94, 95]. Alternatively, piRNAs have also been described to have anti-tumor effects. For example, piR-39980 was demonstrated through functional assays to decrease proliferation, migration, invasion, colony formation, and to induce apoptosis in fibrosarcoma cell lines upon piRNA-mimic transfection .
The role of piRNAs in the response to chemotherapy has also been addressed [97, 98, 99]. In PIWL2-knockout embryonic fibroblast mouse models, the commonly overexpressed gene PIWL2 was demonstrated to facilitate chromatin acetylation and relaxation in response to cisplatin treatment, leading to enhanced DNA repair and highlighting its potential role in treatment resistance . piR-FTH1 was reported to drive chemoresistance in breast tumor cell lines, where its repression could sensitize tumor cells to doxorubicin . Similarly, inhibition of piR-L-138 can increase apoptosis in cisplatin-treated lung cancer cell lines and patient-derived xenografts .
4. Emerging roles of sncRNA as cancer biomarkers
Considering the tissue-specificity of miRNAs and piRNAs in cancerous and healthy samples [55, 56], several individual or sncRNA-sets have been proposed as diagnostic or prognostic markers . A set of 24 miRNAs evaluated by qPCR has been shown to correctly discriminate malignant from benign thyroid nodules with high sensitivity and specificity, potentially avoiding unnecessary diagnostic thyroidectomies . In gastric adenocarcinoma, a three-piRNA recurrence risk signature was reported, using the small RNA sequencing data from the TCGA database . Similarly, a higher expression of piR-1245 was linked to a lower overall survival in three independent cohorts of colorectal cancer patients .
The use of sncRNAs as liquid biopsy cancer-markers is also under intense investigation [102, 103]. In fact, both miRNAs and piRNAs are detectable in human serum, as demonstrated by a recent study based on RNA sequencing analysis in 477 serum samples . Moreover, sncRNAs are enriched in extracellular vesicles (miRNAs ~40%, piRNAs ~40%) , allowing for their export from the cell in which they were synthesized to affect cells at a distance . Models based on miRNA and piRNA combinations were able to correctly classify colon and prostate cancer patients from healthy individuals . A four-miRNA expression signature in the serum of triple negative breast cancer patients was also demonstrated to be an optimal survival predictor . Recently, a qPCR assay comprising two targets (piR-5937 and piR-28876) and one reference piRNA (piR-28131) was suggested to detect early colon cancer . Despite the low piRNA levels in the serum of cancer patients, they presented better detection sensitivity than the currently used biomarkers such as CA19-9 and carcinoembryonic antigen (CEA).
Many studies are currently investigating the ability of miRNA/piRNA signatures to empower cancer screening through the prediction of cancer recurrence or progression, stratification of patients by prognosis, and prediction of tumor response to various treatments. However, more efforts are still needed to screen miRNA/piRNA biomarker candidates and further validate them in large cohorts.
Here, we summarized the roles of small noncoding RNAs in normal and disease molecular biology and highlighted the importance of developing high-throughput sncRNA-detection methods in genome analyses. Transcribed through a variety of mechanisms, these molecules act in the widespread and specific regulation of gene expression. However, before these results can be translated to the clinic many factors must still be considered, including the development of effective and specific delivery system for sncRNA-based therapeutics and the broad validation of these sequences in large external cohorts. As our ability to detect and validate these sequences develops, we will continue to uncover their biological functions and potential uses in the clinical management of many diseases, including cancer.
This work was supported by grants from the Canadian Institutes for Health Research (CIHR FDN-143345), and scholarships from CIHR, Vanier Canada, the BC Cancer Foundation, the Ligue nationale contre le cancer, the Fonds de Recherche en Santé Respiratoire (appel d’offres 2018 emis en commun avec la Fondation du Souffle), the Fondation Charles Nicolle, and the São Paulo Research Foundation (FAPESP 2015/17707-5 and 2018/06138-8). E.A.M. is a Vanier Canada Graduate Scholar.
Conflict of interest
The authors have no conflicts to declare.