Occurrence of lncRNA in various plant species.
The next-generation sequencing (NGS) technologies embrace advance sequencing technologies that can generate high-throughput RNA-seq data to delve into all the possible aspects of the transcriptome. It involves short-read sequencing approaches like 454, illumina, SOLiD and Ion Torrent, and more advance single-molecule long-read sequencing approaches including PacBio and nano-pore sequencing. Together with the help of computational approaches, these technologies are revealing the necessity of complex non-coding part of the genome, once dubbed as “junk DNA.” The ease in availability of high-throughput RNA-seq data has allowed the genome-wide identification of long non-coding RNA (lncRNA). The high-confidence lncRNAs can be filtered from the set of whole RNA-seq data using the computational pipeline. These can be categorized into intergenic, intronic, sense, antisense, and bidirectional lncRNAs with respect to their genomic localization. The transcription of lncRNAs in plants is carried out by plant-specific RNA polymerase IV and V in addition to RNA polymerase II and target the epigenetic regulation through RNA-directed DNA methylation (RdDM). lncRNAs regulate the gene expression through a variety of mechanism including target mimicry, histone modification, chromosome looping, etc. The differential expression pattern of lncRNA during developmental processes and different stress responses indicated their diverse role in plants.
- next-generation sequencing (NGS)
- high-throughput RNA-seq
- long non-coding RNA (lncRNA)
Next-generation sequencing (NGS) technologies provide a new platform for the production of high-throughput sequencing data in less time at reduced cost. The tremendous improvements in past years have allowed the sequencing of millions of DNA fragments in parallel. It has shifted the genomics to a newer edge by capturing the small details of DNA fragments. Earlier, Maxam and Gilbert's  and Sanger sequencing  techniques were leading approaches after the discovery of the DNA structure . However, these techniques were time-consuming and limited to small-scale, dealing with few genes to the genome of simple organisms. But the necessity of sequencing the complex genome in short time and reduced cost have technologically advanced the sequencing approaches and evolved as NGS technologies. The NGS systems provide rapid, reproducible, and highly accurate sequencing techniques, and are based on the short-read sequencing approaches and a more advance single-molecule long-read sequencing . The short-read sequencing approaches are dependent on sequencing by synthesis (SBS) and sequencing by ligation (SBL) methods. Further, these methods require pre-processing of DNA before directly proceeding to the sequencing steps, according to the requirement of different NGS platforms . In SBS approach, the nucleotides are added by the polymerase into the elongating DNA strand and the signal is received in the form of fluorescence or ionic concentration change for every single nucleotide incorporated [5, 6]. Besides this, in SBL approach, probes having one- or two-base matching, bound to fluorophore, are ligated to the adjacent oligonucleotide on DNA fragments. The emitted fluorescent spectrum identifies the complementary bases of the probe at a specific position and reset primers are used to encrypt the complete DNA sequence . Most of the short-read sequencing approaches require the clonal amplification of DNA on the solid surface such as bead-based, solid-state, and generation of DNA nanoball . In all the methods, initially the DNA is fragmented and then ligated to a common set of adaptor for amplification and consequently ensue for DNA sequencing . The short-read sequencing approaches include 454, illumina, SOLiD, and Ion Torrent platforms. Moreover, the
2. High-throughput RNA sequencing
Transcriptome consists of a whole set of transcripts present in a cell, and their expression level in particular developmental stage and cellular conditions. The detailed study of an organism at transcriptome level is necessary for revealing the molecular constituents involved in that particular stage or condition of the tissue. The high-throughput RNA-sequencing (RNA-seq) has emerged as an important technique in the field of transcriptomics for studying all the aspects of gene expression at large scale. It is one of the most commonly used techniques for quantification and mapping of transcriptomes. It involves the conversion of RNA into cDNA, followed by random sequencing of cDNAs fragments by using NGS platforms . The generated millions of short reads were assembled by various bioinformatics approaches. Consequent mapping of these short reads reveals the position of gene transcribing the RNA on the reference genome or sets of a gene . The high-throughput technologies also include direct RNA sequencing (DRS), in which the native RNA is directly sequenced without proceeding to the step of cDNA preparation. The technique is successful in sequencing native polyA+, where reverse transcription is undesirable. It is applicable in determining the precise sequence, identification of alternative polyadenylation sites, and deals with the small amount of nucleic acid . In cap-assisted gene expression (CAGE) technique, RNAs with a 5′ cap are targeted. The short sequence tags are generated from 5′ ends of targeted RNAs with one tag per RNA molecule and allow the precise mapping of 5′ ends . Series analysis of gene expression (SAGE) is another method for the sequencing of RNA molecules which target polyadenylated messages, and tags are generated near 3′ ends, typically one internal tag per RNA molecule . Similarly, paired-end tags (PET) also targets polyadenylated RNA molecules, but the combined information on 5′ and 3′ ends of same RNA molecule generates the sequence tag . Furthermore, rapid amplification of cDNA ends (RACE) is a PCR-based method used to identify the unknown sequences in conjunction with a known region. Together with the NGS technologies, it can be utilized for deep transcriptome sequencing of the particular locus . Targeted RNA sequencing is also meant for a specific locus and by using tiling microarrays RNAs are selected and sequenced . RNA profiling method by GRO-seq measures the steady-state levels of RNA and combined NGS analysis with the nuclear run-on experiments to generate information on RNA polymerase complexes competent with transcription . This high-throughput RNA-seq is helpful in finding out the transcript (messenger RNAs, non-coding RNAs, and small RNAs) of species in short time and in determining the 5′ and 3′ splice sites, splicing patterns, and post-transcriptional modifications. The quantification of transcripts reveals the change in expression of genes in different conditions.
3. Long non-coding RNA (lncRNA)
3.1. Discovery and identification of lncRNA
In the era of NGS, the high-throughput RNA-seq data has lime lighted the necessity of non-coding part of the genome in the gene functioning. Non-coding RNAs (ncRNAs) are transcribed from non-coding DNA, earlier called junk DNA. An extensive study on transcriptomes from multiple species indicated that about 90% of the genome can be transcribed, whereas only a small portion of such transcribed regions potentially codes for proteins . The ncRNAs are categorized into housekeeping and regulatory ncRNAs on the basis their expression and role in different cells types. The expression of housekeeping ncRNAs (e.g., t-RNA, r-RNA, and snRNA) is prominent and has a structural role in all the cells . While, the regulatory ncRNA shows temporal expression in specific cell types and includes microRNAs (miRNAs), small interfering RNAs (siRNAs), enhancer RNAs (eRNAs), promoter-associated RNAs (PARs), Piwi-interacting RNAs (piRNAs), and long non-coding (lncRNA). The criteria of >200 nt length are set for the identification of lncRNAs among all the organisms . lncRNA comprises of a major group of ncRNAs and regulate various biological processes through different molecular mechanisms.
In plants, the lncRNA was first reported in
|Sr. no.||Plant name||Number of lncRNAs||Tissues/organ/stress||Reference|
|2||~6480||Organ-specific and stress responsive|||
|3||2214||Cultured cells and synchronized vegetative cells|||
|4||2248||Three vegetative tissues and flower development|||
|5||3274||Fruit development and sex differentiation tissues|||
|6||1556||Floral, fruit tissue, and two vegetative tissues|||
|7||23,324||Control, osmatic, and salt stress in leaf and root tissues|||
|8||2224||Development and reproductive organs|||
|10||2542||Control and drought condition|||
|12||4422||Root, stem, and leaf|||
|13||10,774||In wild and ripening mutant|||
|1565||Tomato yellow-leaf curl virus stress|||
|14||44,698||Organ-specific and stress responsive|||
|7245||Leaves (under conditions of nitrogen deficiency and sufficiency)|||
3.2. Classification of lncRNAs
The biotypes of lncRNAs were identified with respect to their genomic localization, and were mainly categorized into intergenic, intronic, sense, antisense, and bidirectional lncRNAs. As the term suggest, the intergenic lncRNA are transcribed from the region amid two genes, while introns are the source of intronic lncRNA . The sense and antisense lncRNAs are derived from overlapping region of exons on the sense and antisense strands, respectively , when the transcription of lncRNA is initiated in the juxtaposition of adjacent mRNA on complementary strand, termed as bidirectional lncRNA .
4. Molecular mechanisms of the functioning of lncRNAs
The dramatic change, in the past years about the knowledge of lncRNA in gene regulation mechanisms, has exponentially raised with high-throughput RNA-seq data. In plants, the studies are limited to small scale in comparison to animals, but the available reports suggested their different mechanisms as following.
4.1. lncRNA as target mimics of miRNA
Target mimicry is a mechanism of lncRNA for regulating the functions of miRNAs. They inhibit the interaction between the miRNA and their respective targets by binding to the target of miRNA via partial complementary sequence . The novel mechanism of target mimicry was first discovered in
4.2. Histone modification
The lncRNAs are known to regulate gene expression through epigenetic changes. These epigenetic changes may result in alteration of gene expression in plants. Vernalization is the most common phenomenon of lncRNA mediated epigenetic regulation in plants. In
4.3. Precursor lncRNA
lncRNAs constitute an important class of riboregulators by acting as a precursor in the synthesis of shorter ncRNAs, such as miRNAs and siRNAs. In this mechanism, some lncRNAs are processed to shorter ncRNAs or may directly act as a precursor . The genes of primary miRNA transcripts (pri-miRNA) encoding miRNAs are transcribed by RNA polymerase II . In plants, miRNA constitutes the modest portion in small regulatory ncRNA pool due to the presence of other complex small regulatory ncRNAs. In addition, they have plant-specific RNA polymerase IV/V involved in the transcription of siRNAs and endogenous siRNAs . For example, in
4.4. RNA-dependent DNA methylation
The modification of chromatin is facilitated by recruitment of chromatin modifiers through lncRNA and small RNA (sRNA) into the specific locations in DNA. This RNA-dependent DNA methylation (RdDM) is a conserved process that recruits DNA methyltransferase and histone modifiers for DNA methylation and suppressive histone modification, respectively .
4.5. Chromosome looping
This mechanism is different from RdDM and histone modification, as it only involves the structural changes of chromatin. Thereby, it affects the binding potential of RNA polymerase and other transcription factors . A persuasive example of chromosome looping mechanism in plants by
4.6. Protein re-localization
The mechanism of lncRNA in protein relocalization was first described in
5. Expression profiling of lncRNAs
5.1. During developmental stages of different tissues
The expression of lncRNAs is regulated through different environmental and biological factors and delving into their diverse biological roles. They exhibit spatial and temporal expression during different developmental stages of various plant tissues. In contrast to the animals, a little is known about the functioning of lncRNAs in plants. The available reports reveal their role in nodule formation , lateral root development , vegetative and gametophytic development , cell-wall synthesis , flowering time [27, 54], and several others. The expression profiles developed using high-throughput RNA-seq data from various plants organs marks lncRNAs as an indispensable unit of the transcriptome. For instance, the expression profiles of lncRNAs from root, leaf, stem, spike, and grain in three developmental stages of
5.2. Expression under biotic and abiotic stresses
The expression of lncRNAs gets affected by biotic and/or abiotic factors in plants, but the mechanism remains poorly understood. Stress-responsive lncRNAs have emerged as an important component of plant defense machinery. The differential expression patterns in response to various stresses, including biotic and abiotic stresses, suggest the diverse function of lncRNAs at different intervals of stress exposure. For instance, the expression of 1832 lincRNAs gets remarkably affected after 2 h and/or 10 h of drought, salt, cold, and/or ABA (abscisic acid) treatments in
6. Databases for lncRNAs
New high-throughput technologies have aided in the exponential rise of RNA-seq data from various plant species. A significant amount of lncRNAs has been identified and characterized for their diverse biological roles. Therefore, it is necessary to organize this data in web-based platforms or databases for further improvement, updates, and analysis . Along with the aid of several computational tools, the data can be analyzed for phylogenetic relationships, expression patterns, molecular interactions, single nucleotide polymorphism, epigenetic variations, etc., and assist in understanding the lncRNAs in plants. The information in these databases can be managed specifically for single or numerous plant species. For instance, PLncRNAdb is specific for four plants species including
7. Biological roles of lncRNAs
The present knowledge on the function of lncRNAs is still limited in plants and a large portion of their function and mechanism is yet to be identified. In spite of this, the biological role of lncRNA has been studied in several plant species as discussed in Table 2. Some biological roles have been discussed here to highlight the importance of lncRNAs in plants.
|Sr. no.||Species name||Annotated lncRNAs||Biological role||Regulatory mechanism||References|
|1||Auxin-controlled development||Chromatin loop dynamics|||
|Lateral root development||Alternative splicing regulators|||
|Vegetative and gametophytic development||Antisense transcription|||
|Flowering time||Promoter interference|||
|Flowering time||Histone modification||[28, 54, 78]|
|Phosphate homeostasis||Target mimicry|||
|2.||Nodule formation||Protein re-localization|||
|3.||Cell-wall synthesis||siRNA precursor|||
|4.||Nodule formation||Protein re-localization|||
|5.||Phosphate homeostasis||Translational enhancer|||
|Fertility||Promoter interference||[81, 82]|
|6.||Local cytokinin synthesis||dsRNA degradation|||
7.1. lncRNA in plant fertility
The participation of lncRNAs in producing the male sterile lines in
7.2. Role in alternate splicing
Plant lncRNAs are known to increase the complexity of transcriptome and proteome by participating in alternative splicing. It was first reported in Arabidopsis, where lncRNA behaved as an alternative splicing competitor (ASCO) . Together with the nuclear speckle RNA-binding protein (NSR), ASCO-lncRNA forms an alternative splicing regulatory module. The expression of AtNSR in primary and lateral root meristems regulates the development of lateral roots. The interaction of AtNSR with overexpressing ASCO-lncRNA affects the splicing pattern of mRNA targeted by NSR in transgenic plant [67, 87]. This indicates the role of lncRNA as a regulator of alternative splicing.
7.3. Plant lncRNAs in photomorphogenesis
Most of the plant growth and developmental processes are regulated by different climatic factors among them light is one of the most important factor . The role of lncRNA in the regulation of photomorphogenesis is still an interesting area of research because most of the identified regulatory molecules are proteins. In
8. Limitations in computational analysis of lncRNAs
The selection of lncRNAs from the complete set of RNAs is broadly based on three criteria: (i) transcript length of ≥200 bp, (ii) small open reading frame with ≤300 bp, and (iii) transcripts without homology to known proteins. In addition to this, several other factors like the type of cDNA libraries or transcriptional sequence data, depth of sequencing, and coding potential of transcripts, also contribute in the screening of lncRNAs. The challenges during computational analysis come when some protein-coding gene which fulfill the basic selection criteria and encode a functional peptide. Besides this, the functional long non-coding transcript may have ORF >300 bp and share homology with known protein-coding genes will also produce hindrance in the identification . Another challenge comes with the transcripts that not only function as an RNA molecule, but also encodes a peptide . The advancement in computational approaches have been made to overcome these limitations and for more accurate differentiation between coding and non-coding transcripts . The use of support vector machines (SVMs) or other machine learning algorithms along with the computational methods have increased the confidence of disparity in between coding and non-coding transcripts . However, the identity and function of computationally identified lncRNA needs to be validated individually by experimentation.
Authors are grateful to Panjab University, Chandigarh, India for research facilities. SKU is grateful to Department of Science and Technology (DST), Government of India, Science and Engineering Research Board (SERB) for Early Career Research Award (ECR/2016/001270), and DST-INSPIRE Faculty Fellowship (IFA12-LSPA-09).