Occurrence of lncRNA in various plant species.
The next-generation sequencing (NGS) technologies embrace advance sequencing technologies that can generate high-throughput RNA-seq data to delve into all the possible aspects of the transcriptome. It involves short-read sequencing approaches like 454, illumina, SOLiD and Ion Torrent, and more advance single-molecule long-read sequencing approaches including PacBio and nano-pore sequencing. Together with the help of computational approaches, these technologies are revealing the necessity of complex non-coding part of the genome, once dubbed as “junk DNA.” The ease in availability of high-throughput RNA-seq data has allowed the genome-wide identification of long non-coding RNA (lncRNA). The high-confidence lncRNAs can be filtered from the set of whole RNA-seq data using the computational pipeline. These can be categorized into intergenic, intronic, sense, antisense, and bidirectional lncRNAs with respect to their genomic localization. The transcription of lncRNAs in plants is carried out by plant-specific RNA polymerase IV and V in addition to RNA polymerase II and target the epigenetic regulation through RNA-directed DNA methylation (RdDM). lncRNAs regulate the gene expression through a variety of mechanism including target mimicry, histone modification, chromosome looping, etc. The differential expression pattern of lncRNA during developmental processes and different stress responses indicated their diverse role in plants.
- next-generation sequencing (NGS)
- high-throughput RNA-seq
- long non-coding RNA (lncRNA)
Next-generation sequencing (NGS) technologies provide a new platform for the production of high-throughput sequencing data in less time at reduced cost. The tremendous improvements in past years have allowed the sequencing of millions of DNA fragments in parallel. It has shifted the genomics to a newer edge by capturing the small details of DNA fragments. Earlier, Maxam and Gilbert's  and Sanger sequencing  techniques were leading approaches after the discovery of the DNA structure . However, these techniques were time-consuming and limited to small-scale, dealing with few genes to the genome of simple organisms. But the necessity of sequencing the complex genome in short time and reduced cost have technologically advanced the sequencing approaches and evolved as NGS technologies. The NGS systems provide rapid, reproducible, and highly accurate sequencing techniques, and are based on the short-read sequencing approaches and a more advance single-molecule long-read sequencing . The short-read sequencing approaches are dependent on sequencing by synthesis (SBS) and sequencing by ligation (SBL) methods. Further, these methods require pre-processing of DNA before directly proceeding to the sequencing steps, according to the requirement of different NGS platforms . In SBS approach, the nucleotides are added by the polymerase into the elongating DNA strand and the signal is received in the form of fluorescence or ionic concentration change for every single nucleotide incorporated [5, 6]. Besides this, in SBL approach, probes having one- or two-base matching, bound to fluorophore, are ligated to the adjacent oligonucleotide on DNA fragments. The emitted fluorescent spectrum identifies the complementary bases of the probe at a specific position and reset primers are used to encrypt the complete DNA sequence . Most of the short-read sequencing approaches require the clonal amplification of DNA on the solid surface such as bead-based, solid-state, and generation of DNA nanoball . In all the methods, initially the DNA is fragmented and then ligated to a common set of adaptor for amplification and consequently ensue for DNA sequencing . The short-read sequencing approaches include 454, illumina, SOLiD, and Ion Torrent platforms. Moreover, the in-silico approaches are used for the assembly of data generated by after these techniques . The limitations in short-read sequencing approaches like de novo sequencing and the resolution of genomic variation leads to the development of more advance long-read sequencing approaches . The long-read sequencing approaches are used for complex genomes with several long repetitive elements, structural variation, and alteration in copy number, which are significant for the occurrence of disease, and for evolution and adaptation [7, 8, 9]. It produces long reads of several kilobases and allows the higher resolution of the genome. In contrast to short reads, a single long-read can completely span the repetitive or complex region of genome, thus reducing the probability of vagueness in the size and positions of the genomic element . Pacific Biosciences and Oxford Nanopore are commercially available sequencing technologies which provide the platform for sequencing the long reads with thousands of bases per read. These technologies are based on single-molecule sequencing, but have different methods of nucleotide detection. Oxford Nanopore is based on the detection through nanopores while Pacific Biosciences uses optical detection inside zero-mode waveguide . Besides this, in synthetic approach, the data of short-read sequencing is combined with informatics and biochemical approaches for the construction of synthetic long reads. Long reads allow researchers for a deep transcriptomic study such as allele-specific transcription, alternative splicing, and in the identification of exact connectivity of exons and discern gene isoforms [6, 11, 12].
2. High-throughput RNA sequencing
Transcriptome consists of a whole set of transcripts present in a cell, and their expression level in particular developmental stage and cellular conditions. The detailed study of an organism at transcriptome level is necessary for revealing the molecular constituents involved in that particular stage or condition of the tissue. The high-throughput RNA-sequencing (RNA-seq) has emerged as an important technique in the field of transcriptomics for studying all the aspects of gene expression at large scale. It is one of the most commonly used techniques for quantification and mapping of transcriptomes. It involves the conversion of RNA into cDNA, followed by random sequencing of cDNAs fragments by using NGS platforms . The generated millions of short reads were assembled by various bioinformatics approaches. Consequent mapping of these short reads reveals the position of gene transcribing the RNA on the reference genome or sets of a gene . The high-throughput technologies also include direct RNA sequencing (DRS), in which the native RNA is directly sequenced without proceeding to the step of cDNA preparation. The technique is successful in sequencing native polyA+, where reverse transcription is undesirable. It is applicable in determining the precise sequence, identification of alternative polyadenylation sites, and deals with the small amount of nucleic acid . In cap-assisted gene expression (CAGE) technique, RNAs with a 5′ cap are targeted. The short sequence tags are generated from 5′ ends of targeted RNAs with one tag per RNA molecule and allow the precise mapping of 5′ ends . Series analysis of gene expression (SAGE) is another method for the sequencing of RNA molecules which target polyadenylated messages, and tags are generated near 3′ ends, typically one internal tag per RNA molecule . Similarly, paired-end tags (PET) also targets polyadenylated RNA molecules, but the combined information on 5′ and 3′ ends of same RNA molecule generates the sequence tag . Furthermore, rapid amplification of cDNA ends (RACE) is a PCR-based method used to identify the unknown sequences in conjunction with a known region. Together with the NGS technologies, it can be utilized for deep transcriptome sequencing of the particular locus . Targeted RNA sequencing is also meant for a specific locus and by using tiling microarrays RNAs are selected and sequenced . RNA profiling method by GRO-seq measures the steady-state levels of RNA and combined NGS analysis with the nuclear run-on experiments to generate information on RNA polymerase complexes competent with transcription . This high-throughput RNA-seq is helpful in finding out the transcript (messenger RNAs, non-coding RNAs, and small RNAs) of species in short time and in determining the 5′ and 3′ splice sites, splicing patterns, and post-transcriptional modifications. The quantification of transcripts reveals the change in expression of genes in different conditions.
3. Long non-coding RNA (lncRNA)
3.1. Discovery and identification of lncRNA
In the era of NGS, the high-throughput RNA-seq data has lime lighted the necessity of non-coding part of the genome in the gene functioning. Non-coding RNAs (ncRNAs) are transcribed from non-coding DNA, earlier called junk DNA. An extensive study on transcriptomes from multiple species indicated that about 90% of the genome can be transcribed, whereas only a small portion of such transcribed regions potentially codes for proteins . The ncRNAs are categorized into housekeeping and regulatory ncRNAs on the basis their expression and role in different cells types. The expression of housekeeping ncRNAs (e.g., t-RNA, r-RNA, and snRNA) is prominent and has a structural role in all the cells . While, the regulatory ncRNA shows temporal expression in specific cell types and includes microRNAs (miRNAs), small interfering RNAs (siRNAs), enhancer RNAs (eRNAs), promoter-associated RNAs (PARs), Piwi-interacting RNAs (piRNAs), and long non-coding (lncRNA). The criteria of >200 nt length are set for the identification of lncRNAs among all the organisms . lncRNA comprises of a major group of ncRNAs and regulate various biological processes through different molecular mechanisms.
In plants, the lncRNA was first reported in Glycine max , involved in changing the sub-cellular localization of a protein. In Medicago truncatula and Oryza sativa, MtENOD40 and OsENOD40 lncRNAs were discovered in nodule formation, respectively, and signify the involvement of lncRNA in biological roles [25, 26]. Likewise, in other plant species, for example, COLD-INDUCED LONG-ANTISENSE INTRAGENIC RNA (COOLAIR) and COLD-ASSISTED INTRONIC NONCODING RNA (COLDAIR), lncRNA in Arabidopsis thaliana [27, 28], involved in regulation of flowering, were identified and studied for their diverse function in the plant system. Furthermore, the exponential rise in high-throughput RNA-seq data have contributed to the discovery of lncRNA at genome-wide level, but the studies are limited in plants to some species. The amalgamation of experimental RNomics with the computational approaches has contributed to the identification of lncRNA and their function in wide-ranging biological processes . The accurate identification and functional annotation is an ongoing challenge in the field of bioinformatics for high-throughput RNA-seq data. The data of identified lncRNAs in plants is timely submitted to the different databases . A pipeline with multiple filters has been designed for the assembly and identification of high confidence lncRNAs in Figure 1 [30, 31]. The present status of most of the identified lncRNAs in different plant species are mentioned in Table 1.
|Sr. no.||Plant name||Number of lncRNAs||Tissues/organ/stress||Reference|
|2||Arabidopsis thaliana||~6480||Organ-specific and stress responsive|||
|3||Chlamydomonas reinhardtii||2214||Cultured cells and synchronized vegetative cells|||
|4||Cicer arietinum||2248||Three vegetative tissues and flower development|||
|5||Cucumis sativus||3274||Fruit development and sex differentiation tissues|||
|6||Fragaria vesca||1556||Floral, fruit tissue, and two vegetative tissues|||
|7||Medicago truncatula||23,324||Control, osmatic, and salt stress in leaf and root tissues|||
|8||Oryza sativa||2224||Development and reproductive organs|||
|9||Physcomitrella patens||2711||Developmental stages|||
|10||Populus trichocarpa||2542||Control and drought condition|||
|11||Setaria italica||584||Drought stress|||
|12||Selaginella moellendorffii||4422||Root, stem, and leaf|||
|13||Solanum lycopersicum||10,774||In wild and ripening mutant|||
|1565||Tomato yellow-leaf curl virus stress|||
|14||Triticum aestivum||44,698||Organ-specific and stress responsive|||
|16||Zea mays||1704||Different tissues|||
|7245||Leaves (under conditions of nitrogen deficiency and sufficiency)|||
3.2. Classification of lncRNAs
The biotypes of lncRNAs were identified with respect to their genomic localization, and were mainly categorized into intergenic, intronic, sense, antisense, and bidirectional lncRNAs. As the term suggest, the intergenic lncRNA are transcribed from the region amid two genes, while introns are the source of intronic lncRNA . The sense and antisense lncRNAs are derived from overlapping region of exons on the sense and antisense strands, respectively , when the transcription of lncRNA is initiated in the juxtaposition of adjacent mRNA on complementary strand, termed as bidirectional lncRNA .
4. Molecular mechanisms of the functioning of lncRNAs
The dramatic change, in the past years about the knowledge of lncRNA in gene regulation mechanisms, has exponentially raised with high-throughput RNA-seq data. In plants, the studies are limited to small scale in comparison to animals, but the available reports suggested their different mechanisms as following.
4.1. lncRNA as target mimics of miRNA
Target mimicry is a mechanism of lncRNA for regulating the functions of miRNAs. They inhibit the interaction between the miRNA and their respective targets by binding to the target of miRNA via partial complementary sequence . The novel mechanism of target mimicry was first discovered in Arabidopsis. In addition, phosphate Starvation 1 (IPS1) was the first lncRNA identified as endogenous target mimic (eTM) of miR399 involved in phosphate homeostasis . During phosphate starvation, the expression of miR399 is induced in companion cells and phloem . Subsequently, the expression of PHO2 gene, a target of miR399, is repressed [47, 48, 49, 50]. This gene encodes UBC24 (E2 ubiquitin conjugase-related enzyme) and the reduction in its expression leads to the increased expression of Pht1;8 and Pht1;9 (phosphate transporter genes) in roots [47, 48]. Later, a similar mechanism was discovered in animals and humans suggesting target mimicry as the prevalent phenomenon [51, 52].
4.2. Histone modification
The lncRNAs are known to regulate gene expression through epigenetic changes. These epigenetic changes may result in alteration of gene expression in plants. Vernalization is the most common phenomenon of lncRNA mediated epigenetic regulation in plants. In Arabidopsis, FLOWERING LOCUS C (FLC) gene is the principal regulator of vernalization process and regulates the flowering time . The expression of this gene is regulated by COOLAIR and COLDAIR lncRNAs through histone modifications .
4.3. Precursor lncRNA
lncRNAs constitute an important class of riboregulators by acting as a precursor in the synthesis of shorter ncRNAs, such as miRNAs and siRNAs. In this mechanism, some lncRNAs are processed to shorter ncRNAs or may directly act as a precursor . The genes of primary miRNA transcripts (pri-miRNA) encoding miRNAs are transcribed by RNA polymerase II . In plants, miRNA constitutes the modest portion in small regulatory ncRNA pool due to the presence of other complex small regulatory ncRNAs. In addition, they have plant-specific RNA polymerase IV/V involved in the transcription of siRNAs and endogenous siRNAs . For example, in Triticum aestivum, 19 lncRNAs were predicted as a precursor of 28 miRNAs . In Arabidopsis, the 24-nt sequence of several siRNAs were matched with five lncRNAs (npc34, npc351, npc375, npc520, and npc523), which was considered as potential precursor lncRNAs. The mapping of siRNAs on both the strands of lncRNAs also strengthened the findings .
4.4. RNA-dependent DNA methylation
The modification of chromatin is facilitated by recruitment of chromatin modifiers through lncRNA and small RNA (sRNA) into the specific locations in DNA. This RNA-dependent DNA methylation (RdDM) is a conserved process that recruits DNA methyltransferase and histone modifiers for DNA methylation and suppressive histone modification, respectively .
4.5. Chromosome looping
This mechanism is different from RdDM and histone modification, as it only involves the structural changes of chromatin. Thereby, it affects the binding potential of RNA polymerase and other transcription factors . A persuasive example of chromosome looping mechanism in plants by APOLO lncRNA has been described in auxin transport by regulating the PID expression, an auxin transporter. When locus of APOLO lncRNA is transcribed by RNA Pol V and modified by RdDM, the expression of the locus is suppressed and loops to PID. This causes the inhibition of PID transcription. In contrast, when RNA Pol II carry out the transcription of APOLO lncRNA the looping of PID is restrained resulting in the expression of PID .
4.6. Protein re-localization
The mechanism of lncRNA in protein relocalization was first described in G. max and Medicago sativa [61, 62]. The symbiotic interactions among soil bacteria and leguminous plants are regulated by Enod40 gene (early nodulin gene) which is induced by nitrogen-fixing bacteria in the pericycle and dividing cortical cells of roots [24, 63]. The diverse occurrence of Enod40 lncRNA was suggested by its presence in non-leguminous plants, such as rice [26, 64]. The secondary structure of Enod40 lncRNA is highly stable and has five highly conserved domains. The ORF of Enod40 is very short and synthesis two short peptides. These short peptides regulate the biological activities of Enod40 and consequently help in nodulation [65, 66]. In M. truncatula, Enod40 has been reported in the re-localization of MtRBP1 (Medicago truncatula RNA binding protein 1). Enod40 showed direct interaction with MtRBP1 and re-localized the protein during nodulation process from nuclear speckles into cytoplasmic granules .
5. Expression profiling of lncRNAs
5.1. During developmental stages of different tissues
The expression of lncRNAs is regulated through different environmental and biological factors and delving into their diverse biological roles. They exhibit spatial and temporal expression during different developmental stages of various plant tissues. In contrast to the animals, a little is known about the functioning of lncRNAs in plants. The available reports reveal their role in nodule formation , lateral root development , vegetative and gametophytic development , cell-wall synthesis , flowering time [27, 54], and several others. The expression profiles developed using high-throughput RNA-seq data from various plants organs marks lncRNAs as an indispensable unit of the transcriptome. For instance, the expression profiles of lncRNAs from root, leaf, stem, spike, and grain in three developmental stages of T. aestivum have suggested the role in developmental processes. Furthermore, the lncRNAs show differential expression pattern comparable to the mRNA and highlight their function in related stages . Besides this, the differential expression of lncRNAs in 11 different tissues of chickpea and 13 of maize also strengthens the findings . These results also highlight the higher number of lncRNAs in actively dividing cells and reproductive tissues in comparison to the other [30, 33, 42, 43]. Depending on the expression values, they can be divided into different categories ranging from very low to very high expressing lncRNAs [30, 31]. Furthermore, fragments per kilobase of transcripts per million mapped reads (FPKM), reads per kilobase of transcripts per million mapped reads (RPKM) or transcripts per million (TPM) has to be determined for normalization and estimation of expression level . The alteration in the expression level of various tissues within sundry plants can be correlated with the different genetic makeup and depth of transcriptome sequencing data. Tissue specificity index (TSI) is also calculated for studying the differential expression pattern of lncRNAs. The value of TSI ranges from zero to one, zero for housekeeping genes and one or near to one for sternly tissue-specific genes . The criteria of TSI has revealed that lncRNAs are involved in flower and fruit development in Fragaria vesca , flower development in Cicer arietinum , development of fiber in Gossypium arboreum , and in development of root and floral tissues in Morus notabilis . In addition to TSI, cell-type specificity can be interpreted for the expression of lncRNAs in specific cells . For instance, in Arabidopsis cell-type specific lncRNAs have been identified in specialized cells but the expression was lower in comparison to mRNA . The knowledge of lncRNAs is limited in plants, but the elevation in the survey of high-throughput RNA-seq data has allowed the prediction of their biological roles through expression profiling.
5.2. Expression under biotic and abiotic stresses
The expression of lncRNAs gets affected by biotic and/or abiotic factors in plants, but the mechanism remains poorly understood. Stress-responsive lncRNAs have emerged as an important component of plant defense machinery. The differential expression patterns in response to various stresses, including biotic and abiotic stresses, suggest the diverse function of lncRNAs at different intervals of stress exposure. For instance, the expression of 1832 lincRNAs gets remarkably affected after 2 h and/or 10 h of drought, salt, cold, and/or ABA (abscisic acid) treatments in Arabidopsis. However, the expression of one of the candidate stress responsive lincRNA increased after treatment by elf18 (EF-Tu), which activates pathogen-associated molecular pattern responses . Likewise, in T. aestivum, 283 lncRNAs were identified as fungal-responsive lncRNAs, out of which 254 and 52 lincRNAs were specifically expressed after infestation with Blumeria graminis f. sp. tritici and Puccinia striiformis f. sp. tritici, respectively . Later, a total of 44,698 lncRNAs were identified in T. aestivum consisting of both stress responsive and tissue-specific lncRNAs . In response to tomato yellow-leaf curl virus, 1565 lncRNAs were expressed in Solanum lycopersicum . In case of Populus trichocarpa, 2542 lncRNAs were expressed under drought stress condition . The exploration of lncRNAs in various plant species in response to different stress conditions exhibit the dynamic role in plant defense.
6. Databases for lncRNAs
New high-throughput technologies have aided in the exponential rise of RNA-seq data from various plant species. A significant amount of lncRNAs has been identified and characterized for their diverse biological roles. Therefore, it is necessary to organize this data in web-based platforms or databases for further improvement, updates, and analysis . Along with the aid of several computational tools, the data can be analyzed for phylogenetic relationships, expression patterns, molecular interactions, single nucleotide polymorphism, epigenetic variations, etc., and assist in understanding the lncRNAs in plants. The information in these databases can be managed specifically for single or numerous plant species. For instance, PLncRNAdb is specific for four plants species including A. thaliana, A. lyrata, P. trichocarpa, and Z. mays and consist of 5000 lncRNAs . The information on 37 plants and 6 algae with data of >120,000 lncRNAs can be accessed on GreeNC database . NONCODE v4 and PLncDB have information on 3853 and >13,000 lncRNA transcripts, respectively in Arabidopsis. Some databases cover the information on both coding and non-coding transcripts like PlantNATsDB accumulating data of 70 plant species on NATs . Besides this, some databases are plant-specific like TAIR10, PNRD, PlantNATsDB, etc., while certain databases (e.g., RNACentral, lncRNAdb v2.0, and NONCODE v4) consist of information from other organisms also in addition to plants . These well-managed databases will allow the researchers to further study the lncRNA in more depth.
7. Biological roles of lncRNAs
The present knowledge on the function of lncRNAs is still limited in plants and a large portion of their function and mechanism is yet to be identified. In spite of this, the biological role of lncRNA has been studied in several plant species as discussed in Table 2. Some biological roles have been discussed here to highlight the importance of lncRNAs in plants.
|Sr. no.||Species name||Annotated lncRNAs||Biological role||Regulatory mechanism||References|
|1||Arabidopsis thaliana||APOLO||Auxin-controlled development||Chromatin loop dynamics|||
|ASCO-lncRNA||Lateral root development||Alternative splicing regulators|||
|asHSFB2a||Vegetative and gametophytic development||Antisense transcription|||
|COLDAIR||Flowering time||Promoter interference|||
|COOLAIR||Flowering time||Histone modification||[28, 54, 78]|
|IPS1||Phosphate homeostasis||Target mimicry|||
|2.||Glycine max||GmENOD40||Nodule formation||Protein re-localization|||
|3.||Hordeum vulgare||HvCesA6 lnc-NAT||Cell-wall synthesis||siRNA precursor|||
|4.||Medicago truncatula||MtENOD40||Nodule formation||Protein re-localization|||
|5.||Oryza sativa||Cis-NATPHO1;2||Phosphate homeostasis||Translational enhancer|||
|LDMAR (P/TMS12-1||Fertility||Promoter interference||[81, 82]|
|6.||Petunia hybrid||SHO lnc-NAT||Local cytokinin synthesis||dsRNA degradation|||
|7.||Solanum lycopersicum||TPS11||Phosphate homeostasis||Unknown|||
7.1. lncRNA in plant fertility
The participation of lncRNAs in producing the male sterile lines in O. sativa is an important example of plant fertility. These male sterile lines are necessary for the hybridization and breeding processes. lncRNAs are known to induce photoperiod-sensitive genetic male sterility (PSMF) in O. sativa [82, 86], but the mechanism is not completely well understood. But according to the available reports, two different mechanisms of lncRNA can be possible . In one mechanism, the high expression of the long day (LD)-specific male-fertility-associated RNA (LDMAR), a type of lncRNA, is important for the fertility of rice plant during long day (LD) conditions. During male sterility, the programme cell death (PCD) of anther cells occur due to lowered expression of LDMAR under LD conditions. The reduced expression of LDMAR is mediated by over expressing psi-LDMAR (a siRNA), transcribed in the promoter region of LDMAR. Enhanced expression of Psi-LDMAR caused methylation in promoter region through RdDM mechanism . The other mechanism suggested the involvement of osa-sm R5864w (a 21-nt sRNA) which was formed from a unique ncRNA encoded by LDMAR. The point mutation of C to G in osa-sm R5864w, resulting in the loss of function, leads to the production of light and temperature sensitive male sterile lines of rice .
7.2. Role in alternate splicing
Plant lncRNAs are known to increase the complexity of transcriptome and proteome by participating in alternative splicing. It was first reported in Arabidopsis, where lncRNA behaved as an alternative splicing competitor (ASCO) . Together with the nuclear speckle RNA-binding protein (NSR), ASCO-lncRNA forms an alternative splicing regulatory module. The expression of AtNSR in primary and lateral root meristems regulates the development of lateral roots. The interaction of AtNSR with overexpressing ASCO-lncRNA affects the splicing pattern of mRNA targeted by NSR in transgenic plant [67, 87]. This indicates the role of lncRNA as a regulator of alternative splicing.
7.3. Plant lncRNAs in photomorphogenesis
Most of the plant growth and developmental processes are regulated by different climatic factors among them light is one of the most important factor . The role of lncRNA in the regulation of photomorphogenesis is still an interesting area of research because most of the identified regulatory molecules are proteins. In A. thaliana, several light responsive lncRNAs have been identified associated with histone modifications . Identification and functional characterization of HIDDEN TREASURE 1 (HID1), a novel lncRNA, involved in photomorphogenesis have been accomplished . It may control the process of photomorphogenesis by regulating the expression of PHYTOCHROME INTERACTING FACTOR 3 (PIF3), a transcription factor involved in light response . It could negatively regulate the expression of PIF3 gene by binding to its promoter directly or in association with chromatin . The occurrence of HID1 homologs has been described in other plant species exhibiting conserved functions. The findings also shed light on the involvement of other ncRNAs in light responses.
8. Limitations in computational analysis of lncRNAs
The selection of lncRNAs from the complete set of RNAs is broadly based on three criteria: (i) transcript length of ≥200 bp, (ii) small open reading frame with ≤300 bp, and (iii) transcripts without homology to known proteins. In addition to this, several other factors like the type of cDNA libraries or transcriptional sequence data, depth of sequencing, and coding potential of transcripts, also contribute in the screening of lncRNAs. The challenges during computational analysis come when some protein-coding gene which fulfill the basic selection criteria and encode a functional peptide. Besides this, the functional long non-coding transcript may have ORF >300 bp and share homology with known protein-coding genes will also produce hindrance in the identification . Another challenge comes with the transcripts that not only function as an RNA molecule, but also encodes a peptide . The advancement in computational approaches have been made to overcome these limitations and for more accurate differentiation between coding and non-coding transcripts . The use of support vector machines (SVMs) or other machine learning algorithms along with the computational methods have increased the confidence of disparity in between coding and non-coding transcripts . However, the identity and function of computationally identified lncRNA needs to be validated individually by experimentation.
Authors are grateful to Panjab University, Chandigarh, India for research facilities. SKU is grateful to Department of Science and Technology (DST), Government of India, Science and Engineering Research Board (SERB) for Early Career Research Award (ECR/2016/001270), and DST-INSPIRE Faculty Fellowship (IFA12-LSPA-09).