Open access peer-reviewed chapter

Transcriptome Analysis of Non‐Coding RNAs in Livestock Species: Elucidating the Ambiguity

By Duy N. Do, Pier-Luc Dudemaine, Bridget Fomenky and Eveline M. Ibeagha-Awemu

Submitted: November 18th 2016Reviewed: May 23rd 2017Published: September 13th 2017

DOI: 10.5772/intechopen.69872

Downloaded: 704

Abstract

The recent remarkable development of transcriptomics technologies, especially next generation sequencing technologies, allows deeper exploration of the hidden landscapes of complex traits and creates great opportunities to improve livestock productivity and welfare. Non-coding RNAs (ncRNAs), RNA molecules that are not translated into proteins, are key transcriptional regulators of health and production traits, thus, transcriptomics analyses of ncRNAs are important for a better understanding of the regulatory architecture of livestock phenotypes. In this chapter, we present an overview of common frameworks for generating and processing RNA sequence data to obtain ncRNA transcripts. Then, we review common approaches for analyzing ncRNA transcriptome data and present current state of the art methods for identification of ncRNAs and functional inference of identified ncRNAs, with emphasis on tools for livestock species. We also discuss future challenges and perspectives for ncRNA transcriptome data analysis in livestock species.

Keywords

  • bioinformatics
  • genome editing
  • livestock species
  • long non-coding RNA
  • non-coding RNA
  • microRNA
  • transcriptome

1. Introduction

A vast portion of the mammalian transcriptome is composed of non-protein coding transcripts or non-coding RNA (ncRNA). Some ncRNAs are processed into functionally important transcripts such as microRNA (miRNA), transfer RNA (tRNA), ribosomal RNA (rRNA), small nucleolar RNA (snoRNA), small nuclear RNA (snRNA), small interfering RNA (siRNA), PIWI-interacting RNA (piRNA), circular RNA (circRNA), long non-coding RNA (lncRNA) and several classes with limited information about their functions. In addition to the well described ncRNA classes, clusters of ncRNA (22–200 nucleotides (nt)) were detected at the 5, and 3′end of human and mouse genes, and named promoter-associated short RNAs (PASRs) and termini-associated short RNAs (TASRs) [1]. Mercer et al. [2] described a class of ncRNA, about 50–200 nt, that are processed from the 3′UTRs of protein-coding genes (uaRNAs). The uaRNAs are in sense direction to the protein-coding gene and show stage, sex and subcellular specific expression. A class of ncRNA derived from tRNA precursors and named tRNA-derived RNA fragments (tRF) or tRNA-derived small RNAs (tsRNAs) appear to be processed by Dicer while others are Dicer independently processed [3, 4]. Small nucleolar RNAs (snoRNA) can also be processed into small miRNA-like molecules called sno-derived RNAs or sdRNAs [5, 6] which play roles in guiding enzymes to target RNAs for modification [7]. In this chapter, only the main classes of functional ncRNAs (miRNA, snoRNA, siRNA, piRNA and lncRNA), not considering the translation related ncRNAs (rRNA and tRNA), will be further discussed. NcRNAs have been implicated in many biological processes including transcriptional inference, translational modifications, mRNA cleavage, epigenetic modifications, regulation of structural organization, and modulation of alternative splicing, small RNA precursor, and endo or secondary siRNA generation [710].

2. Transcriptome analysis of non-coding RNA

2.1. Platforms for transcriptome analysis of non-coding RNA

Transcriptome analysis reached a turning point in its history with the arrival of high throughput next-generation sequencing technologies like RNA-Sequencing (RNA-Seq) [11, 12]. Before this time, microarray was the gold standard for transcript profiling or simultaneous measurement of the expression level of thousands of genes in a given sample [13, 14]. Microarray technology however has major drawbacks like non-specific probe hybridization signals and errors in background level measurements [15], as well as limited gene diversity since probes are designed to represent only a set of preselected genes. Unique hybridization properties of each probe may affect their dynamic range and thus create bias in data processing algorithms [16]. The flexibility offered by RNA-Seq technology enables detection of unknown splice junctions [17], novel transcripts [18], new single nucleotide polymorphisms (SNPs) [19] and many other features all in the same assay. RNA-Seq technology has taken the possibility of fine tuning our knowledge of the transcriptome to a much higher level. In recent years, RNA-Seq has proved its worth as a technology that will replace microarray in whole-genome transcript profiling [2022]. Correlation of RNA-Seq to RNA-Seq differential gene expression data resulted in good overlap than RNA-Seq to microarray data [23, 24], thus confirming that RNA-Seq is the preferred method to analyze the transcriptome. Moreover, correlation of transcriptome quantification by the two methods versus transcript level measured by shotgun mass spectroscopy showed better estimation with RNA-Seq analysis [25]. Through the evolution process of RNA-Seq technology, other new aspects have been included such as allele specific transcriptome analysis. Moreover, since the RNA-Seq procedure does not rely on known genome annotation, but rather on all the information available in a given sample, there is clear opportunity to make discoveries at a rate never expected before.

A diversity of platforms offer a wide range of RNA-sequencing possibilities[12]. For example, Illumina HiSeq and MiSeq technologies offer short sequence reads (36–300 base pairs (bp)) while Oxford Nanopore can reach sequence lengths of greater than 150 kilo base pairs (kb) [26]. The sequencing techniques could be DNA-polymerase dependent (i.e. sequencing-by-synthesis (e.g. Illumina MiSeq/HiSeq)) while others like PacBio and Oxford Nanopore are single-molecule sequencers. The sequencing error rate ranges from 0.1% (Illumina MiSeq/HiSeq) to about 1.3% (PacBio RSII single pass). An overview of sequencing platforms and their characteristics is shown in Table 1. The error rate between platforms varies [27], so it is important to consider this especially when the goal is to sequence short read transcripts like miRNA.

PlatformRead length1 (base pair)Throughput2Number of reads3Error profile
Illumina MiniSeq (high output)75 (SE)1.6–1.8 Gb22–25 M<1%, substitution
75 (PE)3.3–7.5 Gb44–50 M<1%, substitution
150 (PE)6.6–7.5 Gb44–50 M
Illumina MiniSeq (mid output)75 (SE)2.1–2.4 Gb14–16 M<1%, substitution
Illumina MiSeq v236 (SE)540–610 Mb12–15 M<0.1%, substitution
25 (PE)750–850 Mb24–30 M<0.1%, substitution
150 (PE)4.5–5.1 Gb24–30 M<0.1%, substitution
250 (PE)7.5–8.5 Gb24–30 M<0.1%, substitution
Illumina MiSeq v375 (PE)3–4 Gb44–50 M<0.1%, substitution
300 (PE)13–15 Gb44–50 M<0.1%, substitution
Illumina NextSeq 500/550 (high output)75 (SE)25–30 Gb400 M<1%, substitution
75 (PE)50–60 Gb800 M<1%, substitution
150 (PE)100–120 Gb800 M<1%, substitution
Illumina NextSeq 500/550 (mid output)75 (PE)16–20 Gb~260 M<1%, substitution
150 (PE)32–40 Gb~260 M<1%, substitution
Illumina HiSeq250v2 Rapid run36 (SE)9–11 Gb300 M0.1%, substitution
50 (PE)25–30 Gb600 M0.1%, substitution
100 (PE)50–60 Gb0.1%, substitution
150 (PE)75–90 Gb0.1%, substitution
250 (PE)125–150 Gb0.1%, substitution
Illumina HiSeq250v336 (SE)47–52 Gb1.5 B0.1%, substitution
50 (PE)135–150 Gb3 B0.1%, substitution
100 (PE)270–300 Gb0.1%, substitution
Illumina HiSeq250v436 (SE)64–72 Gb2 B0.1%, substitution
50 (PE)180–200 Gb4 B0.1%, substitution
100 (PE)360–400 Gb0.1%, substitution
125 (PE)450–500 Gb0.1%, substitution
Illumina HiSeq3000/400050 (SE)105–125 Gb2.5 B0.1%, substitution
75 (PE)325–375 Gb0.1%, substitution
150 (PE)650–750 Gb0.1%, substitution
Illumina HiSeqX150 (PE)800–900 Gb2.6–3 B0.1%, substitution
150 (PE)1.6–20 B167 Gb–6 Tb
Ion Proton200 (SE)Up to 10 Gb60 M1% indel
Ion PGM 318200 or 400 (SE)0.6–2 Gb4–5.5 M1% indel
Ion PGM 316200 or 400 (SE)0.3–1 Gb2–3 M1% indel
Ion PGM 314200 or 400 (SE)30–100 Mb0.4–0.5 M1% indel
PacBio Sequel8–12 kb (SE)3.5–7 Gb>100,000N/A
PacBio RS II~20 kb0.5–1Gb~55,000~13%, indel
454 GS Junior~400 (SE, PE)35 Mb~0.1 M1%, indel
454 GS Junior+~700 (SE, PE)70 Mb~0.1 M1%, indel
454 GS FLX Titanium XLR70Up to 600; 450 mode (SE, PE)450 Mb~1 M1%, indel
454 GS FLX Titanium XL+Up to 1000; 700 mode (SE, PE)700 Mb~1 M1%, indel
SOLiD 5500 xl50 or 75 (SE)160–320 Gb~1.4 B≤0.1%, AT bias
SOLiD 5500 Wildfire50 or 75 (SE)80–160 Gb700 M≤0.1%, AT bias
Oxford Nanopore MK1 MinIONUp to 200 Kb~1.5 Gb~12%, indel
Oxford Nanopore GridION X5~Hundreds of Kb100 Gb
Oxford Nanopore PromethION~4 Tb

Table 1.

Overview of some sequencing platforms for transcriptome analysis and their characteristics.

1SE: single end, PE: paired end, Kb, Kilo base pair.


2Mb: Megabyte, Gb: Gigabyte, TB: Terabyte.


3M: Million, B: Billion.


The challenges of managing RNA-Seq data are considerable in terms of data storage and analysis as well as algorithm development. Since the technology is not yet fully matured, shortcomings exist at every step of sequence analysis. Various tools are available for alignment of reads, transcript construction, quantification, differential gene expression, pathways and correlation analyses [28] (Tables 2 and 3). Nonetheless, the use and specificity of the softwares differ highly from one type of analysis to another and the hardest part is making sure that the right tool is chosen at every step. A review of best practices for RNA-Seq data analysis was published recently [29]. The gap between the rapid evolution of RNA-Seq technology and the development of data analysis tools is hindering wide application in livestock species. Most data analysis tools are developed for use with genomes of human and common model organisms (mouse, rat) and require tweaking before use with livestock genomes. For example, when performing target prediction analysis for newly discovered transcripts, it is the practise to use human/mouse databases as it brings a lot of power to the analysis. However, there is great bias coming from the assumption that livestock biological systems are identical to human or mouse.

StepToolsApplication/Web linkReferences
Trimming*TrimmomaticIllumina single end and paired end quality and adapter trimming.http://www.usadellab.org/cms/?page=trimmomatic[39]
PEATSpecific for paired end sequencing quality and adapter trimming.https://github.com/jhhung/PEAT[50]
Trim GaloreQuality and adapter trimming with some extra functionality for Bisulfite-Seq.https://www.bioinformatics.babraham.ac.uk/projects/trim_galore[51]
SkewerAdapter trimming, can take into account indels.https://github.com/relipmoc/skewer[52]
AlienTrimmerDetect and remove alien k-mers in both ends of sequence reads.ftp://ftp.pasteur.fr/pub/gensoft/projects/AlienTrimmer/.[53]
CutadaptFinds and remove adapter, primers, poly-A and other types of unwanted sequences.https://github.com/marcelm/cutadapt
NxTrimDiscard as little sequence as possible from Illumina Nextera Mate Pair reads, single end and paired end reads.https://github.com/sequencing/NxTrim[54]
SeqPurgeCan detect very short adapter sequences.https://github.com/imgag/ngs-bits/blob/master/doc/tools/SeqPurge.md[55]
Alignment**STARAlign RNA-Seq reads to a reference genome, detect splice junctions.https://github.com/alexdobin/STAR[45]
Bowtie / Bowtie2Align short DNA sequences to genomes with Burrows-Wheeler index. bowtie-bio.sourceforge.net/bowtie2[56, 57]
BWAMapping low-divergent sequences against large reference genome. bio-bwa.sourceforge.net[58]
TopHat2Use Bowtie for alignment. TopHat analyzes results to identify splice junctions.https://ccb.jhu.edu/software/tophat[59]
RockhopperSpecific for bacterial RNA-Seq data. It supports de novo and reference based transcript assembly. cs.wellesley.edu/~btjaden/Rockhopper[60]
SpliceMapDe novo splice junction discovery and alignment tool.https://web.stanford.edu/group/wonglab/SpliceMap[61]
StringTieDe novo transcript assembly.
Quantitation of full-length transcripts representing multiple splice variants for each gene locus.https://ccb.jhu.edu/software/stringtie
[47]
TrinityDe novo reconstruction of transcriptomes from RNA-seq data.https://github.com/trinityrnaseq/trinityrnaseq/wiki[62]

Table 2.

Frequently used tools for trimming and alignment.

*Further trimming tools are available at: https://omictools.com/adapter-trimming-category/


**Further alignment tools are available at: https://omictools.com/read-alignment-category/


NamesMajor purpose1Known miRNA annotation2Novel miRNA discoveryDE analysesTarget predictionPathway enrichmentLivestock SpeciesReferences
miRDeepmiRNA identification+++[74]
mirToolsmiRNA identification+++++[71]
UEA sRNA WorkbenchmiRNA identification+++++[76]
sRNAtoolboxmiRNA identification+++++[77]
MIReNAmiRNA identification++[81]
miRExpressmiRNA identification++[93]
DARIOmiRNA identification+++[94]
Target scanTarget prediction++[95]
DIANA-microT-CDSTarget prediction++[96]
miRandaTarget prediction++[97]
miRDBTarget prediction++[98]
miRTarTarget prediction+[99]
mirWIPTarget prediction+[100]
MMIATarget prediction++[101]
PITATarget prediction++[102]
psRNATargetTarget prediction+[103]
RNA22Target prediction++[104]
RNAhybridTarget prediction++[105]
TargetRankTarget prediction+[106]
DIANA-mirPath v3Down-stream miRNA analyses+++[107]
miRGatorIntegrated tools+++[108]
MAGIADown-stream miRNA analyses+[109]
miRNetDown-stream miRNA analyses++[110]
miRSystemDown-stream miRNA analyses++[111]
miRNAMapIntegrated tools+++++[112]
miRTarBaseIntegrated tools++++++[113]
TransmiRDown-stream miRNA analyses++[114]
PicTarTarget prediction++[115]
miRWalkIntegrated tools++++[116]
MiRecordsIntegrated tools++++[117]
multiMiRIntegrated tools++++[118]
miRconnXIntegrated tools+++++[119]
DIANA-mirExTraDown-stream miRNA analyses+[120]
TarBaseDatabase+++++[121]

Table 3.

Overview of tools used for the analysis of miRNA sequence data.

1Further tools for miRNA annotation are available at:https://tools4mirs.org/software/known_mirna_identification/; Further tools for novel miRNA discovery and miRNA precursor prediction are available at:https://tools4mirs.org/software/precursor_prediction/; Further tools for miRNA target prediction are available at:https://tools4mirs.org/software/target_prediction;https://omictools.com/mirna-target-prediction-category


2“+” Function is included, “−” Function is not included.


2.2. Generation of ncRNA sequence data and pre-mapping quality control

2.2.1. Generation of ncRNA sequence data

The choice of the sequencing platform is critical to attain the goals of a study. Numerous protocols and commercial kits to generate cDNA libraries from RNA samples are available and they are mostly based on the same principles (e.g. fragmentation, reverse-transcription, adapter ligation and amplification). The steps in library preparation for lncRNA are the same as for mRNA since they share similar biogenesis pathways. The starting material for lncRNA library preparation is total RNA. Majority of lncRNA transcripts have poly-A tails while a small proportion do not. Library preparation methods based on poly-A tail selection are cheaper but less robust since non-poly-A tail transcripts are lost. An ideal but more expensive method involves depletion of rRNA (constitutes ~90% of total RNA). Library preparation with rRNA depleted total RNA is robust as it allows quantification of all other RNA transcripts including lowly expressed transcripts. Thus, the first step in lncRNA library preparation is to consider whether to perform poly-A tail selection or to deplete rRNA (Figure 1). The next dilemma is deciding whether or not to preserve strand information during library preparation. As lncRNA annotation is still in the initial phase, it is crucial to preserve strand information to enable correct genome localization of novel transcripts. Paired-end sequencing is to be considered over single end sequencing for lncRNA characterization to facilitate construction of transcripts with clear-cut exon boundaries. Paired-end sequencing also allows accurate detection of splicing position. Sequencing long fragments (>100 bp) is also desired to get adequate coverage of the genome and consequently, better transcript construction. The number of multiplexed samples on each sequencing lane affects lncRNA sequence depth. Reducing cost by multiplexing more samples than necessary reduces quality of results obtained. It has been demonstrated that the depth of sequencing is relative to the nature of the expected results [30, 31]. To accomplish lncRNA discovery with confidence, a minimum of 100 million reads per sample is suggested to enable de novo transcript assembly.

Figure 1.

Starting material and sequencing method considerations according to RNA species to be analyzed.

The procedure for the generation of miRNA sequence data differs slightly from the procedure for lncRNA analysis. First of all, miRNAs are small (18–24 bp) in size and do not require RNA fragmentation prior to library construction. Total RNA is the recommended starting material for miRNA library preparation (Figure 1). Although some commercial kits provide the option to enrich the miRNA fraction prior to library preparation, there is evidence that some small RNA species are lost during enrichment [32]. The protocols for miRNA library preparation are generally similar to lncRNA and include adapter ligation step, reverse transcription and amplification followed by size selection and purification of the cDNA. Fifty bp single end sequencing is sufficient for miRNA libraries since miRNAs are generally small. Thus, Illumina platforms are well suited for sequencing miRNA libraries. Studies showed that approximately 2 million reads are sufficient for differential expression analysis while 8 million reads are sufficient for discovery analysis [33, 34]. Considering that over 150 million reads are available per lane on HiSeq machines, sample multiplexing can be as high as 18 to 20 libraries per lane.

2.2.2. Common data processing steps

Upon availability of sequence data, many bioinformatics tools are used in the analytical procedures. Some processing steps are optional but strongly recommended; while others are required before the next step can be performed. Many pipelines have been developed to answer specific questions, but the softwares used can be very different. A global view of the general processing steps and frequently used tools for lncRNA and miRNA sequence data analyses are presented in Figures 2 and 3, respectively. These processing steps can be modified to include desired or specific tools depending on the research question.

Figure 2.

General processing steps and tools used in lncRNA sequence analysis.

Figure 3.

General processing steps and tools used in miRNA sequence analysis.

2.2.3. Raw data quality control

Sequence data generated by Illumina platforms and most platforms is in FASTQ format. The FASTQ format is a text file consisting of the nucleic acid sequence (read) and base calling accuracy score (Phred score) attributed to each base pair of the sequence. FastQC [35], Picard tools (https://broadinstitute.github.io/picard/) and NGS QC tool kit [36] are often used to assess the quality of raw sequence reads. This step is necessary to determine if the sequencing outcome is as expected. These tools inform on the total number of reads, the overall quality of base call according to the position, GC percentage and other features. Care should be taken when interpreting the results because GC content is species specific and some softwares evaluate GC content according to the human genome. In order to avoid bias in the mapping step, a quality trimming is necessary to get rid of low quality base pairs and remaining adapter sequences. A recent study showed that incorrect trimming can lead to generation of short reads impairing the capacity to correctly predict differences in expression changes [37]. Several trimming tools are available [38] (https://omictools.com/adapter-trimming-category) including Trimmomatic [39], FASTX-Toolkit [40], CutAdapt [41], etc.(Table 2). Following trimming, filtering of reads is necessary to get rid of very short and overall low quality reads to keep bias level as low as possible.

2.2.4. Alignment

After trimming and filtering, reads are ready for alignment or de novo construction. Alignment consists of mapping reads to a reference genome. Various alignment tools have been developed [42, 43] (https://omictools.com/read-alignment-category) including frequently used tools like TopHat [44], STAR [45], Bowtie [46], StringTie [47], etc. (Table 2). These softwares have their own specifications highlighting the importance of understanding the utility of each tool and the options they offer. The alignment tool used can have great impact on the end results. It has been observed that the choice of aligner and specific options can affect results of differential gene expression analysis [48]. Aligners can be grouped in two types, gapped (also known as split, e.g. STAR, BWA, etc.) and ungapped (e.g. Bowtie, etc.). Bowtie (ungapped group) can easily map reads to a genome, but is less effective at finding spliced junctions. Aligners in the gapped group are able to align reads and detect spliced variants. In the absence of a reference genome, de novo assembly aligners (e.g. Trinity [49]) can be used. In the context of lncRNA read alignment, gapped softwares are preferred since the transcripts are not all annotated and portions of the reads of the same transcript may align to one position of the genome and the remaining to another position. Alignment is one of the longest steps in RNA-Seq sequence analysis therefore selection of the right tool might have significant impact on the outcome of the analysis. It is also important to perform mapping quality control following alignment. Quality check includes the percentages of mapped and unmapped reads, the location of the reads (intronic and exonic) and the 5′–3′ coverage.

2.2.5. Transcript construction and quantification

RNA-Seq transcript construction and the alignment steps can demand considerable computing time. Transcript construction tools are many (https://omictools.com/transcript-quantification-category) including commonly used tools like Cufflinks [63], iReckon [64], StringTie [47], etc. This step requires paired-end data and high sequence coverage to reconstruct lowly expressed transcripts. With the assumption that transcripts are species specific, raw data or alignment files from all samples from the same population can be merged to increase coverage [65]. This modification will help clarify transcript boundaries in case of de novo transcript assembly. Particular considerations for lncRNA transcript construction include sample pooling according to species and tissue type. LncRNA expression is known to demonstrate tissue specificity [6668].

2.2.6. miRNA processing steps

Overall, the procedures for miRNA identification and discovery are less time consuming and do not include as many steps as for mRNA and lncRNA identification. The global process includes quality and adaptors trimming with quality checkpoints before and after each step. A size selection to keep sequences between 17 and 30 nt (sometimes up to 35 nt) is often performed right after the quality and adaptors trimming step. This is followed by read mapping and filtering of other RNA sequences (rRNA, tRNA, snRNA, mRNA, lncRNA, etc.). The reads thought to represent miRNA are analyzed with miRNA prediction tools like miRDeep2 [69], miRanalyzer [70], mirTools 2.0 [71], etc. (Table 3). Subsequent interrogation of miRBase database enables classification of retained miRNAs as known or novel miRNAs. A tool like miRDeep2 has a quantifier module that generates a read count table for each miRNA using precursor and mature sequence files as input. An overview of tools for miRNA identification are presented in Table 3 and further discussed in the next section.

3. Tools for ncRNA identification

3.1. Tools for miRNA identification

The identification of miRNAs can be either annotation of known miRNAs or discovery of novel miRNAs. A variety of algorithms and bioinformatics tools are applied to annotate known miRNAs as well as to discover new miRNAs from sequence data. These tools can use several features such as sequence conservation among species, structural features like hairpin and minimal folding free energy [72]. Many tools are available for miRNA annotation (https://tools4mirs.org/software/known_mirna_identification/) [73] including frequently used tools like miRdeep [74], miRanalyzer [75], mirTools 2.0[71], UEA sRNA Workbench [76], sRNAtoolbox [77], and SeqBuster [78] (Table 3). Many more tools have been developed for novel miRNA discovery and miRNA precursor prediction (https://tools4mirs.org/software/precursor_prediction/)[73] including frequently used tools like MiPred [79], miRanalyzer [75], miR-Abela [80], MiReNA [81], UEA sRNA Workbench [76] and mirDeep [74] (Table 3). Major features of miRNA discovery tools have been reviewed [8284]. Regarding livestock species, the choice of methods for miRNA discovery and novel miRNA annotation vary among studies and species. For example, De Vliegher et al. [85] used miRbase [86] and UNAFold [87] for miRNA annotation and discovery in bovine mammary gland tissues while Peng et al [88] used miRbase [86] and RNAfold [89] for these purposes in porcine mammary glands. In our own studies, miRbase [86] and mirDeep2 [74] were used to identify miRNAs in various tissues including bovine mammary gland tissues [90], milk fat [9092], milk whey and cells [90].

3.2. Tools for lncRNA identification

To date, a large number of lncRNA genes have been identified in the genomes of human (141,353), cow (23,896) and chicken (13,085) (http://www.bioinfo.org/noncode/analysis.php, accessed on 24-03-2017). Several methodologies have been described to identify/distinguish lncRNAs from mRNAs and successfully applied to livestock species such as coding potential calculator (CPC) [122], PhyLoCSF [123], coding-non-coding index (CNCI) [124], coding potential assessment tool (CPAT) [125], Predictor of Long non-coding RNAs and mRNAs based on an improved k-mer scheme (PLEK) [126] and Flexible Extraction of LncRNAs (FEELnc) [127], etc. The FEELnc program developed by the functional annotation of animal genome project consortium (FAANG) [128] is recommended as a standardized protocol for lncRNA analyses in animal species. In order to distinguish lncRNAs from mRNAs, FEELnc program uses a machine-learning method for estimation of a protein-coding score according to the RNA size, open reading frame coverage and multi k-mer usage [127]. The FEELnc program can derive an automatically computed cut-off so it maximizes the lncRNA prediction sensitivity and specificity. An overview of tools for lncRNA identification/characterization is listed in Table 4.

ToolsTypeMajor Function/web linkReferences
ChIPBaseDatabaseIdentifies binding motif matrices and their binding sites. Predicts transcriptional regulatory relationships between transcription factors and genes. http://rna.sysu.edu.cn/chipbase/.[129]
LNCipediaDatabaseProvides basic transcript information and structure, human lncRNA transcripts and genes. http://www.lncipedia.org/.[130]
lncRNAdbDatabaseProvides comprehensive annotation of eukaryotic lncRNAs. Offers an improved user interface enabling greater accessibility to sequence information, expression data and the literature. http://www.lncrnadb.org/.[131]
LNCatDatabaseStores the information of 24 lncRNA annotation resources. Allows achieving refined annotation of lncRNAs within the interested region. http://biocc.hrbmu.edu.cn/LNCat/[132]
LncRNASNPDatabaseProvide comprehensive resources of single nucleotide polymorphisms (SNPs) in human/mouse lncRNAs. bioinfo.life.hust.edu.cn/lncRNASNP/[133]
lncRNAWikiDatabaseProvide open-content and publicly editable curation and collection of information on human lncRNAs. http://lncrna.big.ac.cn/index.php/Main_Page[134]
NONCODEDatabasePresents the most complete collection and annotation of non-coding RNAs (excluding tRNAs and rRNAs) for 18 species including human, mouse, cow, rat, chicken, pig, fruitfly, zebrafish, Caenorhabditis elegans and yeast. www.noncode.org/[135]
ALDBDatabaseEnables the exploration and comparative analysis of lncRNAs in domestic animals. Offers information on genome-wide expression profiles and animal quantitative trait loci (QTLs) of domestic animals. http://res.xaut.edu.cn/aldb/index.jsp[136]
GENCODEDatabasePresents all gene features in the human genome.
Contains annotation of lncRNA loci publicly available with the predominant transcript form consisting of two exons. https://www.gencodegenes.org
[137]
ncRDeathDBDatabasePresent a comprehensive bioinformatics resource to ncRNA-associated cell death interactions. www.rna-society.org/ncrdeathdb[138]
LncVarDatabasePresents genetic variation associated with long noncoding genes. bioinfo.ibp.ac.cn/LncVar[139]
IRNdbDatabaseCombines microRNA, PIWI-interacting RNA, and lncRNA information with immunologically relevant target genes. http://irndb.org[140]
AnnoLncAnnotationPresents online portal for systematically annotating newly identified human lncRNAs.[141]
LongTargetTarget predictionPresent a computational method and program to predict lncRNA DNA-binding motifs and binding sites. lncrna.smu.edu.cn[142]
LncRNA2FunctionFunctional inferencesFacilitates search for the functions of a specific lncRNA or the lncRNAs associated with a given functional term, or annotate functionally a set of human lncRNAs of interest. http://mlg.hit.edu.cn/lncrna2function[143]
Co-LncRNAFunction inferencePresents a web-based computational tool that allows users to identify GO annotations and KEGG pathways that may be affected by co-expressed protein-coding genes of single or multiple lncRNAs. www.bio-bigdata.com/Co-LncRNA/[144]
LncRegFunction inferenceProvides regulatory information about lncRNAs, such as targets, regulatory mechanisms, and experimental evidence for regulation and key molecules participating in regulation. bioinformatics.ustc.edu.cn/lncreg/[145]
Linc2GOFunction inferenceProvides comprehensive functional annotations for human lincRNA. http://www.bioinfo.tsinghua.edu.cn/~liuke/Linc2GO[146]
FARNAFunction annotationIntegrates ncRNA information related to expression, pathways and diseases in a large number of human tissues and primary cells. www.cbrc.kaust.edu.sa/farna/[147]
ViRBaseDatabaseProvides the scientific community with a resource for efficient browsing and visualization of virus-host ncRNA-associated interactions and interaction networks in viral infection. http://www.rna-society.org/virbase[148]
LncRNA2TargetDatabaseStores lncRNA-to-target genes. Provides a web interface for searching targets of a particular lncRNA or for the lncRNAs that target a particular gene. https://www.lncrna2target.org/[149]
LncinFunction annotationIdentifies lncRNAs-associated modules from protein interaction networks and predicts the function of lncRNAs based on the protein functions in the modules. lncin.ym.edu.tw[150]
NPInterFunction annotationIntegrates experimentally verified functional interactions between noncoding RNAs (excluding tRNAs and rRNAs) and other biomolecules (proteins, RNA and genomic DNA). www.bioinfo.org.cn/NPInter[151]
CPCCoding potential assessmentDistinguishes between coding and noncoding RNA. Uses a Support Vector Machine-based classifier to assess the protein-coding potential of a transcript. cpc.cbi.pku.edu.cn/[122]
CNCICoding potential assessmentDistinguishes between protein-coding and non-coding sequences independent of known annotations. Applies to a variety of species without whole-genome sequence or with poorly annotated information. https://github.com/www-bioinfo-org/CNCI[124]
CPATCoding potential assessmentDistinguishes between coding and noncoding RNA. Uses a logistic regression model to assess the protein coding potential. rna-cpat.sourceforge.net/[125]
FEELncLncRNA predictionDerives an automatically computed cut-off so it maximizes the lncRNA prediction sensitivity and specificity. https://github.com/tderrien/FEELnc[127]
PLEKlncRNA predictionUses k-mer scheme and a support vector machine (SVM) algorithm to distinguish lncRNAs from mRNAs. http://www.ibiomedical.net/plek/[126]

Table 4.

Overview of tools for the analysis of lncRNA sequence data.

3.3. Tools for identification of other non-coding RNA

Currently, few tools have been developed for the identification of groups of ncRNAs other than miRNAs and lncRNAs. The popular tools for piRNA identification include ProTRAC [152], piClust [153], piRNAQuest [154], etc. (Table 5). proTRAC detects piRNA clusters based on a probabilistic analysis with assumption of a uniform distribution while piClust uses a density based clustering approach for the detection of piRNAs. piRNAQuest allows a search of the piRNome for silencers [154]. Another notable framework is SeqCluster [155], a python pipeline for the annotation and classification of non-miRNA small ncRNAs. The pipeline permits a highly versatile and user-friendly interaction with data in order to easily classify small RNA sequences with putative functional importance [155]. For other small RNAs, ncPRO-seq [156] allows the discovery of unknown ncRNA or siRNA-coding regions from small RNA sequence data. DARIO [94] is a web-tool that allows annotation and detection of ncRNAs from various species but not livestock species. CoRAL [157] is a machine learning method that classifies ncRNAs by relying on biologically interpretable features. Several tools also have been developed for predicting circRNAs such as PredicircRNATool [158] and PredcircRNA [159] which apply a machine learning approach to distinguish circRNAs from other ncRNAs (Table 5).

ToolsTypesMain Features/web linkReferences
ProTRACpiRNA predictionDetects and analyses piRNA clusters based on quantifiable deviations from a hypothetical uniform distribution regarding the decisive piRNA cluster characteristics. https://sourceforge.net/projects/protrac/[152]
piClustpiRNA predictionFinds piRNA clusters and transcripts from small RNA-seq data using a density based clustering approach. http://epigenomics.snu.ac.kr/piclustweb[153]
piRNAQuestpiRNA databaseProvides annotation of piRNAs based on their genomic location in gene, intron, intergenic, CDS, UTR, repeat elements, pseudogenes and syntenic regions. bicresources.jcbose.ac.in/zhumur/pirnaquest[154]
SeqClusterncRNA classificationA framework python for the annotation and classification of the non-miRNA small RNA transcriptome. http://seqcluster.readthedocs.io/#[155]
ncPRO-seqncRNA discoveryAllows the discovery of unknown ncRNA- or siRNA-coding regions from sRNA sequence data. http://ncpro.curie.fr/.[156]
DARIOncRNA discoveryAllows annotation and detection of ncRNAs from various species but not livestock species. http://dario.bioinf.uni-leipzig.de/index.py[94]
CoRALncRNA classificationA machine learning method that classifies ncRNA by relying on biologically interpretable features. http://wanglab.pcbi.upenn.edu/coral[157]
DASHRDatabaseStores human small ncRNAs: miRNAs, piRNAs, snRNAs, snoRNAs, scRNAs (small cytoplasmic RNAs), tRNAs, and rRNAs information. lisanwanglab.org/DASHR[160]
Sno/scaRNAbaseDatabaseA curated database for small nucleolar RNAs (snoRNAs) and small cajal body-specific RNAs (scaRNAs). gene.fudan.edu.cn/snoRNAbase.nsf[161]
snoRNADatabaseContains over 1000 snoRNA sequences from Bacteria, Archaea, and Eukaryotes. http://evolveathome.com/snoRNA/snoRNA.php[162]
CircNetProvides the following resources: (i) novel circRNAs, (ii) integrated miRNA-target networks, (iii) expression profiles of circRNA isoforms, (iv) genomic annotations of circRNA isoforms, and (v) sequences of circRNA isoforms. circnet.mbc.nctu.edu.tw[163]
PredicircRNAToolcircRNA predictionUses a machine learning method for predicting circRNAs from those of non-circularized, expressed exons based on conformational and thermodynamic properties in the flanking introns. https://sourceforge.net/projects/predicircrnatool[158]
circRNADbcircRNA databaseContains 32,914 human circular RNAs. http://reprod.njmu.edu.cn/circrnadb[164]
PredcircRNAcirRNA predictionApplies a machine learning approach to predict circRNA. https://github.com/xypan1232/PredcircRNA[159]
CirsBaseDatabaseProvides scripts to identify known and novel circRNAs in sequence data. circbase.org[165]
Circ2TraitsDatabaseContains a database of potential association of circular RNAs with diseases in human. http://gyanxet-beta.com/circdb[166]
CircInteractomeDatabaseProvides a web tool for mapping (RNA Binding Proteins (RBP)- and miRNA-binding sites on human circRNAs. Allows to (i) identify potential circRNAs which can act as RBP sponges, (ii) design junction-spanning primers for specific detection of circRNAs of interest, (iii) design siRNAs for circRNA silencing, and (iv) identify potential internal ribosomal entry sites. https://circinteractome.nia.nih.gov[167]
tRNAdbDatabaseContains 12,000 tRNA genes from 577 species and 623 tRNA sequences from 104 species, provides various services such as graphical representations of tRNA secondary structures. trnadb.bioinf.uni-leipzig.de[168]

Table 5.

Overview of tools and databases for sequence analysis of other small ncRNAs.

4. Tools for differential expression analysis of non-coding RNA

Various tools allow for the detection of genes (mRNA or ncRNA) differentially expressed (DE) between two or more conditions or states from sequence data. The major differences among tools are their implemented statistical methods, input and output file formats as well as filtering steps for DE analyses. Many tools such as DESeq [169], edgeR [170], NBPSeq [171], TSPM [172], baySeq [173], EBSeq [174], NOISeq [175], SAMseq [176] and ShrinkSeq [177] use count data as input file, while others like limma [178] and Cufflinks use transformed data or BAM files (the binary version of sequence alignment data) as input, respectively. Tools that use count data can be divided in to two groups; parametric (DESeq [169], edgeR [170], NBPSeq [171], TSPM [172], baySeq [173], EBSeq [174]) and non-parametric methods (NOISeq [175], SAMseq [176]). For parametric methods, most softwares (baySeq [173], DESeq [169], NBPSeq [171], edgeR [170], EBSeq [174] and NBPSeq) use a negative binomial model to account for over dispersion except ShrinkSeq which has two options for distribution, either negative binomial or a zero-inflated negative binomial distribution. These methods also implement different statistical test approaches; DESeq, edgeR and NBPSeq perform a classical hypothesis testing approach while baySeq, EBSeq and ShrinkSeq apply Bayesian methods. The comparison of methods and performances have been done and reviewed by many authors [29, 179183]. In general, no single method performs well for all datasets. In a survey of performance of DE analyses methods, Conesa et al. [29] observed that limma package [178] performed well under many conditions. Many studies observed similar performances by DESeq and edgeR in ranking genes [29, 179183]. However, DESeq is more conservative while edgeR is more liberal in controlling false discovery rate (FDR) [29]. Other tools such as SAMseq is better in controlling FDR while NOISeq is efficient in avoiding false positives [29].

5. Bioinformatics tools for target prediction and functional inference of non-coding RNA

Following discovery and detection of important ncRNAs from RNA sequence data, the important next steps are to understand their regulatory roles. Since ncRNAs commonly act by interacting with target genes (mostly inhibit expression), various tools have been developed to predict their target genes and to infer their functions (Tables 3 and 4). A simple work flow for inferring the functions of miRNAs is shown in Figure 4.

Figure 4.

A simple work flow for inference of miRNA function.

5.1. Functional inference of miRNAs

5.1.1. Bioinformatics tools for target prediction and functional inference of miRNAs

Inferring individual targets for a given miRNA can be done either by computational or experimental methods. Computational target prediction is coordinated in a sequence-specific manner and the target genes are normally predicted based on information derived from the potency of binding between miRNA and putative targets. Generally, the methods for computational prediction of miRNA targets can be grouped in single platforms such as TargetScan [95], PicTar [115], RNAhybrid [105] or multiple platforms such as miRwalk [116], TarBases [121], miRecords [117] as well as integrative platforms which include downstream analyses of putative target genes such as DIANA-microT-CDS [96], miRPathDB [184], etc. A collection of tools for miRNA target prediction are available at https://omictools.com/mirna-target-prediction-category and https://tools4mirs.org/software/target_prediction/ [185] (Table 3). Among the prediction tools, the major differences in principles are in the algorithm applied and in filtering steps considering the secondary structure of the target mRNA (reviewed in [83, 115, 186]). Consequently, the specificity, sensitivity and accuracy of prediction are different among tools. Additionally, the performances of tools also differ based on the skills of the user (such as formatting of input and output, programming skills, web interface and so on). Taken together, all these factors affect popularity of tools [72, 187]. A word cloud plot of the popularity of tools based on their citation per year is shown in Figure 5.

Figure 5.

Word cloud for relative use of miRNA target prediction tools (based on number of citations per year).

5.1.2. Popular single platforms for miRNA target prediction

TargetScan can be accessed via the web interface or by running a perl script (local run) [95]. The software detects targets in the 3′UTR of protein-coding transcripts by base-pairing rules (seed complementarity) and predicts miRNAs for miRNA families instead of individual miRNAs. To assess important miRNA-target interaction, TargetScan outputs two matrices: probability of conserved targeting (Pct) and total contextual score (TCS). Pct corresponds to a Bayesian estimate of the probability that a miRNA site on the 3′ UTR of a mRNA is conserved due to miRNA targeting while TCS represents the strength of the sequential features (site-type, 3′ pairing contribution, local AU contribution, position contribution, target site abundance and seed-pairing stability) that facilitate miRNA-target hybridization/cleavage. PicTar also searches for identical seed sequences to predict miRNA-mRNA interaction [115]. PicTar derives an overall score to assess the strength of the miRNA-target interaction. PicTar computes a score based on the maximum likelihood that a given 3′ UTR sequence is targeted by a fixed set of miRNAs. The PicTar algorithm scores any 3′ UTR that has at least one aligned conserved predicted binding site for a miRNA, and then incorporates all possible binding sites into the score. RNAhybrid computes target genes based on the free energy of hybridization of a long and a short RNA [105]. Hybridization is performed in a kind of domain mode; for example the short sequence is hybridized to the best fitting part of the long one. Rna22 [104] is a pattern-based approach to find miRNA binding sites and corresponding miRNA:mRNA complexes without a cross-species sequence conservation filter. Rna22 is resilient to noise and does not rely upon cross-species conservation. Unlike previous methods, Rna22 starts by finding putative miRNA binding sites in the sequence of interest followed by identification of the targeting miRNA. It can identify putative miRNA binding sites even though the targeting miRNA is unknown. miRanda was the first bioinformatics tool to predict the target genes of miRNAs. The miRanda algorithm is based on a comparison of miRNAs complementarity to 3′UTR of genes [97]. miRanda calculates the binding energy of the duplex structure, evolutionary conservation of the whole target site and its position within the 3′UTR and accounts for a weighted sum of match and mismatch scores for base pairs and gap penalties.

5.1.3. Portals for miRNA target prediction

miRWalk, a comprehensive database developed by Dweep et al [116] documents miRNA binding sites within the complete sequence of a gene and combines this information with predicted binding sites data resulting from 12 target prediction programs (DIANA-microTv4.0, DIANA-microT-CDS, miRanda-rel2010, mirBridge, miRDB4.0, miRmap, miRNAMap, doRiNA, PicTar2, PITA, RNA22v2, RNAhybrid2.1 and Targetscan6.2) to build platforms of binding sites for the promoter, coding (5 prediction datasets), 5’ and 3′UTR regions. It also contains experimentally verified miRNA-target interaction information collected via text-mining search and data from existing resources (miRTarBase, PhenomiR, miR2Disease and HMDD). MirRecords is a resource for animal miRNA-target interactions developed at the University of Minnesota [117]. MiRecords integrates predicted miRNA targets produced by 10 miRNA target prediction programs (DIANA-microTv4.0, miRanda-rel2010, miRDB4.0, PicTar2, PITA, RNAhybrid2.1, Targetscan6.2, miRTarget2, microinspector, NBmiRTar). It also contains information on experimentally validated miRNA targets obtained from the literature. mirDIP integrates 12 miRNA prediction datasets from miRNA prediction databases (DIANA-microTv4.0, miRanda-rel2010, miRDB4.0, PicTar2, PITA, RNAhybrid2.1, Targetscan6.2 and microCosm) allowing to customize miRNA target searches. multiMiR contains a collection of nearly 50 million records from 14 different databases [118]. It allows user-defined cut-offs for predicted binding strength to provide the most confident selection.

5.1.4. Integrated tools for miRNA analysis

Various integrated tools as well as work flow for miRNA analysis have been developed to perform downstream analyses of putative target genes (e.g. gene ontology, pathways enrichments of target genes, etc.) such as MMIA [101], MAGIA [109] and miRconnX [119], to link miRNA to transcription factors or to analyze the effect of several miRNAs such as DIANA-mirExTra v2.0 [120] and TransMIR [114]. Typically, predicted target genes are used as input for functional enrichment to infer the potential functions of miRNAs. Furthermore, several tools are also used to correlate the expression levels of miRNAs with mRNA in a particular experiment to infer miRNA function such as miRnet [110], miRSystem [111] and DIANA-miRPath v3.0 [107]. Several tools have also been developed to directly link miRNAs to biological processes such as DMirNet [188], miRnet [110] and DIANA-miRPath v3.0 [107]. Many tools and resources have also been developed to link miRNAs to specific phenotypes/environments including diseases such as miRNAs in obsessive-compulsive disorder [189], autophagy in gerontology [190], epilepsy [191] and cancer [192]. Among the most popular integrated tools, DIANA-tools (www.microrna.gr) covers a wide scope and research scenarios integrating several tools such as DIANA-microT-CDS, DIANA-TarBase v7.0, DIANA-miRGen v3.0, DIANA-miRPath v3.0, and DIANA-mirExTra v2.0. DIANA-microT-CDS uses different thresholds and meta-analysis followed by pathway enrichment to perform miRNA target prediction [96]. DIANA-TarBase is a manually curated target database with more than half a million miRNA-target interactions curated from published experiments performed with 356 different cell types from 24 species. DIANA-miRPath is an online software suite dedicated to the assessment of miRNA regulatory roles and the identification of controlled pathways [107]. DIANA-mirExTra performs combined differential expression analysis of mRNAs and miRNAs to uncover miRNAs and transcription factors that play important regulatory roles between two investigated state [193]. miRNet is an easy-to-use web-based tool for statistical analysis and functional interpretation of various datasets generated in miRNAs studies in various species. Moreover, it also allows users to explore the results of miRNA-target interaction [110]. MMIA is a web tool for integration of miRNA and mRNA expression data with predicted miRNA target information for analyzing miRNA-associated phenotypes and biological functions by gene set enrichment analyses [101].

5.2. Functional inference of lncRNA

Compared to miRNAs, fewer bioinformatics tools have been developed for functional inference of lncRNAs. Several databases have been developed to curate computationally predicted and experimentally verified lncRNAs, such as LncRNAdb [194], GENCODE [137], lncRNAtor [7], lncRNome [195], NONCODE [135], lncRNAWiki [134], LncRNA2Function [143] and starBase v2.0 [196]. LncRNAdb was the first lncRNA database [194] and its updated version (LncRNAdb v2.0) integrates lncRNAs reported in livestock species (cattle, sheep, pig, horse and chicken) [131]. DeepBase database is an online platform for annotation and discovery of lncRNAs from RNA-seq data and it contains a large number of transcript entries for bovine (43,156) and chicken (47,004) lncRNAs. Other databases for livestock species are RNAcentral [197] which currently houses information from 23 ncRNA databases (http://rnacentral.org/, access March, 2017) but only contains a small number of lncRNAs from livestock species (cattle, pig, horse and chicken). NONCODE [135] contains lncRNAs for 16 species including cattle and chicken in the latest version. The first lncRNA database with a particular focus on domesticated animals was ALDB [136]. ALDB contains 12,103 pig lincRNAs (long intergenic non-coding RNA), 8923 chicken lincRNAs, and 8250 cow lincRNAs (http://www.ibiomedical.net/aldb/, access March, 2017). However, no comprehensive database currently covers available information on lncRNAs from livestock species, therefore the availability of a comprehensive tool will be valuable and helpful for subsequent genomic and functional annotation of lncRNAs and comparative interspecies analyses [198]. Inference of lncRNAs functions can also be done by connecting their expression patterns with specific cell types or biological processes to draw possible conclusions on their potential roles. LncRNAs can act in cis and/or trans manner to influence or interact with nearby or distant genes, respectively [2, 199]. For cis-regulation, the genomic location can be used as a guide for guilt-by-association analysis which allows global understanding of lncRNAs and protein coding genes that are tightly co-expressed and thus presumably co-regulated. Cis-relationships can foreseeably arise through complementary sequence motifs, tethering, blocking, and product-independent transcription [2]. For example, the human HOTTIP lncRNA is a cis-acting lncRNA expressed in the HOXA cluster that activates transcription of flanking genes [200]. The bioinformatics tools for cis-regulation prediction include ncFANs (http://www.ebiomed.org/ncFANs) [201] which uses a coding-non-coding gene co-expression network to infer lncRNA function.

6. Emerging platforms and technologies for understanding and using ncRNAs

Efficient and reliable techniques for accurate detection of genome information are important for productivity and health of livestock species [202]. The introduction of next generation sequencing technologies has increased throughput studies of ncRNAs considerably. Consequently, studies on ncRNAs have contributed toward better understanding of disease resistance, productivity, breeding and meat quality in livestock species [203]. Although the numbers of detected ncRNA transcripts are increasing continuously, the ncRNAs identified and annotated in livestock species are still very scanty, compared with human data. Therefore, there is need to continue to explore the ncRNA transcriptome of livestock species [204]. The ability to explore and modify the genomes of livestock species could be beneficial in improving disease resistance, productivity, breeding capability as well as generation of new biomedical models [205].

Genome editing tools have emerged that allow efficient and precise genome manipulation of many organisms including livestock. The genome editing technique is built on engineered, programmable and highly specific nucleases that induce site-specific changes in the genomes of cellular organisms [206]. Subsequent cellular DNA repair processes generates desired insertions, deletions or substitutions at the loci of interest establishing linkages between genetic variations and biological phenotypes [207]. Presently, four artificially engineered nuclease systems have been developed for genome editing: meganucleases derived from microbial mobile elements, zinc finger nucleases (ZFNs) based on eukaryotic transcription factor DNA binding motif, transcription activator-like effector-based nucleases (TALEN) derived from a plan-invasive bacterial protein, and clustered regularly interspaced short palindromic repeats (CRISPR)-CRISPR associated protein 9 (Cas9) system [208]. Centromere and Promoter Factor 1 (Cpf1) is used as an alternative to Cas9 nuclease which requires only a single CRISPR RNA (crRNA) for targeting [209]. CRISPR/Cas9 is easily applicable and has developed really fast over the past years since only programmable RNA is required to generate sequence specificity [210].

CRISPR–Cas9 system is based on a bacterial CRISPR-Cas9 nuclease from Streptococcus pyogenes enabling inexpensive and high-throughput interrogation of gene function [211]. CRISPR-based screening can be used to study non-coding sequences, characterize enhancer elements and regulatory sequences crucial to elucidate the roles of ncRNA [212]. With the CRISPR–Cas9 system, the genome can be sliced at specific sites [213]. Genome editing techniques have been modified and used to alter the genomes of many organisms, thus offering opportunities for generation of genetically modified farm animals [214]. CRISPR offers the ability to target and study particular DNA sequences in the vast expanse of a genome [215]. There are two chief ingredients in the CRISPR–Cas9 system: a Cas9 enzyme that snips through DNA like a pair of molecular scissors, and a small RNA molecule that directs the scissors to a specific sequence of DNA to make the cut. The genome can be edited as desired at nearly any site if a template is provided [216].

In order to adapt this far-reaching application of gene-editing technology to agricultural improvement, various approaches have been applied to a number of livestock species. In pigs, direct cytoplasmic injection of Cas9 mRNA and single-guide RNA into zygotes generated biallelic knockout piglets [217]. The CRISPR-Cas9 system was used to generate gene-edited pigs protected from porcine reproductive and respiratory syndrome virus [218] and to genetically modify single blastocyst inducing indel mutations in a given gene locus[219]. Both Talen and ZNF have been injected directly into pig zygotes to produce live genome edited pigs [220]. Similarly, the porcine myostatin (MSTN) gene, which functions as a negative regulator of muscle growth, was disrupted using CRISPR/Cas9 system to efficiently generate biologically safe genetically modified pigs [221]. Similarly, zygote injection of TALEN mRNA targeting MSTN gene led to production of gene-edited cattle and sheep [205]

In cattle, the CRISPR/Cas9 system was successfully used to clone embryos that could be used to develop livestock transgenes for agricultural science [222]. Hornlessness was introduced into dairy cattle by genome editing and reproductive cloning providing the potential to improve the welfare of millions of cattle [223]. In the cattle industry, gene-edited calves have been produced with specified genetics by ovum pickup, in vitro fertilization and zygote microinjection (OPU-IVF-ZM). The CRISPR/Cas9 system has also been used efficiently to generate gene knock out sheep [224].

In livestock, CRISPR-Cas9 has been greatly enhanced by single-guide RNA generating site-specific DNA breaks through homology-directed repair and used for diverse applications, from disease modelling of individual loci to parallelized loss-of-function screens of thousands of regulatory elements [225]. Equally, bioinformatics designs for CRISPR deletions are now possible with a tool known as CRISPETa developed with efficient CRISPR deletion of an enhancer and exonic fragment of MALAT1, a lncRNA. CRISPETa can be used for single target regions or thousands of targets and has high-coverage library designs for entire classes of non-coding elements which can be adopted for use in livestock species [226]. CRISPR-Cas9 may be used with a gene drive incorporated with genome edit to investigate the control of any biological process and can be used to accelerate livestock breeding [225]. Gene drives can be constructed with the use of CRISPR-Cas9 tool that can favour the inheritance of edited alleles possible to modify a whole population [227]. In the DNA, a double strand break can be initiated by a gene drive during the copying process. Using the sequence of the chromosome containing the gene drive elements as a repair template, the DNA break could be repaired by cellular pathways such as homology-directed repair [228]. Editing the genomic DNA elements targeting non-coding regions is vital since silencing of ncRNA genes using RNA interference tools still presents major challenges. An improved vector system adapted to delete non-protein-coding regulatory elements; double excision CRISPR Knockout (DECKO) using two-step cloning to produce vectors (lentivirus) with two guide RNAs concurrently [229], has been used effectively to silenced five ncRNAs (miRNAs-miR21, miR29a and lncRNAs-UCA1 and MALAT1) [230]. The use of genome editing technologies will create novel viewpoints for enquiry to advance our knowledge on biological function of ncRNAs in livestock species and facilitate creating animals with precise alterations.

7. Conclusion and remarks

With the application of next generation sequencing technologies, the number of ncRNAs reported in livestock species has increased dramatically in the last 5 years. Various tools and pipelines have been introduced to make sense out of ncRNA sequence data. This chapter has provided a comprehensive overview of the current and emerging tools and methods for generating and analyzing ncRNA (miRNA, lncRNA as well as other small ncRNAs) sequence data (transcriptome) with special emphases on the tools that can be applied to livestock species. While bioinformatics tools for miRNA analyses are quite mature, there is a general lack of comprehensive bioinformatics tools for lncRNA and other small ncRNAs. It is our belief that comprehensive “omics” databases that integrate existing and future ncRNA transcriptome databases in the framework of livestock species will contribute towards elucidation of the ambiguity surrounding RNA sequence data. Moreover, given the fact that several emerging platforms (such as genome editing tools) for understanding ncRNAs have been introduced recently, these tools certainly bring great opportunities for broader and also deeper exploration of ncRNA functions. In addition, meticulous in silico prediction and careful interpretation of results are critical when handling ncRNA sequence data. Finally, wet-lab validation of the results of transcriptome data will be vital to confirm the functions of ncRNAs in livestock species.

Acknowledgments

We acknowledge financial support from Agriculture and Agri-Food Canada.

© 2017 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Duy N. Do, Pier-Luc Dudemaine, Bridget Fomenky and Eveline M. Ibeagha-Awemu (September 13th 2017). Transcriptome Analysis of Non‐Coding RNAs in Livestock Species: Elucidating the Ambiguity, Applications of RNA-Seq and Omics Strategies - From Microorganisms to Human Health, Fabio A. Marchi, Priscila D.R. Cirillo and Elvis C. Mateo, IntechOpen, DOI: 10.5772/intechopen.69872. Available from:

chapter statistics

704total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

Transcriptome Sequencing for Precise and Accurate Measurement of Transcripts and Accessibility of TCGA for Cancer Datasets and Analysis

By Bijesh George, Vivekanand Ashokachandran, Aswathy Mary Paul and Reshmi Girijadevi

Related Book

First chapter

Photo- and Free Radical-Mediated Oxidation of Lipid Components During the Senescence of Phototrophic Organisms

By Jean-François Rontani

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us