Open access peer-reviewed chapter - ONLINE FIRST

Transcriptome Analysis Using RNA Sequencing for Finding Genes Related to Fiber in Cotton: A Review

Written By

Shalini P. Etukuri, Varsha C. Anche, Mirzakamol S. Ayubov, Lloyd T. Walker and Venkateswara R. Sripathi

Submitted: March 18th, 2022 Reviewed: March 21st, 2022 Published: May 5th, 2022

DOI: 10.5772/intechopen.104572

Cotton Edited by Ibrokhim Y. Abdurakhmonov

From the Edited Volume

Cotton [Working Title]

Prof. Ibrokhim Y. Abdurakhmonov

Chapter metrics overview

13 Chapter Downloads

View Full Metrics


The cotton crop is economically important and primarily grown for its fiber. Although the genus Gossypium consists of over 50 species, only four domesticated species produce spinnable fiber. However, the genes determine the molecular phenotype of fiber, and variation in their expression primarily contributes to associated phenotypic changes. Transcriptome analyses can elucidate the similarity or variation in gene expression (GE) among organisms at a given time or a circumstance. Even though several algorithms are available for analyzing such high-throughput data generated from RNA Sequencing (RNA-Seq), a reliable pipeline that includes a combination of tools such as an aligner for read mapping, an assembler for quantitating full-length transcripts, a differential gene expression (DGE) package for identifying differences in the transcripts across the samples, a gene ontology tool for assigning function, and enrichment and pathway mapping tools for finding interrelationships between genes based on their associated functions are needed. Therefore, this chapter first introduces the cotton crop, fiber phenotype, transcriptome, then discusses the basic RNA-Seq pipeline and later emphasizes various transcriptome analyses studies focused on genes associated with fiber quality and its attributes.


  • Gossypium
  • species
  • gene
  • expression
  • sequencing
  • fiber
  • transcriptome
  • and RNA-Seq

1. Introduction

Cotton is a globally and economically important fiber, oil, and protein source. It belongs to the family Malvaceae and genus Gossypiumwith more than 50 species, but only four domesticated species produce spinnable fiber. Of these, G. hirsutumand G. barbadenseare allopolyploids that originated in the United States. The remaining two species are diploids (G. herbaceumand G. arboreum) from Africa and/or Asia [1]. Genus Gossypiumconsists of one tetraploid (AD) and eight diploid genome groups (A – G and K). It is believed that the allotetraploid, upland cotton, G. hirsutum(AtAtDtDt; ~2.5 Gb) is most likely evolved from the diploid A-genome ancestor, G. arboreum(A2A2; ~1.8 Gb) or G. herbaceum(A1A1; ~1.6 Gb), and the diploid D-genome progenitor, G. raimondii(D5D5; ~900 Mb) [1, 2, 3]. According to the United States Department of Agriculture (USDA) and Foreign Agricultural Service (FAS), the acreage, yield, and production projections for the year 2021–2022 across the world and the United States are ~32 and ~4 million hectares, ~810 and ~950 kilograms per hectare, and ~120 and ~18 million 480-lb bales, respectively [4]. However, the United Nations (UN) projected that the global population would surpass the mark of 10 billion by 2050 [5]. Therefore, to meet the clothing needs of a constantly expanding population, the yield or production must be increased by developing or employing better cotton crop improvement strategies.

The percent composition of cotton fibers includes cellulose (94%), waxes (0.6%), pectin (0.9%), proteins (1.3%), minerals (1.2%), organic acids (0.8%), sugars (0.3%), and miscellaneous (0.9%) substances [6, 7, 8]. Cellulose is a homopolymer containing repeated units of β-(1→4)-D-anhydroglycopyranose. Polymerization and crystallinity of these units impart strength to the fiber [6, 7, 8]. Cotton fibers are chemically composed of five layers: (i) cuticle, (ii) primary wall, (iii) winding (transition) layer, (iv) secondary wall, and (v) lumen. The outermost layer of the cotton fiber is the cuticle, and it is composed of waxes (cutin and suberin), pectins, proteins, sugars, ash, and other substances. The primary wall and winding layer comprise amorphous cellulose, hemicelluloses, esterified and non-esterified pectins, proteins, and metal ions. The secondary cell wall (SCW) is made of crystallinity cellulose. Finally, the innermost layer is the lumen, and it comprises proteins, malic, citric, and other organic acids [6, 7, 8]. Mature cotton fibers are cellulose-rich with a thicker secondary wall and smaller lumen, while immature fibers are low in cellulose content with a thin wall and a large lumen. Based on the length, the mature cotton fibers can be classified as lint (long) and fuzz (short) fiber [9]. The unicellular cotton fibers emerge from the ovule surface immediately after flowering (days post-anthesis, DPA). The temporal progression of fiber development occurs in approximately 50 days, and it can be divided into four overlapping phases based on morphological features: (i) initiation (0–4 DPA), (ii) elongation (3–21 DPA), (iii) SCW formation and thickening (15–40 DPA), and (iv) maturation (38–52 DPA) [10]. However, genotype and environmental interactions affect the duration of phases and the rate of progression. Moreover, an array of transcripts expressed at these phases vary substantially. Thus, enabling us to study these differences in gene expression (GE).

The term transcriptome is referred to as a repertoire of RNA types found in a cell or a tissue or an organism at a given period or a collection point or a location or a specific treatment or a developmental phase or an environmental condition or a physiological state [11, 12]. The transcriptome is dynamic, and it tends to respond to subtle changes in environment or experimental condition or treatment [11, 12]. The earlier GE or transcript analysis methods such as in situhybridization (ISH), northern blot, Quantitative reverse transcription PCR (RT-qPCR), microarrays, serial gene expression analysis (SAGE), and expressed sequence tags (EST) analysis primarily focused on a single gene or a group of genes, while RNA-Seq or whole transcriptome sequencing (WTS) technique aimed at studying a wide range of transcripts [11, 13]. WTS is a contemporary and more promising strategy that has been widely used in studying the similarity or variability of GE, depending on the objective of the study. Although microarrays is an inexpensive and user-friendly method with a decent throughput, the fundamental limitation is the requirement of prior knowledge for immobilizing a probe set on the chip. RNA-Seq has largely replaced microarrays as the predominant method for quantifying GE due to its significant advantages in providing a more comprehensive overview of the transcriptome [14, 15]. RNA-Seq can be used to study differential expressed genes (DEGs), a dynamic range of transcripts including, novel transcripts, alternative splice variants (ASVs), novel isoforms, and gene fusions, non-coding RNA (ncRNA), and single nucleotide variants (SNVs) from a complex landscape of transcripts collected from a sample [11, 15].

RNA-Seq is also the method of choice because of its throughput, ease, and portability in using hypothesis-free experimental designs. RNA-Seq has been widely used in profiling the transcriptomes of diverse crop species, including cotton [3] and its fiber [16]. In cotton, RNA-Seq based differential gene expression (DGE) for fiber-related genes among diverse species [17, 18, 19], genotypes [20, 21], and in various biotic [22, 23] and abiotic [24, 25] stresses have been reported. The inherent variation in the biochemical composition of the cotton fiber, and the temporal and spatial GE in fiber development phases, offers broad scope for analyzing the fiber transcriptome using RNA-Seq. Combining biochemical, molecular, and microscopic approaches to analyze cotton fiber showed that the significantly enriched genes were associated with the cytoskeleton, cell wall, cellulose biosynthesis, carbohydrate, and energy metabolism during fiber development [26, 27]. Since the fiber transcriptome has been reported [28], various transcriptome-based studies reported in cotton have shown the differential expression (DE) of several genes among different phases of fiber development [26]. Further, RNA-Seq enabled researchers to investigate a comprehensive set of genes involved in fiber yield and quality [20, 29, 30]. This chapter reviews the basic RNA-Seq analysis pipeline and tools currently being used in Section 2 below. Also, a few RNA-Seq studies primarily aimed at fiber quality attributes such as fiber length, strength, development, initiation, elongation, and color are discussed in Section 3. However, fiber single-cell transcriptome analysis is beyond the scope of this chapter.


2. RNA-seq analysis pipeline

RNA-Seq is a complex subject with several aspects to be considered. A few key factors include experimental design, sequencing, and data analysis. This section provides an overview of the experimental design and primarily focuses on the basic pipeline of RNA-Seq analysis.

2.1 Experimental design

The time, effort, and cost of RNA-Seq analysis primarily rely on the experimental design [31]. A single sample on a short-read (Illumina) and long-read (PacBio SMRT/Oxford Nanopore) sequencing including RNA-Seq are offered as low as $100 [32]. However, the sequencing costs vary with the experimental design (control vs. treatment; normal vs. condition; early vs. late collection time point; and the number of samples vs. the number of replicates), throughput (30 vs.100 million reads), NGS platform (Illumina vs. PacBio), sequencing time (today vs. a year later), and setup (academic vs. commercial). In addition, there are several factors one needs to consider before RNA-Sequencing that include the size of the genome (e.g., ~2.5 Gb), the number of genes (e.g., ~50,000), gene density (e.g., 20 genes/1 Mb), the targeted coverage (1X vs. 10X), sequencing chemistry (short-read vs. long-read), the read type (single-end vs. paired-end), the library type (stranded vs. non-stranded), the number of reads (e.g., >30 million) per sample, and the number of samples per lane (1 vs. 12) [11]. It is ideal for including six biological/technical replicates per sample in any experiment [33]. However, in most cases, a minimum of three replicates are used to draw statistically significant conclusions [11].

Different variations in RNA-Seq methodology are available, and they are primarily based on coding (e.g., mRNA-Seq) and non-coding regions (e.g., small RNA-Seq). Besides total RNA-Seq, targeted RNA-Seq, digital gene expression (DGE)-Seq, and single-cell RNA-Seq (scRNA-Seq) are widely used [11], but only mRNA-Seq is highlighted in this chapter as it represents the coding portion of the genome. A typical short-read-based RNA-Seq experiment generally starts by selecting a population of cells or tissue, then extracting total RNA, enrichment of mRNA, fragmenting mRNA, converting fragments into cDNAs, adding adapters to cDNAs, creating a library, sequencing (Illumina), and data analysis [11]. The detailed RNA-Seq methodology has been reviewed earlier [15, 34, 35, 36], and it is beyond the scope of this chapter.

2.2 Read processing and quality check

After collecting the raw read data from an NGS platform, we must consider several metrics. The most important statistic is Phred quality (numeric) scores calculated at each position or base within the read. Phred score indicates the probability of base calling at each position is likely accurate. In addition, the Phred score is the mean value calculated at each position for all reads in that sample. For instance, in a read library of 100 bases, the Phred score is the mean value calculated across each sample within the reads between positions 0 and 100. Therefore, a higher Phred score value is expected at each position in a read and across all reads in a sample as an indicator of quality data. The second quality metric to consider is the adapter content, the percent of sequences containing the adapter. It is often detected in silicoby checking for the adapter contamination at each read position and across all reads in a sample. Sometimes, adapter contamination is seen towards the end of the read. Especially when the sequencing fragment (cDNA) is too short, the DNA polymerase continues to sequence until the end of the fragment and into the opposite adapter. Another quality metric to consider is GC content, in which we expect to see a single peak with a smooth progression without multiple peaks. The presence of multiple peaks in GC content represents the possible contamination during the library preparation.

RNA-Seq is primarily performed in core laboratories or offered as a charge per service. However, it is obvious to find traces of ribosomal RNA from humans or other species, if contaminated during the sample handling and processing. Before analysis, we can eliminate such contamination issues by comparing query sequences against the reference databases such as DeconSeq [37] and PrinSeq [38]. Some efficient algorithms like FASTQC [39] will take a random subset of the reads from each sample and map them to various possible contaminant datasets to determine the quality. A clean RNA-Seq data should have the vast majority of reads (80–90%) from the organism being sequenced, and the remaining reads (10–20%) can be from related species owing to their homology. Also, the genomic origin of the reads is determined by their exonic proportion. In mRNA-Sequencing, ~80% of the reads must have exonic regions, and the remaining (~20%) can be intronic or intergenic regions. Before performing the alignment, raw reads collected are preprocessed for filtering low-quality reads and removing non-biological sequences such as adapter sequences, barcodes, and indexes using trimming tools such as Cutadapt [40], TrimGalore [41], Scythe [42], Trimmomatic [43], HTStream [44], and BBduk [45]. The popular tools for quality checking of raw reads before and after trimming include iSeqQC [46], MultiQC [47], NGS QC [48], FASTQC [39], and FASTX-Toolkit [49].

2.3 Read alignment

The alignment approaches available for both short and long reads for alignment and quantification are: (i) traditional reference-based splice-aware alignment, and (ii) pseudo alignment. The reads are mapped directly to the reference genome in reference-based alignment without ignoring the splice junctions across the exons. For instance, pre-mRNA contains exons and introns, but the processed mRNA joins the spliced-out exons. So, it is inevitable to find some short reads that intersect these exon-exon junctions that do not map directly back to the reference genome. The selected aligner must be aware of such spliced products and junctions. It is important to realize that if our goal is to discover novel isoforms, we must consider exon-exon junctions. These methods typically run-on clusters because they require a large amount of memory or central processing unit (CPU) time. There are several read-alignment tools available for both short and long reads. The popular reference-based read alignment tools are PuffAligner [50], STAR [51], HISAT2 [52], HTSeq [53], TopHat2 [54], and Bowtie2 [55].

The other relatively advanced approach in RNA-Seq is pseudo alignment. In which, the mapper joins reads together based on their compatibility with the transcripts and not based on precise alignment, as its primary goal is to quantify the transcript expression. The main idea of pseudo alignment is to ignore the location of mapped reads and to consider only the aligned reads. Computationally, pseudo alignment is more efficient, faster, less memory intensive, and a better prediction tool than reference-based alignment. However, a transcriptome or a transcript repertoire related to an organism of our interest is used as a reference in pseudo alignment. A splice-aware aligner is not required here, as reads are mapped to a reference transcriptome but not the genome. The popular pseudo alignment tools such as Salmon [56], Kallisto [57], and Sailfish [58] are currently being used. A few tools available for checking the alignment quality are QC3 [59], QoRTs [60], Qualimap [61], RseQC [62], and RNA-SeQC [63]. Further, %GC, base quality, and mapping efficiency of aligned reads are assessed along with distribution of read count, insert size, and depth to detect sample bias resulting from library preparation.

2.4 Transcript abundance estimation and quantification

In a standard RNA-Seq pipeline, read quantification is performed after filtering low-quality reads, aligning them to an annotated genome or the transcriptome to identify their genomic origin. The standard alignment and counting methods mainly relied on base-to-base alignment or by mapping to an annotated or unannotated genome. The major limitation of standard tools is that genes often have multi-mapped reads. As a result, the algorithms such as STAR [51], HTSeq [53], and Tophat2 [54] underestimate gene expression (GE), thus resulting in false negatives. In contrast, the Cufflinks [64] overestimates GE and results in many false positives (FPs). Most transcriptome-based tools avoid base-to-base alignment of the reads, thus reducing the computational time and costs. Also, transcriptome-based tools provide quantification estimates much faster and more accurately at the transcript level. Less memory intensive tools such as Salmon [56], Kallisto [57], and Sailfish [58] are used for transcript quantification and abundance estimation with minor differences. For instance, Salmon first quantifies pseudo-counts, then quasi-mapping, and finally estimates transcript abundance. The pseudo-counts obtained can be used to find the differential gene or isoform-level expression. A few tools available for transcript abundance estimation and quantification are StringTie [65], STAR [51], tximport [66], DESeq2 [67], and Cufflinks [64].

The most straightforward approach for quantifying GE by RNA-Seq is to count the reads that align with each gene. The gene-level quantification approaches such as HTSeq commonly utilize annotated information, where gene models correspond to the structure of transcripts. Raw read counts are usually affected by transcript length and the total number of reads. For instance, the longer transcripts have higher read counts at the same expression level. Thus, the raw read counts are normalized to compare expression levels between samples. The reads per kilobase (kb) of the exon per million mapped reads (RPKM) are used to normalize the single-end read data for sequencing depth and gene length differences. While fragments per kb of transcript per million reads mapped (FPKM) is used to normalize the paired-end read data for differences in sequencing depth and gene length. In contrast to RPKM and FPKM, transcripts per million reads (TPM) is used to normalize the differences in gene length first and library size later [68]. Therefore, correcting gene length within the same gene across samples is avoided. However, it is required to precisely rank the GE levels within the sample to accurately report longer genes with relatively more reads at the same expression level.

2.5 Finding differentially expressed genes

Normalized read count data is taken in differential expression (DE) analysis, and statistical analysis is performed to identify quantitative changes in GE levels between experimental groups. For instance, statistical testing determines whether the observed differences in read counts are significant compared to natural random variation. Often the selection of the analysis tool depends on the experimental design and availability. We can use DE analysis tools for pair-wise or multiple comparisons between or among the samples. When the same GE contribution is observed in several samples, their average value is taken as the eventual GE level. Testing for DE across thousands of genes requires correction for multiple comparisons. The two common ways in statistics for correction are Bonferroni correction and false discovery rate (FDR). FDR is the most widely adopted approach in RNA-Seq as it operates on the whole population and aims to keep the false positive rate below the acceptable threshold (<5%). The upregulated or downregulated DEGs are typically represented using volcano plots or MA plots. Top-ranked significant (p-value < 0.05; Log2 Fold Change, Log2FC > 1.00; and FDR < 0.05) DEGs are usually shown as heatmaps. The average linkage method is used to compute the hierarchical clustering, whereas the euclidean algorithm computes the closeness or distance between rows and columns. Dimensionality reduction on expression data is obtained to eliminate outliers and batch effects. Principal component analysis (PCA) is most commonly used as it reduces the complexity of expression data by showing relationships among samples or replicates as clusters in two-dimensional space.

Even though different tools such as Glimma [69], Ballgown [70], EBSeq [71], limma [72], voom [73], DESeq2 [67], edgeR [74], and baySeq [75] are available for DE analysis, DESeq2 and edgeR were most widely used. DESeq2 normalizes the gene read counts by library size and composition to avoid sampling bias and batch effects. In addition, it models gene read counts with the negative binomial distribution and uses hierarchical modeling to stabilize the gene variance. Further, it uses the Benjamini-Hochberg (BH) statistic to calculate the false discovery rate. Although DESeq2 and edgeR rely on the negative binomial distribution assumption, they differ in the test statistic. For example, DESeq2 depends on the Wald test, and edgeR relies on the quasi-likelihood F-test. Also, the distribution assumption of Bayesian approaches, baySeq, and EBSeq is the negative binomial model. While limma, voom, limma+voom tools use a normal linear model. The test statistic used for these tools is empirical Bayes moderated t-statistic. In comparison, Ballgown uses a nested linear model and parametric F-test. In addition, quality control on expression data is determined using tools such as iSeqQC [46], DEGreport [76], NOISeq [77], and EDASeq [78] for detecting the sample heterogeneity, outliers, and cross-sample contamination. These tools mostly rely on statistical approaches such as correlation analysis and dimensionality reduction.

2.6 Functional annotation

After finding DEGs or gene clusters, assigning a function is commonly employed. These genes or gene sets are screened to see whether they are enriched in a particular pathway, localized to a specific cell location, or have a specific function. Based on these features, the DEGs are classified into Biological Process (BP), Cellular Component (CC), and Molecular Function (MF) [79]. In gene ontology (GO) analysis, initially, an individual or a set of DEGs are assigned with functions or GO terms, and then the enrichment analysis is performed on gene sets. Finally, filters lowly expressed genes to reduce the number of hypotheses to be tested. For instance, given a set of DEGs that are upregulated among samples under a particular condition, an enrichment analysis will find GO terms that are overrepresented or underrepresented using functional annotations for that gene set. For example, we can use the goana and camera functions in the limma Bioconductor package to find the most enriched GO terms on the gene sets and enrichment analysis. A few routinely used functional annotation and enrichment tools include: Panther [80], FoldGO [81], DAVID [82], ReviGO [83], and AmiGO [84].

2.7 Enrichment and pathway analysis

The three methods used to assess the gene sets or pathways are enrichment-based, pathway topology-based, and combined. The enrichment-based approaches, over-representation (ORA) and functional class scoring (FCS) analysis/gene set enrichment analysis (GSEA) are widely used. At the same time, pathway topology tools help us understand GE as a set in a coordinated network [85]. While the combined approach utilizes the features of both enrichment-based and pathway topology approaches.

A typical ORA pipeline includes DE analysis to find the number of DEGs and their reference genes associated with each pathway. ORA is simple and robust in identifying a few significant genes or gene-sets, i.e., it relies on a portion of the data. As the background assumption is based on low-input, independent genes or gene sets in a pathway are treated as separate entities, and the interaction among the genes or gene clusters are ignored, it may result in many false positives. The GSEA is more accurate than ORA as the entire list of genes is considered. A typical GSEA first enriches significant genes and gene sets based on their P-values, rank order, and weighted scoring, and then identifies independent pathways. However, GSEA also ignores the interaction between the gene sets or pathways. A few widely used pathway tools are Cytoscape [86], BioCyc [87], and EcoCyc [88]. Pathway topology is developed to mimic the biological perspective as, in reality, genes work in a coordinated or regulated environment in the form of networks or pathways. The idea is to perturb a pathway and thus leverage the topology to study the effect on a single gene or gene set. Pathway topology analysis predicts the gene function, gene position, fold change, and interactions among genes. It relies on more data and is computationally intensive, and it is currently limited to signaling pathways alone. A few integrated tools such as iDEP [89], ingenuity pathway analysis [90], and ipathway guide [91] are gaining more attention recently with the availability of cloud-based and data science tools in RNA-Seq analysis. We can visualize the up-regulated and down-regulated genes on a Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway using the Pathview package [92] and KEGG mapper [93]. Protein-protein interactions (PPIs) and their enrichment among up-regulated or down-regulated genes can be retrieved using the STRING database [94].

Most transcriptome analysis studies include a combination of tools discussed above based on the objective or biological question under investigation. In general, the upstream analysis includes read processing and alignment, while the downstream analysis includes quantification, annotation, enrichment, and pathway assignment. A few recent RNA-Seq studies in identifying DEGs associated with fiber quality and its characteristics are discussed in the next section.


3. Transcriptome analysis in cotton for finding fiber related genes and pathways

3.1 Transcriptome analysis for fiber quality

Several attributes that determine fiber quality include fiber length (mm), strength (cN/tex), development: initiation and elongation (%), uniformity (%), and micronaire (μg/inch). Fiber strength and length are important in deciding the spinning and yarn quality [95]. In addition, the micronaire value reflects the fiber fineness and maturity, which influences its processing and dyeing [96]. We reviewed a few recent RNA-Seq studies in identifying DEGs associated with fiber quality attributes below (Table 1).

S. NoSignificant DEGs (~5) identifiedAssociated function/pathwayCitation
1ABC transporter G family member 10 (Gh_A01G0397); Protein DETOXIFICATION 19 (Gh_A01G0453); RING-H2 finger protein (Gh_A09G2087); Probable ubiquitin-conjugating enzyme 24 (Gh_A10G0253); and Aquaporin TIP1-1 (Gh_D04G1049)Secondary metabolites biosynthesis[97]
2Heat shock protein 90-6 (Gh_A07G1723); Nuclear pore complex protein 133 (Gh_A07G1727); Transcriptional regulator STERILE APETALA (Gh_A07G1730); SNF1-related protein kinase regulatory subunit beta-2 (Gh_A07G1735); and Fasciclin-like arabinogalactan protein 2 (Gh_A07G1742)Primary and secondary cell wall (SCW) synthesis[98]
3Aldehydedehydrogenases (Gh_A06G1256); xyloglucan endotransglycosylase/hydrolase 8 (Gh_A06G1277); glycosylphosphatidylinositol (GPI)-anchored protein (Gh_A06G1301); and Aminomethyltransferase (Gh_A06G1313)Cell wall and cellulose biosynthesis[30]

Table 1.

Transcriptome analysis for fiber quality.

In an integrated study, a high-density mapping has been used to identify 36 stable and 18 novel quantitative trait locus (QTLs) associated with fiber quality in the CCR170 RIL hybrid generated from sGK156 and 901-001 varieties of G. hirsutum. Their RNA-Seq analysis included these two parental types and two RILs (MBZ70-053 and MBZ70-236) to identify 24,941 unique and 473 DEGs associated with pectin and phenylpropanoid biosynthesis and plant hormone signaling. Their bioinformatics analysis included Trimmomatic for trimming reads, HISAT2 for read alignment, StringTie for quantification of genes, DESeq2 for differential gene expression, BLASTX program and GO tools for gene set enrichment, and KAAS for pathway analysis [99]. In a different transcriptome-based study, a high yielding cotton cultivar Jimian 5 and a high fiber quality G. thurberiintrogression line, DH962 have been used to identify 780 DEGs linked with fiber quality at 10 DPA [97]. Also, their study integrated DEGs from transcriptome data and QTLs from phenotypic data to identify 31 genes associated with nine QTLs. Further, their study included Bowtie and TopHat tools for aligning clean reads to the reference genome, Cufflinks for transcript prediction, DESeq for DEGs, OmicShare tools for functional annotation, and Cotton Functional Genomics Database for finding pathways associated with the significantly enriched DEGs [97].

In a different study, RNA-Seq analysis has been performed on data obtained from different TM-1 tissues from NCBI to identify genes controlling fiber quality. Results showed that 91 genes had been expressed at different fiber developmental stages (5, 10, 20, and 25 DPA). Functional annotation of these genes using GO analysis revealed that most of the genes have been involved in binding and enzymatic activity [98]. Further, their study revealed 11 candidate genes for fiber quality linked with the genomic locations of chromosomes, A07 and A13, by combining the genome-wide association data (GWAS) with publicly available RNA-Seq datasets [98]. A transcriptome study has been conducted between two recombinant inbred lines, L1 and L2, with a varied fiber quality, to underline the differences in gene expression (GE) during fiber development stages and identify the genes responsible for the fiber quality in upland cotton (G. hirsutum) [100]. Their RNA-Seq analysis utilized Trimmomatic, Bowtie, TopHat2, Cufflinks, and DESeq2 tools to find over 1000 DEGs between L1 and L2 at 15, 20, 25, and 30 DPA. Among these, 363 DEGs colocalized within the fiber strength QTL. Further, DEGs have been annotated, and pathways were assigned by STEM, BLASTX, KOBAS, WGCNA, and Cytoscape tools. In addition, their co-expression network analysis revealed five modules closely associated with fiber-development stages. The significantly enriched genes belonged to leucine-rich repeat (LRR) receptor-like protein kinase, Rho GTPase-activating protein, bHLH transcription factor (TF), TPX2 protein, and actin-1 protein classes [100]. In a separate study, by integrating fine-mapping data with RNA-Seq, four DEGs linked with a QTL responsible for fiber quality traits have been identified in G. hirsutumRIL118 and Yumian 1 lines [30]. Their analysis included Bowtie and TopHat for mapping reads to the reference genome, HTSeq for transcript abundance estimation, DEGSeq for finding DEGs, KEGG, and KOBAS for gene enrichment analysis and finding KEGG pathways [30].

3.2 Transcriptome analysis for fiber length

The cotton fiber length and other attributes such as strength, elongation, and evenness determine the spinning or yarn quality. However, the cotton fiber length varies from one variety to the other and generally ranges between 0.9 and 1.6 inches. Longer fibers are preferred over shorter ones in the textile industry due to their uniformity, fineness, and strength. However, the fiber length is determined by the underlying molecular mechanisms, including gene expression and regulation. Recent RNA-Seq studies related to fiber length attribute are presented below (Table 2).

S. NoSignificant DEGs (~5) identifiedAssociated function/pathwayCitation
1Homeobox protein knotted-1-like (Gh_D12G244300); Cellulose synthase A catalytic subunit 4 (Gh_A07G229300 and Gh_A08G054700); Transcription factor MYB46 (Gh_A13G243300); and Cellulose synthase A catalytic subunit 8 (Gh_D05G155000)Modulating cellulose synthesis[101]
24-coumarate—CoA ligase-like 7 (Gh_D03G1318); DNA ligase 1 (Gh_D03G1330); Protein UPSTREAM OF FLC (Gh_D03G1331); E3 ubiquitin-protein ligase 1 (Gh_D03G1332); and Protein CHUP1 (Gh_D03G1337)Sucrose synthesis[102]
31-aminocyclopropane-1-carboxylate oxidase-like (Gh_A08G1512); 3-ketoacyl-CoA synthase 19 (Gh_A06G1558); gibberellin-regulated family protein 2 (Gh_A06G0045); Ethylene responsive element binding factor 3 (Gh_A04G0106); and zinc finger transcription factor constans 21 (Gh_D06G0023)Secondary metabolites biosynthesis[103]

Table 2.

Transcriptome analysis for fiber length.

Integrated transcriptome and genotyping study aimed at deciphering molecular mechanisms associated with cotton fiber length identified 2662 significant DEGs that belonged to energy metabolism during fiber initiation and auxin signaling pathway during fiber elongation by utilizing ovule and fiber samples collected at −3, 0, 5, and 10 DPA from two contrasting RILs (MBZ70-053 and MBZ70-236) of G. hirsutumhybrid, CCRI70 [104]. Their pipeline included Trimmomatic for trimming adapters from raw reads, HISAT2 for aligning clean reads to the reference genome, StringTie2 for quantifying genes, DESeq2 for finding DEGs, BLASTX and GO tools for functional annotation, and KAAS, WGCNA, and Cytoscape for gene set enrichment and pathway analysis [104]. A study conducted in Upland cotton investigated the role of class II KNOX protein (GhKNL1) in fiber development, which primarily acts as a transcription repressor in regulating SCW formation. The comparative transcriptome profiling of two transgenic cotton varieties (silenced and dominant repression for GhKNL1 gene) and a genetic standard (TM-1). The GhKNL1 silenced variety showed improved fiber length and thickened SCW, whereas the dominant repression for GhKNL1s showed shortened fiber length and thinner SCW. Furthermore, it has been reported that GhKNL1 could bind to promoters to facilitate cellulose synthesis and SCW development, thus affecting the cotton fiber length [101].

In a study conducted to evaluate Germin-like proteins (GLPs) in regulating cotton fiber development, the RNA-Seq analysis between the wild type and RNAi line for GbGLP1 (YZ-1) gene with an overexpression promoter revealed that higher expression levels of GhGLP1 lead to shortened fibers. Their RNA-Seq analysis identified 566 DEGs in the RNAi lines, while most of them belonged to genes and TFs involved in SCW biosynthesis. Also, comparative transcriptome screening of thirty long and short fiber varieties of cotton revealed that the GhGLP1 promotes fiber elongation by delaying the SCW thickening. Moreover, the YZ-1 knockdown line for the GhGLP1 gene resulted in improved fiber length and retarded SCW thickening, thus suggesting its negative role in fiber elongation [105]. A separate study combined GWAS and linkage mapping identified a sucrose synthesis-like gene linked with a significant QTL on chromosome D03 that affects fiber length in Upland cotton [102]. Their RNA-Seq and qRT-PCR results consistently showed elevated expression levels of eleven candidate genes related to fiber length in G. hirsutum, including the sucrose synthesis gene at -5 DPA to 20 DPA with increments of five. In addition, their study included Bowtie2 for mapping clean reads to the reference genome and cufflinks for obtaining GE levels for analyzing the publicly available data [102].

Comparative transcriptome analysis between G. hirsutumCSB25 line developed for fiber elongation and genetic standard, TM-1 has been performed to evaluate fiber traits. Their data analysis included Tophat2 for aligning clean reads to the reference genome, HTSeq to estimate GE, EdgeR for finding DEGs, GAGE and REVIGO for GO enrichment, and KEGG tools for finding gene sets and pathways. They identified 1872 DEGs in their study, and most of them belonged to cytoskeleton and cell wall metabolism. In addition, their investigation revealed that most of the genes were enriched in plant hormone signaling, phenylpropanoid, amino acid, sucrose, and starch biosynthesis [103].

3.3 Transcriptome analysis for fiber strength

There is a huge demand for stronger cotton fiber in the global textile industry. Individual fiber strength determines the yarn strength. However, fiber strength is often measured as bundle fiber strength (BFS), i.e., grams per tex (g/tex), where ‘g’ is the breaking force, and tex (g/km) is the fineness. BFS is usually not an accurate measure for yarn strength because of the variability in fiber properties and interaction between the fibers. The fiber strength of Upland cotton has been improved considerably through molecular breeding approaches. However, efforts on gene expression and regulation of fiber strength during its development are limited, and a few such studies are discussed here (Table 3).

S. NoSignificant DEGs (~5) identifiedAssociated function/pathwayCitation
1Pentatricopeptide repeat-containing protein (CotAD_13630 and CotAD_22605); methyl-sterol monooxygenase 1-1-like protein (CotAD_19764); homeobox-leucine zipper protein HAT22-like (CotAD_36040); and Phospholipase (CotAD_36045)Secondary metabolites biosynthesis[106]
2Peroxidase (GhPER8); serine/threonine-protein kinase-like protein (GhCCR2); Cinnamyl alcohol dehydrogenase 4 (GhCAD4); Hydroxcinnamoyl-CoA quinate (ChHCT6); and Catechol-O-MethylTransferase family (GhCOMT4)Lignin biosynthesis[107]
3Expansin A10 (Gh_D13G0786); Pectin methylesterase 3 (Gh_A05G1180); Sucrose synthase 5 (Gh_A07G0665); COBRA-like protein 2 precursor (Gh_D12G0298); and Chitinase/ Chitinase-like (Gh_D06G0479)Cell wall modification[108]

Table 3.

Transcriptome analysis for fiber strength genes.

Using comparative fiber transcriptome analysis between G. mustelinumintrogression line (IL9) and its recurrent parent (PD94042), over 250 significantly enriched DEGs associated with the fiber strength QTL have been identified at 17 and 21 DPA. Among which, 52 DEGs have been identified as candidate genes and two DEGs associated fiber strength QTL regions. Their GO enrichment and KEGG analysis showed that most of these DEGs belonged to the biosynthesis of secondary metabolites and metabolic pathways [106]. An RNA-Seq analysis study aimed to understand the molecular mechanism underlying the fiber development and quality included a CSL line (SL7) and G. hirsutumline (L22) to identify 70 significantly enriched DEGs associated with plant hormone transduction pathways. Their findings indicated that the introgressed chromosomal segment of SL7 plays a crucial role in expressing a transcription factor that contributes to the fiber strength [109]. A study that screened publicly available transcriptomic data for bHLH transcription factors found that GhbHLH18 is coexpressed with most lignin biosynthesis genes [107]. Furthermore, they suggested that GhbHLH18 is preferentially expressed during the early fiber elongation and is negatively regulates fiber strength and length by binding to the E-box of its promoter and enhancing peroxidase-mediated (GhPER8) lignin biosynthesis [107].

Transcriptome analysis in chromosome segment substitution lines (CSSLs) revealed 71 significant DEGs associated with fiber strength among four lines, CCRI45, MBI7561, MBI7747, and MBI7285, collected at 15, 20, 25, and 28 DPA [110]. They suggested the possible roles of these genes in cell wall biogenesis, SCW deposition, and cotton fiber strength. Their analysis further identified 16 DEGs consistently found in the introgressed segments from the G. barbadensechromosomes across all possible comparisons. Their data analysis included NGS QC Toolkit for trimming and filtering raw reads, TopHat for aligning clean reads to the reference genome, HTSeq for quantifying expression levels, GFOLD for DEGs, and BLAST2GO for functional annotation and protein class assignment [110]. Comparative transcriptome analysis of two contrasting near-isogenic lines (NILs) of G. hirsutum, MD90ne, and MD52ne for fiber strength at 15 and 20 DPA revealed over 1000 significant DEGs [108]. In addition, the fiber elongation and cell wall integrity genes have been enriched in ethylene and receptor-like kinases (RLKs) signaling pathways. In data processing, they utilized Sickle, GSNAP, Bedtools, EdgeR, and AgriGO tools for trimming, mapping, annotation, differential gene expression, and enrichment analysis, respectively [108]. Further, they compared the RNA-Seq data with previously published microarray data [111].

3.4 Transcriptome analysis for fiber development

Cotton fibers are natural, unicellular outgrowths that emerge from the epidermis of the ovules. The differentiation and developmental phases (initiation, elongation, SCW synthesis, and maturation) of the cotton fiber determine the other attributes such as fiber length and strength. The variation in GE at different developmental stages of cotton fiber can be assessed using RNA-Seq. A few RNA-Seq studies related to fiber development are discussed below (Table 4).

S. NoSignificant DEGs (~5) identifiedAssociated function/pathwayCitation
1Tubulin beta-2 chain 6 (CotAD_51888); Tubulin alpha-3 chain 1 (CotAD_75071); Actin-7 (CotAD_57082); Kinesin-related protein 11 (CotAD_24245); and Clathrin light chain 1 (CotAD_46336)Fatty acid metabolism[26]
2Zinc finger protein 5 (Gh_A03G1255); Histone deacetylase 1 (Gh_D05G0849); UBX domain-containing protein 1 (Gh_D02G2408); Zinc finger protein 10 (Gh_D01G1033); and Protein indeterminate-domain 5 (Gh_A01G2114)Transcriptional and apoptosis regulation[112]
3WRKY transcription factors: WRKY11 (Gh_A07G0017); WRKY33 (Gh_D03G1371); WRKY40 (Gh_A06G1923); WRKY41 (Gh_D08G1232); and Abscisic acid 8-hydroxylase 1 (Gh_A08G1344).Metabolic pathways[113]

Table 4.

Transcriptome analysis for fiber development.

Comparative transcriptome analyses of three fiber developmental stages with non-fiber tissues (leaf, root, stigma, and anther) identified 1205, 1135, and 937 significantly upregulated, and 124, 179, and 213 downregulated DEGs at 7, 14, and 26 DPA, respectively in G. hirsutumduring fiber development. Moreover, the identified DEGs have been enriched in functional and metabolic pathways, including signal transduction, catalytic activity, and carbohydrate metabolism [26]. Their pipeline included cutadapt for collecting quality reads, TopHat2 for aligning clean reads to the reference genome, StringTie for assembling mapped reads into transcripts, HTSeq for quantification of GE, DeSeq for finding DEGs, and GOseq R and KOBAS for finding protein classes and pathways [26]. A study aimed to understand the genes and complex networks associated with cotton fiber development and its domestication utilized G. hirsutumand screened transcriptomes collected at 5, 10, 15, and 20 DPA to reveal convergence and divergence in duplicated and homoeologous coexpression networks. Their analysis included GSNAP for mapping, PolyCat for homoeolog-specific expression, HTseq for read count data, DEseq2 for finding DEGs, and weighted gene coexpression network analysis (WGCNA) for coexpression data to corroborate the idea of widespread gene usage in cotton fibers, subgenome-specific expression bias, and similarities and differences in coexpression modules within the subgenomes of a polyploid [114].

In a study aimed at screening publicly available datasets for myeloblastosis (MYB) like TFs and finding their role in fiber development, a research group has identified 36 R2R3-MYBs highly expressed at 20 DPA in Upland cotton suggested them as potential SCW synthesis regulators [115]. Comparative transcriptome analysis of G. arboreum, G. hirsutum, and transgenic line revealed that the GhTCP4 TF plays an essential role in activating SCW genes by interacting with cis-elements in the promoter region. In contrast, GhHOX3 regulates TCP gene expression, thus promoting fiber cell elongation [116]. A comprehensive genome-wide analysis of G. arboreum, G. raimondii, and G. hirsutumfrom publicly available data revealed 196, 195, and 386 C2H2-like zinc finger genes, respectively. Also, the phylogenetic analysis of C2H2-like zinc finger proteins identified seven subgroups with similar exon-intron and protein motif compositions. Further, the differential expression (DE) pattern of 16 C2H2-like zinc finger genes identified in RNA-Seq data has been validated with RT-qPCR analysis in Ligon-lintless-1 (Li1) mutant and TM-1 at 0, 5, 8, and 10 DPA and suggested the role of these transcription factors in biochemical and physiological functions during cotton fiber development [112]. In another study, comparative transcriptome analysis of phytochrome A1 gene (PHYA1) RNAi line and its parent Coker 312 using RNA-Seq to study the GE profiles of 10 DPA fibers identified 142 DEGs that play an essential role in fiber development. Their pipeline included Trimmomatic for trimming low-quality reads, ArrayStar for aligning clean reads to the reference genome, DeSeq2 for finding DEGs, Blast2GO and InterProScan for functional annotation, and KEGG database for functional protein classes and pathways [113].

3.5 Transcriptome analysis for fiber initiation and elongation

Cotton fibers are single and elongated cells derived from epidermis of seed as external outgrowths. Therefore, cotton fibers are ideal for studying cell developmental stages such as differentiation and elongation. Further, the initiation step is critical in the fiber development process because it is the stage where the cell fate is determined or committed to developing into a fiber. Therefore, fiber initiation and elongation are ideal stages for undertaking RNA-Seq analysis to understand early fiber development, and a few such studies are discussed here (Table 5).

S. NoSignificant DEGs (~5) identifiedAssociated function/pathwayCitation
1Myb-related proteins: Myb 16 (Gh_A01G187100); Myb 306 (Gh_D04G053900); Myb 330 (Gh_A13G137700); Myb 16 (Gh_A05G373500); and Transcription factor WER (Gh_A05G364100)Cell cycle and translational regulation[29]
2Transcription factor LABRA 3, (GB_D11G0963); BURP domain protein RD22 (GB_D05G0504); Protodermal factor 1 (GB_A07G1636); Homeobox-leucine zipper protein PROTODERMAL FACTOR 2 (GB_D06G1803); and Zinc finger protein 8 (GB_D01G1500)Asparagine and cell wall biosynthesis, and stress responses[117]
3Trehalose phosphatase/ synthetase 11 (Gh_D08G0936); Ethylene responsive element binding factor 1 (Gh_D11G0426); Methyl esterase 1 (Gh_D11G1910); Tubulin beta-1 chain 1 (Gh_D01G0939); and Tubulin alpha-2 chain 2 (Gh_D02G2420)Carbohydrate metabolism, ethylene response, and microtubule synthesis[118]

Table 5.

Transcriptome analysis for fiber Initiation and elongation.

A recent study combined the Laser-capture microdissection (LCM) technology with RNA-Seq to understand the cotton cell types during the fiber developmental shifts. LCM can differentiate the epidermal cells from the fiber, while RNA-Seq can identify the subtle differences between these cell types [29]. Their results suggested that the fiber cell initiation in cotton can be triggered by phytohormones and MYB-like transcription factors, cell cycle arrest, ribosome biosynthesis, and homoeolog expression bias of cell cycle and ribosome biosynthetic genes [29]. A recent omics-based study conducted in the ovules collected immediately after anthesis in upland cotton showed DE of several MYB-like TFs and early fiber development genes associated with biosynthesis and signaling of phytohormones, indole-3-acetic acid, cytokinins, gibberellic acid, jasmonic acid, and brassinosteroids [27]. Another RNA-Seq study has been conducted to understand effect of temperature on fuzz fiber initiation in a thermo-sensitive variety of G. barbadense, L7009 subjected temperature stress at 4 DPA to identify 43,826 DEGs. Of these, 189, 9667, 240, and 901 DEGs belonged to plant hormone signal transduction, fiber development, fuzz fiber initiation, and transcription factors, respectively. Also, they reported that high temperatures could induce fiber development, fiber quality, and fuzz initiation. Further, the significantly enriched DEGs belonged to stress response, asparagine, and cell wall biosynthesis. However, the fuzz initiation can be inhibited by low-temperature treatment in L7009. Furthermore, they reported the 4 DPA stage as the most susceptible stage to temperature stress during the fuzz initiation [117].

A genome-wide transcriptome profiling of fiber-bearing ovules of G. arboreumat an increment of 0.5 from -0.5 DPA till 3.0 DPA has been investigated to understand the molecular basis for fiber initiation. A total of 12,049 DEGs and 1049 DE transcription factors have been detected from the analyses. Most identified DEGs belonged to the ribosome and amino acid biosynthesis and carbon metabolism. A few significantly enriched DEGs belonged to fatty acid degradation and flavonoid biosynthesis. Further, during fiber initiation, the significantly induced DE transcription factors belonged to the trihelix family, referred to as GaGTs, and often found on 12 of 13 chromosomes in G. arboreum[119]. In a study aimed at screening the transcriptome profiles for finding variations in fiber initiation and elongation among diverse fiber types of G. hirsutum, for example, long-staple cotton (LSC), short-staple cotton (SSC), long fiber group (LFG), and short-fiber group (SFG) to identify twelve genes in fiber development; among these, glycosyl hydrolase, Pectin lyase-like superfamily protein (PER64), and Pectin lyase (PL) were down-regulated in fiber elongation [118]. Their pipeline included Trimmomatic for filtering low-quality reads, TopHat2 for aligning clean reads to the reference genome, StringTie for assembling mapped reads into transcripts, HTSeq for quantification of GE, DeSeq for finding DEGs, GO for functional annotation, and KEGG for finding protein classes and pathways [118].

3.6 Transcriptome analysis for fiber color

Most cotton (G. hirsutum) fibers produced worldwide are white, despite the lint and fiber of tetraploid cotton (G. barbadense), exhibiting various colors including red, blue, green, and several shades of brown. The fiber color trait in cotton is genetically inherited, resulting from pigments blended with cellulose. Generally, the yields of colored cotton are typically lower, and the fiber is shorter and weaker but softer when compared with white cotton. However, the fiber qualities such as fiber length, strength, and color have been improved in hybrids between G. barbadenseand G. hirsutum. More recently, colored cotton fiber has gained importance due to its unique and desirable characteristics and emerged as an eco-friendly dye-free textile material. A few recent RNA-Seq studies focussed on fiber color-related genes are presented below (Table 6). Using multi-omic approaches (metabolome and RNA-Seq analysis), the biochemical and regulatory roles of genes involved in light-induced green color formation in cotton have been reported [120]. Their study compared early initiation (15 DPA) and late accumulated (45 DPA) metabolites under different lighting conditions and identified 236 differential metabolites. Among which, 20% of metabolites belonged to the phenylpropanoid pathway. Their RNA-Seq analysis included gene set enrichment and KEGG pathway analysis to identify genes and regulatory networks linked with light-induced fiber color formation. These networks are highly correlated with the corresponding phenylpropanoid metabolites [120].

S. NoSignificant DEGs (~5) identifiedAssociated function/pathwayCitation
1Cryptochrome 1 (Gh_A05G1941); Asparagine synthetase 1 (Gh_A07G0951); E3 ubiquitin-protein ligase 1 (Gh_A10G0098); Glutamate dehydrogenase 1 (Gh_A11G2354); and Mannose-6-phosphate isomerase 2 (Gh_A12G0488)Phenylpropanoid pathway[120]
2Phenylalanine ammonia-lyase 6 (Gh_D10G2528); Cytochrome P450 84A (Gh_D11G1805); Aldehyde dehydrogenase family 2 member 4 (Gh_D07G0047); Flavonoid 3',5'-methyltransferase (Gh_D04G1818); and Shikimate O- hydroxycinnamoyltransferase 1 (Gh_A05G1005)Phenylpropanoid pathway[121]
3Shikimate O-hydroxycinnamoyltransferase (Gorai.009G122600); Cinnamoyl-CoA reductase 2 (Gorai.003G055600); Flavonoid 3-O-glucosyltransferase (Gorai.012G009500); Chalcone synthase 1 (Gorai.005G035100); and Dihydroflavonol-4-reductase (Gorai.009G200600)Flavonoid and phenylpropanoid pathways[122]

Table 6.

Transcriptome analysis for fiber color.

Another study compared transcriptomes and metabolomes of Green Colored Fiber (GCF) accession and its near-isogenic line, White Colored Fiber (WCF) at 12, 18, and 24 DPA, to identify 2047 non-redundant metabolites enriched in eighty pathways, including biosynthesis of phenylpropanoid, wax, cutin, and suberin [121]. Their metabolome analysis identified higher levels of metabolites (sinapaldehyde) linked with the phenylpropanoid pathway in the GCF line compared with the WCF phenotype. Moreover, the metabolites identified in their study overlapped with the transcriptome analysis showing significant up-regulation of the genes responsible for the biosynthesis of select metabolites. The WGCNA analysis on DEGs identified between GCF and WCF has shown 16 gene modules co-expressed with fiber color at selected time points. At a visually different fiber color stage between GCF and WCF, the blue module at 24 DPA was of prime importance due to the upregulation of 56 hub and two homoeologous Gh4CL4 genes that have a potential role in green pigment biosynthesis [121]. A study aimed at understanding the gene expression and regulation of the pigment biosynthesis generated RNAi lines for the chalcone flavanone isomerase gene in the brown-colored fiber (BCF) line [122]. In addition, they compared the transcriptome profiles of BCF with its transgenic fiber phenotypes, white and green, to identify 13 significantly enriched DEGs in flavonoid and phenylpropanoid pathways [122].


4. Conclusions

In conclusion, comprehensive phenotyping, genotyping, and transcriptome approaches coupled with integrated bioinformatics pipelines have considerably improved our understanding of genes associated with fiber quality and yield traits. However, besides the genes related to the fiber quality characteristics discussed in this chapter, fiber uniformity, fineness, and micronaire attributes must also be considered in cotton germplasm improvement programs. Further, functional annotation, enrichment, and gene network analyses tools will continue to evolve with better features to visualize subtle changes in gene expression associated with biological pathways. Furthermore, to better understand complex traits (e.g., fiber quality and yield) and polyploid plant genomes (e.g., Upland cotton), more advanced computational pipelines need to be developed to integrate multi-omic and multi-dimensional phenotypic data. Moreover, fiber cell is a single elongated structure that serves as an ideal model for single-cell genomics. Thereby, dissecting the complexity associated with the initial input mRNA quantity in single-cell RNA-Seq will aid in screening thousands of samples per sequencing run. Therefore, the bulk RNA-Seq data generated by such voluminous efforts demands lightweight data science tools that utilize less memory footprint.



The authors acknowledge Ms. Padma S. Ragam for reviewing this book chapter. Also, the authors would like to thank anonymous reviewers and editors for their efforts in improving this book chapter. Finally, the authors acknowledge the funding support by the Capacity Building grant #2020-38821-31103 from the USDA National Institute of Food and Agriculture (NIFA).


Conflict of interest

The authors declare no conflict of interest.


  1. 1. Wendel JF, Grover CE. Taxonomy and evolution of the cotton genus,Gossypium. Cotton. 2015;57:25-44
  2. 2. Hendrix B, Stewart JM. Estimation of the nuclear DNA content ofGossypiumspecies. Annals of Botany. 2005;95(5):789-797
  3. 3. Sripathi VR, Buyyarapu R, Kumpatla SP, Williams AJ, Nyaku ST, Tilahun Y, et al. Bioinformatics tools and genomic resources available in understanding the structure and function ofGossypium. Bioinformatics-Updated Features and Applications. 2016. p. 231
  4. 4. FAS-USDA. 2022. Cotton: World Markets and Trade: February 2022 report. Available from:
  5. 5. United Nations. World Population Prospects: The 2017 Revision, Key Findings and Advance Tables. Department of Economics and Social Affairs PD, editor. New York: United Nations; 2017. p. 46
  6. 6. Liu Y. Chemical composition and characterization of cotton fibers. In: Cotton Fiber: Physics, Chemistry and Biology. Cham: Springer; 2018. pp. 75-94
  7. 7. McGrath JE, Hickner MA, Höfer R. Polymers for a sustainable environment and green energy. Polymer Science. 2013;10:849
  8. 8. Dochia M, Sirghie C, Kozłowski RM, Roskwitalski Z. Cotton fibres. In: Handbook of Natural Fibres. Cambridge, England: Woodhead Publishing; 2012. pp. 11-23
  9. 9. Hu H, Wang M, Ding Y, Zhu S, Zhao G, Tu L, et al. Transcriptomic repertoires depict the initiation of lint and fuzz fibres in cotton (Gossypium hirsutumL.). Plant Biotechnology Journal. 2018;16(5):1002-1012
  10. 10. Salih H, Leng X, He SP, Jia YH, Gong WF, Du XM. Characterization of the early fiber development gene, Ligon-lintless 1 (Li1), using microarray. Plant Gene. 2016;6:59-66
  11. 11. Sripathi VR, Anche VC, Gossett ZB, Walker LT. Recent applications of RNA-sequencing in food and agriculture. Applications of RNA-Seq in Biology and Medicine. 2021. p. 97
  12. 12. Lowe R, Shirley N, Bleackley M, Dolan S, Shafee T. Transcriptomics technologies. PLoS Computational Biology. 2017;13(5):e1005457
  13. 13. Morozova O, Hirst M, Marra MA. Applications of new sequencing technologies for transcriptome analysis. Annual Review of Genomics and Human Genetics. 2009;10:135-151
  14. 14. Wolf JB. Principles of transcriptome analysis and gene expression quantification: an RNA-seq tutorial. Molecular Ecology Resources. 2013;13(4):559-572
  15. 15. Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews. Genetics. 2009;10(1):57-63
  16. 16. Yoo MJ, Wendel JF. Comparative evolutionary and developmental dynamics of the cotton (Gossypiumhirsutum) fiber transcriptome. PLoS Genetics. 2014;10(1):e1004073
  17. 17. Chen ZJ, Sreedasyam A, Ando A, Song Q , De Santiago LM, Hulse-Kemp AM, et al. Genomic diversifications of fiveGossypiumallopolyploid species and their impact on cotton improvement. Nature Genetics. 2020;52(5):525-533
  18. 18. Parekh MJ, Kumar S, Fougat RS, Zala HN, Pandit RJ. Transcriptomic profiling of developing fiber in levant cotton (Gossypiumherbaceum L.). Functional and Integrative Genomics. 2018;18(2):211-223
  19. 19. Lin M, Pang C, Fan S, Song M, Wei H, Yu S. Global analysis of theGossypium hirsutumL. Transcriptome during leaf senescence by RNA-Seq. BMC Plant Biology. 2015;15(1):1-8
  20. 20. Ma Q , Wu M, Pei W, Wang X, Zhai H, Wang W, et al. RNA-Seq-mediated transcriptome analysis of a fiberless mutant cotton and its possible origin based on SNP markers. PLoS One. 2016;11(3):e0151994
  21. 21. Peng Z, He S, Gong W, Sun J, Pan Z, Xu F, et al. Comprehensive analysis of differentially expressed genes and transcriptional regulation induced by salt stress in two contrasting cotton genotypes. BMC Genomics. 2014;15(1):1-28
  22. 22. Naqvi RZ, Zaidi SS, Akhtar KP, Strickler S, Woldemariam M, Mishra B, et al. Transcriptomics reveals multiple resistance mechanisms against cotton leaf curl disease in a naturally immune cotton species.Gossypiumarboreum. Scientific reports. 2017;7(1):1-5
  23. 23. Xu L, Zhu L, Tu L, Liu L, Yuan D, Jin L, et al. Lignin metabolism has a central role in the resistance of cotton to the wilt fungus Verticillium dahliae as revealed by RNA-Seq-dependent transcriptional analysis and histochemistry. Journal of Experimental Botany. 2011;62(15):5607-5621
  24. 24. Zhang F, Zhu G, Du L, Shang X, Cheng C, Yang B, et al. Genetic regulation of salt stress tolerance revealed by RNA-Seq in cotton diploid wild species,Gossypiumdavidsonii. Scientific Reports. 2016;6(1):1-5
  25. 25. Bowman MJ, Park W, Bauer PJ, Udall JA, Page JT, Raney J, et al. RNA-Seq transcriptome profiling of upland cotton (Gossypium hirsutumL.) root tissue under water-deficit stress. PLoS One. 2013;8(12):e82634
  26. 26. Yang J, Gao L, Liu X, Zhang X, Wang X, Wang Z. Comparative transcriptome analysis of fiber and nonfiber tissues to identify the genes preferentially expressed in fiber development inGossypiumhirsutum. Scientific Reports. 2021;11(1):1-8
  27. 27. Wang L, Wang G, Long L, Altunok S, Feng Z, Wang D, et al. Understanding the role of phytohormones in cotton fiber development through omic approaches; recent advances and future directions. International Journal of Biological Macromolecules. 2020;15(163):1301-1313
  28. 28. Wilkins TA, Arpat AB. The cotton fiber transcriptome. Physiologia Plantarum. 2005;124(3):295-300
  29. 29. Ando A, Kirkbride RC, Jones DC, Grimwood J, Chen ZJ. LCM and RNA-Seq analyses revealed roles of cell cycle and translational regulation and homoeolog expression bias in cotton fiber cell initiation. BMC Genomics. 2021;22(1):1-6
  30. 30. Liu D, Zhang J, Liu X, Wang W, Liu D, Teng Z, et al. Fine mapping and RNA-Seq unravels candidate genes for a major QTL controlling multiple fiber quality traits at the T 1 region in upland cotton. BMC Genomics. 2016;17(1):1-3
  31. 31. Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, et al. A survey of best practices for RNA-Seq data analysis. Genome Biology. 2016;17(1):1-9
  32. 32. Naphade S, Bhatnagar R, Hanson-Smith V, Choi I, Zhang A. Systematic comparative analysis of strand-specific RNA-Seq library preparation methods for low input samples. Scientific Reports. 2022;12(1):1-10
  33. 33. Schurch NJ, Schofield P, Gierliński M, Cole C, Sherstnev A, Singh V, et al. How many biological replicates are needed in an RNA-Seq experiment and which differential expression tool should you use? RNA. 2016;22(6):839-851
  34. 34. Hrdlickova R, Toloue M, Tian B. RNA-Seq methods for transcriptome analysis. Wiley Interdisciplinary Reviews: RNA. 2017;8(1):e1364
  35. 35. Jazayeri SM, Melgarejo Munoz LM, Romero HM. RNA-Seq: a glance at technologies and methodologies. Acta Biológica Colombiana. 2015;20(2):23-35
  36. 36. Nagalakshmi U, Waern K, Snyder M. RNA-Seq: a method for comprehensive transcriptome analysis. Current Protocols in Molecular Biology. 2010;89(1):4-11
  37. 37. Schmieder R, Edwards R. Fast identification and removal of sequence contamination from genomic and metagenomic datasets. PLoS One. 2011;6(3):e17288
  38. 38. Schmieder R, Edwards R. Quality control and preprocessing of metagenomic datasets. Bioinformatics. 2011;27(6):863-864
  39. 39. Andrews, S. FastQC: A Quality Control Tool for High Throughput Sequence Data. 2010. Available from:
  40. 40. Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. journal. 2011;17(1):10-12
  41. 41. Krueger F. Trim galore. A wrapper tool around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files. Babraham Bioinformatics. 2015. Available from:
  42. 42. Buffalo V. Scythe—A Bayesian adapter trimmer. UC Davis Bioinforma. Core. 2011. Available from:
  43. 43. Bolger AM, Lohse M, Usadel B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114-2120
  44. 44. David S. HTStream: A toolkit for high throughput sequencing analysis. In: Theses and Dissertations Collection, Digital Initiatives. University of Idaho Library; 2017. Available from:
  45. 45. Bushnell B. BBDuk. Jt Genome Inst. 2020. Available from:[Accessed: June 25, 2020]
  46. 46. Kumar G, Ertel A, Feldman G, Kupper J, Fortina P. iSeqQC: A tool for expression-based quality control in RNA-sequencing. BMC Bioinformatics. 2020;21(1):1-10
  47. 47. Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016;32(19):3047-3048
  48. 48. Patel RK, Jain M. NGS QC Toolkit: A toolkit for quality control of next generation sequencing data. PLoS One. 2012;7(2):e30619
  49. 49. Gordon A, Hannon GJ. Fastx-toolkit. FASTQ/A short-reads preprocessing tools (unpublished). 2010. Available from:
  50. 50. Almodaresi F, Zakeri M, Patro R. PuffAligner: A fast, efficient and accurate aligner based on the pufferfish index. Bioinformatics. 2021;37(22):4048-4055
  51. 51. Dobin A, Gingeras TR. Mapping RNA-seq reads with STAR. Current Protocols in Bioinformatics. 2015;51(1):11-14
  52. 52. Kim D, Langmead B, Salzberg SL. HISAT: a fast spliced aligner with low memory requirements. Nature Methods. 2015;12(4):357-360
  53. 53. Anders S, Pyl PT, Huber W. HTSeq—A Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31(2):166-169
  54. 54. Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: Accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biology. 2013;14(4):1-3
  55. 55. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012;9(4):357-359
  56. 56. Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C. Salmon provides fast and bias-aware quantification of transcript expression. Nature Methods. 2017;14(4):417-419
  57. 57. Bray NL, Pimentel H, Melsted P, Pachter L. Near-optimal probabilistic RNA-Seq quantification. Nature Biotechnology. 2016;34(5):525-527
  58. 58. Patro R, Mount SM, Kingsford C. Sailfish enables alignment-free isoform quantification from RNA-Seq reads using lightweight algorithms. Nature Biotechnology. 2014;32(5):462-464
  59. 59. Guo Y, Zhao S, Sheng Q , Ye F, Li J, Lehmann B, et al. Multi-perspective quality control of Illumina exome sequencing data using QC3. Genomics. 2014;103(5-6):323-328
  60. 60. Hartley SW, Mullikin JC. QoRTs: A comprehensive toolset for quality control and data processing of RNA-Seq experiments. BMC Bioinformatics. 2015;16(1):1-7
  61. 61. García-Alcalde F, Okonechnikov K, Carbonell J, Cruz LM, Götz S, Tarazona S, et al. Qualimap: evaluating next-generation sequencing alignment data. Bioinformatics. 2012;28(20):2678-2679
  62. 62. Wang L, Wang S, Li W. RSeQC: quality control of RNA-Seq experiments. Bioinformatics. 2012;28(16):2184-2185
  63. 63. DeLuca DS, Levin JZ, Sivachenko A, Fennell T, Nazaire MD, Williams C, et al. RNA-SeQC: RNA-Seq metrics for quality control and process optimization. Bioinformatics. 2012;28(11):1530-1532
  64. 64. Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, Van Baren MJ, et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology. 2010;28(5):511-515
  65. 65. Shumate A, Wong B, Pertea G, Pertea M. Improved Transcriptome Assembly Using a Hybrid of Long and Short Reads with StringTie. bioRxiv. 2021
  66. 66. Soneson C, Love MI, Robinson MD. Differential analyses for RNA-Seq: transcript-level estimates improve gene-level inferences. F1000Research. 2015;4: 1521
  67. 67. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2. Genome Biology. 2014;15(12):1-21
  68. 68. Monga I, Kaur K, Dhanda SK. Revisiting hematopoiesis: applications of the bulk and single-cell transcriptomics dissecting transcriptional heterogeneity in hematopoietic stem cells. Briefings in Functional Genomics. 2022. Elac002
  69. 69. Su S, Law CW, Ah-Cann C, Asselin-Labat ML, Blewitt ME, Ritchie ME. Glimma: Interactive graphics for gene expression analysis. Bioinformatics. 2017;33(13):2050-2052
  70. 70. Pertea M, Kim D, Pertea GM, Leek JT, Salzberg SL. Transcript-level expression analysis of RNA-Seq experiments with HISAT, StringTie and Ballgown. Nature Protocols. 2016;11(9):1650-1667
  71. 71. Leng N, Dawson J, Kendziorski C. EBSeq: An R package for differential expression analysis using RNA-Seq data. R Package Version. 2015;1(10):1-39
  72. 72. Ritchie ME, Phipson B, Wu DI, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNA-Sequencing and microarray studies. Nucleic Acids Research. 2015;43(7):e47
  73. 73. Law CW, Chen Y, Shi W, Smyth GK. voom: Precision weights unlock linear model analysis tools for RNA-Seq read counts. Genome Biology. 2014;15(2):1-7
  74. 74. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139-140
  75. 75. Hardcastle TJ, Kelly KA. baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics. 2010;11(1):1-4
  76. 76. Pantano L, Hutchinson J, Barrera V, Piper M, Khetani R, Daily K. DEGreport: Report of DEG analysis 2017. R package version 1.16.0. Available from:
  77. 77. Tarazona S, Furió-Tarí P, Turrà D, Pietro AD, Nueda MJ, Ferrer A, et al. Data quality aware analysis of differential expression in RNA-Seq with NOISeq R/Bioc package. Nucleic Acids Research. 2015;43(21):e140
  78. 78. Risso D, Schwartz K, Sherlock G, Dudoit S. GC-content normalization for RNA-Seq data. BMC Bioinformatics. 2011;12(1):1-7
  79. 79. Gene Ontology Consortium. Gene ontology consortium: Going forward. Nucleic Acids Research. 2015;43(D1):D1049-D1056
  80. 80. Mi H, Ebert D, Muruganujan A, Mills C, Albou LP, Mushayamaha T, et al. PANTHER version 16: A revised family classification, tree-based classification tool, enhancer regions and extensive API. Nucleic Acids Research. 2021;49(D1):D394-D403
  81. 81. Wiebe DS, Mukhin AM, Omelyanchuk NA, Mironova VV. FoldGO for functional annotation of transcriptome data to identify fold-change-specific GO categories. In: Mathematical Modeling and High-Performance Computing in Bioinformatics, Biomedicine and Biotechnology (MM-HPC-BBB-2018). 2018. p. 73
  82. 82. Jiao X, Sherman BT, Huang DW, Stephens R, Baseler MW, Lane HC, et al. DAVID-WS: A stateful web service to facilitate gene/protein list analysis. Bioinformatics. 2012;28(13):1805-1806
  83. 83. Supek F, Bošnjak M, Škunca N, Šmuc T. REVIGO summarizes and visualizes long lists of gene ontology terms. PLoS One. 2011;6(7):e21800
  84. 84. Carbon S, Ireland A, Mungall CJ, Shu S, Marshall B, Lewis S. AmiGO Hub, Web Presence Working Group. AmiGO: online access to ontology and annotation data. Bioinformatics. 2009;25(2):288-289
  85. 85. Fernandes M, Husi H. ORA, FCS, and PT strategies in functional enrichment analysis. In: Proteomics Data Analysis. New York, NY: Humana; 2021. pp. 163-178
  86. 86. Smoot ME, Ono K, Ruscheinski J, Wang PL, Ideker T. Cytoscape 2.8: New features for data integration and network visualization. Bioinformatics. 2011;27(3):431-432
  87. 87. Green ML, Karp PD. The outcomes of pathway database computations depend on pathway ontology. Nucleic Acids Research. 2006;34(13):3687-3697
  88. 88. Karp PD. Pathway databases: A case study in computational symbolic theories. Science. 2001;293(5537):2040-2044
  89. 89. Ge SX, Son EW, Yao R. iDEP: An integrated web application for differential expression and pathway analysis of RNA-Seq data. BMC Bioinformatics. 2018;19(1):1-24
  90. 90. Krämer A, Green J, Pollard J, Tugendreich S. Causal analysis approaches in ingenuity pathway analysis. Bioinformatics. 2014;30(4):523-530
  91. 91. Draghici S, Khatri P, Tarca AL, Amin K, Done A, Voichita C, et al. A systems biology approach for pathway level analysis. Genome Research. 2007;17(10):1537-1545
  92. 92. Luo W, Pant G, Bhavnasi YK, Blanchard SG, Brouwer C. Pathview web: User friendly pathway visualization and data integration. Nucleic Acids Research. 2017;45(W1):W501-W508
  93. 93. Kanehisa M, Sato Y. KEGG mapper for inferring cellular functions from protein sequences. Protein Science. 2020;29(1):28-35
  94. 94. Szklarczyk D, Gable AL, Nastou KC, Lyon D, Kirsch R, Pyysalo S, et al. The STRING database in 2021: Customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Research. 2021;49(D1):D605-D612
  95. 95. Yang X, Wang Y, Zhang G, Wang X, Wu L, Ke H, et al. Detection and validation of one stable fiber strength QTL on c9 in tetraploid cotton. Molecular Genetics and Genomics. 2016;291(4):1625-1638
  96. 96. Rodgers J, Zumba J, Fortier C. Measurement comparison of cotton fiber micronaire and its components by portable near infrared spectroscopy instruments. Textile Research Journal. 2017;87(1):57-69
  97. 97. Wang H, Zhang R, Shen C, Li X, Zhu D, Lin Z. Transcriptome and QTL analyses reveal candidate genes for fiber quality in Upland cotton. The Crop Journal. 2020;8(1):98-106
  98. 98. Dong C, Wang J, Yu Y, Ju L, Zhou X, Ma X, et al. Identifying functional genes influencingGossypium hirsutumfiber quality. Frontiers in Plant Science. 2019;9:1968
  99. 99. Jiang X, Gong J, Zhang J, Zhang Z, Shi Y, Li J, et al. Quantitative trait loci and transcriptome analysis reveal genetic basis of fiber quality traits in CCRI70 RIL population ofGossypiumhirsutum. Frontiers in Plant Science. 2021;12:753755-753755
  100. 100. Zou X, Liu A, Zhang Z, Ge Q , Fan S, Gong W, et al. Co-expression network analysis and hub gene selection for high-quality fiber in upland cotton (Gossypiumhirsutum) using RNA-Sequencing analysis. Genes. 2019;10(2):119
  101. 101. Wang Y, Li Y, Gong SY, Qin LX, Nie XY, Liu D, et al. GhKNL1 controls fiber elongation and secondary cell wall synthesis by repressing its downstream genes in cotton (Gossypiumhirsutum). Journal of Integrative Plant Biology. 2022;64(1):39-55
  102. 102. Zhang C, Li L, Liu Q , Gu L, Huang J, Wei H, et al. Identification of loci and candidate genes responsible for fiber length in upland cotton (Gossypium hirsutumL.) via association mapping and linkage analyses. Frontiers in Plant Science. 2019;10:53
  103. 103. Hsu CY, Arick MA, Miao Q , Saha S, Jenkins JN, Ayubov MS, et al. Transcriptome analysis of ten days post anthesis elongating fiber in the upland cotton (Gossypiumhirsutum) chromosome substitution line CS-B25. American Journal of Plant Sciences. 2018;9(06):1334
  104. 104. Jiang X, Fan L, Li P, Zou X, Zhang Z, Fan S, et al. Co-expression network and comparative transcriptome analysis for fiber initiation and elongation reveal genetic differences in two lines from upland cotton CCRI70 RIL population. PeerJ. 2021;9:e11812
  105. 105. Sun M, Ye Z, Tan J, Chen S, Zhang X, Tu L. A cotton germin-like protein GbGLP2 controls fiber length via regulating genes involved in secondary cell wall synthesis. Molecular Breeding. 2020;40(10):1-4
  106. 106. Chen Q , Wang W, Khanal S, Han J, Zhang M, Chen Y, et al. Transcriptome analysis reveals genes potentially related to high fiber strength in aGossypium hirsutumline IL9 withGossypium mustelinumintrogression. Genome. 2021;64(11):985-995
  107. 107. Gao Z, Sun W, Wang J, Zhao C, Zuo K. GhbHLH18 negatively regulates fiber strength and length by enhancing lignin biosynthesis in cotton fibers. Plant Science. 2019;286:7-16
  108. 108. Islam MS, Fang DD, Thyssen GN, Delhom CD, Liu Y, Kim HJ. Comparative fiber property and transcriptome analyses reveal key genes potentially related to high fiber strength in cotton (Gossypium hirsutumL.) line MD52ne. BMC Plant Biology. 2016;16(1):1-9
  109. 109. Song Z, Chen Y, Zhang C, Zhang J, Huo X, Gao Y, et al. RNA-Seq reveals hormone-regulated synthesis of non-cellulose polysaccharides associated with fiber strength in a single-chromosomal-fragment-substituted upland cotton line. The Crop Journal. 2020;8(2):273-286
  110. 110. Lu Q , Shi Y, Xiao X, Li P, Gong J, Gong W, et al. Transcriptome analysis suggests that chromosome introgression fragments from sea island cotton (Gossypiumbarbadense) increase fiber strength in upland cotton (Gossypiumhirsutum). G3: Genes, Genomes, Genetics. 2017;7(10):3469-3479
  111. 111. Hinchliffe DJ, Meredith WR, Yeater KM, Kim HJ, Woodward AW, Chen ZJ, et al. Near-isogenic cotton germplasm lines that differ in fiber-bundle strength have temporal differences in fiber gene expression patterns as revealed by comparative high-throughput profiling. Theoretical and Applied Genetics. 2010;120(7):1347-1366
  112. 112. Salih H, Odongo MR, Gong W, He S, Du X. Genome-wide analysis of cotton C2H2-zinc finger transcription factor family and their expression analysis during fiber development. BMC Plant Biology. 2019;19(1):1-7
  113. 113. Miao Q , Deng P, Saha S, Jenkins JN, Hsu CY, Abdurakhmonov IY, et al. Transcriptome analysis of ten-DPA fiber in an upland cotton (Gossypium hirsutum) line with improved fiber traits from phytochrome A1 RNAi plants. American Journal of Plant Sciences. 2017;8(10):2530-2553
  114. 114. Gallagher JP, Grover CE, Hu G, Jareczek JJ, Wendel JF. Conservation and divergence in duplicated fiber coexpression networks accompanying domestication of the polyploidGossypiumhirsutum L. G3: Genes, Genomes, Genetics. 2020;10(8):2879-2892
  115. 115. Huang J, Guo Y, Sun Q , Zeng W, Li J, Li X, et al. Genome-wide identification of R2R3-MYB transcription factors regulating secondary cell wall thickening in cotton fiber development. Plant and Cell Physiology. 2019;60(3):687-701
  116. 116. Cao JF, Zhao B, Huang CC, Chen ZW, Zhao T, Liu HR, et al. The miR319-targeted GhTCP4 promotes the transition from cell elongation to wall thickening in cotton fiber. Molecular Plant. 2020;13(7):1063-1077
  117. 117. Cheng G, Zhang L, Wei H, Wang H, Lu J, Yu S. Transcriptome analysis reveals a gene expression pattern associated with fuzz fiber initiation induced by high temperature in Gossypium barbadense. Genes. 2020;11(9):1066
  118. 118. Qin Y, Sun H, Hao P, Wang H, Wang C, Ma L, et al. Transcriptome analysis reveals differences in the mechanisms of fiber initiation and elongation between long-and short-fiber cotton (Gossypiumhirsutum L.) lines. BMC Genomics. 2019;20(1):1-6
  119. 119. Mo H, Wang L, Ma S, Yu D, Lu L, Yang Z, et al. Transcriptome profiling ofGossypiumarboreum during fiber initiation and the genome-wide identification of trihelix transcription factors. Gene. 2019;709:36-47
  120. 120. Tang Z, Fan Y, Zhang L, Zheng C, Chen A, Sun Y, et al. Quantitative metabolome and transcriptome analysis reveals complex regulatory pathway underlying photoinduced fiber color formation in cotton. Gene. 2021;767:145180
  121. 121. Sun S, Xiong XP, Zhu Q , Li YJ, Sun J. Transcriptome sequencing and metabolome analysis reveal genes involved in pigmentation of green-colored cotton fibers. International Journal of Molecular Sciences. 2019;20(19):4838
  122. 122. Liu HF, Luo C, Song W, Shen H, Li G, He ZG, et al. Flavonoid biosynthesis controls fiber color in naturally colored cotton. PeerJ. 2018;6:e4537

Written By

Shalini P. Etukuri, Varsha C. Anche, Mirzakamol S. Ayubov, Lloyd T. Walker and Venkateswara R. Sripathi

Submitted: March 18th, 2022 Reviewed: March 21st, 2022 Published: May 5th, 2022