Recent studies in mapping of fiber-related traits in cotton.
Cotton (Gossypium spp.) produces naturally soft, single-celled trichomes as fiber on the seed coat supplying the main source of natural raw material for the textile industry. It is economically considered as one of the most leading cash crops in the world and evolutionarily very important as a model system for detailed scientific investigations. Cotton production is going through a big transition stage such as losing the market share in competition with the synthetic fibers, high popularity of Bt and herbicide resistance genes in cotton cultivars, and the recent shift of fiber demands to meet the standard fiber quality due to change of textile technologies to produce high superior quality of fibers in the global market. Recently, next-generation sequencing technologies through high-throughput sequencing at greatly reduced costs provided opportunities to sequence the diploid and tetraploid cotton genomes. With the availability of large volume of literatures on molecular mapping, new genomic resources, characterization of cotton genomes, discoveries of many novel genes, regulatory elements including small and microRNAs and new genetic tools such as gene silencing or gene editing technique for genome manipulation, this report attempted to provide the readers a comprehensive review on the recent advances of cotton fiber genomics research.
- fiber genomics
- SSR markers
- QTL analysis
- genome-wide analysis
Cotton (Gossypium spp.), a natural fiber source for the textile industry , has significant economic impact in about 80 countries including USA, China, India, Pakistan, Brazil, Egypt, and Uzbekistan . Cotton seeds are a rich source of vegetable oils, medicinal compounds and byproducts. Cottonseed is also used in livestock feed . The estimated world cotton area and production worldwide are 32–34 million hectares and ~26 million metric tons of fiber . Biologically, cotton fibers are single-celled trichomes  that grow from the epidermal cell layer of the ovule in a boll. The allotetraploid cotton genomes contain two subgenomes, At and Dt, in its nucleus, which resulted from the ancestral allopolyploidization of progenitor A-genome and D-genome diploids. The presence of the At and Dt subgenomes further increases the complexity in understanding the evolution, function, and composition of the allotetraploid cotton. Most of the economically important traits in cotton are controlled by complex quantitative trait loci (QTL) composed of many genes that collectively express the phenotype. Not all fiber quality traits are positively related to lint yield, and many have a negative genetic correlation with important agronomic traits including lint yield . The principle of cotton breeding is mainly to select desirable traits based on market demand, and breeders embrace this objective in their genetic improvement programs. Previously, cotton breeders’ primarily emphasized yield and agronomic characteristics, but recent technological changes in the spinning industry necessitated genetic improvements in major cotton fiber quality traits such as length, strength, micronaire (fineness), uniformity, reflectance, elongation, short fiber content, etc. . Accordingly, cotton producers demand superior lines that will produce not only high yield but also improved fiber quality.
Although significant progress has been achieved in developing genetically-engineered insect and herbicide resistant cultivars, which suggested usefulness and efficiency of an integration of molecular and conventional methods for the genetic improvement of cotton, there is currently a great challenge to efficiently and rapidly breed complex traits such as fiber quality. One of the great obstacles in this regard is the narrowness of genetic base of current cultivar germplasm, which is due to following reasons: (1) a “genetic bottleneck” occurring during domestication of cotton [8, 9]; and (2) repeatedly using the similar cotton genotypes for breeding programs as a major parents . Many of these above prompted researchers to bring novel tools and change breeding approaches to have successful cotton improvement programs. With advances in modern genomic technologies, considerable progress has been made in utilizing innovative approaches to achieve progress in cotton genetic research and breeding program. This review emphasized the latest advances in molecular mapping, genome-wide analyses, genome sequence, characterization of fiber genes, and existing genomic approaches for the improvement of fiber traits in allotetraploid cotton species.
2. Molecular markers
DNA markers are “landmarks” in the genome that can be selected because of their proximity to a QTL of interest. The selection of DNA markers linked to a QTL increases the efficiency of breeding, and usually decreases costs and subjective phenotypic selection with minimal backcrossing. Molecular markers represent a site of detectable variation in the genomic DNA, and they are broadly categorized as: (1) restriction enzyme-based DNA markers (such as RFLP and AFLP), (2) polymerase chain reaction (PCR)-based markers (mostly SSRs) and (3) single nucleotide polymorphism (SNP)-based markers.
The first RFLP map in an interspecific cotton population (G. hirsutum × G. barbadense) with 705 RFLP loci identified 41 linkage groups that covered 4675 cM locations mapped on 11 pairs of homoeologous chromosomes . An RFLP map with 63 fiber QTLs linked to the A-subgenome (chromosome 3, 7, 9, 10, and 12) and 29 fiber QTLs associated with the D-subgenome (chromosome 14Lo, 20, and the long arm of chromosome 26) were reported in 2005 . RFLP’s were extensively used in identifying a wide-range of QTLs linked to fiber quality, length, strength, uniformity, wall thickness, micronaire, fineness, and maturity [13, 14]).
A second group of molecular markers, AFLPs, are powerful and efficient. They are being continuously used in cotton genomic investigations . In the recent past, AFLP markers have been used in monitoring the differential expression of cotton fiber transcripts during elongation and secondary cell wall thickening in interspecific (G. hirsutum × G. barbadense) RI lines .
SSRs are the most informative, versatile and readily detectable DNA-based markers . They have been used to determine agronomically and economically important genes, genetic linkage mapping, and linkage disequilibrium-based association studies in cotton [18, 19]. Although, traditional methods of microsatellite marker development are costly and time-consuming , large numbers of SSRs have been used to explore the molecular diversity, population structure and elite alleles of several Upland cotton cultivars [21, 22, 23, 24]. As a result, many fiber quality traits associated marker loci and fiber development have been identified [21, 22, 23, 25].
3. Mapping for fiber-related traits in cotton
The genetic improvement in cotton with full utilization of conventional breeding has provided significant progress, but the complexity of the genome and limited understanding of economically important traits has deterred efficient breeding. Cotton molecular breeding has been a reliable tool for the characterization and mobilization of complex QTLs of interest. The advent of genomics-assisted breeding has become an effective method for selecting parents for agronomic, stress-responsive and fiber-related traits. In the past decade, focus of cotton genomic research has shifted from a few marker genotyping based QTL-mapping efforts to large-scale marker-based genome-wide association studies (GWAS), using high-throughput next-generation sequencing (NGS)-based genotyping methods.
An inter-specific high-density linkage map of allotetraploid cotton has been constructed using the F2 population of G. hirsutum and G. darwinii with 2763 markers associated with 26 linkage groups covering 4176.7 cM genome and displaying a few differences between At and Dt subgenomes . Among 601 distorted SSR loci reported , a lower number of segregation distortion loci were located in the At-subgenome than the Dt-subgenome. Recently, 185 cotton genotypes were evaluated for mapping of major fiber traits using 95 polymorphic SSRs . This map also covered some other marker-trait associations, such as average boll weight, and gin turn out percentage together with fiber traits such as micronaire value, staple length, fiber bundle strength, and uniformity index. The results showed a clear association of MGHES-51, MGHES-31 and MGHES-55 with all these traits, which is useful in future marker-assisted breeding and gene cloning studies in cotton [25, 27]. Similarly, MGHES-31 and MGHES-55 EST SSR markers were associated with lint percentage in a unique RIL populations derived from linted and lintless genotype crosses .
The SSRs have been used to examine molecular diversity, population structure, and to scan for polymorphisms. Genome-wide mapping in over 500 inbred cotton cultivars collected from China, United States, and the Soviet Union helped in identification of 494 fiber-quality-associated SSRs. Of the 216 markers related to fiber quality identified in this study , 13 were reported in other studies as fiber-trait-associated markers.
Gene-based markers were developed based on candidate genes and EST sequences to detect polymorphism in Gossypium species and for genetic and physical mapping. EST-derived microsatellites have been used to increase the number of microsatellites available for genetic map construction, and facilitated the use of functional genomics to elucidate fiber development process. EST-derived SSR-based high-density genetic maps for cotton fiber genes were reported in a number of studies [28, 29, 30, 31, 32]. The early EST-SSR studies in cotton [28, 29] focused on mapping loci in an interspecific (G. hirsutum × G. barbadense) RIL population while recent efforts aimed at mapping colored fiber loci (Lc 1 and Lc 2) .
With the advance on molecular marker development in cotton, three new marker types, indel (insertion-deletion), SNP, and retrotransposon-microsatellite amplified polymorphism (REMAP) were used to increase map density in allotetraploid cotton cultivars . SNP markers in cotton can be used to associate genes with the respective fiber traits. Over one hundred genomic regions have been identified by tagging >2500 SSR and SNP markers using interspecific recombinant inbred lines (RILs) . A large set of gene-associated SNPs were identified by comparative transcriptome profiling of four wild (G. tomentosum, G. mustelinum, G. armourianum, and G. longicalyx) and two cultivated cotton species (G. barbadense and G. hirsutum) . By combining RNA-Seq and super bulked segregant analysis sequencing (sBSAseq) approaches, Li2 mutant and its wild-type near-isogenic line (NIL) G. hirsutum cv. DP5690 were screened to identify the Ligon-lintless 2 (Li2) gene sequence, and subgenome-specific SNPs were identified in the tetraploid cotton .
Expansin protein plays an important role in fiber length and quality. Using NGS approaches, SNPs linked to six expansin genes were identified . An α-type cyclin-dependent kinase (GhCDKA) protein has conserved cyclin-binding, ATP-binding, and catalytic domains. It plays a key role in fiber development. The CDKA gene expression was validated by northern blot and qRT-PCR analyses. Further, SNPs linked to CDKA gene locus was assigned to the chromosome 16 . Using comparative and resequencing analyses, 24 million SNPs were identified between the A- and D-genomes in cotton. This analysis revealed that A-genome is relatively more variable (duplications and deletions) than the D-genome. In G. hirsutum, 1472 conversion events including 113 overlapping genetic events were identified between homologous chromosomes .
Two hundred twenty and 115 BAC contigs for two homoeologous chromosomes 12 and 26 of Upland cotton have been identified within 73.49 Mb and 34.23 Mb in physical length. Numerous fiber unigenes and non-unigene locations were found in both chromosomes . New marker loci and linkage groups were applied in different cotton species using informative sequence-based markers and DNA sequence information. Some of the ESTs and BACs were physically anchored and clustered into the high-density genetic map, and functionally annotated and classified into different Gossypium species .
A single dominant Ligon lintless-2 (Li2) gene results in significantly shorter fibers than a wild type in cotton (G. hirsutum). Two near-isogenic lines of Li2 cotton (one mutant and one wild type) lines were painstakingly created by five backcrosses (BC5) generations. An additional cross was used to develop a linkage map of the Li2 locus located on chromosome 18. The SSR marker NAU3991 was successfully mapped and showed complete linkage with the Li2 locus . Marker-assisted breeding and in vitro mutagenesis of cotton ovules can provide an additional insight into the regulatory aspects of the li2 mutation in cotton . Similarly, using linkage mapping and analysis in G. raimondii, the Li1 gene on chromosome 22 was identified. Many genes were recognized to be altered in the Li1 mutant line for early termination of fiber elongation. Several additional studies have also identified factors that affect Li-associated genes at the downstream position .
In addition, a comprehensive review of molecular markers, marker-assisted selection (MAS) and marker-assisted backcross (MAB) breeding have been presented recently . Cleaved amplified polymorphic sequences (CAPS) and derived-CAPS (dCAPS) markers obtained from the genes of interest are becoming increasingly valuable markers in molecular breeding of crops along with SNP markers. Phytochrome gene-based CAPS, transcription factor (HY5), and specific dCAPS in cotton were developed using comparative sequence analysis of the PHYA1, PHYB, and HY5 genes that showed close association of these genes with fiber quality and early flowering traits in cotton [45, 46].
One significant QTL can control multiple fiber quality traits, such as fiber length, micronaire, strength and uniformity, and helps to elucidate the molecular basis of fiber quality. One such fiber QTL was found and mapped between HAU2119 and SWU2302 markers in a G. hirsutum RIL population. Three candidate genes have been identified within this QTL by RNA-Seq and RT-PCR analysis .
Recently, 352 wild and domesticated cotton accessions were screened, and 93 domestication sweeps have been assigned to 74 Mb and 104 Mb of the A- and D-subgenomes in allotetraploid cotton, respectively. Further, genome-wide association study (GWAS) has identified 19 candidate loci for fiber-quality-related traits. This study suggested possible asymmetric subgenome domestication for directional selection of long fibers. The effects of domestication on cis-regulatory divergence were shown by genome-wide screening for DNase I-hypersensitive sites and linking the variation to gene function .
Numerous additional genetic and genomic resources in cotton are available for the cotton research community through specific databases. In this context, CottonDB, established in 1995, is one of the earliest plant genome databases developed . The International Cotton Genome Initiative (ICGI) was formed to coordinate the development of cotton genomics science including the creation of an integrated and saturated genetic and physical map in cotton . The Cotton Microsatellite Database (CMD), an invaluable resource for cotton microsatellites, was developed to meet the goals of ICGI with the support received from Cotton Incorporated . Later, the more comprehensive and robust database covering the genomic, genetic and breeding information collected from cotton was formed as CottonGen  database, which has enhanced features such as data visualization, mining, sharing and retrieval . Currently, there are 103 maps available in CottonGen . A few recent mapping studies for fiber-related traits in cotton are presented (Table 1).
|#||Author names||Publication year||Mapping population||Trait types mapped||Published journal||Reference|
|1||Iqbal and Rahman||2017||Inbred cultivars and segregating biparental population||Lint||Frontiers in Plant Science|||
|2||Adhikari et al.||2017||F2 and F2:3||Fiber quality||Euphytica|||
|3||Wang et al.||2017||BC3F2, BC3F2:3 and BC3F2:4||Fiber length||Theoretical and Applied Genetics|||
|4||Su et al.||2016||Segregating biparental population||Fiber quality||Scientific Reports|||
|5||Wang et al.||2016||RILs||Yield and fiber||PLoS ONE|||
|6||Nie et al.||2016||Inbred cultivars||Fiber quality||BMC Genomics|||
|7||Shang et al.||2016||RILs and BC||Fiber quality||G3:Genes|Genomes|Genetics|||
|8||Liu et al.||2016||F2 and RIL118||Fiber quality||BMC Genomics|||
|9||Badigannavar and Myers||2015||F2||Fiber||Journal of Cotton Science|||
|10||Wang et al.||2015||RILs||Yield and Fiber||PLoS ONE|||
4. Genome-sequencing efforts in cotton
The genome-sequencing endeavor in cotton has significantly advanced our existing genomic knowledge for the past decade. Available full sequence of cotton genomes should provide a better understanding of fiber and fiber-related traits. The initial efforts in cotton genome sequencing were started with the closest progenitor diploid species, G. raimondii (D5) and G. arboreum (A2), and later extended to other tetraploid species, G. hirsutum (AD)1 and G. barbadense (AD)2. Currently, whole genome-sequencing efforts of the diploid progenitor species, G. herbaceum (A1), are progressing under collaboration between Alabama A&M University and USDA ARS . Genome-sequencing information from the diploid and tetraploid species may aid in developing genetically engineered cotton lines, with superior agronomic traits. Recent reports in whole genome analysis in cotton are summarized in a tabular form (Table 2).
|Authors||Year of publication||Species||Sequencing platform||Assembler used||Assembled genome size||Predicted genes||Journal published||Refe-rence|
|Sripathi et al.||Unpublished||Gossypium herbaceum||Roche 454 and Illumina Hiseq2000||ABySS, SSPACE and SEALER||~1.46 Gb||41,387||Unpublished|||
|Li et al.||2015||Gossypium hirsutum||Illumina HiSeq 2000 platform||SOAPdenovo, SSPAC, and GapCloser||~2.17 Gb||76,943||Nature Biotechnology|||
|Liu et al.||2015||Gossypium barbadense||Roche 454, Illumina Hiseq2000, PacBio SMRT||Newbler v2.3.||~2.20 Gb||77,526||Scientific Reports (Published online)|||
|Li et al.||2014||Gossypium arboreum||Illumina HiSeq 2000 platform||SOAPdenovo||~1.94 Gb||41,330||Nature Genetics|||
|Wang et al.||2012||Gossypium raimondii||Illumina HiSeq 2000 platform||SOAPdenovo and SSPACE||~0.58 Gb||40,976||Nature Genetics|||
Briefly, several key facts have been gleaned from the tetraploid cotton, G. hirsutum and G. barbadense, genomes-sequencing efforts. Using whole-genome shotgun reads, BAC-end sequences, and genotype-by-sequencing (GBS) genetic maps, the allotetraploid G. hirsutum TM-1 genome was sequenced . It was subsequently assembled into ~2.56 Gb genome with 32,032 and 34,402 genes from At and Dt subgenomes, respectively. Comparative subgenome analysis revealed higher percentages of gene loss, disrupted genes, structural rearrangements, and sequence divergence in the At subgenome when compared to the Dt subgenome. This can be attributed to the evolutionary irregularities. However, no genome-wide expression dominances were found between the two subgenomes. It is important and should be noted that the At subgenome, with its positively selected genes (PSGs) for fiber improvement, and stress tolerance in the Dt subgenome, are tied to genomic signatures of selection and domestication . Between the two allopolyploid cottons, G. barbadense is the preferred species for producing superior, extra-long fibers. The WGS analysis of G. barbadense suggested that the uneven genome-wide duplication was 20 million years ago (MYA) and pseudogenization 11–20 MYA might be a probable cause of genomic divergence. Further, based on sequenced genomic information, the role of a fiber-specific cell elongation regulator, PRE1 (with At subgenome origin), conditioning extra-long fibers was revealed .
Since the completion of whole genome sequencing (WGS) of the nuclear genome of cotton, the focus has been shifted to chloroplast and mitochondrial genomes . The complete cotton chloroplast genome is 160,301 bp in length with a total of 131 genes, of which 112 are unique genes and 19 are duplicated genes. The cotton chloroplast genome lacks rpl22 and infA, and contains a number of dispersed direct and inverted repeats. The phylogenetic analysis of cotton and 25 other completely sequenced angiosperm chloroplast genomes revealed strong relationships among Malvales, Brassicales and Myrtales within the rosids clades .
The complete mitochondrial (mt) DNA sequence of G. raimondii was assembled into a circular genome of (676,078 bp), and then compared and analyzed with other plants. The analysis showed 39 protein-coding, 6 rRNA, and 25 tRNA genes. Interestingly, almost all of the G. raimondii mitochondrial (mt) genome has been transferred to Chr1 within the nucleus. The phylogenetic analysis with the other related mt genomes of rosids showed that G. raimondii is closely related to G. barbadense. Similar to the plastid genome analysis, the phylogenetic analysis of mt genomes revealed Brassicales were closely related to Malvales in the rosids clades . Sequencing and characterization of both nuclear and cytoplasmic genomes of the Gossypium species will enrich the knowledge used to identify fiber-related genes for the improvement of cotton fiber quality trait using modern genetic engineering tools [58, 59].
The whole genome sequences of G. hirsutum and G. barbadense have revolutionized molecular biology investigations in cotton. Knowledge gained from decoding the cotton genome has helped to improve our understanding of gene function to ultimately benefit growers with improved yield and fiber quality. It has provided an unprecedented opportunity to bridge the gap between functional and structural genomic research by using the reference sequences of the tetraploid cotton genome. Scientists are using new advanced technologies “to mine” for useful genes and understand the molecular processes of fiber development for cotton germplasm enhancement. For example, Paterson  reported that among 48 genes for which expression is upregulated in domesticated G. hirsutum fibers at 10 dpa, 20 genes show 10-fold enrichment relative to random genes. They are within QTL hotspot affecting length, uniformity, and short fiber content. Thirteen genes show 15-fold enrichment and they are in the homologous hotspot affecting fiber elongation and fineness. The reference tetraploid genome sequence revealed that non-random patterns of diverse data sets that are concentrated in the specific small regions of the At and Dt genome . Having such enriched genomic data in hand, scientists are much closer to identifying the causal gene(s). For example, expression patterns of genes in G. hirsutum wild type and its near isogenic fuzzless/lintless mutant at the stage of fiber initiation were analyzed using the RNA-Seq data . Recently, Chen et al.  reported that differential gene regulation causes the difference in the quality of fiber between G. barbadense and G. hirsutum based on integrated genome-wide expression profiling markers .
5. Fiber-related transcriptome and gene expression studies in cotton
The functions of cotton transcriptome were studied using multiple techniques such as comparative genomics, BLAST, Gene Ontology (GO) analysis, and pathway enrichment by Kyoto Encyclopedia of Genes and Genomes (KEGG). Some prominent fiber-related findings were found. In this section, we briefly summarize some of key findings on this regard.
To study key fiber development genes, fuzzless/lintless (fl) cotton mutants were considered to elucidate molecular mechanisms relevant to fiber length development . Furthermore, different fiber developmental stages have been studied to understand comprehensive mechanisms of overall fiber development. For example, G. hirsutum wild type (WT) and its near isogenic fuzzless/lintless (fl) mutant were used in comparative transcriptome analysis and microarray studies of developing ovules [64, 65]. In fl mutants, at the fiber initiation stage of fiber development, calcium and phytohormone mediated signal transduction pathways, biosynthesis of auxin and ethylene, and stress-responsive transcription factors (TFs) were downregulated, whereas researchers observed a strong downregulation of genes related to carbohydrate and lipid metabolisms, mitochondrial electron transport system (mETS) and cell wall loosening and elongation at the fiber elongation stage of development. A number of genes including CELLULOSE SYNTHASES and SUCROSE SYNTHASE C were down regulated in fl mutants at fiber initiation and secondary cell wall biosynthesis stages compared to the WT . Interestingly, it was reported that some of genes related to phytohormone signaling and stress response, upregulated in the WT genotypes in the early period of fiber initiation, started their expression in the later period of 15 day of post anthesis (dpa) in fl mutants .
Comparative transcriptome analysis showed that only a few genes were differentially expressed in zero dpa ovules, and three dpa fibers. The importance of auxin signaling and sugar signaling pathways in modulation of different fiber developmental stages was studied using pathway studio analysis .
Another transcriptome analysis of G. hirsutum WT and its mutant fl ovules in fiber initiation and elongation stages has been implemented using high-throughput tag-sequencing (Tag-seq). Differentially expressed gene (DEG) analyses results revealed substantial changes in gene type and abundance between the wild type and mutant libraries. Most of the DEG in WT1/M1 and WT2/M2 libraries developed for the study of the fiber cell development included cellulose synthase, phosphatase, and dehydrogenase genes .
In order to identify important genes of Ligon Lintless-1 (Li-1) mutants during the secondary cell wall synthesis stage, high-throughput microarray technology and real-time PCR were employed. There were at least 2-fold differences in at least 100 expressed transcripts found during secondary wall biogenesis using transcriptome analysis. Expansin, sucrose synthase, and tubulin expression gene families were identified in li-1 mutant. This signifies Li-1 gene activities during later fiber developmental stages .
A comparative microarray analysis was used to study fiber elongation in two short fiber mutants, and their near isogenic WT to identify key genes or metabolic pathways. Energy production, increasing mitochondrial electron transport activity, and response-to-reactive oxygen-related genes showed higher expression in short fiber mutants than in wild type. At least 88 fiber elongation genes were identified that were not affected by growth condition .
Improving the defects in the fiber secondary cell wall (SCW) resulted in non-fluffy fibers, low dry weight, and fineness fiber in the immature fiber mutant (im mutant) of G. hirsutum. Lower cellulose content and thinner cell walls were found in im mutant than its near-isogenic WT line (NIL) TM-1 at the same fiber developmental stage. Sucrose content, an important carbon source for cellulose synthesis, was also significantly lower in im mutant than TM-1 during 25~35 DPA. Comparative analysis of fiber transcriptional profiling indicated that SCW biosynthesis was 3 days later in im mutant than TM-1. Cellulose synthesis, secondary cell wall biogenesis, cell wall thickening, and sucrose metabolism were associated with genes significantly upregulated in TM-1. Quantitative reverse transcription PCR (qRT-PCR) analysis validated that carbohydrate metabolism had 12 related genes that were differentially expressed at the earlier transition. qRT-PCR also showed differences in the SCW biosynthesis stages of fiber development between TM-1 and im mutant, and the importance of the im regulatory gene functions in fiber SCW biosynthesis .
Human selection altered the phenotypic evolution of fiber development over long periods of selective breeding. This has been shown by comparative transcriptome profiling of developing fiber using RNA-Seq analysis. Over 5000 differentially expressed genes were found with a regulatory or participatory role in primary and secondary cell wall synthesis between wild and domesticated cottons, which arose from artificial selection . Some 210,965 unigenes of more than 100 bp were obtained from 47.2 million paired end reads of the anthers of TM-1 through transcriptome sequencing. Many enriched genes were found in the processes of transcription, translation, and posttranslation as well as hormone signal transduction. In addition, the participation of transcription factor families and cell wall-related genes active during cell expansion and carbohydrate metabolism were analyzed . To determine fiber elongation and cell wall differentiation, combined transcriptome and metabolome analyses were studied in G. barbadense and G. hirsutum cultivars. Results showed that 10–28 dpa G. barbadense fibers showed primary cell wall synthesis to support elongation, transitional cell wall remodeling, and secondary wall cellulose synthesis for continued fiber elongation. Deep sequencing of transcriptomes and non-targeted metabolomes of single-celled cotton fiber showed that the discrete developmental stage of transitional cell wall remodeling occurs before secondary wall cellulose synthesis begins in both genotypes. Among all 40,000 transcripts, expressed in the fiber and all of the cell wall-related transcripts, expression was similar between genotypes. However, cellulose synthase gene expression patterns were more complex than expected. Oxidative stress in fiber tissues was lower in G. barbadense when compared to G. hirsutum. Using deep-sequencing transcriptomes and non-targeted metabolomes, a transcriptional repression of lignification during cell wall synthesis was identified. The results implicated a positive contribution of the ascorbate-glutathione cycle in improving fiber length by the enhanced capacity of reactive oxygen species (ROS) .
Crossed and backcrossed inbred lines of G. hirsutum and G. barbadense have been developed to investigate fiber yield per acre. Using these experimental populations, a number of yield and yield component QTL co-localized differentially expressed genes (DEGs) and DEG-based SSCP-SNP markers have been identified . Numerous (1486) DEGs were found from (BIL) population using a microarray-based comparative transcriptome analysis in 10 dpa fibers. In 24 yield QTL regions and 11 yield QTL hotspots, the 212 DEGs were mapped. Within the 7 lint-yield QTL, identified in the BIL population, additional 81 DEGs were mapped .
In another study, using the cotton EST sequences, available at NCBI, 29,547 and 19,578 unigenes were assigned to the Dt and At subgenomes of tetraploid cotton, respectively. Among these, the majority of the abundantly expressed genes played intricate roles in cotton fiber development . Selected publication for recent fiber development-related investigations are summarized in Table 3.
|#||Authors||Year of publication||Methods||Tissue used||Journal published||Reference|
|1||MacMillan et al.||2017||RNA-Seq||Fiber quality||BMC Genomics|||
|2||Miao et al.||2017||small RNA-Seq||Fiber quality||PLoS ONE|||
|3||Li et al.||2017||RNA-Seq||Fiber development||BMC Genomics|||
|4||Hu et al.||2017||small RNA-Seq||Fiber yield||Plant Biotechnology Journal|||
|5||Thyssen et al.||2017||RNA-Seq||Lint||Plant Journal|||
|6||Naoumkina et al.||2017||RNA-Seq||Lint||Genomics|||
|7||Ma et al.||2016||RNA-Seq||Fiber development||PLoS ONE|||
|8||Naoumkina et al.||2016||RNA-Seq||Fiber development||BMC Genomics|||
|9||Hinchliffe et al.||2016||RNA-Seq||Fiber color||Journal of Experimental Botany|||
|10||Zou et al.||2016||RNA-Seq||Fiber development||Science, China|||
6. Characterization of specific genes for cotton fiber
A cascade of genes is expressed during the stages of fiber development. Recently, several novel genes were identified. α-expansins are one such gene family, which are cell wall proteins that disrupt non-covalent bonds between wall components to facilitate cell wall extension. Six α-expansin mRNA encoding genes were isolated using a genomic library screen and PCR-based strategies. Four genes in the expansin gene family (GhExp3-GhExp6) are expressed in multiple tissues of cotton, and only two genes (GhExp1 and GhExp2) showed expression in developing cotton fibers. GhExp1 transcripts are highly expressed, while GhExp2 transcripts were detected at low levels in the fiber. Of these two, GhExp1 is relatively of greater importance to cell wall extension during fiber development .
Actin-depolymerizing factor (ADF) is also one of the important genes that modulates the polymeration and depolymeration of the actin filaments, and is also important for fiber development. GhADF2, GhADF3, GhADF4, and GhADF5 genes, encoding ADF proteins, have been isolated from cotton (G. hirsutum) cDNAs. Bioinformatic analyses have shown the molecular evolutionary relationships of these genes, including their highly conserved status. Interestingly, GhADF2 was predominantly expressed in the fibers and not in other tissues . Downregulation of GhADF1 in the transgenic cotton plants showed increased fiber length and strength as compared to the wild-type fiber. Transgenic fibers also contained more abundant F-actin filaments in the cortical region of the cells than control fibers. In transgenic fibers, the secondary cell wall appeared thicker and the cellulose content was higher than that of the control. This showed the critical role of GhADF1 in the processes of elongation and secondary cell wall formation during fiber development .
Plants have a signaling system that mediates responses to environmental stimuli. Co-expression and preferential interaction between two calcineurin B-like (CBL) proteins and CBL-interacting protein kinase (CIPK) genes in the elongating fiber cells of G. hirsutum were determined . Very specific characterization of a receptor-like kinase gene (GhRKL1), especially during secondary cell wall synthesis in the cotton fiber cells, has been studied. In cotton, the location of GhRLK1 products is considered to be in the plasma membrane. Additionally, the GhRLK1 gene’s function is thought to be in the signal transduction pathway, i.e., the induction and maintenance of active secondary cell wall formation during fiber development .
The DET3 gene, which encodes subunit C of the vacuolar ATPase (V-ATPase), participates in Brassinosteroid-induced cell elongation. Seven candidates’ expressed sequence tags (ESTs) were screened to analyze the function of GhDET3 on the elongation of cotton fibers, and yielded detail data about this gene. Results showed that the amino acid sequence of GhDET3 had high homology with DET3 from Arabidopsis, rice and maize. Ubiquitous expression of this gene in all the tissues and organs has been detected by qRT-PCR analysis. The highest accumulation of GhDET3 mRNA peaked during the fiber elongation stage (12 dpa), compared with the lowest level at the fiber initiation stage (0 dpa ovules) underscoring the vital role GhDET3 plays in cotton fiber elongation .
One of the transcription factor genes in the MADS-box family has wide-ranging roles in many diverse aspects of plant development. The gene transcripts have been detected in developing cotton fiber cells and in other plant tissues. Alternative splicing results showed that transcripts may have altered cellular roles. This was due to encoded proteins with altered K-domains and/or C-terminal regions, and their subsequently variant proteins .
Reactive oxygen species (ROS) plays a prominent role in signal transduction and cellular homeostasis, as well as in plant cell development [81, 82]. However, growing and maturing cells encounter oxidative stress when resource imbalances occur . In order to study oxidative stress-related genes and their expression levels in a single plant cell, microarray analyses have been conducted [81, 82]. Antioxidant genes were substantially upregulated in domesticated diploid and polyploid cotton (Gossypium) in contrast to WT. In contrast, no significant influence was shown on regulation of ROS-related genes with genomic merger and ancient allopolyploid formation in three wild allopolyploid species. The ROS-related processes were regulated by different sets of antioxidant genes. . Reduced expression of ROS gene has also been observed in im mature fiber mutants .
Ten cotton class III peroxidase coding (GhPOX) genes were isolated from G. hirsutum. Class III peroxidase, an heme-containing enzyme, is encoded by a large, multigene family. These genes participate in the release or utilization of ROS. Among them, GhPOX1 showed the most predominant expression in fast-elongating cotton fiber cells, and transcription level of GhPOX1 was over 400-fold higher in growing fiber cells than in ovules, flowers, roots, stems and leaves. Results suggested that GhPOX1 plays an important role during fiber cell elongation, possibly by mediating production of ROS .
GhPFN1 is a gene isoform that was found to be preferentially expressed in cotton fibers. GhPFN1 was also found to be tightly associated with the fast elongation of cotton fibers. Overexpression and quantitative analyses also confirmed that GhPFN1 may play a critical role in the rapid elongation of cotton fibers by promoting actin polymerization . Brassinosteroids (BRs) promote fiber elongation. The BIN2 gene is a member of the shaggy-like protein kinase family, and functions as a negative regulator of BR signaling in Arabidopsis. Cotton BIN2 genes have been characterized for investigation of BR-mediated responses in the development of cotton fibers. To further elucidate their role, cotton BIN2 gene was transformed into the Arabidopsis genome. Resulting transgenic BIN2-mutant Arabidopsis plants exhibited reduced growth, confirming the role of BIN2 genes. The BIN2 gene thus encodes functional bin2 isoforms that inhibit growth by negative regulation of BR signaling .
One approach used to detect novel genes during fiber development is by identification of fiber-associated gene-rich islands on cotton chromosomes. For this purpose, 10 gene-rich islands have been found in different stages of G. hirsutum fiber development on chromosomes 5, 10, 14, and 15 . Distributions of a large number of fiber genes across the At and Dt subgenomes of AD tetraploid cotton have been studied extensively. In an attempt to develop an integrated genetic and physical map of fiber development, 103 fiber transcription factors, 259 fiber development genes, and 173 expressed sequence tag-short sequence repeats (EST-SSRs) have been mapped . According to this study, major transcription factor genes and more fiber QTLs were mapped to the Dt subgenome than the At subgenome, whereas more fiber development genes were mapped to the At subgenome than the Dt subgenome. The Dt subgenome may provide the transcription factor genes that potentially regulate the expression of the fiber genes in the At subgenome .
Differential gene expression of candidate genes between wild type and mutant during fiber elongation stage has been shown by RNA-sequencing and qPCR analysis . Twelve candidate genes of the Ligon lintless-1 mutant (li-1) were found in F2 mapping populations derived from the cross of Li 1 and H7124 genotypes. In li-1 mutant genotype genes encoding ribosomal proteins, actin protein, ATP synthase, and beta-tubulin 5 were found as a putative candidates impacting fiber development process .
Heat shock transcription factors (Hsfs) have an important role in both plant stress and development. Due to global warming, cotton is also experiencing increased exposure to elevated temperature. Consequently, the development of the yield and the quality of lint are affected. Forty GhHsf genes were selected by Wang et al.  that were characterized into three groups: A, B, and C. In cotton, these GhHsfs were observed in the majority of the plant tissues, particularly around the developing ovules. Exposure to high temperature in cotton plants showed that GhHsfs39 demonstrated the most rapid response to the heat shock. It has been suggested that a differential expression of Hsfs may thus play a role in the fiber development that requires further study .
UDP-glucuronosyltransferase gene is a cytosolic glycosyltransferase that catalyzes the transfer of the glucuronic acid component of UDP-glucuronic acid to a small hydrophobic molecule. UGT makes up one of the largest and most important multigene family in plants. The cotton UGT, GhGlcAT1, gene promoter contains specific transcription regulatory elements, and provides clues about the roles of GhGlcAT1 in cotton fiber development, especially during fiber elongation . A phylogenetic study of the UGT proteins of cotton was studied in selected cultivars and wild Gossypium species. The study identified, analyzed, and compared 142 UGTs in G. raimondii, 146 in G. arboreum, and 196 in G. hirsutum. The conserved consensus sequence had 44 amino acids. It additionally showed a possibility of regrouping the GrUGTs and GaUGTs into 16 phylogenetic groups (A-P) and GhUGTs into 15 groups. Additionally, RNA-Seq data was used to study the expression patterns of the UGT genes in G. hirsutum wild type and its isogenic fuzzless/lintless mutant during fiber initiation .
WRKY gene products aid in managing stress responses in multiple plant species but they have not been extensively studied at various stages of fiber development. Ding et al.  identified the relation between WRKY transcriptome factors and fiber development of G. raimondii and G. arboreum by studying their genome and transcriptome of 112 G. raimondii and 109 G. arboreum WRKY genes. The transcriptome analysis identified several WRKY genes active during fiber initiation, elongation, and maturation with different expression patterns between species. The association of WRKY allelic gene expression (Dt and At) in G. hirsutum and alternative splicing events were likewise seen in both diploid and tetraploid cotton during the developmental stage of the fiber. In summary, this study provided new results for the evolution and role of WRKY gene family in cotton species .
The role of the MYB family transcription factors was evaluated during the developmental stage of cotton fiber. Within 1986 MYB and MYB-related putative proteins, 524 non-redundant cotton MYB genes were identified and regrouped into four subgroups (1R-MYB, 2R-MYB, 3R-MYB, and 4R-MYB). In addition, MYB transcription factors were classified into 16 subgroups according to the phylogenetic tree analysis. After analysis, 69.1% of all GhMYBs genes were identified as 2R-MYB subfamily. Conclusively, this study highlights important aspects regarding the functions of MYB transcriptome factors in cotton fiber development. Furthermore, it contributes to the understanding of the regulatory network of MYB in affecting other functions of cotton fiber development .
CrRLK1L, one of the receptor-like kinase (RLK) gene family subgroups, has previously been demonstrated to be important in the development pattern and spatial regulation in cotton. CrRLK1L family is believed to act as sensors for the integrity of the cell wall and regulators of polar elongation. This study focuses on CrRLK1L in cotton fiber development. A total of 44 CrRLK1L genes were isolated from G. raimondii, 40 from G. arboreum, and 79 from G. hirsutum. Among these, six genes played an important role in fiber development .
To visualize PME expression levels, 80 PME genes (GaPME01-GaPME80) were isolated from G. arboreum, 78 (GrPME01-GrPME78) from G. raimondii, and 135 (GhPME001-GhPME135) from G. hirsutum. The differences in the PMEs expression levels at the developmental stage of fiber was observed using qRT-PCR. Predominant expression in fiber was during the secondary cell wall thickening stage suggesting tissue-specific expression patterns in cotton fiber .
LPAAT is an enzyme from the Kennedy pathway in higher plants encoded by a multigene family. Recently, the role of modified-LPAAT gene (At-Gh13LPAAT5) in increasing the cottonseed oil content and fiber quality has been proposed by combining the genome-wide and transcriptome analyses .
7. Small RNA-mediated gene regulation studies in cotton
Small RNAs (including microRNAs, tasiRNAs, and piRNAs) are mainly 17–24 nucleotide long sequences that are scattered across the plant genomes and play an important role in regulating target gene expression via posttranscriptional and translational repression at different stages of plant development. The discovery of novel miRNA genes will help in understanding the key mechanisms associated with miRNA genesis and regulation of fiber development in cotton. Although the regulatory mechanisms of microRNAs and small non-coding RNAs were determined in overall plant growth and development , their specific role in the regulation of fiber cell elongation and developmental processes were more widely elucidated primarily after 2008. The roles of small interfering and microRNAs in the development of the cotton ovule and fiber elongation were annotated by Abdurakhmonov and his colleagues , as a first attempt in fiber genomics. They identified three plant microRNAs (miR172, miR390, and ath-miR853-like) and demonstrated dpa-specific small RNA expression profiles during ovule development. These result suggested the complex dpa-specific small RNA regulation in ovule development covering 0–10 dpa fiber development stages .
Multiple approaches have since been developed to identify the role of small RNAs in fiber initiation and elongation. For example, a deep-sequencing approach was used to investigate global expression and complexity of small RNAs in wild type and fuzzless/lintless cotton ovules. Over 20 conserved candidate miRNA families, including their 111 members, were identified during fiber initiation and elongation. More than 100 unique target genes were predicted for most of the conserved miRNAs; two cell-type-specific novel miRNA candidates were also determined in cotton ovules . More than 4 million small RNA sequences have been analyzed from fiber and non-fiber cotton tissues. Thirty one miRNA families, including 27 conserved and 4 novel miRNAs, have been identified from these tissues. In addition, 19 unique miRNA families were also identified representing 32 miRNA precursors. Seven families had been previously reported, and 25 new miRNA precursors have also been found.
The enrichment of siRNAs in ovules and fibers in small RNA metabolism and chromatin modification becomes active during fiber development . A recent study identified 46 novel and 96 known miRNAs in elongating cotton fibers. They also found 64 differentially expressed miRNAs, and of those, 16 were predicted as novel miRNAs . Several novel fiber miRNAs have been identified using high-throughput sequencing technologies during the secondary cell wall thickening stage in cotton . Small RNA libraries were constructed from developing fiber cells of the short fiber mutants Li-1, Li-2, and their near-isogenic wild-type lines. Among 24 conserved and 147 novel identified families, four miRNAs revealed significant negative correlations with fiber lengths .
Earlier miRNA-specific DNA markers were developed and mapped in cotton to study the genetic variation of miRNAs and their putative target genes  A number of pre-miRNA and putative target gene primers have been examined and polymorphic loci were mapped on the total tetraploid cotton chromosomes. MiRNA-based sequence-related amplified polymorphism (SRAP) markers were used in order to map more miRNA loci. RT-PCR analysis revealed unique expression patterns across different fiber development stages between the parents in pre-miRNAs and putative target genes .
In G. hirsutum, ~300 miRNAs have been identified targeting over 3000 genes that possibly regulate stress responses, metabolism, hormone signal transduction and fiber development . Among 79 and 46 miRNA families identified in G. hirsutum and G. raimondii, respectively, eight miRNAs were specifically related to fiber elongation and associated pathways such as calcium and auxin signal transduction, fatty acid metabolism and anthocyanin biosynthesis, and xylem tissue differentiation. In addition, one tasiRNA was identified and its target, ARF4, was experimentally validated in vivo .
Approximately, 10 million non-coding RNAs (ncRNAs) from fiber tissue of the allotetraploid cotton (G. hirsutum) were sequenced 7 days after flowering (DAF), to identify 24 nt ncRNA as the dominant species, followed by 21 nt ncRNA, and 23 nt ncRNA. This study further screened ~560 miRNA gene loci and suggested the role of miRNAs in elongation and secondary cell wall synthesis stages of cotton fiber development .
8. Functional genomics and genome-editing technologies in regulation of fiber genes
Genome modification (GM) and genome-editing technologies (GETs) are invaluable in the discovery of genes of interest, and support functional genomics of many organisms, including cotton. Various GM and GETs have been developed to investigate regulation mechanisms of genes. One of the most commonly used GM approaches is RNA interference (RNAi) that drew its historic support from antisense technology in the discovery of gene structures and the functions of organisms. RNAi is a new emerging technique based on homology-dependent post transcriptional gene silencing, induced by double-stranded RNA (dsRNA). Associations of many important genes with fiber development were detected using RNAi . Recently, considerable work has been done highlighting the suitability of this method in cotton improvement [106, 107, 108]. For example, modifying or regulating flowering time is arduous in a plant improvement program, but it is sometimes a critical step to produce novel varieties with high yield that are better adapted to a specific environment.
In a collaborative project, scientists from Uzbekistan and USA developed cotton plants with early flowering, higher yield, and improved fiber qualities when RNAi technology was applied to regulate phytochrome gene  (Figure 1). Considering the potential of this research, scientists have patented this technology. A number of novel RNAi cultivars were successfully fielded trialed in over 60,000 hectares in Uzbekistan . This is the first time a major crop is developed through the new RNAi technology and has been planted in such a large area.
A majority of RNAi published studies have focused on the functional aspects of cotton fiber-related genes [3, 109]. Later, Wang et al.  characterized dihydroflavanol 4-reductase (DFR) enzyme that mediates the biosynthesis of two polyphenols (anthocyanins and proanthocyanidins (PAs)) in Upland cotton. In order to silence GhDFR1 in cotton, DFR gene was cloned from developing fibers, and used for virus-induced gene silencing. The results show a significant decrease in accumulation of anthocyanins and PAs when GhDFR1 is silenced. In addition, a high decrease of two PA monomers, (-)-epicatachin and (-)-epigallocatachin, occurred in GhDFR1-silenced plant fibers while two new monomers, (-)-catachin and (-)-gallocatachin, were present compared to control plants infected with the empty vector. GhDFR1 contribution in the biosynthesis of anthocyanins and PAs in cotton has thus been confirmed .
Overexpression of an important gene in an alien/host genome can be useful to detect gene functions and structure. The cellular functions of the class I of TEOSINTE-BRANCHED1/CYCLOIDEA/PCF (TCP) transcription factor GhTCP14, from Upland, cotton were characterized by Wang et al. . According to their work, the main expression of GhTCP14 gene was detected in fiber cells at the initiation and elongation stages of development, and its expression increased in response to exogenous auxin. Overexpression of GhTCP14 in Arabidopsis thaliana enhanced initiation and elongation of trichomes and root hairs. Moreover, it affected root gravitropism like a mutant of the auxin efflux carrier PIN-FORMED2 (PIN2) gene. Expression of the auxin uptake carrier AUXIN1 (AUX1) showed its upregulation, while PIN2 was downregulated in the GhTCP14-expressing plants. GhTCP14 showed transcription activity by binding to the promoters of the PIN2, IAA3, and AUX1 genes; these are auxin response genes that use electrophoretic mobility shift assays. All results demonstrated the potential regulation of GhTCP14 gene in auxin-mediated differentiation and elongation of cotton fiber cells . Overexpression of the actin-bundling protein GhFIM2 was functionally characterized in cotton. The abundance of actin bundles is accompanied with accelerated fiber growth at the fast-elongating stage, and it increased when the GhFIM2 gene was overexpressed. Secondary cell wall biogenesis also showed activation when the GhFIM2 gene was overexpressed. These results indicated the importance of GhFIM2 gene in the development of cotton fiber cells .
Recently, the clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated (Cas) 9 protein system have emerged as a simple and efficient tool for genome editing in eukaryotic cells. Most of the commercially grown cotton is tetraploid, and it is much more difficult to target both sets of homologous alleles. In an initial effort to standardize CRISPR/Cas9 in in the tetraploid cotton, a single copy gene, green fluorescent protein (GFP), has been utilized to determine the efficacy of the system in generating targeted mutations (indels) using three independent sgRNAs . Literature analysis showed that application of novel generation GETs for cotton in general and fiber trait improvements in specific are in their very early stages and require more future attentions, coordinated efforts, and continuous investments.
For all crops—cotton in particular—where limited genetic diversity exists among the agriculturally elite types, genetic improvements will depend on innovative exploitation of genetic resources, and efficient strategies that effectively utilize both conventional and advanced molecular technologies. Comprehensive information is needed from independent and diverse research to understand molecular and genetic mechanisms associated with fiber development and additional agronomic traits. A cotton fiber provides a single cell crucible to understand the mechanisms of primary and secondary cell wall synthesis. It facilitates not only the study of cotton fibers, but also helps for further understanding of how all plant cell walls grow in relation to cell division and cell elongation. The elongating fibers also exemplify a scheme for production, and utilization of complex biochemical substances are first synthesized and then transported beyond the metabolic confines of cell membrane in the so-called “outer-space.” A cotton fiber cell can be studied in detail relatively away from the noise of metabolic activity within the “inner-space” where thousands (and soon to be tens of thousands) of functionally characterized genes participate in a spatial and temporal interplay of plant growth. These genes first promote the regulated cell growth in organ-specific manner, and later march the specialized fiber cells toward senescence and apoptosis. To study the complex but very interesting process of fiber development, the cotton research community has extensively applied the genetic tools from mapping to genome modification technologies through characterization of key genes and development of molecular markers. This enabled researchers to tag complex fiber QTLs, clone and characterize genes and breed novel superior fiber cultivars using MAS and GE technologies. However, functional genomics of fiber is still behind other crops in the utilization of new generation native GETs due to complexity and multi-allelic nature of fiber-related genes, which require well-planned cooperative research activities and larger investments.
We thank Academy of Sciences of Uzbekistan, and Committee for Coordination Science and Technology Development of Uzbekistan for basic science (FA-F5-T030), and several applied (FA-A6-T081, FA-A6-T085 and YF5-FА-Т-005) research grants. We greatly acknowledge the Office of International Research Programs (OIRP) of the United States Department of Agriculture (USDA) – Agricultural Research Service (ARS) and U.S. Civilian Research and Development Foundation (CRDF) for international cooperative grants P121, P121B, and UZB-TA-2992, which were devoted to cotton microRNA research. We also thank Dr. Sharma for his efforts to review this manuscript.