Association Mapping for Improving Fiber Quality in Upland Cottons

Improved fiber yield is considered a constant goal of upland cotton ( Gossypium hirsutum ) breeding worldwide, but the understanding of the genetic basis controlling yield-related traits remains limited. Dissecting the genetic architecture of complex traits is an ongoing challenge for geneticists. Two complementary approaches for genetic mapping, linkage mapping and association mapping have led to successful dissection of complex traits in many crop species. Both of these methods detect quantitative trait loci (QTL) by identifying marker–trait associations, and the only funda-mental difference between them is that between mapping populations, which directly determine mapping resolution and power. Nowadays, the availability of genomic tools and resources is leading to a new revolution of plant breeding, as they facilitate the study of the genotype and its relationship with the phenotype, in particular for complex traits. Next Generation Sequencing (NGS) technologies are allowing the mass sequencing of genomes and transcriptomes, which is producing a vast array of genomic information with the development of high-throughput genotyping, phenotyping will be a major challenge for genetic mapping studies. We believe that high-quality phenotyping and appropriate experimental design coupled with new statistical models will accelerate progress in dissecting the genetic architecture of complex traits.


Introduction
Cotton is a crop of immense importance as being a dominant source of fiber and oil from cottonseed all over the world [1]. The improvement of cotton fiber quality has become more important because of changes in spinning technology and ever-increasing demands of fiber. Cotton is grown in more than 80 countries, and contributes to the world economy as a raw material for textile industry [2].
Gossypium" genus is made up of about 52 species of which 47 are diploid and 7 are as allotetraploids [3][4][5][6][7]. Of all the species of the genus, two most common diploids are G. arboreum L., G. herbaceum L., while G. hirsutum L., and G. barbadense L. are considered as the most commercially valuable tetraploids. G. hirsutum, is characterized by high yield, moderate fiber quality and wide adaptability contributes for 95% of overall cotton production [8]; while G. barbadense (Pima, and Egyptian) increases superior fiber quality [9,10].
Efforts for broadening the genetic base of Gossypium genus have not generated successful outcomes due to the complex and large genetic architecture of its genome. Moreover, owing to its developmental barriers, genetic studies have not yet been able to produce the required traits in cotton [11]. Association among markers and characters can be used for fastening the breeding program. The hereditary variation present among the gene pool land races can be exploited by applying the mapping based on linkage disequilibrium. It will speed up the cotton breeding through identification of markers among trait of interest and ensure molecular breeding. Single reproducibility of genetic marker which govern a specific appearance on sequence of nucleotides can be analyzed with genome wide association [12,13]. Association mapping relies upon the magnitude of different pair of genes for population analysis. Moreover, this mapping shows powerful connection between required character and a genetic marker while nonrandom combination between two quantitative trait loci or markers manifests linkage disequilibrium [8]. The valuable information about the origin of an individual is determined with the degree and the size of the population [13,14]. Many loci relating to polygenic characters have been determined via genetic maps and linkage disequilibrium (LD) was measured in humans through diverse analysis methods [15,16]. Population based polygenic characters mapping for desired traits became a widely used technique thanks to the innovations in omics and availability of advanced bioinformatic tools for analyzing genetic variations [17]. The ultimate benefits of this technique includes the ability to work with a large number of loci, producibilty of highly saturated maps, its speed and its low cost [18].

Fiber quality
Single cell elongation of ovule in cottonseed outer layer forms a natural fiber known as "trichome" which contains about 89-100% cellulose. [19][20][21][22]. As little as, 30% of lint primordia have the ability to be differentiated as mature fibers forming about 20,000 of it within a single ovule [23,24]. The ideal cotton fiber should be white like frozen vapor, durable like iron, attractive like silk and stretched as a wool [25]. Nonetheless it is hard to include all these qualities within a breeding program for cotton production, but efforts have been made to obtain the most desired ones. Fiber quality is an array of quantitative traits (length, fineness, strength, uniformity and elongation) that enhance yarn value during spinning [26][27][28]. Fiber quality is a difficult association of physiology and genetic make-up of plant within a growing season of cotton [29,30].
Fiber quality enhancement through genetics is the ultimate objective of breeding strategy in cotton. Cotton scientists have been involved in fiber quality improvement for a long time due to the increase in demand for multiple products from cotton. The critical goals of all cotton related techniques are fiber yield and quality, and the precise parameters which contribute its economic value on global level. Spinning automation renders fiber improvement according to interests of textile sector, as a result fiber quality measurements for breeders are considered. As an instance, prevailing spinning automation highly signify strength instead of fiber length and fineness [31]. Moreover, fiber quality improvement is a demanding task as it is determined after harvesting of crop.
The main goal of all genetic improvement is to increase yield. The intensity of improvement for lint production has deteriorated since the 1980s [32][33][34]. Nonetheless, genetic diversity has increased at the start of 21st century [35,36].

Marker assisted selection
Due to the inverse relationship between seedcotton yield and fiber quality, and the complicated involvement of multiple genes in traits demand breeders to evolve varieties through more useful methods. In the past textile industry flourished principally via selection of new recombinants among germplasm entries with traditional breeding approaches [37,38]. Elite grown cotton genotypes have narrow genetic base, therefore it has been thought that germplasm should be used for improvement of traits. Some of popular characters such as disease and insect resistance have been enhanced by introgression [39]. The advent of DNA markers paved the way for plant breeders to fasten breeding process through fast, authentic and substitutive techniques instead of the traditional methods for the selection to develop both agronomic and economic characters of plants [40].
Molecular marker is a specific DNA portion with a known position on the chromosome [41], or a gene whose phenotypic expression is frequently easily distinguished and used to detect an individual [42,43]. DNA markers are having the property of polymorphism which can be used for the differentiation of homozygotes and heterozygotes [44]. Marker assisted selection has a great amount of advantages over conventional breeding, reviewed by many researchers [45][46][47] [56,57]. Cotton is an important cash crop at global level and marker assisted selection has not got desired goals because of compatibility barriers through historic domestication and insufficient polymorphism [58][59][60].
Molecular characterization is the way to transfer required traits into modern genotypes [45,[61][62][63][64]. Quantitative trait loci (QTLs) allow gene pyramiding for yield and fiber quality through evolution of linkage maps. Association mapping using linkage disequilibrium on genome wide level is the most valuable strategy among scientists for searching QTLs in crop sciences. The association among trait of interest and germplasm entries is observed using population construction information and linkage disequilibrium (LD) with association mapping [65]. LD mapping is highly popular thanks to the sophistication of mathematical methods and accessibility of large number of DNA markers.
The traits controlled by multiple genes such as fiber quality can be studied more precisely with linkage maps after the availability of new genomic data of Gossypium spp. like Gossypium raimondii Ulbrich [66,67], Gossypium arboreum L. [68] and Gossypium hirsutum L. [69,70]. [71] revealed that tetraploid species derived from crossing of two diploid species Gossypium arboreum L. (A genome) and Gossypium raimondii Ulbrich (D genome) about 1-2 million years ago. Moreover, it may pave the way for fiber improvements as higher number of QTLs assigned to the Dt subgenome compared to At sub-genome in hawian cotton [72][73][74].
Many researchers have observed QTLs for seedcotton yield and its components [9, 70,[75][76][77][78][79]. But, mostly filial generations were used for QTLs. Quantitative trait loci are highly effected by low heritability and more experimental error which are high in such plant materials, hence it is need of the day that a useful way should be Plant Breeding -Current and Future Views 4 employed for the development of stable populations for overcoming these obstacles. The accuracy of QTL determination relies upon allelic frequency among QTL of the desired character and related marker [80]. Molecular breeding methods designed with the information obtained through quantitative trait loci analysis in association mapping creates valuable genetic variation from stable populations [81].

Association mapping of fiber traits using genotyping by sequencing (GBS)
Molecular markers are highly favored for linkage map development because they are polymorphic, easily transferred to next generation with Mendelian ratio and do not show epistasis. Molecular breeding with highly saturated maps having QTLs connected with economic traits through impactful genetic markers provides a good source for cotton improvement [64]. Genomic analysis in many crop species including cotton has been done using populations derived from hybridization of only two ancestors; which is major drawback for omics information. Therefore, there has been hindrance in applying QTL information gained from such populations to accomplishing breeding objectives, as, in these populations, the genetic aspects are the same owing to the share of genetically similar backgrounds.
The foundation of association mapping is on hypothesis about occurrence of markers as a panel in which the alleles are found almost adjacent to the required traits with co-segregation and thought to be in linkage disequilibrium. Germplasm entries are used for determining QTLs of interest using genome wide association mapping [82]. There are many agents including type of copulation, gene flow frequency and population structure can affect such mapping approach [18]. Association mapping allows to overcome drawbacks found in bi-parental mapping from traditional methods which include using populations which are found as well-established genotypes, detects only the required gene and identify high polymorphism [83][84][85]. This methodology also urges to use knowledge based on linkage disequilibrium instead of linkage mapping.
Marker assisted breeding involves recent approaches of genomics combined with traditional breeding procedures for improving traits in crop sciences. For this reproducibility is essential among genetic markers. Morphological characters grading and genotyping with molecular markers is accomplished [86]. Molecular markers are very effective for identifying and overcoming problems for transfer of traits from other species such as segregation distortion [87]. Genetic markers are effective for determining genetic variation in Gossypium gene pool. [88] classified DNA markers into groups: 1) non-hybridization based; which include Amplified Fragment Length Polymorphism (AFLP), Simple Sequence Repeats (SSR), Sequence Repeat Amplified polymorphism (SRAP), İnter-Simple Sequence Repeats (ISSR), Expressed Sequence Tag (EST-SSR), Single Nucleotide Polymorphism (SNPs) etc. Numerous linkage maps have been developed in allotetraploid cotton employing diverse mapping populations and different DNA markers techniques [76,[89][90][91][92][93][94]. Numerous SSRs and SNPs have been evolved in cotton [95][96][97][98][99]. Saturated genetic maps development through loci information of SSR and SNPs in cotton paves the way for ascertaining quantitative traits related to breeder objectives [100][101][102][103][104] Nonetheless, association analysis and very fine mapping is not possible owing to less information from these maps. It is need of the day that highly saturated mapping should be devised in cotton for overcoming the sequencing drawbacks and fastening the variety development.
Single nucleotide polymorphisms are distinct points of nucleotides on chromosomes between two genotypes differentiated by a single base [64]. [115] speculated that each SNP is found after 100-300 bp in any genome while revealed that such genetic markers are highest in occurrence than any other marker and manifest higher degree compared to microsatellites. SNPs can be formed rapidly with economical cost owing to availability of high-throughput tools for genotyping [116]. Assessment of gene expression [117,118], genome wide association [68,119] and SNPs detection has been carried among the individuals having different sizes of genomes and also polyploid species having limited genetic variation like cotton [10, 120] and wheat [121] through low-cost high-throughput genotyping tools. SNPs have been explored and genotyped among different species via diverse ways [10, [120][121][122].
Genotyping-by-sequencing (GBS) is powerful and easy approach which paves the way for the discovery of numerous SNPs concurrently among large number of genotypes [123]. Restriction enzymes with methyl sensitivity are used to mark the flanking restriction sites in the genome for the development of reduced representation of the genome via GBS [121,122]. GBS method is much easier, requires lower amount of DNA and library preparation is achieved in just two steps on plates, circumvents DNA fragment analysis preceded by PCR amplification of pooled library in contrast to reduced representation libraries (RRL) and restriction site associated DNA (RAD) [122]. The discovery and verification of reproducibility is not required in this procedure and can be applied in any species having polymorphism or mapping population with diverse size [124]. A number of SNPs has been discovered in many species using GBS like maize [122], wheat, barley [121], sorghum [125], rice [126], soybean [127], oat [128] and cotton [10, 79,129,130].
Association mapping furnishes saturated map of desired trait in contrast to pair of genes harboring a required character [131]. Therefore, verification of QTLs is compulsory for mapping. Association mapping is the way to examine genetic variation of required characters; integrates the variation of the desired characters through reproducibility of the alleles and genetic markers are selected connected to economic traits using linkage disequilibrium extent [132]. Moreover, LD elaborates the ancestral pattern through information among populations and ecology [133,134].
LD based association mapping has been applied by using different strategies for determining genetic diversity contributing source pattern and design of population [135,136]. Grouping of population individuals with combined genetic distance among the entries established via LD [137][138][139]. LD extent among natural population is not contributed by linked loci but non-homologous chromosomes are also involved, accountable to selection, behavior of population and hybridization.
Owing to which immense care should be considered for analyzing such relations. Reproducibility in a sequence controlling a specific character is the property of this mapping [140]. Moreover, considerable concern is prevailed among association studies and linkage mapping relating to depth and precision of QTLs, the magnitude of knowledge and evaluating procedures [132].
In spite of the fact, statistical analysis is not appropriate with LD derived tools. Natural population partitioned into distinct categories with model-based procedures [141]. Bayesian modeling is used widely for assessing the probability of a genotype related to a specific population category through allele repetition. With this technique the genotypes are allotted to particular population which can be interspersed into statistical methods for association mapping with population organization. The population framework is analyzed by using STRUCTURE software [135] which has been used for association studies in many plants. Various studies have been conducted in cotton for different aspects in cotton through association mapping like seedcotton yield and components [142][143][144], salt tolerance [145], architecture of plant, earliness [146] and protein and oil contents [147] and fiber quality [8, 60,132,[148][149][150].
In-contrast to genetic mapping in populations developed from hybridization of parents using conventional ways are not saturated, labor intensive, always in danger, high investment for development and more work after evaluating numerous genotypes of gene pool [84]. Nonetheless, association mapping use LD and overcomes the requirement of bi-parental populations by utilizing the extent of genetic variation present within the available stable populations like cultivars, accessions developed with the time and maintained as gene pool. Association mapping on whole genome has been studied in Arabidopsis [151], rice [152] for observing loci connected to economical characters. Association studies allow the development of highly saturated maps via determination of QTLs related to economic characters at whole genome level in permanent mapping populations.
Abdurakhmonov et al. [60] used association analysis for observing association among fiber traits in cotton among germplasm entries for utilizing the genetic variation in marker-based breeding. Linkage disequilibrium based association mapping determined in the germplasm having diverse genotypes from all over the world. 95 SSR were screened among all germplasm entries for ascertaining QTLs at whole genome level associated with fiber properties. They found about 11-12% LD among all SSRs. They also observed significant population orientation among all entries. They employed mixed linear model and general linear model using kinship and population structure and as a whole determined 6 & 13% pair of primers related to fiber quality. They concluded that the markers selected in this study can be used for refinement of fiber using hidden sources of genetic variability.
Genetic variation, population behavior and LD based association analysis for fiber conducted in germplasm under two different climatic zones [85]. The upland gene pool containing 335 elite entries screened with 202 SSRs. Mean of LD prolonged to 25 cM at whole genome level among all genotypes at 0.01 probability. They found that LD dropped to about 5 cM at (r2 > 0.2) showing potential for association among genotypes for yield contributing characters. They performed mixed linear model and population analysis for observing association contributing to permutation significance and population pattern. As a whole developed many common markers for fiber traits among genotypes in both locations. They revealed that mixed linear model associations ranged from 7 to 43% having strong to very strong relation to fiber properties as confirmed by Bayes factor which will be a very effective source for association analysis of yield improvement in marker based breeding techniques.
Wang et al. [153] found association among yield and fiber characters in using mixed linear model in pima cotton germplasm entries. They observed 72 loci, out of which 46 were connected to fiber while 26 related to cotton. They concluded that marker-associations among fiber characters are of vital value for enhancing quality.
Fang et al. [154] used multi-parents population for observing association among yield and fiber quality traits. They revealed that common and new QTLs deducted in this study can be used for overcoming problems in fiber quality enhancement. They screened 1582 polymorphic microsatellites among 275 RILs in first set developed from diverse parents for screening QTLs connected to fiber. 131 QTLs found for fiber quality sharing characters via association analysis with TASSEL while same QTLs verified in second set of 275 RILs with 270 SSR. The distinction showed that 54 new QTLs and 77 QTLs are in accordance to previous studies.
Genetic map constructed using RIL developed from transference of superior fiber quality from G. barbadense (TM-1) to G. hirsutum cv. NM24016 and relationship determined among yield components and fiber. 429 SSR and 412 GBS-based single nucleotides were involved in the development of map which spanned to about half length of upland cotton genome [10]. They revealed that all makers are distributed randomly among all loci of the genome. The yield components and fiber characters showed extreme phenotypic expression under multiple locations. They found 28 QTLs which are useful from breeding perspectives for agronomic and fiber properties.
Cai et al.
[8] used 99 upland cotton genotypes to ascertain the association for fiber traits. The relationship among fiber components determined with 97 polymorphic microsatellites. The genomic regions associated with fiber were 107 including 70 in 2 or more than 2 zones and 37 found in just one. It was revealed that most of the associations were reliable as verified from earlier findings for fiber quality. They also observed genomic regions related with 2 or more characters and assumed that such regions derived from the genotypes which are having minor allele frequency less than five, from local sources or acclimatized in china. They concluded that fiber traits can be renovated by using such loci from diverse resources.
Islam et al. [123] carried GBS for observing SNPs which can be used for improving economic traits in cotton gene pool. RILs and 11 contrasting parents were used in the study with two separate methods were applied for determining SNPs with variant allele frequency of >0.1. SNPs quality control performed and calling done with available G. raimondii Ulbrich genome. As a whole 1071 and 1223 SNPs observed among At and Dt genomes respective. Moreover these SNPs were found in coding region usually in higher frequency. GBS was conducted in germplasm consisting of 154 accessions for the verification of 111 of total SNPs and the SNPs verified in all parents and none of the genotype was found with same SNP. They revealed that SNPs can be determined in G. hirsutum with ease and genetic improvement can be done after getting true SNPs.
Association among fiber traits conducted in germplasm collection of Hawaiian cotton consisting of 503 genotypes [132]. They used 494 microsatellites at whole genome and as a whole 179 replicable SSRs were screened among genotypes under diverse climatic conditions. Population pattern and LD used for observing association among various fiber traits with mixed linear model via TASSEL program. The QTLs were selected among markers and phenological characters with association values. 426 alleles were evolved and germplasm was differentiated into seven subgroups upon the basis of hybridization, climate and topographical pattern. 216 polymorphic loci were associated with fiber contributing characters having mean of 2.7% and showed phenotypic variation from 0.58-5.12%. LD decreased significantly to 0-5 cM and observed 13 QTLs which are same to earlier findings and 3 connected to similar character while 7 QTLs were corresponded to fiber formation. They concluded that novel alleles identified based association mapping based LD for fiber quality can be applied in breeding cultivars for tagging genes of interest.
GBS carried in a population evolved using various parents for overcoming the inverse relation among yield and fiber traits [155]. They assumed that GBS will serve as a valuable source for the development of high saturated map with the development of large frequency of SNPs. Association analysis via mixed linear model in TASSEL observed among fiber traits in four separate climates with 5071 SNPs developed from GBS and 223 SSRs from 547 RILs. One QTL cluster related to fiber traits including length, short fiber content, strength and uniformity found and verified on locus A07. They also studied the ultimate genes connected to fiber traits and revealed that SNP (CFBid0004) formed from deletion of 10 bp GhRBB1_ A07 is directly associated with fiber traits among RIL and 104 approved American varieties. Moreover, GhRBB1_A07 can be used in MAS for the improvement of fiber traits among germplasm entries.
Sun et al. [150] studied the genetic architecture of major fiber traits in cotton germplasm using association mapping under different climatic zones. The mixed linear model association analysis showed that fiber length, strength and uniformity had 16, 10 and 7 SNPs respectively while G. raimondii 7th chromosome had two main genomic locations and fiber length contributing four genes were also observed. Moreover population structure showed that populations from low peaks were having less genetic variation among accessions compared to high peaks. The valuable allelic frequency was more in genotypes from less elevation in-contrast to high. They concluded that the desired allelic number among genotypes can be used for enhancement of fiber.
Association was observed for plant ideotype, heat tolerance, yield contributing traits and fiber quality among germplasm collection under different climatic conditions for consecutive three years at whole genome [156]. The genetic stock associations were observed using SNPs. Fiber characters were found to be low to highly heritable as value ranged from 0.26-0.89 for boradsense heritability as compared to yield components having 0.14-0.43. Phylogenetic analysis showed that the genotypes were developed from diverse parents having multiple characters from breeding perspectives. They pointed that less number of informative markers can be used for association mapping studies as LD value found upto 5Mbp which decreased to 2Mbp at r2 ≥ 0.2. 17 significant SNPs connected fiber length while 50 SNPs for fineness were observed using mixed linear model. The results revealed that associations among most of the characters at whole genome were non-significant as numerous SNPs impact on phenotype was found lower than 5% and assumed this to be due to low reproducibility of markers among cotton or SNP Chip less coverage in the germplasm.
Sun et al. [150] used association analysis in germplasm containing wide variation among genotypes at multiple locations for fiber quality traits. Illumnia SNP array was used for genome-wide study for quality analysis. They found 10,511 SNPs which were distributed over all loci and 46 SNPs associated with fiber quality with significance. They observed two QTLs for strength and length on At07 and Dt11.

Conclusion
Fiber quality enhancement through genetics is the ultimate objective of breeding strategy in cotton. Cotton scientists have been involved in fiber quality improvement for a long time due to the increase in demand for multiple products from cotton. Furthermore, conventional ways would be tiresome and stagnant. Hence, the modern plant improvement methods should be integrated. Molecular characterization is the way to transfer required traits into modern genotypes. Genotypingby-sequencing (GBS) is powerful and easy approach which paves the way for the discovery of numerous SNPs concurrently among large number of genotypes. Quantitative trait loci (QTLs) allow gene pyramiding for yield and fiber quality through evolution of linkage maps. Molecular breeding with highly saturated maps having QTLs connected with economic traits through impactful genetic markers provides a good source for cotton improvement. Association mapping using linkage disequilibrium on genome wide level is the most valuable strategy among scientists for searching QTLs in crop sciences. It is need of the day that highly saturated mapping should be devised in cotton for overcoming the sequencing drawbacks and fastening the variety development.