Species diversity in cotton and their importance.
Conventional breeding interventions in cotton have been successful and these techniques have doubled the productivity of cotton, but it took around 40 years. One of the techniques of molecular biology i.e., genetic engineering has brought significant improvement in productivity within the year of introduction. With cotton genomics maturing, many reference genomes and related genomic resources have been developed. Newer wild species have been discovered and many countries are conserving genetic resources within and between species. This valuable germplasm can be exchanged among countries for increasing cotton productivity. As many as 249 Mapping and Association studies have been carried out and many QTLs have been discovered and it is high time for researchers to get into fine-mapping studies. Techniques of genomic selection hold valuable trust for deciphering quantitative traits like fiber quality and productivity since they take in to account all minor QTLs. There are just two studies involving genomic selection in cotton, underlining its huge prospects in cotton research. Genome editing and transformation techniques have been widely used in cotton with as many as 65 events being developed across various characters, and eight studies carried out using crisper technology. These promising technologies have huge prospects for cotton production sustainability.
- wild species
- reference genomes
- QTL mapping
- genomic selection
- genetic transformation
Cotton is one among many fiber-producing species, but it is the only major crop cultivated for quenching one of the basic human necessities i.e., clothing. The ancient Harappan civilizations that were discovered in the Indus valley suggested that the first use of cotton was around the 2nd millennium BC . However, the discoveries of cotton fabric at Duweilah in Jordan indicate that cotton was used as early as 4th millennium BC, but the latest discoveries at Mehrgarh in Pakistan suggest that cotton fibers were used as early as 6th millennium BC . Hundreds of years ago cotton was a chief source of clothing and in the future, it would continue to be, because of its unique unparalleled qualities such as comfort, safety and eco-friendly attributes. However, with the revolution in the textile industry, the synthetic fibers were dumped into the markets with the big tag line as “cost-effective” as these synthetic fibers can be manufactured at will with the desired fiber properties to meet the spinning demands. Synthetic fibers were assumed as a major threat to cotton cultivation but sooner than later when people realized the unsustainability, unsafe and less eco-friendly characters of the Petro-chemical based synthetic fibers, cotton is still the most preferred and produced fiber for clothing . It’s a big surprise that the majority of us would strictly prefer cotton-based clothing for newborn children but not synthetic fibers, which describes its safety and comfort. Nowadays, the reinforcement of natural fiber by the synthetic fibers has proved excellent in terms of improved properties of the new fiber synthesized . Apart from the primary application as clothing, coarse cotton is widely used in hospitals as cotton swabs. Cotton linters (fiber <3.5 mm) are used in the paper industry along with other pulps to manufacture technical papers, art papers etc. Cotton oil extracted from seed is used in the cosmetics and paint industry, oil can also be used for consumption/cooking if the gossypol content is very low. Cottonseed meal is used as dairy feed. Apart from being economically important, the cotton fiber serves as a powerful single-celled model in studying cell wall and cellulose research. Around 100 million people are involved in cotton production with over 250 million deriving employment in transportation, ginning process, and several million people in textile manufacturing, agriculture inputs sector and cottonseed crushing, among others . The cotton export value during 2017 was around 15.62 billion US$ (Rice export value; 24.99 billion US$, Wheat export value: 45.13 billion US$) . Cotton has around 30% share in world textile fibers . The global textile and apparel market were to the tune of 1.7 trillion US$ , indicating that cotton is a very important crop globally. The world population is booming and it is expected to be 8–10 billion people by 2050  On the contrary, cotton productivity has also seen a rise with few but impactful breeding and molecular breeding interventions such as the introduction of hirsutum, early maturing types, introduction of intra and interspecific hybrids/derivatives and Genetically Modified (GM) crops (pest and herbicide tolerance). However, the trend of increase in productivity of cotton (1961:8.62 q/ha vs. 2018:21.90 q/ha) compared to principal crops such as wheat (1961:10 q/ha vs. 2018: 34.25 q/ha) and rice (1961, 18.69 vs. 2018: 46.78 q/ha) (Figure 1) is very low due to less international collaboration and lesser germplasm, technology exchange. In 2050, the cotton production is required to be 94.71 MT of seed cotton (33.15 MT of lint) . To meet the projected demand with the same amount of land i.e., 33 mha, there is a need to boost productivity to 28.7 t/ha of seed cotton. After the era of transgenic introductions, there is no new technological breakthrough to push the stagnant yield plateau to higher peaks. To sustain future demand with available scanty cultivable land with uncertain climatic vulnerabilities, there is a need for a strong, focused and coordinated cotton research among the world cotton research community. There lie huge prospects for molecular breeding to break the yield stagnation. Here we attempt to review the cotton genomics research carried out till date and its ability to meet the future demands.
2. Current situation
Cotton is grown in around 105 countries and total cotton production during 2018–2019 was 71.02 million tons (Figure 2). India, China, United States of America, Brazil, Pakistan, Turkey, Uzbekistan, Australia, Greece and Benin are the top ten producers. India is the highest producer followed by China and United States. Australia has the highest productivity of 49.05 q/ha seed cotton followed by China (45.32 q/ha) and Brazil (39.76 q/ha) (Figure 3). In the last fifty years, the area is more or less the same with the productivity however showing an upward trend (Figure 2). The percent productivity improvement graph (Figure 4) shows us that there were two negative growth trends in the world during 1975 and 1995 but they were addressed with technological improvements. Currently, we are again witnessing a negative trend of cotton productivity growth around the world. India and USA together contribute 51% area of cotton cultivation, however the productivity (India: 13.9 t/ha & USA: 26.9 t/ha) (Figure 5) is low compared to other countries like Australia (49.05 q/ha seed cotton) and China (45.32 q/ha) , Improving cotton productivity in USA and India would change the outlook of cotton production sustainability. The world’s two largest democracies, India & USA, have a mutual interest in promoting global security, stability and economic prosperity with cooperation at various levels. The co-operation can be extended to cotton research with other counties like Australia to make cotton productivity sustainable.
3. Genetic resource, origin, distribution and uses
Genetic variability is the main driving force of all breeding programs. The basic requirement of varietal improvement/hybrid development/marker-assisted selection is the availability of genetic resources. Even basic molecular understanding requires the occurrence of special important morphological/physiological characters upon which the studies are imposed. It is therefore necessary to understand and utilize within and between species diversity for crop improvement. The Cotton, belonging to family Malvaceae and genus Gossypium has high species diversity, which includes diploids and tetraploids with all the diploids sharing a common chromosome number i.e. 2n = 2x = 26. However, with 3-fold variation in DNA content per genome , they are classified into eight cytological groups (A, B, C, D, E, F, G, and H) [11, 12, 13]. All the tetraploids also share the same chromosome number i.e. 2n = 4x = 52 and have an AD genome. So, in total, the family Gossypium has around 51 recognized species, which includes 7 tetraploids and 44 diploids (Table 1). The genome A is thought to be originated in Asia/Africa, but the D genome is a derivative of A-genome formed by allopatric speciation i.e. due to trans-oceanic dispersion (Africa to Peru) of the A-genome. The modern-day tetraploids (AD-Genome) have originated from the trans-oceanic dispersion of A-genome to Peru followed by a polyploidization event with the native D-genome of Peru . Cotton fiber is a single cell extension of the seed cell epidermis with deposition of cellulose. Only four species underwent the parallel selection pressure of domestication in America (Gossypium hirsutum and G. barbadense; tetraploids) and Africa-Asia (G. arboreum and G. herbaceum; diploids) and only these species produce the seed epidermal cell extension that is between 10 mm to 35 mm and hence are cultivated for lint purpose. The rest of the species produce lint less than 10 mm with varying shades of brown to white. Some of the recent updates in number of species in the Gossypium family include G. trifurcatum being tentatively placed in the B genome . G. lanceolatum was proved to be a domesticated form of G. hirsutum and it does not hold a species status . G. stephensii and G. ekmanianum are the two new tetraploids discovered with a species status [17, 18]. Wild species are considered as the treasure of important genes required to combat biotic and abiotic stress . The various species of the Gossypium genus and their important traits of interest are presented in Table 1.
|Sl. no.||Species||Genome||Ploidy/chromosome number||Origin||Habitat/Important traits|
|Primary Gene Pool|
|1||G. hirsutum||AD1||2n = 4x = 52||Mexico||Cultivated|
|2||G. barbadense||AD2||2n = 4x = 52||South America||Cultivated, Verticillium wilt resistance|
|3||G. tomentosum||AD3||2n = 4x = 52||Hawaiian Islands||Wild, sucking pest tolerance, Drought/ Heat Resistance, Fiber strength|
|4||G. mustelinum||AD4||2n = 4x = 52||NE Brazil||Wild|
|5||G. darwinii||AD5||2n = 4x = 52||Galapagos Islands||Wild, Nematode resistance, Drought Resistance|
|6||G. ekmanianum||AD6||2n = 4x = 52||Dominican Republic||Wild|
|7||G. stephensii||AD7||2n = 4x = 52||Wake Atoll||Wild|
|Secondary Gene Pool|
|1||G. herbaceum||A1||2n = 2x = 26||India||Cultivated, Drought resistance|
|2||G.arboreum||A2||2n = 2x = 26||Africa||Cultivated, Drought resistance|
|3||G. anomalum||B1||2n = 2x = 26||Africa||Wild, Fiber Length, Fiber Strength, Fiber fineness, Bollworm tolerance, Sucking pest tolerance, Bacterial Blight resistance, Drought Resistance, and Mite resistance.|
|4||G. triphyllum||B2||2n = 2x = 26||Cape Verde Islands||Wild, Jassid, Bollworm resistance, highly resistant to Bacterial blight|
|5||G. trifurcatum||B||2n = 2x = 26||Somalia||Wild|
|6||G. capitis-viridis||B3||2n = 2x = 26||Cape Verde Islands||Wild, Immune to bacterial blight|
|7||G. thurberi||D1||2n = 2x = 26||Sonora Desert||Wild, Fiber Strength, Bollworm tolerance, Fusarium wilt resistance, Frost resistance, Prolific boll bearing and high GOT.|
|8||G. armourianum||D2-1||2n = 2x = 26||Baja California||Wild, Bollworm tolerance, Sucking pest tolerance, Bacterial Blight resistance|
|9||G. harknessii||D2-2||2n = 2x = 26||Baja California||Wild, Verticillium wilt resistance, Fusarium wilt resistance, CMS male sterility source, Drought Resistance.|
|10||G. davidsonii||D3-d||2n = 2x = 26||Baja California||Wild, sucking pest tolerance, Resistance to salinity, and Bacterial Blight resistance.|
|11||G. klotzschianum||D3-k||2n = 2x = 26||Galapagos Islands||Wild, Sucking pest resistance|
|12||G. aridum||D4||2n = 2x = 26||Pacific slopes of Mexico||Wild, arborescent, CMS male sterility source, Drought resistance, High seed index|
|13||G. raimondii||D5||2n = 2x = 26||Pacific slopes of Peru||Wild, Fiber Length, Fiber Strength, Fiber finess, Bollworm tolerance, sucking pest tolerance, Bacterial Blight resistance, Drought Resistance, High GOT.|
|14||G. gossypioides||D6||2n = 2x = 26||South Central Mexico||Wild, Resistance to Leaf Hoppers|
|15||G. lobatum||D7||2n = 2x = 26||SW Mexico||Wild, arborescent, Resistance to bollworm|
|16||G. trilobum||D8||2n = 2x = 26||West-Central Mexico||Wild, CMS male sterility source, Glabrous leaves|
|17||G. laxum||D9||2n = 2x = 26||SW Mexico||Wild, arborescent|
|18||G. turneri||D10||2n = 2x = 26||NW Mexico||Wild|
|19||G. schwendimanii||D11||2n = 2x = 26||SW Mexico||Wild, arborescent|
|20||G. longicalyx||F1||2n = 2x = 26||East Central Africa||Wild, trailing shrub, Fiber Length, Fiber fineness, and Nematode resistance|
|Tertiary Gene Pool|
|1||G. sturtianum||C1||2n = 2x = 26||Central Australia||Wild, Ornamental, Fiber Strength, Fusarium wilt resistance, Cold and Frost resistance and Insensitive to photoperiod|
|2||G. robinsonii||C2||2n = 2x = 26||Western Australia||Wild|
|3||G. stocksii||E1||2n = 2x = 26||Arabian Peninsula and the Horn of Africa||Wild, Fiber Length, Fiber Strength, Drought Resistance|
|4||G. somalense||E2||2n = 2x = 26||Horn of Africa and Sudan||Wild|
|5||G. areysianum||E3||2n = 2x = 26||Arabian Peninsula||Wild, Fiber Length, Fiber Strength, Drought Resistance|
|6||G. incanum||E4||2n = 2x = 26||Arabian Peninsula||Wild|
|7||G. benadirense||E||2n = 2x = 26||Somalia, Ethiopia, Kenya||Wild|
|8||G. bricchettii||E||2n = 2x = 26||Somalia||Wild|
|9||G. vollensenii||E||2n = 2x = 26||Somalia||Wild|
|10||G. bickii||G1||2n = 2x = 26||Central Australia||Wild|
|11||G. australe||G2||2n = 2x = 26||North Trans Australia||Wild, high GOT, Drought Resistance|
|12||G. nelsonii||G3||2n = 2x = 26||Central Australia||Wild|
|13||G. costulatum||K||2n = 2x = 26||North Kimberleys of W Australia||Wild, decumbent|
|14||G. populifolium||K||2n = 2x = 26||N Kimberleys, Australia||Wild|
|15||G. cunninghamii||K||2n = 2x = 26||The northern tip of NT, Australia||Wild|
|16||G. pulchellum||K||2n = 2x = 26||N Kimberleys, Australia||Wild|
|17||G. pilosum||K||2n = 2x = 26||NW Australia||Wild|
|18||G. anapoides||K||2n = 2x = 26||N Kimberleys, Australia||Wild|
|19||G. enthyle||K||2n = 2x = 26||N Kimberleys, Australia||Wild|
|20||G. exgiuum||K||2n = 2x = 26||N Kimberleys, Australia||Wild, prostrate,|
|21||G. londonderriense||K||2n = 2x = 26||N Kimberleys, Australia||Wild|
|22||G. marchantii||K||2n = 2x = 26||Australia||Wild, decumbent|
|23||G. nobile||K||2n = 2x = 26||N Kimberleys, Australia||Wild|
|24||G. rotundifolium||K||2n = 2x = 26||N Kimberleys, Australia||Wild, prostrate|
G.arboreum and G. herbaceum species known as old-world cotton were majorly grown in the Indian sub-continent. The Indus valley discoveries prove that cotton was grown as early as in 6 millennium BC, with the use of cotton being mentioned in Rig Veda (15 century BC) and Manu’s Dharmashastra (800 BC) . From the Indian sub-continent cotton has spread to Mesopotamia, Egypt and Nubia. During the first century, cotton was introduced to Europe by the Arab traders. The East India Company that colonized India (1757) and started ruling were the biggest importers of raw cotton and they used to sell the finished goods to India and the world. G. arboreum var neglecta grown in Bengal was known to produce lint that could be spun to 480 counts yarn and made into Muslins which was a result of both a beautiful skill set and the cotton germplasm that were available. Garments produced here were called “webs of woven wind” . However the East India Company that wanted to sell only their finished cotton garments, chopped off the thumbs of weavers and with the weavers lost, the germplasm too vanished from the world forever . Though the polyploidization event of tetraploids happened in Peru, the Gossypium hirsutum and G. barbadense also called new world cottons originated in Mexico and Peru, respectively, from where it spread to both South and North America. The arrival of European colonists hastened the spread of the new world cotton to the rest of the world . The East Indian Company also bought and introduced early maturing and high yield G. hirsutum cotton to India with many unsuccessful attempts being made (1790 in Bombay & Madras, 1840–42 in Deccan, Konkan and Hubli, 1853 in Punjab). However, the most significant development in terms of the spread of cotton was the introduction of Cambodia variety in Tamil Nadu region . At present, G. hirsutum is cultivated in 95% cotton area due to its high yield ability. G. barbadense is the best source for fiber quality improvement of G. hirsutum as G. barbadense is known to produce lint that can be spun to 80–120 counts yarn . However, owing to their low yields, G. arboreum, G. herbaceum, and G. barbadense are not grown widely. Maintaining germplasm and utilizing within-species variation are big challenges in varietal development as it will be an expensive proposition. The germplasm maintained elsewhere in different countries can be efficiently utilized in breeding programs. The list of germplasm preserved is mentioned in Table 2. Commendable efforts are needed in pre-breeding to utilize the rare alleles/genes present in wild species. In the principal crops like rice and wheat, the IRRI and CIMMYT, respectively, are taking up the pre-breeding work and the genetic materials are being supplied to breeders around the world. Some notable works concerning cotton pre-breeding include developing populations involving genes/segments/whole chromosomes from wild species. RHMBHMTUP-C4 a random mated population was developed involving G. hirsutum, G. barbadense, G. mustelinum and G. tomentosum . RMBUP-C4 was developed from crossing three elite hirsutum lines with 18 chromosome substitution lines from G. barbadense . There is huge scope for pre-breeding work in cotton to combat biotic and abiotic stresses.
|Country||G. hirsutum||G. barbadense||G. herbaceum||G. arboreum||Other species||Total||Reference|
|United States of America||6302||1584||194||1729||502||10,311|||
4. Cotton conventional breeding
Conventional breeding is the base for any trait improvement and without proper knowledge of conventional breeding techniques advanced molecular breeding techniques would surely lead to costly mistakes. Until the advent of molecular marker technologies, conventional breeding was the sole method for genetic improvement. Some of the popular conventional interventions in cotton were the development of determinate growth types. Development of early maturing types (resistant to boll weevils), the inclusion of morphological traits such as fergo bracts, glabrous leaves, nectariless, high gossypol for resistance to boll weevils and bollworms then followed.
Australian Conventional Breeding: American bollworm, Bacterial blight and Verticillium wilt were the major problems in Australia. Okra leaf types were used in an Australian breeding program by Norm Thomson to develop a variety called Siokra1-1 which was the first okra leaf type along with bacterial blight resistance in 1985. Dr. Peter Reid released Verticillium wilt resistance variety which was popular outside Australia .
Indian Conventional Breeding: Introduction of G. hirsutum during the 1970s and development of first intra-hirsutum hybrid cotton (H4) in India by C T Patel in 1970  and Development of first interspecific hybrid (G. hirsutum x G. barbadense) in cotton (Varalaxmi) by B.H. Katarki in 1972  led to utilizing hybrid vigour for higher productivity in the Indian subcontinent. Morphological traits such as fergo bracts, glabrous leaves, nectariless, and antibiosis [36, 37, 38, 39, 40] were included in the breeding program for bollworm tolerance. Wild species were used widely used through introgression breeding in developing novel varieties like Badnawar-1, Khandwa-1, and Khandwa-2 from G. hirsutum x G. tomentosum cross, Arogya and PKV081 using G. hirsutum x G. anomalum cross, Devitej using G. hirsutum x G. herbaceum cross, SRT-1, Deviraj and Gujarat 67 using G. hirsutum x G. arboreum cross. MCU2 and MCU5 from G. hirsutum x G. barbadense cross .
US Conventional breeding: Boll weevils were a major threat to cotton cultivation and elimination of boll weevils by developing early maturing short-staple types was one of the significant interventions . To reduce the negative association between yield and fiber quality, exotic germplasm was used in breaking the association and varieties like MD51ne with higher fiber quality were developed . Development of sub okra, smooth leaf, and nectariless for reducing tarnished plant bug populations  was another achievement. Some of the private companies like Delta Pine made huge advances in cotton productivity improvement in cotton. They released mechanical harvest suitable variety called Delta pine smooth leaf which had around 25% US cotton area by 1963, Deltapine-16 with improved disease resistance and better fiber quality had around 28% area in the US by 1972, Delta pine Acala 90 premium quality cotton released was used as parental germplasm in development of many other varieties globally, all these interventions of Delta pine improved cotton production of United States significantly.
Uzbekistan Conventional Breeding: Turkestan Breeding Station established in 1992 with a major emphasis on the collection of cotton germplasm under the leadership of Dr. Zaitsev and Dr. Mauer was a major milestone in realizing the huge germplasm collection of present-day Uzbekistan. Early maturing types AK-Djura and Dehkam by Dr. Zaitsev were an important contribution. Utilizing G. barbadense as a resistant source to fusarium wilt and large boll types led to the release of 35-1 and 35-2 cultivars. Termez-14 high yielding cultivar developed by Dr. Ibragimov was another breakthrough. Development of Verticillium wilt resistance variety C-6524 by Dr. Alexander Avtonomov and Dr. Vadim Avtonomov had occupied more than two hundred thousand hectares for fifteen years till 2004 .
All these interventions along with production technologies have improved world productivity from 9.65 q/ha in 1960 to 16.20 q/ha in 2000 in the span of 40 years . Conventional breeding methods are effective in transferring the traits but take considerably higher time, resources and uncertainty in the transfer of the trait. A breeding program involving large entries involves more samples needed to be tested for traits like fiber quality/oil content in cotton. We need to put a lot of resources and time to wait for the maturity of cotton. On the contrary, the Bt cotton technology, one of the spin-off technologies of genomics and molecular breeding played a significant role along with other production technologies and has achieved 5 q/ha improvement in just five years (Figure 4). Thus, cotton genomics/marker-assisted selection has huge potential in reducing a considerable amount of time, resource and assist conventional breeding in achieving future demands.
5. Advances in cotton genome sequencing
Having a reference genome is a boon since it is possible to characterize gene/gene families that are species-specific and which are further amenable to functional genomics work . Since the publication “Toward Sequencing Cotton (Gossypium) Genomes” in 2007 by Chen and Co-workers , the framework was laid down for genome sequencing of cotton. The initial framework was to get first sequence of D-genome (G. raimondii) followed by A and then AD genome.  developed the first assembly in cotton (D-Genome). Now, as many as nine assemblies of G. hirsutum, four assemblies of G. barbadense, three assemblies of G. arboreum, three assemblies of G. raimondii and one each assembly of wild species such as G. australe, G. darwinii, G. longicalyx, G. mustelinum, G. tomentosum and G. turneri have been documented in CottonGen. The assembly statistics of various Gossypium genomes assembled are presented inTable 3. The reference genomes of cotton were used to produce a total of 17,224,361 SNPs that are documented in CottonGen by various researchers, the gene annotations and physical maps provided are valuable information for cotton scientists for various studies like the development of linkage maps, GWAS, validating the linkage maps, expression studies, development of guide RNAs in gene editing, genome-wide characterization studies of gene families etc. Since the sequencing cost is reducing day by day there are huge prospects for developing newer assemblies to catch all the variation (within and between species) for identifying rare alleles/genes that would help us to sustain future demands in cotton improvement.
|Species||G. hirsutum||G. hirsutum||G. hirsutum||G. hirsutum||G. hirsutum||G. hirsutum||G. hirsutum||G. hirsutum|
|Number of contigs||1235||1283||3718||4831||6733||4746||265,279||44,816|
|N50 of Contigs (kb)||5020||4760||1976||113.02||7839||1891(L50)||34||80|
|No. of scaffolds||342||599||2238||48||1025||2190||40,407||8591|
|N50 of scaffolds (Mb)||—||—||—||15.510||108.1||97.73 (L50)||1.6||0.764|
|Total assembled genome size (Mb)/Scaffold length (Mb)||2290||2286||2309||2295.26||2305.2||2347.01||2432.7||2173|
|Number of annotated protein coding genes||74,350||73,624||73,707||72,761||75,376||70,199||70,478||76,943|
|Species||G. barbadense||G. barbadense||G. barbadense||G. barbadense||G. arboreum||G. arboreum||G. arboreum||G. herbaceum|
|Number of Contigs||4766||6902||4930||—||2432||8223||40,381||1781|
|N50 of Contigs (kb)||1800||77.66||2151.56|
|No. of scaffolds||2048||29||3032||29,751||1269||—||7914||732|
|N50 of scaffolds (Mb)||93.8||23.44||92.88 (L50)||0.260||—||—||0.665||—|
|Total assembled genome size (Mb) / Scaffold length (Mb)||2195.8||2224.98||2266.65||2573.19||1637||—||1694||1556|
|Number of annotated protein coding genes||74,561||75,071||71,297||80,876||43,278||40,960||41,330||43,278|
|Species||G. raimondii||G. raimondii||G. raimondii||G. australe||G. darwinii||G. longicalyx||G. mustelinum||G. tomentosum||G. turneri|
|Cultivar||D5-4||D5-3 (CMD10)||—||G2-lz||AD5-32, no. 1808015.09||F1-1||1408120.09, 1408120.10, 1408121.01, 1408121.02, 1408121.03||7179.01,02,03||D10-3|
|Number of contigs||187||41,307||19,735||2598||821||17||2147||750||220|
|N50 of contigs (kb)||6291.83||44.9||135.6||1825.35||9100||95,880||2300||10,000||7909.23|
|No. of scaffolds||—||4715||1033||650||334||—||383||319||—|
|N50 of scaffolds (Mb)||58.81||2.2||6.0||143.60||101.9||—||106.8||102.9||60.46|
|Total assembled genome size (Mb)/Scaffold length (Mb)||734.88||775.2||761.4||1752||2183||1190.67||2315||2193.6||755.20|
|Number of annotated protein coding genes||40,743||40,976||37,505||40,694||78,303||38,378||74,699||78,338||38,489|
6. Transcriptome studies in cotton
The technique of isolating and characterizing the Spatio-temporal pool of mRNA to study the differential gene expression patterns between contrasting genotypes and understanding underlying alternative pathways for specific trait supremacy has been well implemented in cotton. Many characters have been targeted to find the key responsible genes and pathways, especially fiber initiation, and elongation being highly focused up on [61, 62, 63, 64, 65, 66, 67, 68]. Other characters like Green and Brown colored cotton [69, 70, 71], Cadmium tolerance , Cold stress , Drought stress , Nematode resistance , Semigamy in Pima cotton , Whitefly mediated cotton leaf curl infection transcriptome , the transcriptome of Mepiquat chloride-induced compact types using in cotton  have also been done in cotton. Many studies have been carried out to identify genes that are differentially expressed, however paucity of the causes of the differential expression, hints us towards epigenetic regulations and transcription factors as the probable cause. Very few methylation studies have been carried out in cotton for fiber quality , male sterility , cold stress , salt tolerance , fruiting branch development . Thus, there is huge scope for studies like differential methylation, studies on transcription factor, and correlating them with the differential expression patterns.
7. Genetic markers in cotton
In cotton, various markers like restriction fragment length polymorphism (RFLP) , random amplified polymorphic DNA (RAPD) , amplified fragment length polymorphism (AFLP) , simple sequence repeats microsatellites (SSRs) [87, 88], sequence-related amplified polymorphism (SRAP) [89, 90], target region amplified polymorphism (TRAP) , inter simple sequence repeats (ISSRs) , expressed sequence tag-Simple Sequence Repeat (EST-SSRs)  and single nucleotide polymorphism (SNP) [94, 95, 96, 97] have been used for various genomic studies with each marker system having its advantages and disadvantages. However, SSRs were initially thought that they were sufficiently polymorphic but with the advent of high throughput SNPs, they are being less used. In the era of sequencing, the availability of the cotton reference genome is a boon to cotton researchers as a large number of SNPs are identified using whole-genome re-sequencing and transcriptome sequencing. Further, reduction in cost and advent of reduced representation sequencing methods like Genotype by sequencing (GBS) and Specific-locus amplified fragment sequencing (SLAF) provide scope for high throughput genotyping. To date, in CottonGen, 7,870,031 SSRs and 17,224,361 SNP markers are available for researchers for various studies . There are few SNP arrays developed in cotton like 63k cotton array , 80k SNP array , and 50k array by Samir Sawanth and I.S. Katageri (unpublished) which are being utilized for linkage and association mapping studies. However, the SNPs associated with various traits identified using different techniques can’t be used in a minimalistic laboratory with minimal cost involved, thus it is necessary to exploit trait-associated SNPs through different marker systems like CAPS (Cleaved Amplified Polymorphic Sequences) and dCAPS (derived CAPS) which require minimal laboratory set up. CAPS and dCAPS can be used as dominant marker systems and can be carried out in simple agarose gel electrophoresis. They are highly stable as they are specifically designed for certain genomic targets [101, 102, 103]. However, there are only a few CAPs and dCAPS markers developed in cotton. There are huge prospects for developing simple PCR-based markers in cotton so that the breeders working in a remote research station with minimal laboratory can take advantage of DNA markers in cotton.
8. Molecular mapping and quantitative trait mapping
The Quantitative Trait Loci identification helps in finding the association between a marker and measurable phenotype at the genomic level or understanding the genetics of traits under study. Various types of populations like F2 , Recombinant inbred lines (RILs) , Backcross inbred lines (BILs)  and Multi-parent Advanced Generation Inter Cross (MAGIC)  are commonly used in cotton. Bi-parental RIL Mapping is one of the most common methodologies successfully employed for identifying QTLs in cotton for various traits. Genome-wide association study is also used for developing genetic maps and developing an association between the trait and DNA markers in cotton germplasm. This technique allows detecting association among various markers and traits through assessing Linkage disequilibrium (LD-mapping). In cotton the construction of linkage maps and detection of QTLs for various economic traits has been in progress since 1994 with the first RFLP linkage map  being published after which many maps have been constructed [94, 96, 97, 105, 108]. Many genome-wide association studies have also been carried out [95, 107, 109]. Currently, there are around 249 QTL mapping and association studies using various populations and germplasm (Table 4), QTLs identified using Bi-parental mapping/GWAS are presented in Table 5. However, QTLs discovered for various studies indicate that Chromosomes 5, 7, 10 and 25 are harboring many QTLs for fiber length, similarly Chromosomes 7 and 21, for fiber strength. For yield (Seed cotton yield/Lint yield) Chromosomes 1, 13 and 26 seem to be very important. For boll weight, Chromosomes 7 and 13 harbor many QTLs as reported from various studies. For Boll number, Chromosome 25 and for Lint percentage Chromosomes 16 and 13 are over represented (Figure 6). Efforts have been made to develop linkage maps in wild species like G. hirsutum X G. anomalum , G. trilobum X G. raimondii , G. nelsonii x G. austral , G. hirsutum X G. darwinii , G. hirsutum X G. mustelinum , G. darwinii X G. darwinii , G. klotzschianum X G. davidsonii , G. hirsutum x G. tomentosum [117, 118, 119] and G. thurberi x G. trilobum . QTLs after validation can be used directly for marker-assisted selection. Transfer of QTL/pyramiding of QTLs is one way of realizing targeted trait introgression  or these QTLS can be utilized for fine mapping and map-based cloning before marker-assisted selection. However, only a few validation studies are done for the Virescent gene in Virescent mutants [121, 122], the fuzzless gene in the fuzzless mutant , traits like fiber length , Fiber strength , leaf shape  and QTL affecting root-knot nematode multiplication  etc. Though fine-mapping is done it would require still more concentrated efforts to dissect out the traits. There are no successful cotton cultivars deployed in the field that are developed using the identified QTLs unlike in crops like Rice (MAS 946-1, Swarna Sub-1 and Cadet) and Wheat (Patwin, Expresso and AGS2026). Now that the marker development and QTL mapping has been done to a greater depth in cotton, at least for major traits like fiber quality and yield, the focus around the world should now be on utilizing all the major QTLs identified in fine mapping and then in marker assisted selection.
|Sl. no.||Genome||Number of maps|
|1.||G. hirsutum x G. barbadense||57|
|2.||G. barbadense x G. hirsutum||9|
|3.||G.hirsutum x G.hirsutum||99|
|4.||G. barbadense x G. barbadense||4|
|5.||G. hirsutum x G. anomalum||1|
|6.||G. trilobum x G. raimondii||1|
|7.||G. australe x G. nelsonii||1|
|8.||G. hirsutum x G. darwinii||1|
|9.||G. hirsutum x G. mustelinum||2|
|10.||G. darwinii x G. darwinii||1|
|11.||G. davidsonii x G. klotzschianum||1|
|12.||G. hirsutum x G. tomentosum||4|
|13.||G. thurberi x G. trilobum||1|
|14.||MAGIC (Multi parent advanced generation intercross)||6|
|Sl. no.||Trait||Documented QTLs|
9. Genomic selection in cotton
QTL mapping and genome-wide association studies have identified many genomic regions responsible for the important agronomic and fiber quality traits in cotton. Among them only a few traits like disease resistance and pest resistance were qualitatively governed by a few genes/QTLs with a major effect. Marker-assisted selection (MAS) is well-suited for handling these traits. But in the majority of the crops and also cotton, most of the yield, yield contributing and fiber quality traits are quantitatively governed by one or few QTLs with relatively large effects along with several QTLs with small effects, which are not captured through QTL mapping [128, 129]. Hence targeted phenotype has not been achieved successfully through Marker Assisted Selection. Under such a situation, genomic selection (GS) would seem to be a promising and powerful tool of genomics to breed for these traits. GS is a unique form of MAS, here the basis of selection is the genotypic data on marker alleles covering the entire genome, irrespective of whether the effects associated with these marker loci are significant or not . Based on these marker effect estimates, genomic estimated breeding values (GEBVs) of different individuals/lines will be calculated without actually phenotyping them, which forms the basis of selection (Figure 7). GS empirical studies in maize (Zea mays; [132, 133, 134, 135]), rice (Oryza sativa; [136, 137, 138, 139]), wheat (Triticum aestivum; [140, 141, 142, 143, 144]), and sorghum (Sorghum bicolor; [145, 146, 147]) have all recently shown how GS has become an efficient approach in crop breeding with recent developments in the implementation of various high-density array-based DNA marker technologies and their reduced genotyping costs. There are many marker effects estimation models that have been developed for the GS. Their predictability mainly depends on factors like marker density, training population size, and the relationship between training and breeding populations [131, 148]. Hence, the model which is capable of giving the highest GEBV accuracy will be selected. To date only two cotton GS studies have been reported. Islam et al. (2020) compared prediction ability (PA) and prediction accuracy (PACC) of Several GS models in cotton including genomic BLUP (GBLUP), ridge regression BLUP (rrBLUP), BayesB, Bayesian LASSO, and reproducing kernel Hilbert spaces (RKHS). And reported BayesB predicted the highest accuracies among the five GS methods tested and also the same model is suggested by Gapare et al. in 2018 in cotton. In many field crops for different traits, GS prediction accuracies of >0.80 have been reached [149, 150], but now in cotton, the accuracy of 0.71 and 0.59 for fiber length and strength has been achieved, respectively . The prediction ability (PA) and prediction accuracy (PACC) was 0.65, and 0.69, respectively for fiber elongation . In most plant breeding programs, especially in cotton, GS is still in its infancy and one of the biggest barriers to the implementation of GS in practical plant breeding is the high start-up cost required for accurate phenotyping, maintaining a large training population and costs of genotyping entire breeding populations. However, nowadays the genotyping costs are continually decreasing and genotyping of large plant populations is much more manageable than going in for conventional phenotyping. Soon, at points in the breeding program where selection using conventional methods is too costly and time-consuming, GS may have its greatest potential usage.
With the advent of recombinant DNA technology in the 1970s, the genetic manipulation of plants entered a new age. Genes and traits previously unavailable through traditional breeding became available through DNA recombination and with greater specificity than ever before. This modern genetic engineering technology allows the transfer of genetic material across a wide range of species and has removed the traditional limits of crossbreeding. It involves the transfer of desired genes into the plant genome, and then regeneration of a whole plant from the transformed tissue/cell. For successful development of transgenic plants, identification of suitable target tissue and efficient gene transfer protocol are essential. Therefore, understanding the genetic variability of different crop plants and genotypes for in vitro regeneration and optimization of routine regeneration protocol is pre-requisite for the utilization of transformation technology in any crop. Currently, the most widely used method for transferring genes into plants is Agrobacterium-mediated transformation [151, 152, 153] and particle bombardment method . Other methods, such as polyethylene glycol (PEG) mediated transformation  and electroporation  have also been used to transfer genes into plants. Cotton is a recalcitrant crop to generate from in vitro tissue cultures. Compared with many other crops, it is more difficult to obtain somatic embryogenesis, shoot multiplication and plant regeneration in cotton. The nature of tissue explants, the genetic makeup of the crop plant and presence of different growth hormones have a direct effect on regeneration potential. Genotype dependent genetic transformation is well studied and used commercially in cotton. Coker genotypes, which are amenable for regeneration in vitro by somatic embryogenesis, are widely used in genetic transformation experiments [151, 152, 153, 157]. Genotype independent genetic transformation techniques although developed [152, 158] show very low frequency of heritable gene incorporation. In the beginning, the two major goals of genetic engineering in cotton were to confer insect resistance and tolerance to more environmentally acceptable herbicides . To date, 65 plus transgenic cotton events approved in India and all over the world. Continuous exposure of bollworms to BT cotton has led to resistance in them and thereby affecting the efficiency of controlling them. Cotton bollworm P450 monooxygenase gene (CYP6AE14) gene was silenced to impair larval tolerance to gossypol through the plant-mediated RNAi approach . Genetic engineering is a remarkable breakthrough in modern crop improvement. Bt cotton came at the most opportune time when bollworms were causing a lot of destruction to the cotton crop making farmers helpless. Since its release in the USA in 1995, China during 1997 and in India during 2002 the Bt. technology has had a significant impact on bollworm control and reduction in usage of pesticides has been seen.
Acceptance of genetically altered cotton in various regions of the world is offering new opportunities for improvement of cotton yield and quality. Over-expression of GhUGP1 (Cotton uridine diphosphate glucose pyrophosphorylase) in upland cotton improves the fiber quality and reduces fiber sugar content . Overexpression of novel sucrose synthase GhSusA1 gene leads to a considerable increase in biomass and fiber length with a moderate increase in fiber strength . A silkworm fibroin gene was used to improve the fiber structure and quality . The transgenic cotton plants expressing the fiber expansin gene (GhEXPA8) showed a significant improvement in fiber lengths and micronaire values . The fiber quality QTL-associated phytochrome PHYA1 gene was targeted through RNAi to explore the biological roles of PHYA1 and (indirectly) other phytochrome genes in cotton . The elimination of gossypol from cottonseed has been a long-standing goal of geneticists. A cotton variant was obtained using antisense technology against (+)-delta-cadinene gene to suppress terpenoid aldehydes (gossypol) but with lysigenous glands . RNAi-knockdown of delta-cadinene synthase gene(s) was used to engineer plants that produced ultra-low gossypol cottonseed (ULGCS) . Recently, ultra-low gossypol cottonseed (ULGCS) was obtained by using PTGS and seed-specific promoter (α-globulin) through suppression of CDN genes and these lines are under field evaluations . In the future, increased research investment on biotic and abiotic stresses through a transgenic approach is needed. Much focus is required for exploiting and improving cotton fiber and yield traits with the help of alien gene incorporation. In regards to public acceptance and questions, there is a need to carry on a massive public awareness campaign i.e. benefits, biosafety and risk assessment.
11. CRISPR Cas system for crop improvement in cotton
In the last decade, there is a revolution in the field of genome modification and continuous advancement in the targeted genome modification technologies. Genome editing tools like zinc-finger nucleases (ZFN), transcription activator-like effectors nucleases (TALENs) were extensively used before the advent of CRISPR Cas9 technology. These ZFN and TALENs technologies didn’t become as popular as that of CRISPR Cas9 due to low efficiency, low specificity, low engineering feasibility and low design simplicity. CRISPR/Cas9 (Clustered Regularly Interspaced Short Palindromic Repeats/CRISPR-associated protein) system is the latest gene-editing technology, which has the power to alter the DNA or the code of life. Scientists predict that CRISPR has enormous potential for the next green revolution by 2050. To date, the CRISPR/Cas9 system has been successfully applied to efficient genome editing not only in model plants species but also in crop plant species. Many institutions and several groups all over the world are studying the feasibilities of the CRISPR/Cas9 system in cotton (Gossypium hirsutum L.). Successful use of CRISPR/Cas9 system in cotton still relies on Agrobacterium-mediated transformation and tissue culture, a genotype-dependent and low-efficiency process, but it provides a powerful tool for cotton functional genomics as it seems to be more efficient than RNA interference (RNAi) and virus-induced gene silencing (VIGS) in terms of knocking out the function of target genes . The applications of the CRISPR/Cas9 system in cotton is been presented in Table 6. Chen and coworkers  demonstrated for the first time that the CRISPR/Cas9 can be used for advanced functional genomic research in cotton through targeted mutagenesis of two endogenous genes [Cloroplastos alterados 1 (GhCLA1) and vacuolar H + -pyrophosphatase (GhVP)]. Many factors influence the efficiency of the CRISPR/Cas9 system to obtain high mutation rates.  studied the sgRNA expression and mutagenesis efficiency by taking the endogenous U6 promoter over the existing one (Arabidopsis AtU6-29 promoter). Improved mutagenesis efficiency (4 to 6 times) was obtained by the use of an endogenous U6 promoter to drive the sgRNA expression. This study provided a fast and effective method to validate sgRNA mutagenesis efficiency in cotton using CRISPR/Cas9. Gao and coworkers  analyzed the nature of mutations induced by the CRISPR/Cas9 system through transient expression study of two genes Translation elongation factor 1 (GhEF1) and Phytoene desaturase (GhPDS) in cotton. The CRISPR/Cas9 system has been used for multiple sites targeting and simultaneously editing of multiple genes. Wang and his colleagues  successfully utilized the CRISPR/Cas9 system in allotetraploid cotton and accomplished multiple sites genome editing by targeting the exogenously transformed gene Discosoma red fluorescent protein2 (DsRed2) and an endogenous gene Cloroplastos alterados 1 (GhCLA1).
|Sl. no.||Gene||Mutation type||Method of Cas9 system delivery||Phenotype||Gene function||Reference|
|1||GhCLA1 (Chloroplasts alterados 1)||Nucleotide insertion and substitution||Agrobacterium-mediated transformation||Albino phenotype was observed||A novel gene for chloroplast development|||
|GhVP (vacuolar H+-pyrophosphatase)||—||Involved in both acidify intracellular compartments and to transport protons across the plasma membrane.|
|2||GhMYB25-like A & GhMYB25-like D||Nucleotide insertions and deletions (indels)||Agrobacterium-mediated transformation||—||GhMYB25-like is involved in the development of cotton fiber.|||
|3||Arginase (ARG)||Nucleotide insertions and deletions (indels)||Agrobacterium-mediated transformation||Improved lateral root system||Plays an important role in the regulation of lateral root formation.|||
|4||GhPDS, GhCLA1 & GhEF1||Deletions (64%)||Agrobacterium mediated transformation||Albino phenotypes observed||A novel gene for chloroplast development.|||
|5||An endogenous gene GhCLA1 and DsRed2 (Discosoma red fluorescent protein2)||Nucleotide insertions and deletions (indels)||Agrobacterium tumefaciensmediated transformation||Disappeared red fluorescence and showed albino phenotype||AtCLA1 is involved in the development of chloroplast. DsRed2 protein is utilized as a reporter due to its different benefits over other report proteins.|||
|6||ALARP||Nucleotide insertions and deletions||Agrobacterium-mediated transformation||—||A gene encoding alanine-rich protein that is preferentially expressed in cotton fibers|||
|7||Cotton Gland Formation (CGF3)||Nucleotide insertions and deletions||Agrobacterium-mediated transformation||Glandless phenotype||plays a critical role in the formation of glands in the cotton plant|||
|8||Cotton Gland Pigmentation 1 (CGP1)||Nucleotide insertions and deletions||Agrobacterium-mediated transformation||Decreased accumulation of gossypol and related terpenoids, as well as the color intensity in glands||CGP1 is an MYB Transcription Factor that regulates gossypol accumulation but not gland morphogenesis.|||
CRISPR/Cas9 has been used to edit a couple of agronomically important cotton genes, such as the genes involved in fiber development (GhMYB25-like A and GhMYB25-like D)  and a gene encoding arginase (ARG) for the increased lateral root formation . Zhu and his colleagues  demonstrated the high editing efficiency of the CRISPR/Cas9 system in cotton by targeting-ALARP, a gene encoding an alanine-rich protein that is preferentially expressed in cotton fibers. CRISPR/Cas9 knockout of the Cotton Gland Formation (CGF3) gene resulted in a glandless phenotype in cotton. Gao and coworkers  confirmed the important role of Cotton Gland Pigmentation 1 (CGP1) in gland biology through CRISPR knockout of CGP1. Decreased accumulation of gossypol and of related terpenoids was observed in the CRISPR knockout plants. The above successful studies indicate that the CRISPR Cas9 system can further be effectively utilized in the functional genomics of cotton research. However, there are some limitations of the CRISPR/Cas9 system, including off-target effects, difficulties in PAM (protospacer adjacent motif) sequence selection for fewer potential target sites, and difficulties in generating homozygous mutations in the offspring [175, 176, 177]. Therefore, there is a lot of scope for the modification of the CRISPR Cas9 system or finding new alternative CRISPR systems. Zeng and co-workers , for the first time, established an efficient CRISPR/LbCpf1 system to expand the scope of genome editing in allotetraploid cotton by targeting the cotton endogenous gene Cloroplastos alterados (GhCLA). In addition to CRISPR/Cas9 & CRISPR/LbCpf1 system, a new effector with a single nuclease domain, a relatively small size, with low-frequency off-target effects, and cleavage capability under high temperature has been recently established and designated CRISPR/Cas12b (C2c1) . CRISPR/Cas12b is a heat-induced system which requires a temperature ranging between 40 and 55°C for effective cleavage, when the temperature is lower than 40°C, cleavage cannot be accomplished [180, 181]. Recently the manipulation of the Cloroplastos alterados (GhCLA) gene in cotton plants using AacCas12b has been successfully established with no off-target effects. This system is ideal for plant species that can tolerate temperatures above 40°C, such as cotton that can grow well at temperatures reaching 45°C . Some researchers are deactivating one or both of the Cas9’s cutting domains and fusing new enzymes onto the protein. Cas9 can then be used to transport those enzymes to a specific DNA sequence. In one example, the Cas9 is fused to an enzyme, a deaminase, which mutates specific DNA bases eventually replacing cytidine with thymidine.  developed a new base editor system (GhBE3) consisting of a cytidine deaminase domain fused with nCas9 and uracil glycosylase inhibitor (UGI), for use in allotetraploid cotton, and obtained high base-editing efficiency with no detectable off-target effects. From all the above studies, it is indicated that CRISPR/Cas9 and its alternatives are potential gene-editing tools which would be superior to RNAi for cotton functional genomics. In future, this technology will have much scope for targeting tolerance to sucking pests, increased fiber yield and improved fiber quality traits in cotton.
Conventional Cotton breeders around the world had made a significant impact on cotton productivity improvement and germplasm conservation by their meticulous research that was developed during the early time is invaluable and highly significant. The advent of newer technology is an added advantage to the new young breeders since they can take advantage of newer technologies, but however prior knowledge of thorough conventional breeding, its limitations, and advantages of advanced molecular breeding, its limitation has to be kept in mind. A small mistake made initially during early molecular breeding may make us pay a heavy price in the end. The wealth of data like QTLs/Transgenic events/Gene-edited lines already developed can be cautiously used in crop improvement programs and further research in advanced technologies like genomic selection, fine mapping, and gene editing has to be the priority area of research for the sustainable cotton production.
Conflict of interest
We thank sincerely: Dr. B. M. Khadi, former Director, ICAR- Central Institute for Cotton Research, Nagpur, (India).