The literature pertaining to cassava (rev. ref.) on agro-morphological characterization reviewed in this paper.
The identification of cassava cultivars is important for understanding the crop’s production system, enabling crop improvement practitioners to design and deliver tailored solutions with which farmers can secure high yields and sustainable production. Across the lowland tropics today, a large number improved varieties and landraces of cassava are under cultivation, making it inefficient for breeders and geneticists to set improvement goals for the crop. The identification and characterization of cassava genotypes is currently based on either morphological characters or molecular features. The major aim of cultivar identification is to catalog the crop’s genetic diversity, but a consensus approach has still not been established. Of the two approaches to the identification of variety, morphological characters seem to account for most of the genetic variability reported in cassava. However, these characters must be treated with caution, as phenotypic changes can be due to environmental and climatic conditions as well as to the segregation of new highly heterozygous populations, thus, making the accurate identification of varieties difficult. The use of molecular markers has allowed researchers to establish accurate relationships between genotypes, and to measure and track their heterozygous status. Since the early 1990’s, molecular geneticists working with cassava have been developing and deploying DNA-based tools for the identification and characterization of landraces or improved varieties. Hence, in the last five years, economists and social scientists have adopted DNA-based variety identification to measure the adoption rates of varieties, and to support the legal protection of breeder’s rights. Despite the advances made in the deployment of molecular markers for cassava, multiple platform adoption, as well as their costs and variable throughput, has limited their use by practitioners of crop improvement of cassava. The post-genomic era has produced a large number of genome and transcriptome sequencing tools, and has increased our capacity to develop and deploy genome-based tools to account for the crop’s genetic variability by accurately measuring and tracking allele diversity. These technologies allow the creation of haplotype catalogs that can be widely shared across the cassava crop improvement community. Low-density genome-wide SNP markers might be the solution for the wide adoption of molecular tools for the identification of cultivars or varieties of cassava. In this review we survey the efforts made in the past 30 years to establish the tools for cultivar identification of cassava in farmer’s fields and gene banks. We also emphasize the need for a global picture of the genetic diversity of this crop, at its center of origin in South America.
Today, a large number of the varieties of cassava which are under cultivation have persisted from pre-Columbian times, having been perpetuated through vegetative propagation, particularly at its center of origin in South America [4, 5]. From South America, this crop spread to sub-Saharan Africa in the 16th century , and from South and Southeast Asia (SEA) in the late 18th and early 19th century to Asia . Crop improvement, led by International Institute of Tropical Agriculture (IITA) in Africa and the International Centre for Tropical Agriculture (CIAT) in Latin America and the Caribbean (LAC), as well as in South Asia and SEA, has made improved varieties more common in farmer’s fields [2, 8, 9]. For instance, CIAT and Kasetsart University in Thailand developed what is considered to be the most successful variety ever breed, KU50, which has a notably high fresh root yield and dry matter content [10, 11]. Since its official release in Thailand, this variety has spread throughout SEA. In Vietnam, KU50 was released in 1995 as KM94, and was later introduced in Cambodia as Malay [12, 13, 14]. It covers nearly one million hectares today. In cassava, it is quite common for the same variety to be re-named when it is introduced to a new area, leading to the existence of synonymous varieties. The opposite situation also occurs, where different varieties are identified under the same name (homonyms) [15, 16, 17].
There is currently little understanding of the number of cassava varieties grown throughout the lowland tropics, but, this number is likely to be in the order of thousands, based on the results obtained by Rabbi et al.  and Floro et al. . This number can also be estimated from the total number of the crop accessions (genotypes) kept under conservation in different ex-situ gene banks. In 2010, CIAT commissioned a survey of the status of germplasm conservation of cassava across 50 cassava gene banks . Out of the 50 gene banks surveyed, 34 provided information that allowed the estimation that as many as 14,791 distinct landraces were under conservation in gene banks . The real number, however, is likely be significantly lower, once all varieties are characterized using DNA-based molecular markers [2, 4, 12].
In the past 30 years, a body of knowledge about the varietal identification and genetic diversity of cassava has been developed for genetic materials found in ex-situ collections, experimental field trials, and farmers’ fields, using morphological descriptors [17, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48], morphological descriptors and molecular markers [16, 49, 50, 51, 52, 53, 54, 55, 56, 57], and molecular markers alone [2, 4, 12, 15, 25, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126]. The morphological descriptors were first defined by CIAT in the late 70’s and early 80’s [47, 48], and were later revised by Fukuda et al. [21, 46]. Approximately 75 morphological descriptors, also known as traits, have been defined, and 199 alleles have been made available for distinguishing cassava varieties under ex-situ conservation or to catalog the local varietal inventory of farmers, In more than half of these studies, a measure of their genetic diversity was included [19, 23, 25, 29, 30, 33, 34, 35, 36, 37, 40, 41, 43]. Additional efforts to identify cassava varieties have been undertaken, combining morphological descriptors and molecular markers, under the assumption that combining knowledge of farmers with DNA-based genetic profiles should more accurately account for the large genetic differentiation observed among cassava varieties in gene banks, breeding programs, and in famers’ fields [16, 49, 50, 51, 52, 53, 54, 55, 56, 57].
Across the scientific community investigating cassava, the most widely used methods for identification of varieties, and the estimation of its genetic diversity, have involved molecular markers . Since the advent of DNA-based molecular marker technologies, cassava scientists have adopted nearly all of the most popular techniques to elucidate and describe the crop’s varietal identities, diversity, domestication, and ancestry [2, 4, 64, 72, 73, 74, 78, 126]. These molecular approaches have focused on two primary objectives: (1) to access an adequate number of highly informative DNA-based molecular markers across the cultivated species; and (2) to assess the crop’s global ex-situ germplasm, and that of populations produced at publicly funded breeding programs. Thus, the use of molecular markers could allow building a global varietal haplotype catalog, containing the molecular descriptions of the most common varieties of cassava grown during the last 50 years across sub-Saharan Africa, South and SEA, and LAC. This information will facilitate the development, registration, and release of varieties that will effectively replace old varieties with the latest modern cultivars.
Access to a global catalog of the crop’s molecular haplotypes will enable the conducting of studies on the adoption of improved varieties . DNA-based marker technology must be cost-effective, easy to use, and reproducible across laboratories. The reproducibility of molecular marker techniques is extremely important in cassava, due to the presence of fixed somatic mutations, which are potentially caused by clonal propagation, although evidence for this phenomenon is limited [36, 68, 95, 128, 129].
A robust set of highly informative DNA-based markers could be used for variety identification, quality control, and the measurement of genetic diversity, with a potential use in variety registration. Thus, cassava breeders will be able to trace infringements of Plant Breeder’s Rights, particularly when the cassava variety is licensed for exclusive commercial use.
2. Morphological descriptors
The need to improve cassava varieties, to fight hunger, malnutrition, and poverty in the tropics, has led to the identification of the problem of discriminating between
Cassava has large phenotypic variance in the field, with a wide eco-geographic adaptation range, suggesting that there is a significant amount of genetic diversity available for breeding. Thus, the identification and differentiation of commercial and landrace cultivars is very important. It has been necessary until recently to rely on the morphological characteristics of the vegetative parts of cassava. Consequently, a range of vegetative descriptors has been used to distinguish cassava varieties from each other in Africa, Asia, LAC, SEA, and Oceania (Table 1). The resolution achieved by Fukuda’s et al. morphological descriptors can account for cassava’s genetic differentiation between accessions, facilitating the understanding of the crop’s genetic resources. Among the 28 morphology-based cassava varietal identification and genetic differentiation studies used in this review, the ranges of qualitative (6–44) and quantitative (0–28) morphological descriptors are significantly different (Table 1). Approximately one-third of these studies jointly evaluated qualitative and quantitative morphological descriptors, producing a noticeable increase in the number of genetic targets sampled, and thus improving the assessment of genetic diversity in both natural and segregating populations, allowing for the selection of contrasting parents for breeding.
|Morphological Descriptors||Rev. Ref.|
|Region||Location||Cassava ||Scoring Schedule||No. of variables||No. of Dupl.|
|Africa||Côte d’Ivoire||Collection maintained at CNRA’s research station at Bouaké||340||5–12 MAP||14||0||35|||
|Nigeria||Collection maintained at IITA, Ibadan, Nigeria||1766||NP||32||8||0|||
|Collection maintained at IITA, Ubiaja||1890||NP||28||8||0|
|Benin||Field collected in 55 villages surveyed in the southern region||125||NP||20||0||0|||
|Côte d’Ivoire||Field collected across 26 villages in the Centre-west, South-west and West region||159||5–12 MAP||14||0||16|||
|Chad||Field collected in Mandoul, Moyen Chari, Tandjilé, Logone Occidental and Oriental region||59||3, 6, 9 & 12 MAP||32||13||3|||
|Cameroon||Field collected across de Humid Forest & Guinea Savannah Agroecologies||89||3, 6, 9 & 12 MAP||35||14||0|||
|Côte d’Ivoire||Field collected in the forest zone of the Ivory Coast||44||NP||20||4||0|||
|Angola||Collection maintained at the Agronomic Investigation Institute||40||12 MAP||12||10||0|||
|LAC||Brazil||Collection maintained at Embrapa Mandioca e Fruticultura, Cruz das Almas||14||8 MAP||10||4||0|||
|Brazil||Collection maintained at Mandioca do Cerrado (BGMC) - Embrapa||16||12 MAP||33||0||0|||
|Costa Rica||Collection maintained at Centro Agronómico Tropical de Investigación y Enseñanza (CATIE)||37||NP||44||28||0|||
|Brazil||Collection maintained at Embrapa Mandioca e Fruticultura, Cruz das Almas||200||NP||19||16||0|||
|Brazil||Collection maintained at Embrapa Mandioca e Fruticultura, Cruz das Almas||95||11–12 MAP||32||0||0|||
|Brazil||Collection maintained at Embrapa Mandioca e Fruticultura, Cruz das Almas||95||11–12 MAP||35||13||0|||
|Brazil||Regional Germplasm Bank of Eastern Amazon, situated in Belém, Pará, Brazil,||262||11–12 MAP||21||0||0|||
|Brazil||Collection maintained at Mato Grosso’s State University (UNEMAT - Cáceres) and Embrapa Agrossilvipastoril)||158||6–8 & 12 MAP||29||9||0|||
|Brazil||Field collected in the Brazilian Middle North Regions, Viçosa-MG||10||8 MAP||24||0||0|||
|Asia||India||Western Ghats region of Tamil Nadu, covering 32 villages in the southern region of Western Ghats with altitude ranging from 250 to 2552 feet above MSL||56||NP||6||2||0|||
|SEA||Indonesia||Field collected in Java, Sumatera, Kalimantan, Sulawesi, Maluku, Nusa Tenggara Timur and Papua Islands||181||12 MAP||10||0||0|||
|Vietnam||Collection maintained at the Root Crop Research and Development Center (RCRDC), and Field Crops Research Institute, located in Chuong My, Hanoi||7||4–8 MAP||20||0||0|||
|Pacific Islands (Oceania)||Vanuatu||Collection maintained at Vanuatu Agricultural Research and Training Centre (VARTC)||145||12MAP||12||0||4|||
Although cassava displays strikingly high levels of heterozygosity, clonal propagation has permitted the spread of a small set of superior clones, increasing their frequency of occurrence across different regions. This set of clones is grown under large number of different names. A single genotype cultivated in a given geographical region might be found under different names, resulting in the unintentional presence of duplicated genotypes in any one collection. The results of variety identification based on morphological descriptors in cassava has not revealed the presence of these duplicated entries in the ex-situ collections or under cultivation (Table 1), although 20 to 25% genotypic redundancy is expected. This review covers a total of 4,285 cassava accessions from Africa, Asia, LAC, SEA, and the Pacific Islands, but the number of duplicate cassava accessions reported is extremely low (1.4%) (Table 1). This result might be explained by the high morphological variability reported in cassava due to changes in soil, climatic, and biotic factors, making it difficult to precisely describe the morphological characteristics of this crop. The inability to identify genetic duplicates in a germplasm collection has profound implication for cost-effective germplasm conservation, as well as for germplasm use by breeding programs. Thus, the accurate and reliable identification and elimination of duplicates within a germplasm collection will facilitate genetic resource management and use, while reducing maintenance costs.
These studies have revealed an important heterogeneity within cassava cultivars, particularly those held by farmers [31, 33, 37, 39, 43]. The use of morphological descriptors in the early characterization and identification of cassava varieties is useful to identify new genetic variability, but it can be a lengthy process, taking more than a year to obtain and analyze this type of data. The number of potentially uncharacterized varieties still used in traditional farming is estimated to be as high as 15,000 . Thus, it is likely that the available number of morphological descriptors is inadequate to account for the crop’s large genetic variability, as well as the number of cassava cultigens which are affected by environmental factors that influence their phenotypes. This situation highlights the need to develop a method to measure the crop’s genetic variability, reducing or eliminating the need to use morphological descriptors. Molecular markers, due to their nature, could provide an immense advantage in the identification of varieties and the characterization of genetic variability, by providing more detailed information about its polymorphisms, independent of the physiological status of the plant or the environmental conditions in which it grows.
3. Molecular markers deployed in cassava
Since the mid 80’s, molecular markers have been used in cassava for a large number of genetic diversity and variety identification studies (Table 2). The scientific community working on cassava is therefore well acquainted with the development and use of these markers [64, 71, 72, 73, 74, 78, 81, 126, 127, 131, 135, 136, 137, 138, 139, 140]. The first attempt to use molecular markers for variety identification in crops was undertaken at CIAT by Hussain et al. , Ramirez et al. , and Ocampo et al. , using isozymes. In 1992, Ocampo et al. , using αβ-esterase isozymes, analyzed 86% of the global cassava collection of 4,034
|Region||Location||Source||No. of cassava samples||Morphological Descriptors||Isozymes||RFLPs||RAPDs||SSRs||ISSRs||SRAPs||ISTR||AFLPs||DaRTs||SNPs||No. of Duplicates||Rev. Ref.|
In 1995, Ocampo et al.  implemented a DNA fingerprinting method for genetic analysis called restriction fragment length polymorphisms (RFLPs). RFLPs allowed Ocampo et al.  to estimate the number of duplicates in the CIAT collection; of the 5500 genotype approximately 1,000 could be duplicates indicating an approximate 18% redundancy in the global germplasm collection (Table 2). Therefore, the RFLP marker system is an attractive approach because they are inherited in a co-dominant mode, allowing homozygotes to be distinguished from heterozygotes, and are locus-specific and highly informative, targeting specific sites on the genome, due to restriction-site specificity . However, the use of RFLPs can be challenging, as their use is laborious, costly, and can only resolve mutations at the enzyme cut site, limiting their use in phylogenetic reconstruction . Nevertheless, these efforts demonstrate that the identification of genetic variety can be achieved using molecular genetic tools, and used for germplasm management, including quality control of experimental lines across breeding programs.
The polymerase chain reaction (PCR) technique, published in 1986 by Mullis et al. , allowed cassava scientists at CIAT to investigate genetic differences using minute amounts of DNA, coupled with random primer amplification to produce random amplified polymorphic DNA (RAPD) [70, 83, 86, 126, 142, 143]. RFLPs were therefore superseded by PCR-based markers . Since then, other PCR-based molecular markers tools have been adapted and deployed, such as amplified fragment length polymorphisms (AFLPs) [72, 73], inter-simple sequence repeats (ISSRs) [56, 115, 117], single sequences repeats (SSRs) [71, 73, 74, 78], sequence-related amplified polymorphisms (SRAPs) , inter-sequence tagged repeats (ISTRs) , and diversity arrays technology (DArT) . Over the past 30 years, SSRs have been the molecular marker approach most widely used in cassava, both for variety identification and to estimate the genetic diversity of the crop (Table 2). Chavarriaga-Aguirre et al. , used this approach to search for duplicates in the CIAT’s core collection, but reported a lower frequency of duplicates than was reported by Ocampo and co-workers .
The release of the cassava reference genome by Prochnik et al.  allowed cassava geneticists in Africa and LAC to identify tens of thousands genome-wide sequence variations across multiple landraces and improved cultigens [2, 145, 146]. These genomic variations were unraveled by re-sequencing using restriction-site associated DNA-sequencing (RAD-seq)  or genotyping by sequencing (GBS) . These two methods can detect small genetic differences between individuals, and therefore may be useful for studying organisms with reduced genetic variation, such as those found in clonal lineages, such as cassava, or highly inbred organisms, such as maize. However, one has to ask whether the large number of SNPs resolved with or without prior knowledge of the genome are more reliable than SSRs or SNP arrays built from expressed sequence tag databases with a high frequency of heterozygous loci in the population. Two SNP arrays have been built for cassava: the Illumina GoldenGate 1,190SNPs-assay by Ferguson et al. , and the Fluidigm® Dynamic 96 SNP Array™ SNPY-Chip by Becerra Lopez-Lavalle and co-workers at CIAT [4, 12, 105].
4. Current status of PCR-based DNA analysis for variety identification
PCR-based DNA molecular markers have been used to assess the genetic diversity of cassava, and to establish the relationships among genotypes (Table 2). In 1994, Marmey et al.  showed the value of RADPs for analyzing the crop’s genetic diversity, as well as for detecting duplicated accessions (10%) among collections. In 1996, Angel et al.  showed that RADPs give comparable results to RFLPs, offering a cost- and time-effective alternative to restriction and hybridization DNA analysis. Another powerful PCR-based molecular marker tool used in cassava  is the AFLP method used by Vos et al. , in which selected restriction fragments from the digestion of total DNA are reduced in complexity by PCR and resolved with 1 to 2 bp difference. Roa et al.  concluded that AFLPs were an effective and efficient molecular methodology with which to estimate genetic similarities in the genetic variability of cassava, and among other Manihot species.
Of the 77 studies listed in Table 2, 13% used RAPDs, and 12% used AFLPs, including studies incorporating morphological descriptors [50, 51, 55, 64, 68, 72, 73, 111]. Both molecular marker methods have been shown to be powerful and able to provide genetic data that reflects the observed phenotypic differences, geographic origins, and pedigree background of the plants. The AFLP fingerprinting technique detected a larger number of duplicates in the African and LAC cassava landraces than RAPDs, suggesting that AFLPs are a suitable for estimating genetic similarity and dissimilarity . The identification of duplicates across these studies ranged from 4 to 35%. AFLP data indicated that cassava varieties can become widespread and adopted by farmers under different names, leading germplasm curators to consider them to be different varieties.
SSRs, which were used in 47% of the studies reviewed here (Table 2), and their use has been favored over that of RADPs or AFLPs in cassava. SSR markers are abundant and evenly distributed across the cassava genome, are co-dominant, highly polymorphic, and are not influenced by the environment . Compared with AFLPs, SSRs are less technically challenging to implement. These marker system data can easily be shared across different laboratories, particularly if fingerprinting data is generated with fluorescently labeled SSR markers and resolved in capillary DNA-sequencing instruments. Overall, the authors consulted for this review agreed that SSR profiles generated for improved and landrace genomes were extremely useful in the conservation of diversity in Africa, Asia, LAC, and SEA, as well as for guiding the best crop improvement strategy. Studies involving the development of molecular tools to accelerate the introgression of observed phenotypic differences on disease resistance, such as cassava mosaic disease (CMD) have been extremely successful in identifying the SSRs that will best guide this effort. CMD resistance has been efficiently introgressed into LAC’s breeding lines, and successfully transferred to Africa [150, 151, 152].
Over the last decade, we have witnessed an important shift in the cassava research community in Africa and LAC, led by IITA and CIAT, toward sequence-based nucleotide variation mining. In sub-Sahara Africa, Ferguson et al.  characterized and validated 1,190 SNPs using the V4.1 of the cassava genome  and Illumina’s GoldenGate assay. They demonstrated that SNP markers could successfully measure the genetic variability of cassava, while accurately detecting duplicates in the IITA’s gene bank collection. The SNP data of Ferguson et al.  allowed, the comparison of the genetic diversity between cassava varieties from the Americas and Africa, and showed that cassava from the Americas displayed greater genetic diversity than their counterparts in Africa. These researchers showed that the levels of genetic diversity in west, southern, eastern, and central Africa were similar. These two observations suggested a massive adoption by IITA of improved varieties developed for African farmers.
In 2015, Rabbi et al.  undertook a large varietal identification survey on 917 accessions using 56,489 SNP loci generated by next-generation sequencing , compared against 64 released cassava varieties and popular landraces in Ghana. Rabbi et al.  accomplished variety identification and ancestry estimation through two complementary cluster methods: distance-based hierarchical clustering, and model-based maximum likelihood admixture analysis. They found that 30% of the identified accessions from farmers’ fields matched specific released varieties. A hierarchical clustering analysis revealed that the number of major varieties was 11, and 69% of the accessions belonged to one of the 11 groups, while the remaining accessions had two or more ancestries. Rabbi et al.  demonstrated that reduced subsets of SNP markers could reproduce the results obtained from the full set of markers, concluding that GBS can be performed at higher DNA multiplexing. However, these results, as well as those by Ferguson et al. , indicated that a large numbers of SNPs may not be needed to achieve accurate identification of cassava varieties, whether in farmers’ fields or in formal germplasm collections.
Concurrently, CIAT and the Beijing Genome Institute (BGI)  committed to developing genomic resources in the post-genomic era, with the aim of increasing scientists’ understanding of the evolution and distribution of cassava from its origin in the Americas to Africa and Asia. Next-generation sequence information from both wild and domesticated species offered cassava researchers the opportunity to investigate individual genes which have played a role in the domestication of cassava. Whole genome sequences allow researchers to exploit genomic variations associated with resistance to pests such as whiteflies or mites, and diseases such as frog skin disease and cassava brown streak disease, as well as to improve the nutritional value of the crops, such as by increasing the pro-vitamin A content. In 2013, CIAT’s geneticists and bioinformaticians explored the genetic variation present in 150 LAC accessions, and identified a panel of 180 highly informative single nucleotide variants (SNVs, MAF > 0.25), with high discriminative power and a uniform genome distribution of 5 to 10 SNP per chromosome. These SNVs were transferred to a SNPtype™ allele-specific PCR assay and validated on the same set of samples (Fluidigm® Dynamic Array™, USA) (Becerra Lopez-Lavalle, personal communication).
Of the 180 SNVs identified by CIAT, a 96 SNPs Fluidigm® Dynamic Array™ (referred to as an “SNPY-CHIP”) was first assembled and used by Peña-Venegas et al. , who aimed to validate the identity of 173 Amazonian cassava landraces classified as unique by indigenous growers. The cassava SNPY-CHIP allowed the classification of 44 genotypes into 21 duplicate-genotype clusters, confirming the uniqueness of 150 (87%) of the 173 materials identified as unique by indigenous people of the Colombian Amazon. The SNPY-CHIP array also allowed the exploration of the diversity and population structure of these materials. When the 150 unique genotypes characterized in this study were compared with genotypes from the CIAT core collection, the cassava genotypes from the Tikuna community of San Martín de Amacayacu (AMA) appeared to be closely related to Peruvian manioc genotypes (PER). CIAT scientist demonstrated that these SNP markers have a very low genotyping error rate, and are easy to store and share in genotype databases. The information generated from the 99 accessions evaluated along with the 150 from the Peña-Venegas et al.  study allow us to assess the value of each SNP with a high MAF, indicating a genotyping success. The 99 cassava genotypes represents a good sampling of the global cassava germplasm collection. Of the 99 genotypes, 71 were from the Americas: five from Argentina, two from Bolivia, four from Brazil, 26 from Colombia, three from Costa Rica, three from Cuba, three from Ecuador, five from Guatemala, five from Mexico, two from Panama, two from Paraguay, five from Peru, one from Puerto Rico, and five from Venezuela; nine from Asia: two from China, three from Thailand, two from Indonesia, and two from Malaysia; three hybrids from ICA-CIAT (Colombia), three genotypes from Africa (TMS60444, C18 and TME3) and 13 samples of unknown origin (AM206-5, AM560-2, FLA 21, FLA61, FLA 19, GLA8, GM905-52, GM905-57, GM905-60, SM301-3, SG107-35, GUT64, and JAC3). These 99 genotypes exhibit good phenotypic differentiation and are likely to be of ancient origin in the Americas. Of the 272 genotypes analyzed, 249 (91%) were unique genotypes, showing the effectiveness of these SNPs for varietal identification and the identification of duplicates (9%). The SNP results unequivocally identified all accessions, including those nominated as morphological duplicates.
The accurate identification of cassava varieties in the Colombian Amazon using DNA-based SNPY-CHIP provided the opportunity to undertake large variety adoption studies using SNP-based DNA fingerprinting. This approach established the basis for the methodology of a multidisciplinary approach and for synergy of efforts between agricultural scientists and economists . Floro et al.  estimated the level and determinants of adoption of improved varieties in the Cauca department of Colombia, using the SNPY-CHIP. They collected cassava samples from each variety identified by cassava growers, and interviewed 217 households in Cauca, Colombia. Four hundred and thirty six cassava samples were collected, and DNA fingerprinting was undertaken using the SNPY-CHIP. The genetic analysis allowed the identification of duplicated genetic material, as well as the improved hybrids developed by CIAT, thus reducing the 117 named varieties by farmers to 60 true genetic types found in CIAT’s germplasm collection or its global cassava breeding program (Figure 1). A set of 60 unique genotypes was identified showing this set of genotypes are missing at CIAT’s germplasm collection (Figure 1). DNA fingerprinting was therefore shown to be important in the procurement of new germplasm to introduce into breeding programs or furnish publicly funded gene banks with the most diverse and complete set of accessions. The cassava genetics research team at CIAT reorganized the 436 stem samples collected, and planted them back in the Cauca region in the Morales Municipality, to assess their morphological features. The morphology displayed by each of the 120 varietal plots confirmed the results obtained by SNP fingerprinting (Figure 2).
An ambitious variety adoption study using DNA fingerprinting (SNPY-CHIP) and socioeconomic approaches was undertake by CIAT scientists in Vietnam . The cassava germplasm found in Vietnam has very limited morphological description and molecular information, limiting its use for breeding. However, farming communities in Vietnam have maintained traditional knowledge about this genetic diversity through the vernacular names given to varieties. Depending on the context, however, informal naming of varieties can lead to either overestimates or underestimates of crop diversity. Ocampo et al.  studied the varietal composition found in Vietnamese cassava production regions using SNP markers. They procured 97 different varieties based on farmer identification, from a total of 1,570 cassava genotypes collected across six agro-ecological zones. Vietnamese farmers distinguished the different varieties mainly by the morphology of the vegetative parts, such as Bamboo Leaf, Long Leaf, Purple Bud, Red Bud, and Red Branch. CIAT’s SNPY-CHIP allowed for the characterization of 85 distinct genetic groups out of the 1570 genotypes collected, and indicated a 12.4% overestimation of varietal differences based on vernacular names given by local farmers. When compared against CIAT’s global germplasm reference set, the allele diversity contained in 85 genetically distinct varieties represents a rich and diverse collection. Hence, a set of ten major varieties grown across Vietnam, named KM94, KM419, BRA1305, KM101, KM140, PER262, KM60, KM57, and two unidentified varieties with a high accounted for 82% of the frequency distribution, of which KM94 (KU50) and KM419 represented 48% of the genotypes investigated.
5. Conclusions: challenges and future perspective for the varietal and cultivar identification of cassava
This review has highlighted the potential of SNP-based variety identification in cassava, as a means to assess the rate of variety adoption, acquisition of novel genetic resources, and quality control of breeding products. Further progress toward a full characterization of varieties across all cassava growing regions, using SNP-based approaches, can be anticipated. Among the 21 morphological and 77 molecular-based variety identification studies used in this review (Tables 1 and 2), those based on morphological descriptors are lengthy, time consuming, labor intensive, and space demanding. As the number of varieties to be evaluated increases, the number of morphological descriptors available for the identification of new genotypes is limited.
The basic principles of molecular marker technologies focus on the detection of polymorphisms, from protein or ribonucleic acid information. For cassava isozymes, RFLPs, RAPDs, SSRs, ISSRs, SRAPs, ISTR, AFLPs, DaRTs, and SNPs have been successfully used for detecting genetic variation in the crop (Table 2). Of these markers, SSRs are by far the most popular molecular method used by cassava scientists order to describe the differentiate among varieties and to measure the crop’s genetic diversity. Nearly 17% of the 4,950 materials that underwent varietal identification were fingerprinted using SSR markers. However, SSR-based fingerprinting data has limited use outside the discrete experimental units evaluated in this review, thus limiting the opportunity to consolidate and globally use SSR genotyping information into a general database, which could enable a global variety identification system.
Unlike SSRs, SNP alleles have been recommended for the construction of shared DNA fingerprinting databases . CIAT’s newly designed SNPY-CHIP has been used to genetically characterized approximately 2,100 cassava genotypes, collected from both farmers’ fields and in ex-situ collections (Table 2). This set of 96 single SNPs are well-distributed throughout the cassava genome. These SNP markers have proven to be stable and repeatable, and have a high power of discrimination. The SNPY-CHIP alleles initially deployed in the Fluidigm® Dynamic Array™ technology (San Francisco, CA, USA) should be transferable across platforms, allowing for direct global data analysis, with SNP information coming from next-generation sequencing performed by other laboratories or research groups. CIAT, through CGIAR, has a collaborative agreement with Intertek-AgriTech (https://www.intertek.com/agriculture/agritech/) to access genotyping services, thus ensuring high quality, cost-effective data production. The emphasis on high quality breeding products stresses the need for quality control at all levels of the variety development pipeline, ensuring traceability and preservation of identity. This is the first step toward building a robust identification platform for the global conservation and use of cassava, as well as standardizing the administration and management of plant varieties. Considering the effectiveness the 96 SNPY-CHIP markers, in 2021, we transferred them to the Intertek genotyping platform with the support from the Excellence in Breeding (EiB, https://excellenceinbreeding.org/toolbox/collection/236), and 93 markers passed the validation stage with 345 diverse accessions from the genebank and breeding progenitors. These markers are publicly available in the EiB low-density genotyping platform for quality control and variety identification for the cassava community. A databased with more than 2,100 accessions genotyped using these 96 SNP markers has been developed and maintained in the cassava program at CIAT, which will enhance the variety identification and genetic diversity analysis for the global cassava community.
As growing emphasis is placed on quality, at all levels, and on traceability and the preservation of the identity of varieties, accurate identification of the varieties of cassava grown by farmers will improve its management and production, and facilitate tracking and replacing specific varieties. Breeders can replace varieties susceptible to pests and diseases with more tolerant or resistant varieties. Knowledge of the distribution of susceptible varieties will help policy makers to target breeding for the development of resistant of tolerant varieties for full varietal replacement and seed system development.
The author acknowledge the support of Ricardo Labarta, Tatiana Ovalle and Janneth Gutierrez in helping the author to demonstrate the use of molecular marker to generate cassava adoption estimates necessary for impact assessment studies, adoption in farmer’s fields. Without them, the work considered in this paper would not have been possible. The author also appreciate the comments and suggestions from the anonymous reviewers. The funded was provided by CGIAR Research Program on Roots, Tubers and Bananas (RTBs) (Grant No: 02-2019-RTB II-CIAT) and USAID (Grant: MTO No. 069033).
Conflict of interest
The authors declare no conflict of interest.