Abstract
Coffea arabica L. produces a high-quality beverage, with pleasant aroma and flavor, but diseases, pests and abiotic stresses often affect its yield. Therefore, improving important agronomic traits of this commercial specie remains a target for most coffee improvement programs. With advances in genomic and sequencing technology, it is feasible to understand the coffee genome and the molecular inheritance underlying coffee traits, thereby helping improve the efficiency of breeding programs. Thanks to the rapid development of genomic resources and the publication of the C. canephora reference genome, third-generation markers based on single-nucleotide polymorphisms (SNPs) have gradually been identified and assayed in Coffea, particularly in C. arabica. However, high-throughput genotyping assays are still needed in order to rapidly characterize the coffee genetic diversity and to evaluate the introgression of different cultivars in a cost-effective way. The DArTseq™ platform, developed by Diversity Arrays Technology, is one of these approaches that has experienced an increasing interest worldwide since it is able to generate thousands of high quality SNPs in a timely and cost-effective manner. These validated SNP markers will be useful to molecular genetics and for innovative approaches in coffee breeding.
Keywords
- Coffea spp.
- high throughput genotyping
- molecular markers
- plant breeding
- DArTseq
1. Introduction
Coffee is an important crop and the second most traded commodity in the world (after petroleum) providing a living to more than 125 million people. Commercial coffee production is controlled by only two species belonging to the
The development of new genomic tools can help us explore, more deeply and more precisely, the genomic diversity at intra and inter-specific levels [4]. Two examples of high-throughput platforms include next-generation sequencing (NGS) [5] and the development of DNA microarrays [6]. Compared to a whole-genome sequencing methodology, an SNP array approach provides time-effective, low-cost and more straightforward genotyping technology for germplasm screening [7, 8].
Thanks to the rapid development of genomic resources and the publication of the reference genome [9], third-generation markers based on single-nucleotide polymorphisms (SNPs) have gradually been identified and assayed in
2. Genetic diversity of Coffea arabica L.
The
The assessment of population structure and genetic relationships of these ET accessions, among themselves and in relation to traditional cultivars is fundamental for efficient use of genetic diversity of these genotypes in Arabica coffee breeding programs [25]. However, selection of genetically diverse parental lines based on morphological and agronomic traits is often difficult because of a high degree of morphological similarities [26].
During the past 30 years, molecular markers have been increasingly used in germplasm diversity assessment of various crops [27, 28]. The molecular information allows gaining insight into the genetic structure of individual genotypes, and eventually helps in accurate selection of superior genotypes for maximizing selection gains [29].
3. C. arabica diversity assessment by molecular markers
Several works on the assessment of Arabica genetic diversity have been carried out with different results. Generally, among different types of material (cultivars, accessions, hybrids, and spontaneous genotypes) practically all studies show a very low genetic variation by using different marker systems [3]. Arabica’s genetic diversity has been evaluated by a range of molecular markers, such as Random Amplified Polymorphic DNA (RAPD) [30, 31], Inter Simple Sequence Repeat (ISSR) [32], Simple Sequence Repeat (SSR) [23, 29, 33, 34], SSR and Amplified Fragment Length Polymorphism (AFLP) [35, 36].
In a recent study presented in the World Coffee Research annual report a genetic diversity assessment of 800 Arabica’s accessions from the collection at CATIE, Costa Rica, shows the least genetic diversity of
Of course, all
In contrast, ET germplasm may be a good source for sensory quality traits in cup. The cup quality profile of the new Arabica’s F1-hybrids developed for Central America is said to derive largely from one of the two progenitors, being a selected ET accession of the FAO-1964 pool [42]. Silvarolla et al. found three coffee plants in offspring of ET germplasm, which were nearly caffeine-free [43]. Male sterility has been detected in a few ET accessions, a character useful for F1-hybrid seed production [44].
4. Next generation sequencing techniques in C. arabica
NGS incorporate technologies which, at low cost and in short time, produce millions of short DNA sequence. The most commonly used platforms for high-throughput, useful genomic research, especially in non-model plant species include second generation sequencing techniques (SGseqTs): Illumina/Solexa, 454/Roche, ABI/SOLiD, and Helicos (read mostly in the range of 25 and 700 bp in length) [45]. Results obtained from such research point to the fact that NGS techniques (NGSTs) should not be restricted to the genomes of model organisms only as non-model plants have provided useful resources for genomic studies [45].
In contrast to classical molecular markers, SNPs are the most abundant markers, particularly in the non-coding regions of the genome [46]. NGS used jointly with different complexity reduction methods, Genotyping by sequencing (GBS) and DArTseq™ (Sequencing-based diversity array technology) methods, enable a large-scale discovery of SNPs in a wide variety of non-model organisms [47, 48, 49]. These techniques provide measures of genetic divergence and diversity within the major genetic clusters that comprise crop germplasm [50].
The genotyping profiles of SNPs can be compared across laboratories and sequencing platforms. These benefits have resulted in the increasing use of SNPs as high-quality markers for genotype identification in a wide range of crops [51], as recently demonstrated in cacao (
Although significant, the number of reports concerning genomic resources in
5. DArTseq™: an effective tool for genome diversity in C. arabica
The DArTseq™ technology, developed by DArT company (https://www.diversityarrays.com), is one of those methods that have received increasing interest worldwide since it can generate thousands of high-quality SNPs in a timely and cost-effective manner [59, 60]. The DArTseq™ method, a variation of GBS, implements complexity reduction methods that effectively targets low-copy sequences of the genome [61]. Besides, this process is optimized for each organism and type of study, by using combinations of restriction enzymes (REs) and selecting the most effective in reducing genome complexity [59].
The DArTseq™ technology has been utilized in diploid but more often in polyploid plant species, such as rice (
In coffee, we have reported a genetic diversity study in 87 accessions of
As a result, 16,995 SNP markers, derived from 34,000 unique sequences, were obtained by DArTseq™ from 87 accessions of different

Figure 1.
Heat map for the 87 accessions of
For the heat map, the genomic relations matrix
where

Figure 2.
Bar graphic of the STRUCTURE software used to study the diversity of the 87 coffee accessions using SNP marker data. The 87 genotypes are represented below the graphic, and were divided into five (K = 5) groups.
The sub-populations were denoted as Pop1, Pop2, Pop3, Pop4 and Pop5. The first group clustered
The results obtained from this
6. Advantages and disadvantages of NGS techniques in C. arabica genomics
High quality reference genome assemblies accelerate plant breeding by selecting desirable genes with improved agronomic traits, including high yield, tolerance to various abiotic and biotic stresses, and resistance to pathogens [68]. However, draft genomes are suffering from unknown sequences and ambiguous assembly due to homologous sequences, while high-quality genomes are required for comparative genomics and functional annotation to crop improvement [68, 69].
These NGSTs are classified as second and third generation. The success of these NGSTs is mainly due to advancement in nanofluidics and automated single molecule imaging [69]. SGseqTs refer to those methods which require a PCR step for signal intensification prior to sequencing and third generation sequencing techniques (TGSeqTs) are those which can perform single molecule sequencing (SMS) [70].
As an advantage, in SGseqT the variation is different in their sequencing chemistry, cost, accuracy, speed and read length; SGseqTs produce thousands to billions of nucleotide long reads (25–800 nucleotides) as compared to first generation sequencing method [69, 70]. However, as a disadvantage, the accuracy of SGseqTs differs due to dependence on several multiplication steps during library preparation, each manipulation causes various artifacts in DNA measurements; additionally, the small reads produced by these procedures are not suitable for de novo genome assembly [69, 70].
Therefore, novel technologies are being designed in such a way that involve a minimum or no manipulation of the natural DNA molecule; TGSTs are able to analyze natural DNA/RNA molecules without any manipulation and without amplification [70] TGSTs have average read length longer to 10 kb, the availability of long reads constitutes a great advantage.
The first SMS technology, was developed by Quake and commercialized in 2009 by Helicos BioSciences; it worked similar to Illumina sequencers, but without any bridge amplification [70, 71]. However, it was slow, expensive and produced relatively short reads, around 35 bp long; therefore, two single-molecule approaches were technologically advanced to overcame these disadvantages [72].
The first approach, Single Molecule Real-Time (SMRT) sequencing was developed by Craighead, Korlach, Turner and Webb and was further refined and commercialized by Pacific Biosciences (PacBio) since 2011 [73]. The second approach, Nanopore sequencing, was first hypothesized in the 1990s and further developed and commercialized by Oxford Nanopore Technologies (ONT) since 2005; the advantages of SMRT sequencing over NGS have come at the price of higher per base sequencing costs [70].
Finally, DArTseq™ technique is based on genomic complexity reduction. This technique benefitted from the development in NGSTs and now DArTseq™ markers are replaced by NGS-DArT markers. Sansaloni et al. [60] found that the combined use of DArTseq™ with NGS make available more quantity of markers than conventional DArT method. DArTseq™ markers in combination with other molecular techniques have been used to create deeper genetic maps in
7. A future in genomic resources of C. arabica
Arabica’s cultivars and landraces are generally propagated by seed. The mating system is primarily based on self-fertilization. Thereby, autogamy leads to high levels of inbreeding. Besides, an effective clonal propagation system is being adopted but limited for F1 Arabica hybrids. It is evident that molecular analyses of genetic diversity are needed to support this scenario [74, 75].
The development of a new coffee variety takes about 25 years. An efficient selection can be addressed when sequencing approaches are adopted in the variety development process [66, 76]. In the 1990s, Marker-Assisted Selection (MAS) was proposed, which enabled selecting individuals with specific alleles. However, MAS has shown to be inefficient in polygenic and/or low heritability traits [77]. Due to its potential and importance, genome-wide selection (GS) was developed by Meuwissen et al. [78].
With the development of NGSTs, GS has become a reality for several economically important species. However, the procedure requires precaution for polyploid species, which have subgenomes with duplicate regions or with high similarity, such as
Genome sequencing initiatives of Arabica accessions have been launched by several research groups (https://coffeegenome.ucdavis.edu/, among others) but an open-access genome assembly, with a reliable sorting of homologous sequences, is not yet available [77, 79]. Decoding the allotetraploid genome of
8. Conclusions
DArTseq™ technology identifies thousands of high quality SNP polymorphic markers in a timely and cost-effective manner. Our study confirmed that the genotyping method by DArTseq™ can be successfully used in studies of genetic diversity specially in coffee. In addition, trait-associated-SNPs identified by GWAS may be helpful to develop strategies aiming to improve the biochemical quality of coffee or another important trait. These SNPs markers may be useful for marker-assisted selection (MAS) in Arabica coffee breeding programs and genomic selection.
Acknowledgments
This work was supported by the FONDO SECTORIAL SAGARPA-CONACYT [2016-2101-277838].
Conflict of interest
The authors have no conflicting interests, and all authors have approved the manuscript and agree with its submission to IntechOpen.