Lessons from Common Bean on How Wild Relatives and Landraces Can Make Tropical Crops More Resistant to Climate Change

Warming is expected to lead to drier environments worldwide, especially in the tropics, and it is unclear how crops will react. Drought tolerance often varies at small spatial scales in natural ecosystems, where many of the wild relatives and landraces of the main crops have been collected. Through a series of examples, we will show that collections of wild relatives and landraces, many of those deposited at germplasm banks, may represent this desired source of variation, as they are genetically diverse and phenotypically variable. For instance, using a spectrum of genotyping and phenotyping approaches, we have studied the extent of genetic and phenotypic diversity for drought tolerance in wild and landraces of common bean (Phaseolus vulgaris L.) and compared it with the one available at cultivated varieties. Not surprisingly, most of the naturally available variation to cope with drought in the natural environments was lost through domestication and recent plant breeding. It is therefore imperative to exploit the reservoir of wild relatives and landraces to make crops more tolerant. Yet, it remains to be seen if the rate at which this naturally available variation can be incorporated into the cultivated varieties may keep pace with the rate of climate change.


Introduction: Common bean -a model to explore the usefulness of wild relatives and landraces as a resource for the future
In this chapter, we review the utility of genome-environment association approaches to infer the potential of wild accessions and landraces to make tropical crops more resistant to climate change, using the food crop common bean (Phaseolus vulgaris L.) as a model. Figure 1. Geographic distribution of wild relatives (light gray) and landraces (dark gray) of common bean and its diversity in terms of seed size, colors, and patterns.
Wild beans are thought to have diversified and adapted locally in South and Central America from an original range in Central America [1,2], after which domestication in the southern and northern ends of each region gave origin to Andean and Mesoamerican domesticates, respectively [3][4][5][6][7]. Both gene pools followed somewhat parallel pathways of dissemination through the world, generating new secondary centers of diversity in Africa and Asia [8].
Common bean is a source of nutrients and protein for over 500 million people in Latin America and Africa, and more than 4.5 out of 23 million hectares are grown in zones where drought is severe, such as in northeastern Brazil, coastal Peru, the central and northern highlands of Mexico, and in Eastern and Southern Africa [9,10]. This situation may worsen as increased drought due to climate change will reduce global crop production in greater than 10% by 2050 [11]. Increasing drought tolerance in common bean varieties is therefore needed.
Characterizing geo-referenced landraces and wild accessions of common bean at the genetic level (e.g., Figure 1) and quantifying single-nucleotide polymorphism (SNP) allelic associations with a bioclimatic-based drought index offer an efficient path to identify adaptive variations suitable to breed new drought-tolerant varieties.
In the following two sections, we fist explain the theoretical bases behind genome-environment associations, as well as its caveats (Section 2), and later we exemplify it with concrete cases that used geo-referenced landraces and wild accessions of common bean to infer naturally available adaptive variations (Section 3).

Strategies to infer adaptability of wild relatives and landraces to their natural habitats
Understanding the genomic signatures associated with environmental variation provides insights into how species adapt to their environment [13][14][15]. Recent genomic studies in wild populations have demonstrated that genome-environment associations, which are associations between SNP alleles and accessions' environment of origin, can indeed be used to identify adap tive loci and predict phenotypic variation. For instance, Turner et al. [16] predicted genetic adaptive variation to serpentine soils in Arabidopsis lyrata; Hancock et al. [17] identified climate-adaptive genetic loci among a set of geographically diverse Arabidopsis thaliana; Fischer et al. [18] predicted adaptive variation to topoclimatic factors in Arabidopsis halleri; Pluess et al. [19] predicted genetic local adaptation to climate at a regional scale in Fagus sylvatica; and Yeaman et al. [20] detected convergent local adaptation in two distantly related species of conifers.
This genome-environment association approach has also been explored in some crop accessions as a prospection strategy of germplasm, alternative to traditional phenotyping. For example, Yoder et al. [21] was able to capture adaptive variation to thermal tolerance, drought tolerance, and resistance to pathogens in Medicago truncatula; Lasky et al. [22] predicted genotype-by-environment interactions to drought stress and aluminum toxicity in Sorghum bicolor; and Berthouly-Salazar et al. [23] uncovered genomic regions involved in adaption to abiotic and biotic stress on two climate gradients in Cenchrus americanus.
Nonetheless, since genomic signatures associated with habitat heterogeneity can result from causes other than adaptation and selection [24,25], for example, random genetic drift (Figure 2), and are also influenced by differences in ancestral variation and recombination in the genome [27][28][29], some further approaches need to be undertaken to clarify the truthful nature of the divergent regions. For instance, the origin of habitat-associated variants from novel or standing genetic variation leads to distinctively different patterns of genomic divergence [30][31][32]. One approach that can help to distinguish these underlying causes of divergence is comparing summary statistics (i.e., Tajima's D) from different genomic sections because demographic processes usually leave genome-wide signatures while selection tends to imprint more localized regions [33]. Specifically, habitat-mediated purifying selection is associated with localized low values of nucleotide diversity (π) [34] and Tajima's D [35] and high scores of the Watterson's theta (θ) estimator [36] because only low-frequency polymorphisms can avoid being eliminated by widespread directional selection. Although recent population bottlenecks tend to achieve the same reduction in nucleotide variation, this pattern is expected at a more genome-wide level. Similarly, local adaptation tends to homogenize haplotypes within the same niche, fix polymorphisms in different populations, and eliminate lowfrequency polymorphism. Consequently, few haplotypes with high frequency are retained, Figure 2. Multiple causes explain genome-environment associations. External processes, such as divergent selection, which is the main focus when assessing adaptation in wild relatives and landraces of crops, is only one of many possible causes. At the same time, the genomic background may be homogenized by gene flow. Similarly, background selection and genomic features in regions of reduced recombination rate and shared ancestral polymorphism (more prone to genetic drift due to their reduced effective population size) could induce hotspots of spurious genome-environment associations. Therefore, besides external processes driven by natural selection, both inherent properties of the genome and the demographic and evolutionary history of the crop influence the extent of the genome-environment associations. Modified from Ravinet, Faria [26].
corresponding to high values of nucleotide diversity (π) and Tajima's D and low scores of the Watterson's theta (θ) estimator [33]. While independent domestication events, extensive population structure, and population expansions after bottlenecks can produce the same patterns, these demographic processes also imprint genomes at a more genome-wide level.
In the following two sub-sections, we explain how to implement genome-environment associations in order to infer the adaptability of wild relatives and landraces to their natural habitats (Section 2.1) and discuss ways to account for causes, other than adaptation and selection, that may be shaping the genomic landscape of signatures associated with habitat heterogeneity (Section 2.2).

Using genome-environment association scans to identify loci associated with bioclimatic-based indexes
First of all, in order to account for possible demographic effects, subpopulation structure must be determined in geo-referenced landraces and wild accessions using principal coordinates analysis (PCoA) implemented in the software Trait Analysis by aSSociation, Evolution and Linkage, TASSEL v.5 [37]. The same dataset and software can be used to perform association analyses between the SNP markers and bioclimatic-based indexes (e.g. [12,38,39]).
As a rule of thumb, a total of ten generalized (GLM) and mixed (MLM) linear models should be compared [40]. Within each model family, five models are usually built as follows: (1) models with the gene pool identity and the first two PCoA axes scores as covariates; (2) models with the within-gene pool subpopulation identity (e.g. [41]) and the first two PCoA axes scores as covariates; (3) models with the first two PCoA axes scores as covariates; (4) models with the within-gene pool subpopulation identity (e.g. [41]) as covariates; and (5) models with the gene pool identity as covariates. All five MLMs usually use a centered IBS kinship matrix as a random effect to control for genomic background implementing the EMMA and P3D algorithms to reduce computing time [42]. QQ-plots of the P-values should be inspected to assess whether excessive numbers of false positives are generated and choose in this way the optimum model. Significant associations are determined using strict Bonferroni corrections of P-values at alpha = 0.001, leading, for example, to a significance threshold of 4.4 × 10 −8 in a usual dataset of ca. 23,000 SNP markers (0.001 divided by the number of markers) or −log 10 (4.4 × 10 −8 ) = 7.36. The construction of customized PCoA and Manhattan diagrams can be carried out with the software R v.3.3.1 (R Core Team).
Finally, candidate genes for habitat adaptation can be identified within the 1000 bp sections, flanking each SNP marker that is associated with a bioclimatic-based index by using the corresponding reference genome (e.g. [5]) and the PhytoMine and BioMart tools in Phytozome v.12 (phytozome.jgi.doe.gov).

Accounting for genomic constrains by inspecting genome-wide patterns of variation
In order to identify causes other than adaptation and selection that may be shaping the genomic landscape of signatures associated with habitat heterogeneity (i.e., genomic constrains and genetic drift), sliding window approaches (e.g., window size = 1 × 10 6 bps, step size = 200 kb) can be implemented to describe patterns of variation and overall divergence across the genome. For instance, SNP density, nucleotide diversity as measured by π [34], Watterson's theta (θ) estimator [36], and Tajima's D [35] can be computed using the software TASSEL v.5 [37] and customized R scripts. Results of all windowed analyses are usually plotted against window midpoints in millions of base pairs (Mb) in the software R v.3.3.1 (R Core Team). The centromeres can be marked to visualize the extent of the centromeric repeats and its correlation with overall patterns of diversity and divergence.
It is advisable to calculate bootstrap-based means and 95% confidence intervals around the mean for some summary statistics (i.e., SNP density, π, θ, and Tajima's D) when computed in sliding windows that contained or did not contain at least one marker that was associated with a bioclimatic-based index. For this, each summary statistic of windows containing and not containing associated SNPs should be randomly resampled with replacement (bootstrapping) across windows within grouping factors (associated vs. not associated). The overall mean is then stored for each grouping factor. This step should be iterated at least 1000 times using customized R scripts. Bootstrapping must be performed independently for each summary statistic in order to eliminate correlations among these.

The adaptive potential of wild relatives and landraces in common bean
In common bean, ecological gradients related with drought stress are associated with divergent selection at the genetic level, after accounting for gene pool and subpopulation structure. This divergent selective pressure might be a consequence of local-level rainfall patterns. Specifically, in tropical environments near the equator with bimodal rainfall, a mid-season dry period occurs that can last 2-4 weeks. In contrast, in the sub-tropics, a dry period of 3 or more months can occur. In response to this mid-cycle drought of the sub-tropics, P. vulgaris enters a survival mode of slow growth and reduced physiological activity until rainfall resumes and flowering occurs [43]. Beans growing in wetter conditions on the other hand are less frequently subjected to these environmental pressures and have a fitness advantage to mature in a shorter length of time. Given these ecological differences, and consistent with genomic signatures of divergent selection, the reaction typically associated with drought tolerance, although favorable under dry conditions, seems detrimental under more humid conditions. The awareness about this trade-off may aid the breeding of new drought-tolerant varieties specifically adapted to unique microenvironments (e.g. [44]) and local regions rather than varieties eventually obsolete, originally intended for a wider range of environments.
In the next two sub-sections, we summarize the concrete evidence supporting these statements (Section 3.1) and explain how we can discard other fortuitous causes that may also explain the same pattern (Section 3.2), based on the approaches that we introduced in the previous section (Section 2).

The signatures of adaptation in common bean are widespread throughout the genome
SNP markers are good at recovering the well-described Andean and Mesoamerican gene pool structure and the five within-gene pool subpopulations observed in wild common bean [41]. Because of this, in a previous research by us with more than 22,000 SNP markers, QQ-plots, from the association analyses between those SNP markers and a bioclimatic-based drought index [12], indicated that GLM analyses likely had excessive rates of false positives whereas MLM models controlling for population structure and using a kinship matrix reduced more effectively the false positive rate.
In that particular case, the MLM model with the first two PCoA axes scores used as covariates was the best at controlling for false positives. This model yielded a total of 115 SNP markers associated with the bioclimatic-based drought index at a Bonferroni-corrected significance threshold of 7.36 -log 10 (P-value). These markers explained on average 51.3% ± 0.4 of the variation in the bioclimatic-based drought index. The 115 SNPs were clustered in 90 different regions, defined as overlapping 1000 bp sections that flanked associated markers (Figure 3). Associated SNPs and regions were widespread in all 11 common bean chromosomes. Pv8 had more regions with at least 2 associated SNPs than any other chromosome, and these regions had more associated SNPs than in any other chromosome for a total of 5 regions with an average number of associated SNPs of 3.2. The single region that contained more associated SNPs was also situated in chromosome Pv8 with 6 SNPs explaining on average 51.1% ± 0.3 of the variation in the bioclimatic-based drought index. After chromosome Pv8, Pv3 was also outstanding, having 4 regions (with at least 2 associated SNPs) with an average number of associated SNPs of 2.5. Therefore, a total of 75 regions, comprising 99 SNP markers associated with the bioclimatic-based drought index, contained at least 1 gene for a total of 77 genes. Most genes were in chromosomes Pv1, Pv3, and Pv8 with 11, 14, and 16 genes.
Only two regions, at chromosomes Pv1 and Pv8 and containing a total of seven different SNPs, spanned two or more genes. The one in Pv8 was the region with more associated SNPs (six in total). One of the two genes in this region encoded an Ankyrin repeat-containing protein, which was associated with osmotic regulation via the assembly of cation channels in the membranes [45]. Among other identified candidate genes, there was a phototropic-responsive NPH3 gene [46] in Pv3.

Rampant divergent selection: interpreting genomic signatures of adaptation in common bean beyond genomic constrains
As a follow-up of the previous example, associated genomic windows were enriched for SNP density and positive Tajima's D scores. This conclusion was achieved after implementing a sliding window analysis to explore the patterns of genome-wide diversity (Figure 3). Marker density decayed drastically toward the centromeres. This decay in diversity proportional to the decay in the rate of recombination was first described in D. melanogaster and has been confirmed in many organisms since then. The correlation was initially understood as an effect of genetic hitchhiking, but background selection has been increasingly appreciated as a contributing factor [28], perhaps in many cases the dominating one.  ). These very same statistics were compared between 1 Mb sliding windows that contained (associated) or did not contain (no associated) at least one marker that was associated with the bioclimatic-based drought index.
Selective process, such as purifying selection and local adaptation (divergent selection), differentially imprint regions within the same genome, causing a heterogeneous departure of genetic variation from the neutral expectations and from the background trend [28]. Divergent selection tends to homogenize haplotypes within the same niche, fix polymorphisms in different populations, and eliminate low-frequency polymorphism. Consequently, few haplotypes with high frequency are retained, corresponding to high values of nucleotide diversity and Tajima's D and low scores of the Watterson's theta (θ) estimator [33]. We have identified these signatures in the various genomic regions associated with a bioclimatic-based drought index. Therefore, it is unlikely that independent domestication events, extensive population structure, and population expansions after bottlenecks are responsible for these patterns because the mixed linear model that we used to identify the genome-environment associations accounted for population structure, while demographic processes would leave genomewide signatures in both, associated and no-associated windows.

Conclusions
Wild accessions and landraces of common bean occupy more geographical regions with extreme ecologies [2] and extensive drought stress [12] than cultivated accessions. Those regions include the arid areas of Peru, Bolivia and Argentina, and the valleys of northwest Mexico. Hence, a broad habitat distribution for wild common bean has exposed these genotypes to both dry and wetter conditions, while cultivated common bean has a narrower distribution and is traditionally considered susceptible to drought. These differences in the ecologies of wild and cultivated common bean have been associated with higher genetic diversity in the former group when surveying candidate genes for drought tolerance such as the ASR [47], DREB [48], and ERECTA [49] gene families, once the population structure [41] and the background distribution of genetic diversity have been accounted for.
Also, as identified through the genome-environment association approach that was illustrated in this chapter, there are notorious differences between the adaptations of wild accessions and landraces found in arid and more humid environments, in congruence with natural divergent selection acting for thousands of years. Several of these differences might be valuable for plant breeding. Therefore, we reinforce, as was envisioned by Acosta [50], that wild accessions and landraces of common bean be taken into account to exploit naturally available divergent variations for drought tolerance. We envision that this lesson from common bean will inspire the exploitation of wild relatives and landraces of other crops to face the threats imposed by current climate change.

Prospects
This chapter ultimately illustrates that genomic signatures of environmental adaptation (e.g. [51]) are useful for germplasm characterization, potentially enhancing future marker-assisted selection and crop improvement. We envision that genome-environment association studies coupled with estimates of genome-wide diversity will become more common in the upcoming years. These types of studies will likely go beyond estimates of drought tolerance, as exemplified here, to also include estimates regarding frost stress (i.e. [52][53][54]), nutrient limitation [55,56], as well as other threats imposed by climate change [57,58] in different types of ecosystems (e.g. [59]) and screened by a variety wide range of genotyping techniques [60][61][62][63]. Genomic selection models [64] could also incorporate at some point environmental variables in order to improve the prediction of phenotypic variation and the estimation of the genotype-by-environment interactions as well as phenotypic characterizations through novel high-throughput methods such as remote sensing and image analysis [65], and novel models such as genome-environment associations [66] in the light of linkage disequilibrium (LD) [67] and various stochastic approaches [68,69].