Bean Genome Diversity Reveals the Genomic Consequences of Speciation, Adaptation, and Domestication Bean Genome Diversity Reveals the Genomic Consequences of Speciation, Adaptation, and Domestication

Here we review whether genomic islands of speciation are repeatedly more prone to harbor within-species differentiation due to genomic features, such as suppressed recombination, smaller effective population size, and increased drift, across repeated hierarchically nested levels of divergence. Our discussion focuses on two species of Phaseolus beans with strong genepool and population substructure and multiple independent domestications each. We overview regions of species-associated divergence, as well as divergence recovered in within-species between-genepool comparisons and in within-genepool wild-cultivated comparisons. We discuss whether regions with overall high relative differentiation coincide with sections of low SNP density and with between-species pericentric inversions, since these convergences would suggest that shared variants are being recurrently fixed at replicated comparisons, and in a similar manner across different hierarchically nested levels of divergence, likely as the result of genomic features that make certain regions more prone to accumulate islands of speciation as well as within-species divergence. We conclude that neighboring signatures of speciation, adaptation, and domestication in Phaseolus beans seem to be influenced by ubiqui tous genomic constrains, which may continue shaping, fortuitously, genomic differentiation at various other scales of divergence. This pattern also suggests that genomic regions impor tant for adaptation may frequently be sheltered from recombination.


Introduction: A strategy to discern among confounding causes of genomic divergence
Genomic signatures associated with species, genepools, and ecotypes' divergence can result from causes other than reduced gene flow, for example, random genetic drift and selection [1]. Moreover, the origin of the outlier variants from novel or standing genetic variation leads to distinctively different patterns of genomic divergence [2][3][4]. One approach that can help to distinguish these underlying causes of divergence is carrying out a replicated sampling of contrasting populations [5,6]. If genetic drift rather than selection is responsible for the divergence, it is unlikely that signals of differentiation reappear consistently across replicates [5]. On the other hand, if selection acted on the same genetic variants at the replicated contrasting pairs, genomic regions with comparatively high divergence between individuals from contrasting populations should be identical at each of the replicated populations. Parallel selection on shared genetic variation should therefore lead to low divergence within populations and across replicates, in the exact genomic regions where equivalent variants are selected at each contrasting population [6]. Discerning among gene flow, genetic drift and selection as the cause of parallel genomic divergence are possible as long as there is some degree of replication considered in the sampling of contrasting populations.
The genomic landscape of divergence can also be influenced by differences in ancestral variation and recombination in the genome [7,8]. Lineage sorting may be enhanced relative to background levels by a reduction in the effective population size (N e ) due to processes other than gene flow, like low recombination [8][9][10]. Since differentiation is further speeded up in low-recombining regions because of linked selection [11][12][13], the imprint caused by genomic features on the differentiation landscape should be ubiquitous across different levels of divergence. Therefore, besides a replicated sampling of contrasting populations, a hierarchical nested sampling across various scales of divergence is advisable in order to examine whether genomic islands of divergence may display differentiation due to suppressed recombination, smaller effective population size, and increased drift.
In order to discern among confounding causes of genomic divergence in a system with strong population structure and subjected to domestication, we suggest conducting the following analyses by taking advantage of a replicated hierarchical nested sampling across various scales of divergence: A. Analyze whether F ST outliers between species coincide with high F ST values at within-species comparisons. This pattern is expected if genomic islands of speciation are repeatedly more prone to harbor within-species divergence as a result of limited recombination [8].

B.
Assess whether the within-species between-genepool divergence F ST profiles are similar among four available comparisons. This trend is expected if the same variants were selected as the result of similar selective pressures at multiple domestication events, but not if divergence outliers were due to population divergence, that is, genetic drift [5].
C. Assess whether the within-genepool wild-cultivated divergence F ST profiles are similar among the available comparisons. This coincidence is expected if the same variants are selected as the result of parallel domestication but not if divergence is due to genetic drift [5].
Finally, we suggest exploring if regions of high F ST co-localized with regions of low F ST in within-population comparisons. Δ Div can be used to analyze the difference between these two F ST values in each window. Peaks in the Δ Div statistic point to genomic regions that diverged as a result of parallel divergence from shared variation rather than due to novel variation evolving at each site [6].

Beans as a model system to study divergence across various scales of divergence in a replicated hierarchical nested framework
Phaseolus beans, with their striking genepool structure and multiple domestications, constitute an excellent model system [14,15] to the approach described in the previous section and to explore to what extent genomic features, besides reduced gene flow and divergent selection, may lead to genomic divergence between (i.e., speciation islands) and within (i.e., during the natural colonization of new habitats as well as part of the domestication syndromes) species [16]. Common and lima beans are the only bean species with multiple domestications among the five domesticated species of Phaseolus [14]. Wild common bean (P. vulgaris L.) diverged from its sister species in the tropical Andes [17] and colonized South and Central America from its original distribution in Central America, originating what nowadays is known as the Andean and Mesoamerican genepools. Independent domestications in each genepool gave rise to the Andean and Mesoamerican cultivars [18][19][20]. On the other hand, wild lima bean (P. lunatus L.) diverged from common bean, after which natural spread also led to a strong genepool structure, with two Andean and two Mesoamerican genepools. Further independent domestications happened in one Andean and one Mesoamerican genepools [21].
With this in mind, in this chapter we discuss how the recurrent phylogeographic splits and nested domestication events of common and lima beans help understand whether genomic islands of speciation in Phaseolus species are more prone to harbor within-species divergence due to reduced recombination and increased drift (Figure 1). We concretely focus our discussion by asking the following questions: 1. Are between-species F ST outliers recovered in within-species comparisons?

2.
Is there any parallelism in the within-species divergence F ST profiles?

3.
Are low-recombining regions (i.e., centromeres) more prone to exhibit divergence across repeated and hierarchically nested scales of divergence?
If there were some parallelisms in the genetic adaptations to the Mesoamerican and Andean environments or in the genetic consequences of the domestication syndromes, then there would be matching signals of differentiation in the within-species between-genepool divergence F ST profiles and in the within-genepool wild-cultivated divergence F ST profiles, respectively. These patterns of repeatability would not be observed if between-genepool and wild-cultivated divergence outliers were due to genetic drift [5], if selection pressures were different [22] or if equivalent selective forces did not act on the same shared variation [6,23].
Yet, genomic constrains, rather than true signals of convergent adaptation and domestication, could still be the reason for these parallelisms. If genomic features were indeed constraining divergence, then genomic islands of differentiation would coincide with low-recombining regions regardless the nature and the scale of divergence.

Evidence that genomic features constrain divergence across scales
By looking at the genomic diversity patterns in common and lima beans [24][25][26][27][28][29][30], there is evidence that differentiation across repeated and hierarchically nested levels of divergence always co-occurs with regions of low SNP density (Figure 2). Increased lineage sorting, and consequently rapid differentiation, is a common phenomenon in low-recombining regions because of linked selection and a reduction in the effective population size [8][9][10]. Likewise, low-recombining regions also tend to exhibit a decline in diversity due to background selection and, to a lower extent, because of genetic hitchhiking [11]. This can be understood as evidence that regions with low SNP diversity are enriched for contiguous signatures of differentiation between bean species, between genepools, and as part of the multiple domestication syndromes. These concurring signatures could be a by-product of genomic constrains inherent to low-recombining regions.
One of the regions that repeatedly exhibit high differentiation across hierarchically nested levels of divergence in the presence of low SNP density is the centromeric section of chromosome Pv11. The wild-cultivated divergence peak in this chromosome is shared by three domestication syndromes and is located beside the outlier peak detected for all within-species between-genepool comparisons, which in turn coincides with a major between-species peak.
In this wide section of chromosome Pv11, there are indications that convergent divergence is consistently correlated with very low SNP density, as expected because of combined effects of linked and background selection in low-recombining regions [8][9][10]22]. The observation that genomic constrains are biasing divergence across scales in this section of chromosome Pv11 is reinforced by the fact that previous genomic scans did not attribute to this region a consisted outstanding role during the domestication syndromes [20,21] or in conferring adaptation to different environments and latitudes across the Americas [31]. The only exception is the candidate gene influencing plant size (Phvul.011G213300) as part of the Mesoamerican domestication syndrome of common bean [20], but then this pattern has not been consistently reported for the other domestication events as to explain its steady repeatability across hierarchically nested levels of divergence in windows with low SNP density.
Other "hotspots" for spurious divergence due to genomic constrains may be the regions with low SNP density in chromosomes Pv8 and Pv10 that exhibit signatures of between-species divergence as well as repeated between-genepool and within-genepool wild-cultivated divergence (Figure 2). The region in chromosome Pv8 was previously reported to be highly divergent during the domestication of the Andean common bean, but then there were not candidate genes in this region associated with that domestication syndrome in particular [20], despite that the same region is known for being involved in plant and seed growth (i.e., Phvul.008G168000) during the Mesoamerican domestication of the same species. This paradox may then be a consequence of genomic constrains obscuring genuine anthropic selection and repeatedly forcing divergence in this region. Similarly, the wide divergent region in chromosome Pv10, characterized by two outlier peaks split by a "high valley," actually matches a pericentric inversion between species [32], exemplifying how genomic features inexorably condition differentiation across scales of divergence.
The observation that low-recombining regions are enriched for differentiation across repeated and hierarchically nested levels of divergence in Phaseolus beans opposes the profiles of the genome-wide selection scans carried out in common bean. While low-recombining regions are more prone to exhibit signatures of divergence, regions toward the arms of the chromosomes with high SNP density more often harbor adaptive variation [31]. This trend follows expectations because low-recombining regions are more liable to display divergence because of linked selection [11,33,34], whereas recombination hotspots usually exhibit higher SNP density and are enriched with functional genes [11,35]-an already well-described relationship for common bean [36,37]. Also, adaptive divergent selection usually homogenizes haplotypes within the same niche and fixes polymorphisms in different populations, so that few haplotypes with high frequency remain. This selective process leads to high values of nucleotide diversity and Tajima's D and low values of the Watterson's theta (θ) estimator [38], a tendency that was corroborated in wild common bean when looking for adaptive variants [31] but that was lacking in the present study while retrieving the genomic landscape of divergence between species, genepools, and domestication statuses.

Signatures of shared within-species parallel divergence
There is some evidence of some parallelisms in the genetic adaptations to the Mesoamerican and Andean environments in common and lima beans (Figure 2). The landscape of genomic adaptation has remained largely unexplored in Phaseolus beans. Among the few other studies addressing this question, a panel of wild common bean sampled across the Andean and Mesoamerican ranges revealed that regardless the strength of the bottlenecks [39], the signatures of divergent adaptation are widespread along the genome and coincided with regions of elevated SNP density [31], frequent recombination, and high gene content [36]. However, these surveys have not explicitly addressed the colonization of the Andes by linages coming from Central America and the corresponding change in selection pressures associated with different altitudes, latitudes, and microenvironments. Topographically complex mountainous systems, such as the Andes, harbor an impressive heterogeneity of climates at a small scale [40][41][42][43]. The ridges and valleys constitute physical barriers that limit dispersal and cause local variation in rainfall, resulting in genetic isolation and variation in habitats. Both processes have likely speeded up the evolution of high species diversity in this region [44][45][46][47][48]. Yet, the relative effects of geographic isolation [49][50][51], environmental variation at a small scale [52][53][54][55][56][57][58], and their potential interactions across genepools remain poorly understood in wild beans. Therefore, characterizing the genomic consequences associated with the colonization of heterogeneous environments may ultimately disclose further cases of genetic parallelism in the adaptation of beans.
The genomic consequences of multiple domestication events are also moderately recurrent as revealed by our survey. From the twelve regions putatively differentiated as the result of the domestication syndrome, only five (42%) appear in more than one comparison but none appears in all. Two peaks in chromosome Pv3 and Pv10 are repeated across three different comparisons of all five profiles of the domestication syndromes. At least the region in chromosome Pv3 has been reported to be involved in the vernalization pathway (i.e., Phvul.003G033400) as part of the Mesoamerican domestication of common bean [20]. Two other divergence peaks in chromosomes Pv8 and Pv11 are consistent across all three genomic profiles of the Mesoamerican domestication syndrome. The region in chromosome Pv8 is known for being related with the encoding of the nitrate reductase (i.e., Phvul.008G168000), a critical element for plant and seed growth, during the Mesoamerican domestication of common bean [20]. Also as part of this domestication event, the region in chromosome Pv8 is associated with increased plant size through the ubiquitin ligase degradation pathway (i.e., Phvul.011G213300) that controls flower and stem size [20]. More loosely, a peak at chromosome Pv2 in the Mesoamerican common bean domestication F ST profile is recovered in the profiles of all three lima bean domestications. This region has been linked with the domestication syndrome of lima bean since it is involved in the regulation of seed germination (i.e., Phvul.002G033500) and leaf size (i.e., Phvul.002G041800) and is enriched by inflated linkage disequilibrium scores [21]. Although scattered, some of these few regions may reveal true parallelisms in the domestication syndromes, whereas others may still be constrained by genomic features.
Also striking is the rarity of regions putatively involved in domestication and shared by several domestication events. This trend, mostly expected for quantitative traits with complex genetic architectures [59][60][61], had already been noticed for the common bean [20]-potentially applying for lima bean as well [21], and so does not necessarily speak for a prevalent role of drift. Since divergence in the lack of repeatability is a liable result of lineage sorting, caution must be undertaken while interpreting these signals. Singularities may result from different adaptive pressures across the Americas unique to each species, distinctive adaptation to the Mesoamerican microenvironments, dissimilar selection as part of each domestication event [22], equivalent selective forces acting on different genetic variants [6,23], or genetic drift [5]. Discerning among these causes requires further genotyping in an extended panel specifically addressing each comparison. At least for the divergence peak at chromosome Pv7 in the wild-cultivated Mesoamerican common bean comparison, other drivers besides the domestication itself are an unlikely reason for divergence because a wide region in chromosome Pv7 region is known for being associated with increased seed weight (i.e., Phvul.007G094299-Phvul.007G.99700) during the Mesoamerican domestication of common bean [20], as well as with flowering regulation (i.e., Phvul.007G096500 and Phvul.007065600) as part of the domestication of lima bean [21] and both common bean genepools [20].

Take-home message
Genomic islands of speciation are not necessarily more prone to harbor within-species divergence, yet subjacent genomic constrains could still be shaping parallel divergence at broader genomic scales. With that in mind, we first discussed how genomic features and linked selection could enhance convergent differentiation in low-recombining regions. Later, we reviewed cases of moderate repeatability in the genomic consequences of multiple adaptation and domestication events. This chapter emphasizes that differentiation across repeated and hierarchically nested levels of divergence co-occurs with regions of low SNP density, and these concurring signatures may be a by-product of genomic constrains inherent to low-recombining regions.
We advise a more systematic use of repeated and hierarchically nested samplings in order to improve our understanding of the underlying causes of the genomic landscape of divergence. Because certain regions are more prone to accumulate islands of divergence as the result of genomic constrains, we advocate that studies of genomic divergence should consider more systematically a dual-purpose sampling, such as the one we described in the first section. In the first place, using replicated populations under presumably similar selection pressures helps accounting for lineage sorting and characterizing the nature of the selected variants, i.e., novel versus standing [6]. Second, a hierarchically nested sampling across various levels of divergence allows for further assessments on the processes, which like genomic constrains, may give rise to parallel divergence patterns [2][3][4]62]. Finally, some of these examinations must be verified with genomic features and estimates of the recombination rate [63][64][65]. We foresee that as the evidence of pervasive genomic constrains shaping genomic differentiation across species and at countless scales of divergence accumulates, replicated samplings of contrasting populations in a hierarchically nested framework of divergence will become indispensable.
In the long run, we are looking forward to see more coherent and systematic samplings of replicated contrasting populations across hierarchically nested levels of divergence in of genomic divergence has always been challenging, but the field is now moving forward toward a more cohesive framework. New ways [66,67] to characterize obscuring genomic features promise aiding our understanding on how the genomic landscape of divergence is shaped.
Among the five domesticated species in the Phaseolus genus, common and lima beans are the only ones exhibiting range expansions toward South American and multiple domestications [14]. However, exploring the landscape of divergence in other domesticated Phaseolus species is equally insightful because of their overlapping distribution ranges, nested phylogenetic relationships, and divergent adaptations. For instance, year (P. dumosus) and runner (P. coccineus) beans are Mesoamerican and well adapted to humid habitats, which makes them a potential source of resistance to biotic stresses. On the other hand, tepary bean (P. acutifolius) is also Mesoamerican but is well known for growing in desert and semiarid environments, which makes it a likely source of tolerance to abiotic stresses. These species also possess well-established genomic resources [68] that could speed up newer genomewide comparisons. Phaseolus species that never underwent domestication are also abundant (ca. 70) and could enrich our understanding of genomic divergence in this intricate complex. Considering the Phaseolus species complex as a whole will ultimately reinforce beans as a model for understanding speciation, adaptation, and crop evolution [14,15,[69][70][71][72].
para la Promoción de la Investigación y la Tecnología del Banco de la República de Colombia to MC, and by the Lundell and Tullberg (Sweden) grants to AC. The Geneco mobility fund from Lund University is thanked for subsidizing the meeting between AC and MB in the spring of 2015 at Nashville (TN, USA). AC's writing time was sponsored by the grants 4.1-2016-00418 from Vetenskapsrådet (VR) and BS2017-0036 from Kungliga Vetenskapsakademien (KVA). MB received support from the Evans-Allen fund of the US Department of Agriculture. The editorial fund from the Colombian Corporation for Agricultural Research is acknowledged for financing this publication.