From the Gene Sequence to the Phylogeography through the Population Structure: The Cases of Yersinia ruckeri and Vibrio tapetis From the Gene Sequence to the Phylogeography through the Population Structure: The Cases of Yersinia ruckeri and Vibrio tapetis

Multilocus sequence analysis (MLSA) and multilocus sequence typing (MLST) are nowa‐ days considered as gold standards in the study of microbial systematic, being both tech‐ niques based on the interpretation of the sequences of several housekeeping genes. In this context, the sequences can be analyzed from different points of view. On the one hand, the phylogeny of the bacterial species can be estimated using the MLSA approach and on the other hand, the structure of the population can be inferred by means of MLST. Moreover, most species display some degree of population structure that can be inter‐ preted in geographic and chronological contexts, that is, phylogeographic studies. In this review, the phylogeny and population structure of two important fish and shell ‐ fish pathogens, Yersinia ruckeri and Vibrio tapetis , exhibiting very different evolutive pat ‐ terns will be analyzed. In both cases, the species form robust and monophyletic groups from a phylogenetic point of view. Regarding to the population structure, very different results were found. While Y. ruckeri follows an epidemic model of clonal expansion with well‐adapted clones that explode to be widely distributed, V. tapetis appears to have a mixed structure in where the paradox of clonality and high level of variability coexist. Furthermore, phylogeographical studies provided the evolutionary and geographical context for the species, allowing the determination of historical and spatial influences on the diversification of both species.


Introduction
Phylogenetic analysis has long played a central role in basic microbiology. Sequence data offer direct genealogical information that can be efficiently used to estimate phylogenetic relationships and parameters associated with population dynamics. Furthermore, sequencing methods provide standardized and unambiguous data that are portable through online databases with direct access to the information needed to identify and monitor emerging pathogenic agents [1,2]. Reconstructing the patterns of descent for a group of organisms can yield important awareness into why and how members of that group have specific characteristics and how those organisms are distributed across the environment. Integrating population patterns with phylogeny knowledge provides insights into epidemiological tracking of an organism at different evolutionary scales, from a single host to across the globe [3,4]. On the other hand, more recently emerging fields of microbiology, including comparative genomics and phylogenomics, require substantial expertise in phylogenetic analysis and computational skills to handle the large-scale data involved [5]. Understanding the ways in which current and emerging technologies can be used to maximize phylogenetic knowledge is advantageous only with a complete proficiency of the strengths and weaknesses of these methods.
Since the conception of phylogenetic trees [6], morphological comparisons have been utilized to determine patterns of descent. Historically, numerous DNA-based approaches have been used to discriminate, subtype and build phylogenies for groups of organisms. Multilocus sequence analysis (MLSA) represents the standard in microbial molecular systematics. In this context, MLSA is implemented in a relatively straightforward way, consisting essentially in the concatenation of several gene fragments for the same set of organisms, resulting in one matrix which is used to infer a phylogeny by means of purely algorithmic methods [7][8][9].
For microbial pathogens, phylogenetic analyses are often conducted in order to determinate whether one particular outbreak may be related to another during times of an epidemic. While the clonal nature of an outbreak could be readily measured and predicted, Maynard-Smith et al. [10] pointed out the potential importance of homologous recombination as a determinant in the overall population structure of many bacterial species. These notions are now supported by several typing methods including multilocus sequence typing (MLST). Unambiguous genotyping systems are a key to describing epidemiological and ecological patterns and highlighting the evolutionary processes that shape microbial populations. Levels of genetic diversity are sufficiently high in most of microbial taxa that the sequences of several housekeeping gene fragments can provide a medium-resolution overview of their population genetic structure [11]. For the pathogenic bacteria whose members exhibit varying degrees of virulence, the integration of population genetic, evolutionary and epidemiological studies can provide important insights into the origins and spread of bacterial disease.
MLSA and MLST are based on housekeeping genes, which are subject to purifying selection and slow evolution and the variation within these genes is nearly neutral [12]. Although there are normally fewer polymorphic sites in individual housekeeping genes compared with hypervariable genes, the use of the combined sequences of multiple housekeeping genes has been shown to provide high discriminatory power while retaining signatures of longer-term evolutionary relationships or clonal stability. Furthermore, analysis of multiple loci can buffer against potentially skewed evolutionary pictures obtained by single-locus analysis [13,14].
Most species display some degree of population structure that can be interpreted in geographical and chronological context [15]. Phylogeography uses genetic information to study the geographical distribution of genealogical lineages, especially those found within species [16,17]. Because the discipline has deep roots in historical biogeography and population genetics, phylogeography was heralded as a bridge linking the study of micro-and macroevolutionary processes providing the empirical and conceptual link between systematics and population genetics. Based on appropriated sampling of individuals and genes, this approach allows the assessment of the biogeographic hypothesis, the description of the evolution of isolated reproductive population units and the inference of processes underlying the origin, distribution and maintenance of diversity [18]. Detecting concordance of geographical variation in genotypes, or their genealogies and the environment is therefore at the core of phylogeographic studies.
The generation of large volumes of sequence data, combined with the development of novel analytical techniques and conceptual advances, promises a better understanding of the complexity of the evolution of bacterial populations. The application, advantages and constrains of the MLSA, MLST and phylogeographic analysis in taxonomic studies will be illustrated in this chapter with two examples: Yersinia ruckeri and Vibrio tapetis.

MLSA: inferring phylogeny of bacteria species
16S rRNA gene was the most common phylogenetic marker during 40 years. This molecule has a slow rate of evolution so very often it is difficult to establish phylogenetic relationships among taxa with recent divergence. MLSA represents nowadays the novel standard in microbial molecular systematics [19]. This is a rapid and robust classification method to study phylogenetic relationships of very diverse taxa of prokaryotes, including entire genera, by combining the information contained in the sequences of several specific genes [19]. This technique consists essentially in the concatenation of the sequence of several housekeeping genes (more than five), being the relationships among taxa established by phylogenetic inference [7,8,20,21]. The use of such amount of data provides increased resolution power than the use of a unique gene as in the case of 16S rRNA gene, although this marker is considered still useful at taxonomic levels above the species.
MLSA can be used for bacterial identification and classification as well as for inferring evaluative relationships and variability among different groups of bacteria. At identification level, there are several studies demonstrating that MLSA using the concatenation of eight housekeeping genes provides a robust phylogenetic resolution for microorganisms sharing 70-95% of average nucleotide identity (ANI) and therefore, it could distinguish species of the same genus [22]. It has been even proposed to replace DNA-DNA hybridization (DDH), although concerns have arisen about this replacement [23][24][25][26]. At systematic level, MLSA is considered an intermediate resolution technique between the 16S rRNA gene and the whole-genome-based approaches [19]. At evolutionary level, MLSA is a useful tool for studying the variability of different evolutionary identities, from families to species, as long as the selected genes for the analysis reflect properly the similarity of the complete genome among the studied group and the evolutionary ratios of the genes represent the evolution of the species.
The critical point for the MLSA is the suitability of the genes chosen for the analysis. In fact, genes that are perfectly informative within a given species, genus, or family may not be useful or even present in other taxa [19,27]. Ideally, the best strategy to get a reasonable estimation of the species tree is to consider multiple genealogies inferred from unlinked loci and to use multiple individuals per species [28][29][30]. To date, there is not a general criterion for determining which genes are more useful for taxonomic purposes, but some attributes have been described for the genes to be used in the analysis [9,26,31,32]. Genes should contain enough genetic information and although there is not a specification regarding to length, they should be small enough to be easily sequenced. Very often the fragments used in phylogenetic reconstructions are the same of those employed for MLST, resulting in too short fragments of the studied genes. The genes should also reflect the evolutionary history of the studied taxon [33]. Therefore, conserved genes must be selected for higher taxa and more evolved genes for species or subspecies levels. In that concern, the so-called core genes, the orthologous genes, should be used preferably than the accessory genes [34].
The availability of a universal set of conserved orthologous loci on a given taxon and, therefore, a set of primers that could amplified them across the studied group often precludes the comparative analysis of evolutionary process and patterns among closely related species and genera [26]. The strongest conflicting signals are usually derived from the existence of horizontal gene transfer (HGT) events in the dataset [35][36][37]. The resulting phylogenetic hypothesis may be distorted since standard treeing methods assume a single underlying evolutionary history [20,38,39].
There are no official recommendations about the inclusion of amino acid-base sequence analysis in MLSA studies although it is recommended because the study of the nucleotide sequences by themselves can lead to an "overinterpretation" of phylogenetic differentiation in closely related taxa [32]. Usually, the exchange in a base on the third position of a given codon has no influence in the resulting protein sequence and therefore in the structure and/ or function of the protein, but also it can have the opposite effect. Because of that, nucleotide alignments should be done regarding their amino acid sequence. It must be taking into account that a bacterium is not only a sequence of DNA and for taxonomic purposes, the living unit at all its levels should be considered.

MLST: establishing the bacterial population structure
Nucleotide sequence data from multiple housekeeping genes in an appropriately sampled population can be used in a variety of analyses to determine population structure. The simplest of these analyses is MLST, which establishes the allele present at each locus and use a clustering algorithm to determine the relationships among strains from the matrix of pairwise differences between their allelic profiles [40]. The major advantage of MLST over others typing methods, such as multilocus enzyme electrophoresis (MLEE), is the unambiguous nature of the data obtained and the simple storage and electronically exchange, meaning that any isolate that is typed using the method can be rapidly compared with all previously typed strains.
The number of alleles obtained for each locus is much higher using MLST than MLEE and the information obtained by MLST is more precise. Publically available databases such as http://pubmlst.org and http://www.mlst.net/ provide examples where clinical subtyping has allowed epidemiological, geographical, and/or evolutionary hypotheses were for pathogens like Neisseria meningitidis, N. gonorrhoeae, Streptococcus pneumoniae, Vibrio parahemolyticus and Staphylococcus aureus [40][41][42][43]. National and international surveillance of bacterial clones can be performed using this resource.
Unweighted pair group method with arithmetic mean (UPGMA) dendrograms based on pairwise comparisons among allelic profiles can be structured on the website to detect relationships between query and/or isolates database. However, although clustering algorithms are useful for detecting the genetic relatedness of small number of isolates, they can result infeasible when visualizing larger sample sizes (e.g., >1000) in MLST database. As these methods are not based on an evolutionary model, they are often inaccurate in reconstructing evolutionary events [44]. The recent development of the algorithm eBURST [1] has addressed both issues. The model incorporated into eBURST assumes that, due to selection or genetic drift, some genotypes will occasionally increased the frequency in the population and then gradually diversify by the accumulation of mutation(s) and/or recombinational replacements, resulting in slight variants of the founding genotype. Using allelic profile data, one sequence type (ST) is assigned to each isolate. STs sharing high genetic similarity are grouped into clonal complexes (CCs). The founding genotype for each CC is then identified parsimoniously as the genotype differs from the highest number of the other genotypes in the CC at only one locus. Further diversification will produce variants of the founder allelic profile that differ at two or more locus. Thus, the simple principal underlying eBURST is that bacterial populations will consist of a series of clonal complexes (set of variants of a funding genotype) that can be recognized from the allelic profiles of the strains within a MLST database [1].
While MLST is very effective for establishing which isolates are identical or closely related, the approach will not provide major information about the relationships between more distantly related isolates, unless the population is strictly clonal. However, additional phylogenetic information can be gathered if the nucleotide sequences themselves are studied by analyzing the extent of linkage disequilibrium between alleles and looking for recombination by the congruence of gene trees, or the presence of mosaic structures [45,46]. Knowledge of the recombination extent in bacterial pathogens is important since low levels of recombination result in a highly clonal population, where lineages persist with little variation over hundreds or thousands of years. At the other extreme, high rates of recombination lead to weakly and/ or non-clonal populations in which lineages diversify so rapidly that the isolates recovered in one decade may be completely different from those recovered in the next [47].
For highly clonal species such as Salmonella enterica [48], most of substitutions in the genome have appeared by mutations. Alleles that arise independently multiple times in different branches are therefore incongruous with the tree. The phylogenetic relationships between isolates can be inferred from the dendrogram derived from the pairwise differences between STs and independently from a consensus tree constructed from the gene sequences. Characterization of weakly clonal pathogens (e.g., N. meningitidis, S. pneumoniae) is more problematic since clones diversify rapidly by the accumulation of recombinational exchanges. However MLST is very useful for the identification of the currently circulating hypervirulent lineages because these are recognized as clusters of isolates with identical, or very similar, multilocus sequence types.

Phylogeography: putting the geography into phylogeny
Phylogeography attempts to infer history from the geographical variation of genes and genetically controlled characters. In the phylogenetic/population genetic approach, graphical phylogenetics trees, networks, or clades are visualized from the observed variation data [49][50][51]. Thus, the usefulness of this approach is to integrate both phylogeny and geography within a quantitative analytical framework that encompasses the diverse aspects of phylogeography concordance [16,52]. In this context, several classes of analytical techniques are used according to their function. The first class of techniques (i.e., AMOVA, Wombling, Monmonier's maximum difference algorithm, cline model by maximum likelihood) extracts spatial pattern from geographically distributed genetic data to identify either geographical partitions or clines (first-order pattern, in the terminology of spatial statistic), or alternatively, patterns of isolation-by-distance (second-order pattern) [53][54][55]. The second class (i.e., analysis of distance matrices, allelic aggregation index) attempts to infer historical scenarios directly from observed distributions of genes or taxa and one or more phylogenetic model [56]. A third class of techniques, such as Slatkin's distribution, provides statistical testing for the previously inferred scenario [57]. Phylogenetic trees and networks are often visualized over a cartographic background. Spatial interpolation algorithms [58] estimate parameter values at unsampled locations from a spatial distribution of observed points, providing a mean of interpreting and visualization the sampled data at different sets of locations [59,60].
Many species show pronounced phylogeographic structure, or even regional or continental endemism, which counteracts the previously held paradigm of continuous and global panmixia. However, biogeographic and macro-ecological studies at the community level have shown that relatively few free-living microbial eukaryotes have cosmopolitan distribution [61,62]. However, prokaryotes are generally smaller and have faster reproduction cycles than eukaryotic microorganisms that were the subject of these biogeographic studies [63]. Several studies have reported clear phylogeographic structuring in bacterial communities including marine, soil and soil-freshwater bacteria [64][65][66]. Conversely, the absence of spatial structuring in other prokaryotes has been corroborated by molecular data for bacteria from those same environments, including cyanobacteria [67][68][69]. For microorganisms occurring in extreme environments, phylogeographic structure indicates the effects of strong geographic isolation and dispersal constraints, although not all show clear spatial structure [70,71]. For the more widely distributed bacteria, biogeographic patterns may result from historical and/ or contemporary environmental processes. The importance of these processes in structuring microbial systems is still poorly understood [72] and few studies have focused on the phylogeographic structure and dispersal limitation in bacteria on a truly global scale in discontinuous but globally common habitats.

Yersinia ruckeri
Yersinia ruckeri is a Gram-negative bacterium and the causative agent of enteric redmouth (ERM) disease or yersiniosis in salmonid and non-salmonid fish reared in both fresh and marine waters. Y. ruckeri, initially isolated from rainbow trout (Oncorhynchus mykiss) in the Hagerman valley of Idaho (USA) in the 1950s [73], is now widely found in fish populations throughout North and South America, Australia, Africa and Europe [74]. The pathogen Y. ruckeri is a serologically variable, highly clonal species. It includes two biotypes: biotype 1 strains are positive for motility and lipase activity, whereas biotype 2 strains are negative for both tests [75]. The species has been grouped into 6 serovars [76], 5 O-serotypes [77], or 4 O-serotypes with different subgroups [78] by using different serotyping systems. In addition, Y. ruckeri strains can be grouped into clonal types on the basis of biotype, serotype and outer membrane protein (OMP) profiles [79]. Strains of serotypes O1a (classic serovar I) and O2b (classic serovar II) cause most epizootic outbreaks and serotype O1a is predominant in cultured salmonids [76,78].
ERM has been successfully controlled for decades by vaccination with commercial monovalent killed whole-cell vaccines. Although formulations of most commercial vaccines are based only on serotype O1a (Hagerman strain), different degrees of cross-protection among serotypes have been described [76]. In recent years, reports of ERM vaccine breakdown have emerged in Europe and USA mostly attributed to biotype 2 strains [80][81][82]. Other epizooties have occurred in vaccinated Atlantic salmon (Salmo salar) from Chile, caused by serotype O1b/ biotype 1 Y. ruckeri strains [83].
Molecular techniques have been used to study the intraspecific genetic variability showing a low genetic diversity. By using of MLEE was identified only four electropherotypes for 47 isolates of Y. ruckeri indicating that the genetic structure of Y. ruckeri is clonal, with one predominant clonal group [84]. The ribotypes, patterns of pulsed-field gel electrophoresis (PFGE) and interspersed repetitive sequences (IRS)-PCR of 30 Y. ruckeri O1a strains have been studied, reporting a high level of genetic homogeneity for all the isolates [85]. On the other hand, a total of 44 pulsotypes from 160 isolates identified by PFGE have provided better insights into the relationship between similar Y. ruckeri clones responsible for recent ERM outbreak among salmonid [86]. Heterogeneous assembly of phenotypes in serotype O1a Y. ruckeri strains with respect to pathogenicity and host has been reported [87]) and the need of expansion of the clonal group theory in this species was suggested, highlighting therefore the existence of new clonal groups [88].
In the context of the genetic approach, none of the studies have focused on the sequencing and analysis of housekeeping genes to understand the Y. ruckeri population structure. The existing studies have been limited to MLSA analysis in which few isolates were included for the comparison and description of new species within the Yersinia genus [89][90][91].
Using a sequence-based approach, new studies were developed by our research group to reconstruct the phylogeny and to characterize the molecular epidemiology and population structure with a collection of 103 strains of Y. ruckeri ( Table 1). Studies included the sequencing of six housekeeping genes glnA (glutamine synthetase), gyrB (DNA gyrase B subunit), recA (recombinase A), Y-HSP60 (heat-shock protein 60 kDa), dnaJ (heat-shock protein 40 kDa) and thrA (aspartokinase-homoserine dehydrogenase), as described by [92].
Similarity matrix of intraspecific sequence for the individual genes ranged 97.2-100% for dnaJ, 98.5-100% for gyrB, 98.8-100% for glnA, 99.4-100% for Y-HSP60, 98.1-100% for recA and 99.0-100% for thrA. For the concatenated sequence (2786 bp) of encoding-protein genes, the similarity was determined between 98.9 and 100%. The best-fit model of nucleotide evolution was determined using Modeltest 3.7 and following the Akaike information criterion (AIC). The maximum-likelihood (ML) estimation was implemented in PHYML 3.0 without substitution rate heterogeneity correction or invariant estimation as recommended by Modeltest. Clade support was evaluated by analyzing 1000 bootstrap pseudo-replicates. To further probe the robustness of our MLSA-based phylogeny, the DNA sequence data were analyzed Genetic Diversity by the neighbor joining (NJ), maximum parsimony (MP) and Bayesian inference methods. All three analyses yielded similar tree topologies, which were highly congruent with our previous findings based on maximum likelihood (Figure 1).
The results of the MLSA analysis confirm that there is significant diversity within Yersinia ruckeri isolates showing that they formed distinct clusters. Except one isolate, all Y. ruckeri strains joined in a major cluster with a complex topology that does not seem to reflect previous typing schemes of the species [84,85]. Using MLSA, the ML tree topology suggests major genetic diversity among the isolates of the serotype O1a. Thus, isolates belonging to serotype O1a appear spread among different branches together with other serotype/biotype representatives. Interestingly, only the groups of isolates associated with recent outbreaks in USA, Chile and Peru fall into well-defined branches.
Based on the sequences of the six housekeeping genes available in the public database htpp:// publmst.org/yruckeri/, a MLST scheme for Y. ruckeri has been developed [92]. Table 2 shows a descriptive analysis of nucleotide and allele diversity for each locus. Synonymous substitutions (d S ) occurred more frequently than non-synonymous substitutions in every gene (d N ). Furthermore demonstration that all loci are under purifying pressure (d N /d S < 1) was obtained, which means a strong selection such that most amino acid substitutions are deleterious, as being typically observed for housekeeping genes.
Among all isolates of Y. ruckeri, 30 different sequence types (ST) were established ( Table 3), 21 of which were represented by a single isolate, evidencing high genetic diversity. From these MLST scheme, eBURST analysis identified two clonal complexes (CC) showing a common evolutionary origin for 94 isolates forming 21 STs into CC1 and for six isolates of six STs in the CC2. ST 14 and ST 21 were identified as founder sequences into CC1 and CC2, respectively (data not shown). Furthermore, the formation of three STs (singletons), no associated to any clonal complexes, suggests genetic diversification among Y. ruckeri strains into the population.
All alleles analyzed showed to be in nonrandom distribution or linkage disequilibrium (I A value of 0.5563, p = 0.000) in the Y. ruckeri population, suggesting that mutation drives the diversification of this species and supporting a clonal structure for the population [10]. Furthermore, recombination events were not detected for glnA, recA, Y-HSP60, dnaJ and thrA when DnaSP5 and RDP4 software was used. However, recombination was found among gyrB alleles (R min = 1), indicating that the relative low genetic diversity present in all of alleles analyzed could have obliterated the chance of detecting recombination in the other five loci and suggesting that recombination could occur within different subpopulations. Based on the single-locus variables (SLVs) found between the two clonal complexes and the different subgroups identified by eBURST algorithm, the variant alleles can be used to determine the events responsible for the evolution into the population [93]. Thus, the per-allele and per-site recombination/mutation (r/m) parameter was calculated empirically from 25 SLVs identified within the two clonal complexes in Y. ruckeri. Twenty out of 25 SLVs arose from a recombination event, whereas only five arose by mutation. This resulted in a per-allele r/m parameter of 4:1. In the case of the per-site analysis, r/m parameter ratio was 7.5:1. These two parameters suggest that the initial steps of Y. ruckeri clonal diversification at allele or individual nucleotide sites are 4-and 7-fold more likely to occur by recombination than by point mutation. Recombination appears then to play a greater role than mutation for the generation and maintenance of the genetic diversity of Y. ruckeri (Figure 2).

Alleles Polymorphic sites
Epidemic model is also consistent with the epidemiology of Y. ruckeri, which suggest that ERM started as a geographically isolate disease that relatively quickly became widely disseminated [74,92]. In an epidemiological approach, the strong association of sub-founders ST 1 and ST 2 to the majority of the ERM outbreaks in salmonid cultures also allows to link these STs to virulence of Y. ruckeri strains. On the other hand and although serotypes were not strictly associated with the STs in this MLST study, our results suggest that serotypes O1a and O1b are an example of recently emerged and disseminated variants. In addition, nonmotile Y. ruckeri strains (biotype 2), causing recent outbreaks in vaccinated fish, were included into the sub-founders ST 1 and ST 2, indicating that biotype 2 phenotype may have evolved from related motile Y. ruckeri strains.
The phylogeographic analysis showed concordance with the eBURST diagram obtained previously for the Y. ruckeri population. Thus, it was possible to construct the complete evolutionary networks, showing the missed putative variants linking the established STs separated into two clonal complex and three singletons by using eBURST algorithm (Figure 3). The inclusion of the geographical data into this analysis indicates high genetic differentiation into Y. ruckeri caused by the fixation of different alleles into one geographical area (data not shown). These findings explain the high diversity of STs found in Europe and USA, including those observed in not salmonid fish species and supports the hypothesis that the majority of Y. ruckeri STs has evolved independently in specific areas. Furthermore, the presence in USA, UK and Peru of the different STs grouping nonmotile isolates provides strong evidence of the independent emergence and dissemination of biotype 2 Y. ruckeri strains in different geographical areas [92,94,95].
The sequence dataset was divided into 29 predefined subpopulations consisting of sequences from STs that present in each geographical origin and the geographical distances between different populations were measured using geographical coordinates. The Mantel test (" isolation-by-distance" analysis) for the matrix of correlation between genetic and geographic distance showed no significant correlation positive for the full dataset (Z = 18 × 10 11 , r = 0.0139 one side p = 0.5959), indicating a lack of overall genetic differentiation between the different geographical area. Furthermore, demographical analyses indicate a recent global expansion of Y. ruckeri revealed by both Tajima's test (D = -1.669, p < 0.001) and Fu's test (Fs = -13.83056, p < 0.001). This fact can explain the emergence of genetic variants that have caused the recent outbreaks in farmed salmonids in several areas, as occurred in Chile, Canada and Portugal where uncommon serotype and phenotype isolates were involved (Figure 4) [95].

Vibrio tapetis
Vibrio tapetis is the causative agent of brown ring disease (BRD), a major limiting factor for the culture of Manila clams (Ruditapes philippinarum) in Europe which has been associated with large economic losses in the sector [96]. It is considered the only disease with demonstrated bacterial etiology that affects adult clams. This disease was first described in Landeda (France) in 1987 associated with an episode of mass mortality of Manila clam. Since then, it has been detected throughout the European Atlantic coast and occasionally in the Mediterranean and the Adriatic coast as well as in Korea and Japan [97,98]. The disease received its name because of the most visible symptom in affected animals, the presence of an abnormal deposit of brown organic (composed mostly of conchiolina) on the inner surface of the valves, usually located between the pallial line and the edge of the shell and not subjected to calcification processes [97,98]. V. tapetis was initially considered as a homogeneous taxon but further studies of new isolates with different geographic and host origin, including fish and shellfish species, demonstrated the existence of intraspecific variability, at phenotypic, serological and genetic levels. Serologically, at least three serogroups were detected using slide agglutination and dot blot [99,100]. At genetic level, differences were first detected in the plasmid content and ribotypes of the different strains [101,102]. More recently, based on ERIC-PCR (enterobacterial repetitive intergenic consensus), REP-PCR (repetitive extragenic palindromic) and RAPD (randomly amplified polymorphic DNA analysis), three major groups associated with the host and the serogroup were established [103]. The existence of these three groups was confirmed by preliminary studies of MLSA and protein expression by 2D-PAGE studies. This MLSA study was performed on the basis of five protein-coding housekeeping genes but only with three representative strains of the described groups [104].
Population structure and phylogenetic analysis (as well as its relationship with the geography) of V. tapetis was performed using thirty strains with different host and geographic origin: The partial sequences of ten housekeeping genes were used: atpA (α subunit of ATPase), fstZ (cell division protein), gapA (glyceraldehyde-3-phosphodehydrogenase), Y-HSP60, (heatshock protein 60 kD), pyrH (uridyl monophosphate kinase), rctB (replication origin-binding protein), recA (recombinase A), rpoA (α subunit of RNA polymerase), rpoD (RNA polymerase sigma factor), topA (topoisomerase I). The selected genes were demonstrated to be phylogenetically resolutive in other Vibrio species [105] and the results were also compared with those obtained by DDH [106] as well as by other typing methods [101][102][103]107]. The concatenation of these ten genes rendered a fragment of 5826 bp in length. The intraspecific sequence similarities ranged between 84.8 and 100% for individual genes and between 93.3 and 100% for the concatenated sequence.
The phylogenetic reconstruction for the concatenated gene sequences was done using three different methods, NJ, MP and ML, using in all cases 1000 bootstraps. Topology of all the trees was the same, showing only some differences at bootstrap values. Visual inspection of the V. tapetis concatenated alignment tree reveals the existence of two tight clusters, one formed by most of the isolates and a second one, smaller, composed for three isolates (Figure 5). These two clusters shared sequences similarities of 93.3-93.5%, while strains within each cluster showed less than 0.6% of gene sequence divergence variation in cluster 1 and less than 0.4% in cluster 2, being most of the substitutions located at third position of each codon.
In the biggest cluster, high diversity is observed regarding to their host and geographical origin, containing the isolates classified as group one (represented by the type strain) and group two (represented by the isolate GR0202RD) by Rodríguez et al. [103]. As can be observed, different branches are formed, most of them related with host origin: the adult Manila clam isolates together with those from cockle and Venus clam cluster in the major branch and related to them appears the corkwing wrasse isolate. The carpet shell clam isolates fall into an individual branch as well as the wedge sole isolates, which form a cluster close to shi drum isolates. The second cluster, formed by isolates HH6087 (halibut), 102 and 127 (R. philippinarum seed), is a very robust branch supported by a 99% bootstrap. These isolates have in Figure 5. Phylogenetic reconstruction based on concatenated alignment of ten housekeeping genes of V. tapetis using the NJ method. Horizontal branch lengths are proportional to evolutionary divergence. Bootstrap from 1000 replicates appears next to the corresponding branch.
common the geographic origin, being all of them isolated in British Isles (UK and Ireland) although from different hosts. The two-cluster topology is supported by the trees generated individually for each gene.
MLST analysis revealed the heterogeneity of the population of this clam pathogen. The high variability of the population is reflected in the number of identified alleles ranging from 3 to 9 depending on the gene analyzed. The allele combination leads to the description of 10 STs ( Table 4), all of them constituting singletons. Even when the stringent SLV criterion was relaxed (from 9/10 to 1/10 shared alleles), none of the SLV or DLV was found (data not shown). This variability is also reflected in the 450 single-nucleotide polymorphisms (SNPs) detected across the 5826 bp surveyed. The majority of the SNPs were biallelic, being only 7 of them were triallelic. The nucleotidic substitutions found throughout the concatenated sequence showed, as usual for housekeeping genes, more frequency in synonymous substitutions (d N ) than synonymous one (d S ). The ratio d N /d S shows that all the genes except by rpoD are under positive selection ( Table 5).
The alleles showed to be in linkage disequilibrium (I A value for the whole strain collection was 6.3008 (p = 0.000)), being therefore the mutation of the main cause of diversification. These data are in agreement with the fact that even when a number of approaches were used to achieve recombination events, only using R min and Phi test some events were found in atpA, pyrH, rctB, recA, rpoD and topA genes, but not utilizing RDP4 (Table 5). However, the SplitsTree generated for all the isolates included in this study shows some structures typical from high recombinatory populations (Figure 6). It has been described that the simplest method to detect recombination in aligned sequences is to look for mosaic structures by eye. Significant mosaic structure is indicative of recombinatorial exchange, usually among isolates of the same species [93].
The contradiction between the results inferred for I A index and those from the SplitsTree can be explained analyzing the deduced genealogies after stripping sequences of recombinational   Table 5. Genetic characteristics, evolutionary variations and recombination among the ten loci included in the MLST scheme of V. tapetis population. Figure 6. Split decomposition analysis of the concatenated sequences of the 30 V. tapetis strains was studied. The splitgraph was generated using SplitsTree v4.
events performed with the ClonalFrame [108]. Figure 7 shows the clonal genealogy constructed for the 30 isolates ( Figure 7A) and the evolutionary events for the three nodes tasked in the tree (Figure 7B) in where the height of the red line indicates the probability of recombination on a scale from 0 (row bottom) to 1 (row top) and each nucleotide substitution is represented by a black cross. Node A represents the divergence among the two clusters generated by MSLA analysis and it can be seen in the representation of evolutionary events are exclusively generated by mutation events. The opposite occurs at nodes B and C, which represent the diversification between cluster one and two, respectively and their evolution is produced mainly by recombination.
Since the three isolates of the cluster two of V. tapetis have in common the geographic origin (British Isles), a phylogeographic approach was used with the aim of correlate geographic origin and genetic evolution of the isolates. The concatenated sequences were divided into four subpopulations consisting in sequences of STs that belong to each geographic origin. For this study, regions of the same country were not taking in account. Despite the two well-differentiated groups showed in the phylogeograpic network (Figure 8), significant correlation was not observed among the identified STs and geographic distance in the "isolation-by-distance" analysis (Mantel test: Z = 25 × 10 11 , r = 0.1314, p = 0.9750). Values obtained for Tajima's test (D = -0.99118, p < 0.10), negative and non-significative and Fu's test (Fs = 29.227, p = 0.000), positive and significative, are indicatives of population expansion. These indexes indicate that the population has suffered a recent expansion following a bottleneck or a selective sweep.
The phylogeographic network is very useful to clarify all the previous data. First, the two groups of isolates that can be observed are the same two clusters of the phylogenetic study (MLSA) and in the inferred clonal genealogy. The node A in the clonal genealogy is represented in this phylograph by the long branch (note that this is not an evolutive method, so that length is not representative of distance) generated by 370 nucleotidic substitutions, which according to the reconstructed evolutionary study are likely mutations. At the ends of this branch are located nodes B and C on the clonal genealogy (and the two clusters in MLSA), which are generated essentially by recombination according to ClonalFrame. These recombinatory events can be seen in the topology of the graphic. On the other hand, the inconsistence between the results in I A index (predominance of mutation) and the splitgraph (predominance of recombination) can be explain on the basis of the amount of nucleotide substitutions produced by mutation between the two clusters (370 substitutions between the two groups) that are probably masking recombination events in the evolution of each group. Moreover, the lack of clonal groups in the population structure obtained by eBURST algorithm can be explain for the amount of intermediated evolutive isolates missed in the population, represented as red dots in the graphic.
To date, the groups defined for V. tapetis have been associated to their host origin. In this work, we described species diversification for the first time on the basis of geographic origin. Distribution of identified STs for V. tapetis population is shown in Figure 9. In the analysis performed, both at phylogenetic and population levels, three isolates appear in an independent group. These isolates show different host origin, halibut and seed of Manila clam, but it have a common geographical origin, British Isles. These groups present huge genetic distance between them, produced mostly by mutation, supporting this finding that the description of two subspecies for V. tapetis: V. tapetis subsp tapetis and V. tapetis subsp britanniensis [106]. The former comprises the majority of the isolates regardless their geographical origin, whereas the latter includes the British isolates. V. tapetis shows a non-clonal, panmictic population in where the two subspecies are Figure 9. Geographic distribution of the ten STs identified for V. tapetis. Regions of the same country are considered as a unique localization.
From the Gene Sequence to the Phylogeography through the Population Structure... http://dx.doi.org/10.5772/67182 generated by mutational events but the diversification within each of them is produced mostly by recombination [109].

Conclusions
In conclusion, MLSA, MLST and phylogeographic analysis are successful for (i) unambiguously genotyping both Yersinia ruckeri and Vibrio tapetis species, (ii) establishing evolutionary relationships among the bacterial populations at different levels and (iii) capturing geographical structure of these pathogens. The case studies reviewed here constitute good examples of the usefulness of these powerful tools for understanding the evolution, epidemiology and genetic population/landscape of bacterial pathogens.
The results obtained from our works suggest that the processes involved in the genetic variability and evolution in both species are different. Using the MLST approximation, two different expansion models of population were detected, a mutation-based epidemic model for Y. ruckeri and panmictic for V. tapetis, where recombination represented the genetic event contributing mostly to diversification.
The phylogeographic approach indicated that well-adapted clones of Y. ruckeri exploded to be widely distributed, while V. tapetis was divided into two defined groups being one of them associated to a specific geographical area.
It is noteworthy that the observed diversification, no matter the process suffered, could be related with host specificity to some extent, which may be indicating the existence of certain degree of function specialization. Further studies using "omics" techniques will allow to confirm such hypothesis.