Phylogenetics, Reticulation and Evolution

Incongruence between phylogenetic trees constructed from different gene sequences has bothered practitioners for decades. Paraphyletic or polyphyletic clustering has been traditionally treated as noise that distorts its genealogical bases. Nevertheless, recent genomic data have provided a first indication that horizontal gene transfer (HGT) in microbes and interspecific hybridization (or polyploidization) in eukaryotes challenge the doctrine of common descent. Due to promiscuous recombination, the initial stages of life would have not had a genealogical history but a common physical one whose graphic representation is known as evolutionary reticulation. Reticulate evolution in plants has long been recognized, and recent genomic evidence from animals also indicates its wide - spread occurrence. Taking into consideration that mounting evidence for hybridization and polyploidy in eukaryotic taxa accumulates, it is essential to have methods to infer reticulate evolutionary histories. Considering the different forms of transpecific genetic transference and introgression across the tree of life, the origin of a given species may not coincide with the origin of its genes. Accordingly, molecular mutation rates might be erroneous if based on strict genealogical thinking. Given abundant new data, it is time to move forward because a major shift in our understanding of species, speciation and phylogenetics is taking place.


Introduction
Since Darwin´s seminal work, it has been claimed that organic diversity could be represented by a unique branching pattern of inclusive hierarchies depicting genealogical relationships among organisms [1]. This tree of life, based on shared homologies, was considered to reflect nature´s genuine attributes, exclusively represented by descent with modification.
Nevertheless, there is neither a priori independent evidence nor rigorous tests to ensure such a nested organization of nature´s biodiversity due to common descent [2]. In fact, the initial stages of life, including the origin of the last cellular ancestor, were dominated by lateral gene transfer, advanced almost 20 years ago [3]. This breakthrough has challenged the doctrine of common descent by indicating that the ancestral state would not have been an individual but a community of entities with a common physical history, but not with a genealogical one. Apparently, the three domains of life emerged independently through a sorting process from a pool of entities involved in promiscuous recombination. These processes of gene recombination in prokaryotes, leading to reticulate evolution are mimicked by repeated intercrossing (hybridization) between metazoan populations or lineages. Consequently, their evolutionary histories cannot be adequately represented as bifurcating phylogenetic trees. As a result from these deviations, a network of relationships difficult to deal with is produced, regardless of the numerous methods for the reconstruction proposed recently [4].
Traditional phylogenetic analysis applied to animal and plant phyla has stumbled with gross, irreconcilable discrepancies since its onset. Molecular phylogenomics has corrected some of these paradoxes, but what gets clarified on one end gets muddled in another. A paradigmatic example of this is the recent synthesis of animal phylogeny and taxonomy of [5], plagued with conflicts near the base of Eukaryota and Metazoa. Likewise, the phylogenomic approach to animal evolution by Telford et al. [6] resolved the most derived branches but is contentious with regard to the placement of Eumetazoa, Bilateria, Protostomia, Deuterostomia and Lophotrochozoa. Likewise, the phylogenetic origin of major plant taxa is unclear. For example, the placement of the Celastrales-Oxalidales-Malpighiales clade within Rosidae remains one of the most confounding phylogenetic questions in angiosperms, with previous analyses placing it with either Fabidae or Malvidae [7].
Theoretically, species correspond to independent, reproductively isolated populations although Darwin recognized interspecific hybridization as a merging process involving two ancestors. The graphical representation of this phenomenon, otherwise being diverging, is known as reticulate evolution or network evolution, describes the origination of a lineage through the partial merging of two ancestor lineages. Hybridization has played an important role in genome diversification and in adapting organisms to their environment. Nevertheless, methods for reconstructing their reticulate relationships are still in their infancy and have limited applicability. Reticulate evolution in plants has long been recognized, but recent genomic evidence from animals indicates that this phenomenon is much more common than anticipated. Taking into consideration that mounting evidence of hybridization in eukaryotic taxa accumulates, it is essential to have methods to infer reticulate evolutionary histories. Given abundant new data, it is time to move forward because a major shift in our understanding of species, speciation and phylogenetics is taking place.
Many groups of closely related species including insects, vertebrates, microbes and plants have reticulate phylogenies. In microbes, lateral gene transfer is the dominant process that distorts strictly genealogical, tree-like phylogenies. In multicellular eukaryotes, hybridization and introgression among related species are of prime importance. Introgression and reticulation can thereby affect all parts of the tree of life, not just the crown species. Accordingly, conceptual issues regarding adaptive evolution, speciation, phylogenetics and comparative genomics must be modified to fit these recent findings. Reticulation is produced by phenomena like lateral gene transfer, introgressive hybridization and polyploidization. In fact, certain alleles of gene trees may appear more closely related to alleles from a different species than to other conspecific alleles, thus giving rise to instances of paraphyly or polyphyly. The occurrence of such anomalous clustering in the evolutionary history of species poses serious challenges to practitioners of phylogenetic analysis as they result in genomic regions with locally incongruent genealogies relative to the speciation pattern. Thus, phylogenetic analyses should account for the reticulate component of evolution, especially now that whole genome sequencing provides unprecedented phylogenetic information across the web of life [8]. Here, we present genetic and genomic evidence indicating the evolutionary importance of reticulation in multicellular eukaryotes and summarize relevant reticulate issues and its bearings on phylogenetic practice.

Horizontal gene transfer (HGT)
HGT phenomenon of genetic transference mainly among prokaryotes can occur via bacterial transformation, conjugation or transduction. It excludes mitosis and meiosis and does not require immediate ancestry. Bacterial genomes have revealed a complex evolutionary history, which cannot be represented by a single strictly bifurcating tree for most genes. Comparative analysis of sequenced genomes indicates that lineage-specific gene loss has been common in evolution, thus complicating the notion of a species tree, of a last universal common ancestor and the delimitation of its taxonomic units by being asexual.
HGT in eukaryotes has been reported in phagotrophic protists and limited largely to the ancient acquisition of bacterial genes. Nevertheless, standard mitochondrial genes, encoding ribosomal and respiratory proteins, are subject to evolutionarily frequent horizontal transfer between distantly related flowering plants. These transfers have created a variety of genomic outcomes, including gene duplication, recapture of gene lost through transfer to the nucleus and chimeric, half-monocot, half-dicot genes [9].
As a result, from intergenomic comparisons, HGT appears as a dominant process to generate innovations and complex adaptations like the acquisition of shade-dwelling habits in ferns. Molecular evidence indicates that the chimeric photoreceptor, neochrome, was acquired from hornworts, thereby optimizing phototropic responses [10]. HGT not only involve individual genes but also whole chromosomes and even nuclear genomes by asexual means. In the fungi genus Fusarium, HGT was responsible for the acquisition of chromosomes that largely increased the organism pathogenicity [11].
The horizontal transfer of a complete genome, giving rise to a new Nicotiana species, was achieved by grafting somatic tissues of two transgenic, 48-chromosome Nicotiana tabacum × Nicotiana glauca. The resulting octoploid species, Nicotiana tabauca (2n = 96), has double genome size, and its fertile F1 depicts intermediate phenotypic traits between both parental species [12]. In Amborella trichopoda (the sister group of angiosperms), whole mitochondrial transfer and subsequent fusion with the recipient genome have been reported. The plant´s huge mitochondrial (mt) genome size (3.9 Mb) comes from six different genomic sources and from the mtDNA of three types of green algae, a fungus and other angiosperms. These findings emphasize the role of transpecific genomic compatibilities, fusions and syngamy, to form more complex wholes [13].
Overall results of reticulate evolution via genome-wide quantification reveal that ecological specialization somehow restricts intra-and interspecific recombination [14]. Nevertheless, the genomic architecture and content of transposable elements are also central to HGT and to recombination potential. In addition, genomic regions differ in levels of potential HGT and reticulated evolution from single genes to whole genomes. It is also noticed that genetic distances, genomic rearrangements and genome synteny all show evidence of HGT and network-like evolution both at whole and core genome scales. Moreover, proteomic core genes have experienced reticulated evolution of complex traits and played a transcendent causal role in the radiation and adaptation of life on earth.

Interspecific hybridization
One potential cause of gene tree/species tree discordance and concomitant polyphyly is the occasional mating (hybridization) between otherwise distinct species. The resulting transfer of parental alleles to hybrid offspring (introgression) introduces variation at rates much higher than mutation.
Thus, significant levels of genomic replacement may accrue over long periods, even at low hybridization rates. This has been recently demonstrated in extant Anopheles mosquitoes [15] and in some Heliconius butterfly species and to detect past hybridization events using ancient DNA [16]. These instances force us to accept an ad hoc species definition applicable only to terminal taxa, rather than to the original bifurcating ancestors. Indeed, the branches of the tree change the species identity. Thereby the accumulation of introgressed regions flips the effect of gene majority to another topology [4].
Hybridization is increasingly being recognized as a widespread process between ecologically and behaviorally divergent animal species. Determining phylogenetic relationships in the presence of hybridization remains a major challenge for evolutionary biologists. If hybridization has occurred among the species of a given taxon, cladistic analysis fails to account for the process involved since the relationships are not genealogical but reticulate. Since hybridization results in incongruent intersecting data that obscure the underlying hierarchy, the results are always plagued with convergences and parallelisms of no biological relevancy [17].
Recombination is a form of reticulation that mimics the problems derived from hybridization, except that occurs at the gene level. Recombination can be diagnosed by looking at the compatibility of the phylogenetic partition supported by the polymorphic sites along the sequence. One strategy consists of looking at changes in the most parsimonious topology along sequences, while others use a maximum chi-square test or use the maximum-likelihood approach to detect specific incongruent evolutionary patterns. Unfortunately, no general method to place a putative hybrid in the appropriate clade exists.
Introgression (also known as introgressive hybridization or interspecific gene flow) occurs when alleles from one species penetrate the gene pool of another through interspecific mating and the subsequent backcrossing of hybrids into parental populations. When hybridization is symmetrical, the resulting hybrid species might be polyphyletic, as might be both parental species. Having in mind that hybrid speciation is often associated with whole genome duplication (polyploidy), knowledge of such traits may strengthen the suspicion of polyphyly derived from hybrid speciation [4]. However, in several cases of putative hybrid speciation, alternative explanations have been difficult to rule out. Considering that mitochondrial alleles are more easily introgressed than nuclear ones, their heterospecific plasmidial origin will be more frequently detected. Consequently, mitochondrial gene trees could be particularly susceptible to the effects of introgression and be especially misleading in cases where introgressed haplotype lineages become fixed, leaving no hint that they are of heterospecific origin.
The discovery of cytoplasmic introgression and the disparity between rDNA and cpDNA phylogenies of several plant groups is reflective of past hybridization and subsequent introgression. If an analysis includes hybrids, no matter where the hybrids are placed, a cladistic method produces only divergently branching phylogenetic patterns and thus can never retrieve the correct phylogeny, and we end up with confusing and conflicting results.

Polyploidy
Polyploidy is a form of interspecific hybridization followed by whole genome duplication (WGD). As the most drastic modification that a cell can experience, it involves rapid and profound nonrandom changes in chromatin composition, segregation patterns and copy number variation of dispersed repetitive DNA [18,19]. Polyploidy is also instrumental to introgress alien DNA into breeding lines enabling the introduction of novel characters as demonstrated by FISH, GISH and genetic mapping [20]. Its evolutionary role has motivated intense studies because duplicated gene pathways provide new opportunities for increased body-plan complexity, organismal differentiation and adaptation by recruitment of new genes to new roles [21,22]. Polyploidy has played a significant role in the hybrid speciation and adaptive radiation of flowering plants but has been considered irrelevant to mammalian speciation due to severe disruptions in the sex-determination system and dosage compensation mechanism [23,24]. Recent comparative genomic data has further demonstrated the evolutionary transcendence of polyploidy by reporting three rounds of WGD (3R hypothesis) in vertebrate evolution [25] and five rounds in flowering plants [26].
The convergence of distinct lineages upon interspecific hybridization (allopolyploid) and subsequent endoreduplication that increases ploidy level is a driving force in the origin of most flowering plants species. Likewise, the grass tribe Triticeae (Hordeeae) is characterized by its evolutionary complexity as indicated by numerous events of auto-and allopolyploidization. Introgression involving diploid and polyploid ancestors is the major factor concurring to their complex history [27]. Moreover, several analyses of multi-gene data sets demonstrated the conflict between the chloroplast and both nuclear and mitochondrial data sets. Nevertheless, synthetic polyploids are able to stabilize their genome in few generations after their onset.
In order to explain conflicting pattern distribution in a phylogeny, it is claimed that several strategies have been advanced [7].
Following WGD, duplicated genes show two types of homologies stem from the fact that genes are duplicated: paralogy and orthology. Paralogy stands for genes that are related following a duplication event, whereas orthology is the result of speciation. Consequently, the gene tree based on multigene families in polyploid species would be problematic if confounding these two forms of homology. Due to this limitation, mitochondrial single-copy genes rather than nuclear genes are a more reliable source of allele orthology. A gene tree that includes paralogous alleles may depict polyphyletic species because its topology reflects gene duplication as well as speciation. The cause of this polyphyly may be misinterpreted if the orthology of alleles is assumed. Because mitochondrial loci are single-copy genes rather than members of multigene families, it was long considered safe to assume allele orthology by mitochondrial primers. This is a serious phylogenetic challenge considering that most angiosperms are polyploid. If the 3R and 5R hypotheses are scientifically valid, their implication makes the search for common ancestry irrelevant to science. To celebrate the 150 years of Darwin´s Origin of Species, the prestigious journal, Heredity, published an issue on speciation whose editorial introduction says: "many questions concerning the causes of speciation remain open and speciation continues to be one of the most actively studied topics in modern evolutionary biology" [28]. The end result is that we neither do have a comprehensive understanding of speciation nor about the reality of the species. And the origin of species by natural selection continues being debated. One wonders whether the scientific community is not pursuing in the wrong direction by studying patterns instead of the process itself [1,3]. In this line, Lynn Margulis claimed that "…neodarwinism will ultimately be viewed as only a minor twentiethcentury religious sect within the sprawling religious persuasion of Anglo-Saxon biology" [29].
In short, gene duplication following polyploidy can give rise to multigenic families that correspond to groups of locally distributed, tandemly oriented redundant genes that can subsequently be involved in non-allelic homologous recombination. Duplicated genes can undergo three different outcomes. First, both copies can persist, keeping their sequence identity while maintaining a high level of gene expression. A second possibility, known as subfunctionalization, occurs when one gene copy is silenced (by physical elimination or methylation). Subfunctionalized copies may form pseudogenes, nonfunctional genetic sequences that conserve their similarity to one or more paralogs that confound phylogenetic analyses. The third outcome of a duplicated gene is neofunctionalization, a phenomenon that involves functional diversification to a new role or allelic specialization of a previous function. Clearly, these processes of gene evolution consisting of both gene births and deaths after duplication interfere with the general assumptions of phylogenetic analysis and blur the end results.

Incomplete lineage sorting
Incomplete lineage sorting occurs when polymorphisms persist between speciation events, so that the true genealogical relationship of a gene or genome region differs from the species branching pattern. Incomplete lineage sorting and introgression are two main causes of discordance between gene trees and species trees of eukaryotic coding sequences. For instance, around 15% of human genes are more closely related to homologs in gorillas than to the chimpanzee sister lineage. This anomaly is probably derived from their reduced ancestral effective population size (N e ) and short speciation time span between humans and simians. Recent findings of shared polymorphisms between them include the MHC and ABO blood group loci. In the species complex of Anopheles gambiae, a very large chromosomal inversion encompassing 8.5% of its genome size is maintained by a balanced selection-driven populational regime [15]. Unlike lateral transfer and introgression, incomplete lineage sorting does not result in phylogenetic reticulation at species level. Nevertheless, it confounds molecular phylogenetic analysis by making to appear closer that real two different clades. A phenomenon derived from chance events is taken as if genealogical.
Several analytical methods assume that reticulation events are the sole cause of all incongruence among the gene trees and seek phylogenetic networks to explain all incongruences. Nevertheless, these methods overestimate the degree of reticulation if other causes of incongruence are at play. Indeed, recent studies in the human genome [30,31] in Mus [32] and butterflies [33] have shown that detecting hybridization in practice is complicated by incomplete lineage sorting.
Some authors claim that significant steps have been conducted to put phylogenetic networks on par with phylogenetic trees as a model of capturing evolutionary relationships. Nevertheless, progress with phylogenetic network inference notwithstanding methods of inferring reticulate evolutionary histories while accounting for ILS is poorly understood. Its inapplicability stems mainly from two major issues: the lack of a phylogenetic network inference method and the lack of a method to assess the degree of confidence associated to an inference traveling into a phylogenetic network space. Likewise, methods for assessing the complexity of a network and the use the bootstrap method for measuring branch support of inferred networks have been developed [33].

Identifying complex patterns of genetic diversity through networks
Branching diagrams dominate the phylogenetic thinking. Nevertheless, the genetic patterns of bacterial genome evolution give rise to complex patterns than cannot be accommodated by a tree [34]. The complexity and profound relationships among the three domains of life defy traditional methods. For example, the construction of a web of genetic similarity comprising proteomic data from 14 eukaryotes, 104 prokaryotes, 2389 virus and 1044 plasmids clearly showed the chimeric origin of eukaryotes. These fusion events between Archaebacteria and Eubacteria would not have been detected by conventional phylogenetic algorithms and trees. But not only that, it also indicated that eukaryote genes connecting a specified domain of prokaryotes tend to connect to other entities of the same domain [35]. Genes derived from Archaea or Bacteria tend to carry out different functions and act in distinct cell compartments. This complex interwoven on the web suggests an early integration of their respective genetic repertoires. Thus, web analysis stresses the study of deep evolutionary events.
Reticulate patterns can also stem from an inadequate analysis or data processing, wrong specification of the model used, wrong use of data or sequence alignments. Even though network analysis allows a drastic reduction of data misinterpretation, most important is to be aware that genomic hybridization is a more probable explanation to capture the differences among genetic trees [36].

Conclusions
Interspecific gene exchanges are much common than previously appreciated. This not only includes hybridizing sister species undergoing genomic introgression but whole groups that exchange adaptive and nonadaptive genomic regions, as exemplified in Anopheles, Heliconius and hominids. Considering that hybridization between sister species may or may not affect the species tree, the sole estimates of introgression rates derived from species tree topologies can underestimate the overall level of gene flow. Thus, the origin of traits and the genes behind them can have very different histories from that of the species tree.
The only literature survey dealing with the frequency, causes and consequences of specieslevel paraphyly and polyphyly indicates that their incidence is taxonomically widespread [37]. Interestingly, almost 25% of the scientific literature surveyed does not offer an explanation to polyphyletic gene trees. Polyphyly was observed in 15% of species across the cnidarians, mollusks, insects, crustaceans, arachnids and echinoderms, whereas half of the citations dealing with these deviations claim for a faulty taxonomy. Both introgressive hybridization and incomplete lineage sorting were also invoked in one third of the 2319 species analysed. Inadequate phylogenetic information is invoked in few papers [37]. Consequently, species-level monophyly cannot be assumed as an a priori axiom. For the origination of above species-level polyphyly, traditional phylogenetics uses a Lamarckian explanation and thinking: the environment triggers evolutionary innovations, while organisms passively adapt to the new environmental demands. Natural selection is conceived as the source and driving force that shape life as we see it. Distance and objectivity of phylogenetic thinking from a particular (i.e. Darwinian) evolutionary view is advised. The search for evolutionary relationships does not require alignment to a particular world view to discover the pattern that connects [38]. Otherwise, any data set that does not fit the model is labelled as convergence or parallelism, descriptive concepts with no informational, explanatory value. The morphophysiological discrepancies observed among animal or vegetal phyla [5][6][7] are incontrovertible evidence that traditional phylogenetic thinking cannot explain the origin of body plans. Distorted presumptions about nature and inadequate or faulty methodologies conspire to maintain the present phylogenetic incongruencies. Having in mind that HGT occurs all across the tree of life, the time for the origin of a given species will not coincide with the origin of its genes. They could have evolved in other genetic backgrounds and horizontally transferred across reproductive barriers. Accordingly, molecular mutation rates might be erroneous if based on genealogical thinking. One explanation for polyphyly might not be derived from a faulty taxonomy but from unforeseen non-Mendelian mechanisms.