Duplicated Gene Evolution Following Whole-Genome Duplication in Teleost Fish

Baocheng Guo1,2,3, Andreas Wagner1,2 and Shunping He3* 1 Institute of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich 2 The Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, Lausanne 3Fish Phylogenetics and Biogeography Group, Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 1,2Switzerland 3PR China


Introduction
Gene and genome duplication have been thought to play an import part during evolution since the 1930s (Bridges 1936;Stephens 1951;Ohno 1970) . Ohno (1970) proposed that the increased complexity and genome size of vertebrates has resulted from two rounds (2R) of whole genome duplication (WGD) in early vertebrate evolution, which provided raw materials for the evolutionary diversification of vertebrates. Recent genomic sequence data provide substantial evidence for the abundance of duplicated genes in many organisms.
Extensive comparative genomics studies have demonstrated that teleost fish experienced another round of genome duplication, the so-called fish-specific genome duplication (FSGD) (Amores et al. 1998;Taylor et al. 2003;Meyer and Van de Peer 2005). Because the timing of this WGD and the radiation of teleost species approximately coincided, it has been suggested that the large number (about 27,000 species-more than half of all vertebrate species (Nelson, 2006)) of teleosts and their tremendous morphological diversity might be causally related to the FSGD event (Amores et al. 1998;Taylor et al. 2001;Taylor et al. 2003;Christoffels et al. 2004;Hoegg et al. 2004;Vandepoele et al. 2004). Semon and Wolfe (2007) showed thousands of genes that remained duplicated When Tetraodon and zebrafish diverged underwent reciprocal loss subsequently in these two species may have been associated with reproductive isolation between teleosts and eventually contributed to teleost diversification. A study in yeast demonstrated that speciation of polyploid yeasts may be associated with reciprocal gene loss at duplicated loci (Scannell et al. 2006). Thus, speciation accompanied by differential retention and loss of duplicated genes after genome duplication may be a powerful lineage-splitting force (Lynch and Conery 2000). For two reasons, teleost fish represent an excellent model system to study the retention and loss of duplicated genes as well as their evolutionary trajectory following whole-genome

Identifying duplicated genes that resulted from the FSGD event throughout the teleost genomes
We obtained 23,155 gene families from the database HOMOLENS version 4 (ftp://pbil.univ-lyon1.fr/databases/homolens4.php) (Penel et al. 2009), which is based on the Ensembl release 49. We chose HOMOLENS, because it allowed us to reliably retrieve sets of orthologous genes for our evolutionary analysis. HOMOLENS is devoted to metazoan genomes from Ensembl and contains gene families from complete animal genomes found in Ensembl. HOMOLENS has the same architecture as HOVERGEN (Duret et al. 1994), in which genes are organized in families and include precalculated alignments and phylogenies. In HOMOLENS 4, alignments are computed using MUSCLE (Edgar 2004) with default parameters; phylogenetic trees are computed with PHYML, using the JTT amino acid substitution model (Jones et al. 1992). Phylogenies are computed based on conserved blocks of the alignments selected with Gblocks (Castresana 2000). Each phylogenetic tree is reconciled with a species tree using the program RAP (Dufayard et al. 2005), which, combined with the tree pattern search functionality, allows detection of ancient gene duplications or selection of orthologous genes (Penel et al. 2009). Several studies on duplicated gene evolution have been performed with data retrieved from HOMOLENS (Brunet et al. 2006;Studer et al. 2008). We employed a topology-based method to identify duplicated genes that resulted from the FSGD event in the five teleost genomes we study. Briefly, if two teleosts have been subject to the same whole genome duplication event, a gene X that has been duplicated in this event and retained in both genomes, should form two gene lineages ''Xa'' and ''Xb'' ( Figure 1A). We identified gene trees with the topology shown in Figure 1A using the TreePattern functionality (Dufayard et al. 2005) of the FamFetch client for HOMOLENS. We required duplicated genes to exist in at least two species to increase the likelihood that they result from the FSGD event ( Figure 1B). In total, we identified 1,500 gene families with duplicated genes in this way.

Differential retention and loss of duplicated genes during teleost diversification
The most common fate of a duplicated gene is nonfunctionalization (pseudogenization). After a whole genome duplication event, many genes share this fate, so that a genome's gene content may only appear be slightly increased long after the duplication (Wolfe and Shields 1997;Jaillon et al. 2004). Our data suggest that only 3.3 percent (zebrafish) to 7.2 percent (Takifugu) of genes in current teleost genomes result from the FSGD event (Table 1). These percentages are lower than the 13 percent of retained duplicates in yeast (Wolfe and Shields 1997). One possible reason for this difference might lie in our topology-based method to identify likely FSGD duplicates (Figure 1), which enforces duplicated genes to exist in at least two teleost genomes. Thus, our method would overlook duplicated genes that result from the FSGD and that are retained in only one teleost genome. While we cannot exclude this possibility, we note that our observations are consistent with a genome-wide study of Tetraodon, in which Jaillon et al. (2004) showed that up to 3 percent of duplicated genes may have been retained since the FSGD event. One plausible explanation of the difference in duplicated gene retention between teleost and yeast may come from the different ages of the genome duplication event. In addition, Kassahn et al. (2009) suggested that a minimum of 3 to 4 percent of protein-coding loci have been retained in two copies in each of the five model (B) Tree topology we used for duplicated gene identification in the database HOMLENS 4. 'N ≥ 2' means that duplicated gene pairs must exist at least in two species to increase the likelihood that the duplicated genes actually resulted from the FSGD event.  (Hoegg et al. 2004;Vandepoele et al. 2004), whereas the yeast whole genome duplication may have occurred more recently, between 100 and 150 MYA (Sugino and Innan 2005). More time has elapsed since the FSGD, allowing more duplicate genes to be lost. Differential retention and loss of duplicated genes is a common phenomenon during speciation after genome duplication. It has been observed in yeast (Scannell et al. 2006) as well as in teleosts (Semon and Wolfe 2007), and is believed to lead to speciation. We thus expected that our dataset would contain many gene families with differential gene retention and loss, as well as fewer families where both copies are retained in all five teleost genomes. Indeed, when we consider all five species together, we observed that 90.4 percent of the 1,500 gene families we identified show differential retention and loss of duplicated genes, and in only 9.6 percent (144 gene families) are both copies retained in all five teleost genomes. Figure 2 and Table 1 show relevant data, broken down by study species. In 45.4 percent to 89.3 percent (depending on the species) of the 1,500 gene families we identified, both duplicates were retained. In 9.9 percent to 44.6 percent of the duplicates (depending on the species), one copy was lost. Our data also indicate that differences in differential gene retention are associated with the phylogenetic position and the relatedness between two teleost species (Figure 2). Taken together, these observations indicate that differential duplicated gene retention and loss are pervasive in teleosts, that the loss of duplicated genes is an ongoing process that has continued for hundreds of million years after the FSGD event, and that this process may be associated with teleost diversification. We next discuss an illustrative example of differential duplicate gene retention and loss. It involves Hox genes, which encode a subclass of homeodomain transcription factors that help determine the anterior-posterior axis of bilaterian animals (McGinnis and Krumlauf 1992). In vertebrates, Hox genes have evolved a highly compact organization, where genes are arranged in clusters on chromosomes. Hox gene clusters are one of the best-studied systems for assessing gene retention and loss after the FSGD event (Amores et al. 1998;Prohaska and Stadler 2004;Hoegg et al. 2007;Guo et al. 2010), due to their genomic architecture and gene complement variation in teleosts. Seven or eight Hox clusters with different complements of Hox genes exist in extant diploid teleosts. They are a result of the FSGD event, which was followed by loss of some Hox gene duplicates. The putative Hox cluster complement of the teleost ancestor and the Hox clusters of several model teleost species are shown in Figure 3. Hox clusters exhibit remarkably different gene complements in different teleost lineages after the FSGD event. Theoretically, 8 Hox clusters containing at least 80 Hox genes genes may have existed in the ancestor of teleosts after the FSGD event. Up to now, 66 of these Hox genes have been found in different teleost species and extant evolutionary diploid teleost usually have 45 to 49 Hox genes in their genome (Figure 3). According to the summary of Hoegg et al (2007) (Figure 3), the Ostariophysii have lost seven Hox genes since their hypothetical common ancestor with the Neoteleosts; during the evolution of the Neoteleosts eight Hox genes were lost; and the pufferfish lineage lost three genes in the common lineage leading to Takifugu and Tetraodon. Some Hox genes are specifically preserved in different teleosts, for example, HoxA1b has been identified thus far only in the Japanese eel (Guo et al. 2010). At the cluster level, eight Hox clusters were retained in basal species such as the Japanese eel (Guo et al. 2010) and the goldeye (Chambers et al. 2009), whereas one Hox cluster (C or D) was lost respectively in the Otocephala (Amores et al. 1998) and Euteleostei (Kurosawa et al. 2006). Based on the phylogeny of teleosts, Guo et al. (2010) Fig. 2. Differential retention and loss of duplicated genes during teleost diversification. The topology is adopted from (Negrisolo et al. 2010). *: retention of both copies; **: retention of one copy; ***: loss of both copies.

Number
www.intechopen.com  Fig. 3. Hox gene clusters, the best-studied examples of differential duplicate gene retention and loss in teleosts. Hypothetical Hox clusters of the teleost ancestor (modified from Guo et al. 2010), and Hox clusters of teleost model fish species, together with specific gene loss events shown on a phylogenetic tree of select fish species (adapted form Hoegg et al. 2007).
was lost independently in the Otocephala and Euteleostei after the FSGD event. The ongoing process of Hox gene loss and retention in teleosts illustrates again that degeneration of functionally important duplicated genes can last for hundreds of millions of years after the FSGD event.

Molecular evolution of duplicated genes
We next wished to study patterns of sequence evolution in the 1,500 duplicate gene families we had identified. To this end, we downloaded both nucleic acid and amino acid sequences for genes in these families. For each species, we retained only one gene copy in each duplicated clade ( Figure 1B) for further analysis, and discarded all other copies in those gene families where additional duplications have occurred after the FSGD event. We then aligned the amino acid sequences within each gene family with MUSCLE (Edgar 2004), and calculated DNA alignments from protein alignments with RevTrans (Wernersson and Pedersen 2003). The following computations were then done on the new DNA alignments. We estimated the nucleic acid evolutionary distance between fish genes and their human orthologs using the LogDet nucleotide substitution model (Tamura and Kumar 2002) in PHYLIP-3.6b (Felsenstein 2004). Previous studies show that duplicated genes in yeast often diverge asymmetrically (Kellis et al. 2004), meaning that one copy evolves significantly faster than the other. We asked whether this is also the case for teleost duplicates. To this end, we compared evolutionary distances of duplicated genes with their human orthologs within the 1,500 gene families we had identified. There is indeed evidence for asymmetric evolution between duplicated gene pairs from the FSGD event (Table 2). Average evolutionary distances to the human homologue between members of duplicated gene pairs are significantly different for each of our five teleost species (paired t-test: P < 4.8 × 10 -95 ). As all duplicated gene pairs stemming from the FSGD diverged at the same time from their human orthologs, we can directly convert differences between evolutionary distances into differences between evolutionary rates. Taken together, our observations suggest that duplicate genes tend not to accumulate sequence change at the same rate. Our results are consistent with previous works in teleosts (Brunet et al. 2006;Steinke et al. 2006) and yeast (Kellis et al. 2004), and confirm that asymmetric sequence evolution between duplicated genes is a frequent pattern of duplicated gene evolution after a genome duplication event. Duplicate_L: duplicated gene in each duplicate pair that has the larger distance to the human orthologue (distances averaged over all duplicate gene families); Duplicate_S: duplicated gene in each duplicate pair that has the smaller distance to the human orthologue (distances averaged over all duplicate gene families). All means are ± one standard deviation. * paired t-test Table 2. Average evolutionary distances of duplicated genes in five teleost species to their human orthologs.

Conclusion
In summary, we used a phylogenetic method to identify 1,500 duplicated gene families in five teleost species that are likely to have resulted from the FSGD event. Only a small fraction of genes in extant teleost genomes have been retained in the FSGD event. Differential retention and loss of duplicated gene is pervasive in the five species we studied, as is illustrated by genes in the teleost Hox gene clusters. Sequence analysis suggests that some duplicated genes pairs may evolve asymmetrically. Our work provides a framework for future studies of the evolutionary trajectory of duplicated genes in the teleost genome.