Microbial Biodiversity and Biogeography on the Deep Seafloor

Microbes are widely distributed on and within the Earth (Gold, 1992; Whitman et al., 1998). They have co-evolved with the Earth through their history and have acquired their diversity. Although most of microbes (estimated more than 99% of the total species numbers present on Earth) are still uncultivated (Amann et al., 1995), vigorous surveys in natural environments, from cold poles to hot deep-sea vents, by microbiologists revealed the wide distribution of microbes. The accumulation of 16S rRNA gene sequence data and the development of useful bioinformatics tools allow us to image a big picture of microbial biodiversity and biogeography in natural environments (Martiny et al., 2006). This will help us to address some fundamental questions about microbial community: where they are; what species are present there; how they constitute communities; what are the factors that control the diversity and distribution pattern of the communities; and how they have evolved from the past to present and will evolve in future. Considering the wide distribution and powerful metabolic function of microbes, they are likely to contribute to the maintenance of the whole ecosystem on Earth and to global climate change.

on the ecosystem functioning (Duffy & Stachowicz, 2006;Prosser et al., 2007), it is important to measure and interpret correctly the microbial diversity in natural environments. Microbial biogeography is the descriptive and explanatory study of microbial biodiversity over space and time. It aims to reveal where microbes live, and what kinds of microbes are present there and how many they exist. The scope of biogeography extends to understand the underlying mechanism of generating and maintaining the distribution pattern of microbial communities in natural environments. Traditional biogeography has focused on large eukaryotes, such as plants and animals. Recent development of molecular biological techniques enabled us to approach the biogeography for microbes, including protists and prokaryotes in gene sequence level (Darling et al., 2000;Whitaker et al., 2003). This offers a challenge to the famous classical Baas Becking hypothesis 'Everything is everywhere, but, the environment selects' (Baas Becking, 1934). To assess how the environmental similarity (i.e., contemporary physicochemical condition) and geographic distance (i.e., historical event) affect on the distribution patterns of biodiversity of microbial communities or population, diversity are associated with physicochemical and geographic differences among each environment. A variety of habitats in natural environments have been targeted for microbial biogeography, such as soil (Cho & Tiedje, 2000;Fierer & Jackson, 2006), lake sediments (Yannarell & Triplett, 2005), ocean (García-Martínez & Rodríguez-Valera, 2000), deep-sea sediments (Schauer et al., 2009) and deep-sea hydrothermal fields (Kato et al., 2010). These studies have suggested that the distribution pattern of microbes in habitats is controlled by not only environmental factors (e.g., pH, salinity and oxygen concentration) but also geographic isolation. However, further data collection and reliable explanation are needed to propose and evaluate theories regarding the generation and evolution of distribution pattern of microbes in natural environments. While the study of microbial biodiversity and biogeography seems to be descriptive, it is the first step and the essential base for theory construction in microbial ecology (Prosser et al., 2007).

Seafloor microbial communities
Deep seafloor is seemingly an unrelieved, monotonous and poor environment, like a desert where organisms are scarcely present. Actually, this notion is not always correct. There are variable environments on the deep seafloor, such as hydrothermal vents, cold seeps, ironrich mats, out crops of young crustal rocks and aged ferromanganese crusts (hereafter, Mn crusts). Furthermore, phylogenetically and physiologically diverse microbes (especially prokaryotes, i.e., the domain Bacteria and Archaea) thrive in these environments (e.g., Takai & Horikoshi, 1999;Inagaki et al., 2002;Santelli et al., 2008;Kato et al., 2009a;Nitahara et al., 2011). Following the Baas Becking hypothesis, these microbes may adapt to each environment and should form a unique community structure. However, the hypothesis has not been tested well for the microbial communities on the deep seafloor, especially nonhydrothermal and unsedimented areas far from land where organic inputs derived from surface photosynthetic ecosystems are not significant. Such deep seafloor accounts for a large part of the surface area of Earth (Smith & Sandwell, 1997). Microbes on and within the seafloor are thought to play a role in geochemical cycling between oceans and Earth crusts (Edwards et al., 2005). Hence, understanding the microbial diversity and biogeography on the deep seafloor is important for modeling the global relationship between microbes and Earth at present and can be applied for in past and future. Furthermore, recently, massive sulfide deposits and Mn crusts on the deep seafloor have been focused on as mineral www.intechopen.com resources (Rona, 2003;Hoagland et al., 2010). There are diverse microbes on/in these seafloor minerals (Kato et al., 2010;Nitahara et al., 2011). Microbial biogeography on the deep seafloor will contribute ultimately to develop deep-sea mining techniques utilizing microbes in future.

Analytical methods
The study of microbial biodiversity and biogeography starts upon collecting data of microbial communities in the environments. In this chapter, we introduce the analysis methods of microbial communities based on nucleotide sequences of genes, especially the small subunit ribosomal RNA gene (called 16S rRNA gene for prokaryotes) which is generally used for such analysis of biodiversity and biogeography. 16S rRNA genes have some merit for the analysis: 1) all prokaryotes have this gene; 2) its sequence length (approximately 1500 bases) is moderate and adequate for analysis; 3) there are some conserved regions that allow to design PCR primers; 4) there are some variable regions that allow to affiliate the sequences in species-level; 5) enormous sequences have been deposited in public database and can be used conveniently; and 6) useful bioinformatics tools specialized for this gene are available. Recent rapid development of molecular biological techniques including gene amplification and nucleotide sequencing enabled us to approach unexpected biodiversity of microbes in natural environments. Especially, next-generation sequencing techniques (e.g., pyrosequencing) open new windows for approaching microbial biodiversity (Sogin et al., 2006). However, it is hard to gain biological meanings of the biodiversity resulted from short sequences (~400 bases) that are produced by the next-generation sequencing. At least, nearly full-length sequences of 16S rRNA genes are needed to connect biodiversity to ecology. It should be noted that information on the gene sequences cannot be related to ecology directly. To connect biodiversity to ecology, the determination of the whole-genome sequence is not enough and information on function of microbes derived from the gene sequences must be obtained by culture-dependent analysis. However, cultivation of all microbes in an environment is impossible by now. Actually, most prokaryotes on Earth are still uncultivated (Amann et al., 1995). There are several steps to measure and compare the biodiversity of microbial communities based on 16S rRNA gene analysis ( Figure 1). The target environmental samples are collected, genomic DNA is extracted from the samples, and then 16S rRNA gene sequences in the genome DNA extracts are determined by PCR-cloning-sequencing analysis. In some cases, analyses of electrophoresis patterns, such as denaturing gradient gel electrophoresis (DGGE) and terminal restriction fragment length polymorphism (T-RFLP), without sequencing process are also used for biodiversity measurement; however these analyses mask valuable information on the biodiversity in contrast to sequencing analysis (Nocker et al., 2007). It should be noted that PCR-cloning-sequencing analysis alone is insufficient for the determination of biodiversity because of the presence of methodological biases (Wintzingerode et al., 1997) in addition to relatively high cost performance regarding time and money. For example, even if the same DNA extract was used, the diversity and composition of microbial communities determined using different primer sets were dramatically different from each other . Hence, for reliable assessment of the biodiversity and distribution pattern of microbial communities, 16S rRNA gene sequences used for comparative analysis should be obtained by the same method. All of the 16S rRNA gene sequences collected are aligned into an alignment dataset. The alignment process is very important to assess more accurate biodiversity and biogeography because the wrong alignment dataset cause overestimation of biodiversity and of difference among communities. Alignments have often been performed by multiple sequence alignment tools such as ClustalW (Larkin et al., 2007) and MUSCLE (Edgar, 2004). However, against the vast 16S rRNA gene datasets including one thousand sequences or more, it takes an immense amount of time. Recently, improved alignment methods incorporating the secondary structure of 16S rRNA genes and using a reference alignment have been provided from several 16S rRNA gene database projects, such as RDP using Infernal (Cole et al., 2009), Greengenes using NAST aligner (Desantis et al., 2006), and SILVA using SINA aligner (Pruesse et al., 2007). However, these methods, in particular NAST, have predicted higher diversity as compared with the results from the pair-wise and multiple alignment methods (Schloss, 2010). The sequences with long insertions seem to be better aligned by Infernal built in RDP than the other methods from our experience. It is known that several prokaryotes, especially Archaea, have long insertions (including introns) in their 16S rRNA gene (Burggraf et al., 1993;Itoh et al., 1998;Itoh et al., 2003). Chimera sequences have often www.intechopen.com observed in datasets. Such chimera sequences must be removed from the datasets before the following analysis by using chimera-check tools such as Mallard (Ashelford et al., 2006) and Bellerophon (Huber et al., 2004). NAST automatically checks chimera sequences in the datasets and remove the chimeric part of the sequences. However, the sequence with relatively longer insertion seems to be also recognized as a chimera sequence by NAST. Until more improvement of the alignment method and chimera check function, we recommend not using NAST for the following phylogenetic analysis. Overall, for alignment of 16S rRNA gene sequences, SINA aligner built in SILVA, or MUSCLE in the case of lower sequence numbers, is recommended for alignment. Finally, accuracy of the alignment dataset should be confirmed by the naked eyes. After the construction of a 16S rRNA gene alignment dataset, the sequences are assigned as operational taxonomic units (OTUs) or phylotypes for each habitat. An OTU is a group of similar sequences each other, which is defined based on the genetic distance thresholds. In general, 97% (0.03 cut-off), 95% (0.05 cut-off) or 80% (0.20 cut-off) similarity threshold are used as species-, genus-and family-level taxonomic definition, respectively (Ludwig et al., 1998). For comparative analysis of communities, the same definition level of OTUs must be used. Assessment of sequences to OTUs can be performed using DOTUR (Schloss & Handelsman, 2005) and its current version mothur (Schloss et al., 2009). A distance matrix generated from the alignment dataset using ARB (Ludwig et al., 2004) or DNADIST in PHYLIP package (Felsenstein, 1989) is needed for calculation using DOTUR. The matrix can be generated by mothur itself. -diversity measures can be calculated using the distance matrix generated from the alignment dataset: Chao1 species richness estimates, abundance-based coverage estimator of species richness (ACE) and rarefaction curves (species-based, qualitative), Shannon and Simpson indices (species-based, quantitative), Phylogenetic Diversity (PD; divergencebased, qualitative) and  (divergence-based, quantitative), and so on (Lozupone & Knight, 2008). Species-based measurements of -diversity can be performed using mothur at once. PD and  can be calculated by PHYLOCOM (Webb et al., 2008) and ARLEQUIN (Excoffier et al., 2005), respectively. -diversity can be also measured using the distance matrix generated from the alignment dataset: Sørensen and Jaccard indices (species-based), UniFrac and FST (divergence-based), and so on (Lozupone & Knight, 2008). These -diversity values provide measures of distance between pairs of communities. Furthermore, the measured distance matrix can be used for multivariate statistical techniques such as clustering [e.g., unweighted pair group method using arithmetic average (UPGMA)] and ordination [e.g., principal coordinate analysis (PCoA)]. Several species-based measures of -diversity can be calculated using SONS (Schloss & Handelsman, 2006), which has been incorporated into mothur. UniFrac (Lozupone & Knight, 2005;Lozupone et al., 2006), and its current version Fast UniFrac (Hamady et al., 2009), is an effective divergence-based method for -diversity (Lozupone et al., 2010) and can easily perform clustering and ordination analyses. For UniFrac analysis, a phylogenetic tree and definition data for OTUs and habitats are needed as input data. This tree can be constructed from the alignment dataset by neighbor-joining (NJ) or maximumlikelihood (ML) method. NJ tree can be constructed using ARB or ClearCut (Sheneman et al., 2006). ML tree can be constructed using FASTTREE (Price et al., 2010) or PHYML (Guindon et al., 2010). Such clustering and ordination analyses can also be performed using R (R Development Core Team, 2011). In addition, the shared OTU numbers and the shared www.intechopen.com Chao1 richness among communities can be calculated and viewed automatically in Venn diagrams using mothur. Overall, our recommendation of the analytical steps is shown in Figure 1.

Data collection
In this chapter, we try to assess microbial biodiversity and biogeography on deep seafloor using the recent useful bioinformatics tools as described above, though few data from nonhydrothermal and unsedimented deep-seafloor in open sea are available for biogeographical analysis. To investigate the distribution pattern of microbial communities on the seafloor in open sea, several data were collected (Table 1) and used for the following analysis. The locations where the samples were collected are shown in Figure 2. The samples of the collected data are basaltic rocks, Mn crusts, sulfide deposits called as dead chimney which were collected from hydrothermally inactive vents, sandy sediments that were not organicrich, and overlying bottom seawater (Figure 2). Although the samples were mainly collected on spreading ridges, they were collected far from hydrothermal vents and may not be significantly influenced by hydrothermal activity. We analyzed and compared the communities as described above using the 16S rRNA gene sequences collected.
3.2 -diversity -diversity for a microbial community is often indicated using Chao1 species richness estimates, rarefaction curves and/or Shannon's index value. However, we should not simply compare these indicators of -diversity provided by the investigators because these indicators are biased by the PCR primers, alignment software and OTU or phylotype clustering methods used in the analyses. As is often the case, the sequences deposited into public databases do not contain all clones in the libraries but only the representative OTUs. Furthermore, the definition levels of OTUs are not always consistent; OTU 0.03 (i.e., 97% similarity level) are usually used, but OTU 0.01 or others are also used in some cases. In general, OTU 0.03 or OTU 0.05 is used as species or genus level definition, respectively. In this chapter, to compare-diversity for the communities for the collected data as impartially as possible, the percentage of the number of OTU 0.05 in total clone numbers, N 0.05 /N t , were used (Table 1). This comparison can roughly address the difference in the-diversity, even if only representative sequences were deposited and several definition levels of OTU (<0.05 cut-off) were used. The N 0.05 /N t are summarized in Figure 3. The N 0.05 /N t for the samples of dead chimneys and seawater, except Asp, were <40%. In contrast, those for the samples of Mn crusts, basaltic rocks, except Rh3, and sediments were 50% or higher. Noted that the N 0.05 /N t for Re5 and Rj were very high (>90%) likely due to the small total clone numbers compared with the other rock or sediment samples (Table 1). The relatively high N 0.05 /N t of Asp may be due to contamination from seafloor sediment at the sampling. In fact, some phylotypes recovered from Asp were closely related to those from seafloor rocks and sediments (Kato et al., 2009b). The N 0.05 /N t did not correlate with the total clone numbers analyzed (r 2 = 0.223) when the data of Asp, Re5 and Rj were excluded (the plot is not shown). Thus, the difference in the -diversity associated with each habitat type is potentially meaningful for microbial www.intechopen.com  ecology. It is difficult to answer questions, like why the diversity of dead chimneys and seawater are lower than that of Mn crusts, basaltic rocks and sediments. Further data collection and experimental investigations are needed to answer. Investigating what kinds of habitats represent high or low diversity within a community is important for understanding the mechanism how microbial communities acquire the diversity. It should be noted that unification of the methodological process including DNA extraction and PCR primer sets in sequencing process (Wintzingerode et al., 1997), and alignment software, used region in the alignment dataset and distance calculation methods in phylogenetic and statistic processes (Schloss, 2010) is important to compare fairly the diversity among communities.

-diversity
To compare the microbial communities for each sample based on the -diversity measures, UniFrac was used here. The result of clustering analysis is shown in Figure 4A. In UniFrac analysis, the jackknifing method can be used to assess confidence in the nodes of the UPGMA tree. In the present case, all nodes, except the root, in the tree were not strongly supported by jackknifing (<50%). The PCoA results are shown in Figures 4B to E. Figure 4B is a three-dimensional image representing the first, second and third principal coordinate axes. Based on the results of clustering analysis and PCoA and taken each habitat-type into account, the samples were affiliated to six groups as shown in Figure 4. Group1 to Group3 represent the communities of basaltic rocks and Mn crusts and of each one of the sediment and chimney, respectively. Group4 and Group5 represent communities of dead chimneys and those of ambient bottom seawater, respectively. Group6 including As and Sdi is the most far from the other communities. Re5 was excluded from any groups due to the unambiguous behavior that may be caused by the small size of the clone libraries. Using UniFrac, we can easily compare the samples and see the difference in communitylevel (Figure 4). The results indicated that seawater communities Group5 were clearly www.intechopen.com distinguished from the other communities. However, the difference between Group4 (representing dead chimney communities) and Group5 were not shown by the twodimensional image by the first and second axes (describing 10.81% and 7.79% of the variation, respectively; Figure 4C). On the other hand, these communities were separated along the third axis describing 6.93% of the variation. This means that the third most influential factor for the community similarity, not the first and second factors, is the factor separating the seawater and dead chimney communities. If certain environmental characteristics in habitats (e.g., pH and availability of energy sources such as iron, sulfide and ammonium as electron donors and oxygen, nitrate and sulfate as electron acceptors) and physiological characteristics of microbes in the communities (e.g., life styles of freeliving and attachment) were correlated with the third axis, this traits would be the factor causing the difference between seawater and dead chimney communities. Likewise, details www.intechopen.com of geographical and physicochemical characterization of the environments and physiological characterization of the communities in the habitats will help us to address the factors, for example, causing the difference among Group1 to Group4, causing the grouping of Group2 that contains Rh1, Mt and Sdt despite of the different habitat types (Mn crust, basaltic rock and sediment), and causing the difference between Group6 and the other groups. Geographic distance may be one of the factors affecting the biogeography of the microbial communities. For example, the genetic distance between pairs of populations of Synechococcus or Sulfolobus in hot springs is related to the geographic distance despite the similar environmental characteristics of each habitat, which can be interpreted due to genetic drift caused by geographical isolation or to adaptation to the fluctuating environment in the past time (Papke et al., 2003;Whitaker et al., 2003). For the deep seafloor communities, the relationships between the community similarity and the geographic distance are shown in Figure 5. For all habitat types (i.e., basaltic rocks, Mn crusts, dead chimneys, sediments and bottom seawater), the similarity between the communities seems not to be related to the geographic distance ( Figure 5A). For basaltic rocks and dead chimneys ( Figure 5B and C), positive correlation between the community similarity and geographic distance is also not observed. In such solid habitats, environmental characteristics can be varied; for example, the gradient of oxygen concentration may occur due to chemical and biological consumption. The environmental varieties will affect on the microbial community diversity and composition in these habitats. This means that the community traits would be dramatically biased by its sampling position (e.g., interior or exterior parts of a sample). Hence, it is difficult to show the clear relationship between the community similarity and the geographic distance for solid habitats. For overlying bottom seawater ( Figure 5D), our result implies that the community similarity is related to the geographic distance. Such correlation for marine microbial communities has been already reported (García-Martínez & Rodríguez-Valera, 2000). Further data sampling from various depths and locations in global oceans will provide a more clear view of oceanic biogeography. Fig. 5. The relationship between the community similarity and the geographic distance. The results of (A) all habitat types integrated, (B) basaltic rocks, (C) dead chimneys, and (D) ambient seawater are shown, respectively. The similarities between pairs of communities are calculated using UniFrac: the value 1 means that the two communities are the same. The geographic distance between pairs of habitats is calculated using spDistN1 in sp package of R software. Fitted line and coefficient regression value are shown in each figure.

www.intechopen.com
To address the relationship between the community similarity and the phylogeny of the OTUs in each community, the OTUs in each community must be compared at nucleotidesequence level, neither the band pattern of DGGE nor the fragment sizes of T-RFLP. As shown by the results of UniFrac (Figure 4), the community similarity on the deep seafloor has positively related to the habitat types. How many OTUs were shared among each community in genus-and family-levels (i.e., 0.05 and 0.20 cutoff) are shown in Venn diagrams ( Figure 5) depicted by mothur. Among the solid samples (i.e., basaltic rocks, Mn crusts, dead chimneys and sediments), the integrated community for the dead chimneys and that for the basaltic rocks contain many unique OTUs 0.05 (67.7% and 70.0%, respectively) in genus-level in contrast to that for sediments or Mn crusts (39.6% and 41.9%, respectively) ( Figure 6A). In family-level, over 80% of OTUs 0.20 of each community were shared with others ( Figure 6B). These unique OTUs 0.05 potentially contain indigenous members for each habitat; for example, the unique clusters for basaltic rocks, Ocean Crust Clades I to VII defined by Mason et al. (2007), and for sulfide chimney, Cluster A to C defined by Kato et al. (2010). Unfortunately, it is unclear whether and how these potential indigenous members play a role in the microbial ecosystem and elemental cycling because they are not phylogenetically close to known cultured species and their physiological characteristics are unknown (Mason et al., 2007;Kato et al., 2010). Further cultivation effort and characterization is important to link the phylogeny of the OTUs to their function and significance in the environments. Over 50% of the OTUs 0.05 in the Mn crusts were shared with those in the basaltic rocks ( Figure 6A). Furthermore, all of the OTUs 0.20 in the Mn crusts were shared with those in the basaltic rocks ( Figure 6B). These results indicate that the bacterial members in the communities of the Mn crusts and basaltic rocks are phylogenetically close to each other, which is consistent with the UniFrac (divergence-based) result that the Mn crust communities clustered with some basaltic rock communities (Group2 in Figure 4). The phylogeny of the shared and unique OTUs can be confirmed by phylogenetic analysis (such as homology search against public databases and phylogenetic tree construction), although this is not shown in this chapter. Although the physiology of OTUs (e.g., metabolic function, growth rate and optimal growth temperature and pH) cannot be directly determined by their phylogeny, the physiological characteristics of OTUs may not be so different from those of certain cultured species that are closely related to the OTUs. The physiology of OTUs inferred from their phylogeny will provide basal information for constructing working hypothesis of the microbial ecosystem modeling and for preparing culture media targeting these uncultured members. Such comparative analysis using nucleotide sequences are also used to check crosscontamination among each habitat. The shared OTUs between the bottom seawater community and others are shown in Figure 6C and D. In genus-level, approximately 6-8% of the total OTUs 0.05 of the Mn crusts, dead chimneys or basaltic rocks were shared with the seawater community ( Figure 6C). In family-level, 83%-100% of OTUs 0.20 of each community were shared with others ( Figure 6D), similar to the comparison among the solid samples ( Figure 6B). Given that all of the OTUs 0.05 detected in the seawater are indigenous in the seawater, these shared OTUs 0.05 observed in the solid samples are potentially contaminants from the seawater community. However, it is also possible that these shared OTUs 0.05 are different from each other in higher-similarity level (e.g., 97% or 99% similarity). Microdiverse clusters at the level of >99% similarity have been reported for marine prokaryotes such as Pelagibacter (SAR11 cluster) (Acinas et al., 2004), Marine Group I Crenarchaeota (Durbin & Teske, 2010), and for Halomonas and Marinobacter (Kaye et al., 2011). Hence, we need to be careful in concluding the shared OTUs between target and reference environments to the contaminants from the reference environment.

Concluding remarks
In this chapter, we introduce the recent bioinformatics tools for assessing the microbial diversity and biogeography. These useful tools allow us to analyze vast sequence data fast and correctly and to get the entire view of the biogeography of microbial communities in natural environments. We should use these tools for analysis of microbial biodiversity and biogeography effectively. Furthermore, both nucleotide sequencing technology and bioinformatics are developing steadily. Microbiologists, especially who study not only biogeography and biodiversity, but also evolution, ecology and biogeosciences, should always try not to overlook these advancing techniques and to apply to their studies. We applied the recent bioinformatics tools for actual data collected from deep seafloor environments. Our results provide insight into the microbial diversity and biogeography of the global deep seafloor in open oceans: for example, relationship between the community similarity and habitat types or geographic distance, commonality and difference among the communities in community-and OTU-levels. For providing persuasive explanation regarding the biogeography in the global deep seafloor, carefull collection of more molecular biological and environmental data from more seafloor habitats in various locations are needed.