A Comparative overview of next-generation sequencing technologies applied in metagenome sequencing
Nature has its ways of resolving imbalances in its environment and microorganisms are one of the best tools of nature to eliminate toxic pollutants. The process of eliminating pollutants using microbes is termed Bioremediation. Metagenomics is a strategic approach for analysing microbial communities at a genomic level. It is one of the best technological upgradation to bioremediation. Identification and screening of metagenomes from the polluted environments are crucial in a metagenomic study. This chapter emphasizes recent multiple case studies explaining the approaches of metagenomics in bioremediation in different contaminated environments such as soil, water etc. The second section explains different sequences and function-based metagenomic strategies and tools starting from providing a detailed view of metagenomic screening, FACS, and multiple advanced metagenomic sequencing strategies dealing with the prevalent metagenomes in bioremediation and giving a list of different widespread metagenomic organisms and their respective projects. Eventually, we have provided a detailed view of different major bioinformatic tools and datasets most prevalently used in metagenomic data analysis and processing during metagenomic bioremediation.
- Microbial Metagenomes
From the day humans started invading this planet, Earth has been crammed with numerous toxic pollutants from multiple sources. Advance scientific technology has given rise to multiple tools to reduce pollutants in different ways, and bioremediation is considered to be the best way to neutralise polluted environments on Earth [1, 2]. In this genomic era, metagenomic approaches have been developed and are known as effective methods of removing various kinds of pollutants [3, 4]. Metagenomics is a strategic approach of analyzing microbial communities at a genomic level. This provides a glimpse of the microbial community view of “Uncultured Microbiota”. Recent studies suggest that microbial communities are the potential alternatives to eliminate toxic contaminants from our environment [5-8]. The term metagenomics was coined by Jo Handelsman et al. in 1998. They have accessed the collective genomes and the biosynthetic machinery of soil microflora during a study of cloning the metagenome . Bioremediation has always been adapting new advances in science and technology for establishing better environments. Compared with the previous years, there has been a gradual increase of interest in metagenomics-based bioremediation studies [10-12]. These studies can prove that metagenomics is one of the best adaptations of bioremediation leading to the establishment of a pure nontoxic environment.
In this chapter, we discussed recent approaches of metagenomics in bioremediation with the help of recent multiple case studies. Preliminarily, we explained the methodology behind metagenomic analysis, starting from the sample screening and ending up with metagenomic analysis with respect to bioremediation. Metagenomic bioremediation reviews and extracts microbial communities applying their extensive biochemical pathways in degrading toxic pollutants. A part of our study aims to emphasize multiple case studies of metagenomic applications on air, water, and soil contaminations. Our analysis provided a topic-specific landscape with respect to metagenomic bioremediation of water contaminations, soil contaminations, and followed by air contaminations. The following part of our study focuses on recently developed sequence and function-based metagenomic strategies to analyze metagenomes from contaminated environments. In addition to this, our study explains the highly prevalent metagenomes derived from metagenomic communities which are also highly capable of degrading contaminations and toxins in the environment. Finally, we provided a landscape view of multiple bioinformatic tools used in the processing and analysis of metagenomic bioremediation data.
2. Applications of metagenomics in bioremediation
Environmental scientists consider metagenomic bioremediation as one of the potential tools to remove contaminants from the environment [13-15]. As cited earlier, recent multiple studies have reported metagenomic approaches in bioremediation. When this was compared with the other approaches of bioremediation, metagenomic bioremediation provided best outcomes with better degrading ratios. The results of a recent study emphasized the potential of metagenomic bacteria derived from petroleum reservoirs . In this study, microbial strains and metagenomic clones have been isolated from petroleum reservoirs, and petroleum degradation abilities were evaluated either individually or in pools using seawater artificial ecosystems. The results showed that metagenomic clones were able to biodegrade up to 94% of phenanthrene and methyl phenanthrenes with rates ranging from 55% to 70% after 21 days . The authors concluded that bacterial strains and metagenomic clones showed high petroleum-degrading potential.
Metagenomic approaches in bioremediation aid in comprehending the characteristics of bacterial communities in different kinds of contaminated environments. A metaproteogenomic study was carried out on long-term adaptation of bacterial communities in metal-contaminated sediments . The aim of this study was to understand the effect of a long-term metal exposure (110 years) on sediment microbial communities. In this study, the authors selected two freshwater sites differing by one order of magnitude in metal levels. The samples extracted from the two sites were compared by shotgun metaproteogenomics which resulted in a total of 69–118 Mpb of DNA and 943–1241 proteins. The two communities were found to be functionally very similar. However, significant genetic differences were observed for three categories: synthesis of exopolymeric substances, virulence and defense mechanisms, and elements involved in horizontal gene transfer. This study can be considered as a best example of advanced metagenomic approaches applied in bioremediation of different contaminated environments.
3. Metagenomic bioremediation of different contaminations
The environment where human activity abounds is being more polluted and contaminated by different kinds of toxic contaminants [18-20]. The contaminations are diverse and cover almost all sources of life including water, soil, and air which are considered the most important sources of life [21-23]. Metagenomic analysis is applied to multiple kinds of polluted environments primarily soil- and water-contaminated environments [24, 25].
3.1. Metagenomic bioremediation of soil contaminations
Soil contamination is a serious contamination [26, 27] as soil is considered as one of the major sources of life . Compared with other approaches of bioremediation, microbial and environmental researches are more inclined in applying metagenomic approaches to bioremediation [10, 29, 30]. A recent case study discusses the metagenomic analysis of arctic soils contaminated by high concentration of diesel in Canada . As this study was on arctic soils, the objective framed was to trace out microorganisms and their functional genes which are abundant and active during hydrocarbon degradation at cold temperature. In this study, scientists have sequenced the soil metagenome and performed reverse-transcriptase real-time PCR (RT-qPCR) to quantify the expression of several hydrocarbon-degrading genes. Pseudomonas species were detected as the most abundant organisms in diesel-contaminated soils at cold environments. RT-qPCR assays confirmed that Pseudomonas and Rhodococcus species actively expressed hydrocarbon degradation genes in arctic biopile soils. The results of this study indicated that biopile treatment leads to major shifts in soil microbial communities which favors aerobic bacteria to degrade hydrocarbons .
3.2. Metagenomic bioremediation of water contaminations
Water pollution has dramatically increased in comparison with the conditions of the 20th century [32, 33]. Metagenomic application in the bioremediation of water contamination is one of the best ways to reduce water contaminations [34-37]. Recent multiple case studies suggest that metagenomic applications have been widely used for the identification and treatment of pollutants and contaminations in the sea, ground water, and drinking water [34-37]. A recent research performed at the Gulf of Mexico beaches precisely talks about the longitudinal metagenomic analysis of water and soil affected by deepwater horizon oil spill . Approximately 7×105 cubic meters of crude oil were released into the Gulf of Mexico as a consequence of deepwater horizon drilling rig explosion, where thousands of square miles of the earth’s surface were covered in crude oil. During this study, researchers performed high throughput DNA sequencing of close-to-shore water and beach soil samples before and during the appearance of oil in Louisiana and Mississippi. The sequencing results have identified an unusual increase in the human pathogen Vibrio cholera, a sharp increase in Rickettsiales sp., and decrease of Synechococcus sp. in water samples . In addition, a metagenomic analysis was also performed for the bioremediation of hexavalent chromium-contaminated water that existed in fixed-film bioreactor . This study talks about hexavalent chromium (Cr6+) contamination from a dolomite stone mine in Limpopo Province, South Africa, causing extensive groundwater contamination. To restrict any further negative environmental impact at the site, an effective and sustainable treatment strategy for the removal of up to 6.49 mg/l Cr6+ from the groundwater was developed. The microbial community shifted in relative dominance during operation to establish an optimal metal-reducing community, including Enterobactercloacae, Flavobacterium sp. and Ralstonia sp., which achieved 100% reduction. This study provides a glimpse of effective demonstration of a biological chromium (VI) bioremediation system .
4. Metagenomic strategies and tools for bioremediation
Advanced scientific technology has given rise to the advancements in research tools applied in different fields of scientific research . These technologically advanced inventions have driven scientific researchers towards finding out some unrevealed things of nature . Multiple technologies have started getting embedded to metagenomics for a better understanding of biological and life sciences . Thus, in this section, we have discussed recent major metagenomic strategies and tools applied in the process of metagenomic bioremediation.
4.1. Screening of metagenomes from polluted environments
Identification and screening of metagenomes from polluted environments are crucial in a metagenomic study. The microbial community interaction can be detected precisely when metagenomes are finely screened from a contaminated environment. A methodology proposed from a recent study  suggested an updated technology of high throughput genetic screening of a soil metagenomic library. The study was initiated by adding a typical composition of oligonucleotide probes to soil metagenomic DNA for hybridization. The pooled radio-labeled probes were designed to target genes encoding specific enzymes. The soil metagenomic DNA of fosmid clone library were spotted on high-density membranes before the addition of oligonucleotide probes. This next step was followed by affiliation of positive hybridizing spots to the corresponding clones in the library and sequencing of metagenomic inserts.
When assembly and annotation were completed, new coding DNA sequences related to genes of interest were identified with low protein similarity against the closest hits in the databases. This work basically highlights the sensitivity of DNA/RNA hybridization techniques as an effective and complementary way to recover novel genes from large metagenomic clone libraries with respect to soil microbiota. Nevertheless, multiple molecular biological-based techniques  may also be applied during the process of metagenome extraction and screening. The basic workflow of extracting metagenomes out of contaminated soil has been explained in Fig. 1. The steps were initiated by collecting contaminated soil from the environment. The collected contaminated soil sample can be processed in two ways; one is by direct cell lysis and DNA purification and second, by separation of cells from contaminated soil and then followed by cell lysis and DNA purification. The isolated DNA is then cloned using specific cloning vectors. The cloned contaminated soil DNA is then delivered into host cells using different gene delivery systems. The multiplied host cells containing contaminated soil DNA forms a Metagenome library and these contaminated soil metagenomes were then screened. A recent study conducted screening of biosurfactant producers from petroleum hydrocarbon contaminated sources in cold marine environments. In this study, the researchers have isolated and characterized 55 biosulphant microbiota of 8 different genera including 1 Alcanivorax, 1 exiguobacterium, and 2 halomonas strains .
4.2. Florescence-Activated Cell sorting (FACS)
Florescence-activated cell sorting is one of the most widely used cell sorting techniques which is applied to sort microbial cells based on florescence during the process of metagenomic screening , with an accuracy rate of 5,000 cells per second . Figure 2 shows the schematic flow of SIGEX and intercellular biosensors methods. High-throughput screening does not require a selectable phenotype. This phenomenon has led to the focus on phenotypes such as pigments that are readily visible providing the use of fluorescence-activated cell sorting. Moreover, FACS can be used to detect expression of certain types of genes by regulation of a fluorescent biosensor present in the same cell as the metagenomic DNA [47, 48]. Hence, these screen methods will be a critical tool for rapid selection of cells from metagenomic libraries.
4.3. Metagenomic sequencing strategies
Genome sequencing technologies have been frequently upgraded  since the completion of the human genome at the beginning of the 21st century . Multiple next-generation genomic sequencing strategies are applied to sequence the metagenomes of different microbial communities [51, 52]. Sequencing technologies were initiated by the Sangers sequencing method which was widely used during the process of human genome sequencing [53, 54]. Technological drift has gifted next-generation sequencing techniques like pyrosequencing [55, 56], ligation sequencing [57, 58], reverse terminator [60, 61], and single-molecule sequence by synthesis [62, 63], providing a high throughput that reads comparatively in less time [64-66]. A comparative overview of recent sequencing technologies applied in metagenome sequencing is provided in Table 1 for a more detailed understanding. However, most metagenomics researchers prefer the pyrosequencing method for sequencing the metagenomes of microbial communities [67-70].
|Sequence Reaction Method||Read Length||Amplification||Data Production||Templates per run||Commercially Available as|
|Sanger’s Method||~900 to 1,100||PCR||1 Mb per day||96||ABI 3730xl|
|Pyrosequencing||~400||Emulsion PCR||400 Mb per run in 7.5 to 8 hours||1,000,000||454 FLX Roche|
|Reverse Terminator||36 to 175||Bridge PCR||>17 Gb per run in|
3 to 6 days
|40,000,000||Illumina SOLEXA Genome Analyser|
|Ligation Sequencing||~50||Emulsion PCR||10 to 15 Gb per|
run in 6 days
|Single Molecule Sequence by|
|30 to 35||None||21 to 28 Gb per|
run in 8 days
5. Prevalent metagenomes for bioremediation
Metagenomes extracted from uncultured microbial communities from multiple contaminant sites are screened and further identified for degrading properties . Microbial communities vary according to the characteristics of source and site of contamination . A metagenomic analysis conducted on the heavy metal-contaminated groundwater revealed metagenomes of γ- and β-Proteobacteria dominated by Rhodanobacter-like γ-proteobacterial and Burkholderia-like β-proteobacterial species from the habitat of extremely high levels of uranium, nitrate, technetium and various organic contaminants . Moreover, multiple metagenome projects have been taking place around the world; we have sorted out a list of multiple environmental metagenome projects with top microbe having the highest percentage of presence in the metagenomic community (Table 2). Studies on microbial adaptation of toxic environments may give rise to trace new metagenomic communities useful for efficient bioremediation. Specific functions and interactions of microbial communities with respect to contamination-degrading capabilities can be a result of environmental-based gene switching in the metagenomes.
|Top Phylum||Percentage of Presence in Community||Domain||Metagenome Projects||Source|
|Actinobacteria||38.04||Bacteria||BASE - Biomes of Australian Soil Environments||Soil|
|Chlorobi||56.04||Bacteria||Antarctica Aquatic Microbial Metagenome||Environmental|
|Actinobacteria||38.21||Bacteria||American Lake Mendota metagenome||Water|
|Proteobacteria||31.62||Bacteria||Swedish Lake Vattern metagenome||Water|
|Proteobacteria||29.68||Bacteria||Detoxification of arsenic mediated by microbial sulphate reduction in Mediterranean marine sediments||Environmental|
|Proteobacteria||48.12||Bacteria||Illumina and 454-based metatranscriptomic analyses of a diatom-induced bacterioplankton bloom in the North Sea||Environmental|
|Unassigned Bacteria||34.8||Bacteria||Functional metagenomic profiling of Tibetan Plateau soils affected by permafrost or seasonal freezing||Soil|
|Euryarchaeota||22.71||Archaea||Lonar Lake Sediment prokaryotic metagenome||Water|
|Unassigned Bacteria||53.84||Bacteria||Metagenome of a microbial consortium obtained from the tuna oil field in the Gippsland Basin, Australia||Environmental|
6. Bioinformatic tools for metagenomic bioremediation
In the last two decades, bioinformatics has been advanced and simultaneously adapted to multiple fields of science such as basic sciences and advanced applied sciences . Our previous study has given a glance of basic applications of bioinformatics in bioremediation . Bioinformatics holds multiple tasks in the field of metagenomic bioremediation, majorly during metagenomic data analysis [76, 77]. A special issue on bioinformatics approaches and tools for metagenomic analysis has provided an advanced view towards comprehensive bioinformatic tools and methodologies used in metagenomics .
Multiple metagenomic projects are generating a large chunk of metagenomic sequence data challenging bioinformatics to develop more robust and better tools to analyze metagenomic sequence data. A recent study reveals the metagenomic characterization of soil microbial community using metagenomic approaches . In this study, researchers have used 33 publicly available metagenomes obtained from diverse soil sites and integrated some state-of-the-art computational tools to explore the phylogenetic and functional characteristics of the microbial communities in soil. Recently, multiple advancements have taken place in the field of bioinformatics with respect to metagenomic bioremediation. In this section, most of our study focuses on recent bioinformatic tools and datasets majorly used in the analysis of metagenomic data in bioremediation. A comparative overview of functions and suitability of mostly used tools for metagenomic analysis is given in Table 3.
Meta Genome Analyzer (MEGAN) is one of the most widely used software tools for efficiently analyzing large chunks of metagenomic sequence data [80, 81]. This tool is most preferably used to interactively analyze and compare metagenomic and metatranscriptomic data, taxonomically and functionally. To perform taxonomic analysis, the program places reads onto the NCBI taxonomy and functional analysis is performed by mapping reads to the SEED, COG, and KEGG classifications. In addition, samples can be compared taxonomically and functionally, using a wide range of charting and visualization techniques like co-occurrence plots. This software also performs PCoA (Principle Coordinate Analysis) and clustering methods allowing high-level comparison of large numbers of samples . Different attributes of the samples can be captured and used during analysis. Moreover, MEGAN supports different input formats of data and is capable of exporting the results of analysis in different text-based and graphical formats. Multiple methods of analysis, acceptance and comparison of high throughput data, robustness and being easy-to-handle are some of the features that made MEGAN as one of the most used metagenome analyzers.
Simple Metagenomics Analysis SHell for microbial communities (SmashCommunity) is a stand-alone metagenomic annotation and analysis pipeline that shares design principles and routines with SmashCell . It is suitable for data delivered from Sanger and 454 sequencing technologies. It supports state-of-the-art software for essential metagenomic tasks such as assembly and gene prediction. It also provides tools to estimate the quantitative phylogenetic and functional compositions of metagenomes, to compare compositions of multiple metagenomes, and to produce intuitive visual representations of such analyses . It provides optimized parameter sets for Arachne and Celera for metagenome assembly, and GeneMark and MetaGene for predicting protein coding genes on metagenomes. SmashCommunity also includes scripts for downstream analysis of datasets. They can generate intuitive tree-based visualizations of results using the batch access API of the interactive Tree of Life (iTOL) web tool. SmashCommunity can also compare multiple metagenomes using these profiles, cluster them based on a relative entropy-based distance measure suitable for comparing such quantitative profiles, perform bootstrap analysis of the clustering, and generate visual representation of the clustering results.
Community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis (CAMERA) is a database and associated computational infrastructure that provides a single system for depositing, locating, analyzing, visualizing, and sharing data about microbial biology through an advanced web-based analysis portal . CAMERA holds a huge chunk of data including environmental metagenomic and genomic sequence data, associated environmental parameters, pre-computed search results, and software tools to support powerful cross-analysis of environmental samples. CAMERA works on a pattern of collecting and linking metadata relevant to environmental metagenome datasets with annotation in a semantically aware environment that allows users to write expressive semantic queries to the database. It also provides data submission tools to allow researchers to share and forward data to other metagenomic sites and community data archives. CAMERA can be best considered as a complete genome-analysis tool allowing users to query, analyze, annotate, and compare metagenome and genome data .
Rapid Annotation using Subsystems Technology for Metagenomes (MG-RAST) is an automated analysis platform for metagenomes, providing quantitative insights into microbial populations based on sequence data . This pipeline performs quality control, protein prediction, clustering, and similarity-based annotation on nucleic acid sequence datasets using a number of bioinformatic tools. Users can upload raw sequence data in FASTA format; the sequences will be normalized and processed, and summaries will be automatically generated. The MG-RAST server provides several methods of access to different data types, including phylogenetic and metabolic reconstructions, and has the ability to compare metabolism and annotations of one or more metagenomes and genomes. In addition, the server also offers a comprehensive search capability. The pipeline is implemented in Perl by using a number of open-source components, including the SEED framework, NCBI BLAST, SQLite, and Sun Grid Engine.
|Single Cell Genomes||-||-||-||-||-||-|
|16S rDNA Metagenomes||+||+||-||+||+||+|
Integrated Microbial Genomes and Metagenomes (IMG/M) system supports annotation, analysis, and distribution of microbial genome and metagenome datasets. IMG/M provides comparative data using analytical tools extended to handle metagenome data, together with metagenome-specific analysis [88, 89]. IMG/M consists of samples of microbial community aggregate genomes integrated with IMG’s comprehensive set of genomes from all three domains of life: plasmids, viruses, and genome fragments. Function-based comparison of metagenome samples and genomes is provided by analytical tools that allow examination of the relative abundance of protein families, functional families or functional categories across metagenome samples and genomes. It seems like registered users can gain more advantage out of IMG/M as the tools focus on handling substantially larger metagenome datasets, are available only to registered users as part of the ‘My IMG’ toolkit, and support specifying, managing, and analyzing persistent sets of genes, functions, genomes or metagenome samples and scaffolds.
Metagenomics is a strategic approach for analyzing microbial communities at a genomic level. This gives a glimpse towards the microbial community view of “Uncultured Microbiota”. Bioremediation has always been adapting new advances in science and technology for establishing better environments, and metagenomics can be considered as one of the best adaptations ever. Identification and screening of metagenomes from the polluted environments are crucial in a metagenomic study. The second section emphasizes recent multiple case studies explaining the approaches of metagenomics in bioremediation. Accordingly, the third section speaks about metagenomic bioremediation in different contaminated environments such as soil and water. The fourth section explains different sequences and function-based metagenomic strategies and tools starting from providing a detailed view of metagenomic screening, FACS, and multiple advanced metagenomic sequencing strategies. The fifth section deals with the prevalent metagenomes in bioremediation giving a list of different prevalent metagenomic organisms and their respective projects. The last section gives a detailed view of different major bioinformatic tools and datasets most prevalently used in metagenomic data analysis and processing during metagenomic bioremediation.