Open access peer-reviewed chapter

Metagenomics — A Technological Drift in Bioremediation

By Pratap Devarapalli and Ranjith N. Kumavath

Submitted: September 25th 2014Reviewed: May 5th 2015Published: September 9th 2015

DOI: 10.5772/60749

Downloaded: 1715


Nature has its ways of resolving imbalances in its environment and microorganisms are one of the best tools of nature to eliminate toxic pollutants. The process of eliminating pollutants using microbes is termed Bioremediation. Metagenomics is a strategic approach for analysing microbial communities at a genomic level. It is one of the best technological upgradation to bioremediation. Identification and screening of metagenomes from the polluted environments are crucial in a metagenomic study. This chapter emphasizes recent multiple case studies explaining the approaches of metagenomics in bioremediation in different contaminated environments such as soil, water etc. The second section explains different sequences and function-based metagenomic strategies and tools starting from providing a detailed view of metagenomic screening, FACS, and multiple advanced metagenomic sequencing strategies dealing with the prevalent metagenomes in bioremediation and giving a list of different widespread metagenomic organisms and their respective projects. Eventually, we have provided a detailed view of different major bioinformatic tools and datasets most prevalently used in metagenomic data analysis and processing during metagenomic bioremediation.


  • Metagenomics
  • Bioremediation
  • Microbial Metagenomes
  • Bioinformatics
  • Pollution

1. Introduction

From the day humans started invading this planet, Earth has been crammed with numerous toxic pollutants from multiple sources. Advance scientific technology has given rise to multiple tools to reduce pollutants in different ways, and bioremediation is considered to be the best way to neutralise polluted environments on Earth [1, 2]. In this genomic era, metagenomic approaches have been developed and are known as effective methods of removing various kinds of pollutants [3, 4]. Metagenomics is a strategic approach of analyzing microbial communities at a genomic level. This provides a glimpse of the microbial community view of “Uncultured Microbiota”. Recent studies suggest that microbial communities are the potential alternatives to eliminate toxic contaminants from our environment [5-8]. The term metagenomics was coined by Jo Handelsman et al. in 1998. They have accessed the collective genomes and the biosynthetic machinery of soil microflora during a study of cloning the metagenome [9]. Bioremediation has always been adapting new advances in science and technology for establishing better environments. Compared with the previous years, there has been a gradual increase of interest in metagenomics-based bioremediation studies [10-12]. These studies can prove that metagenomics is one of the best adaptations of bioremediation leading to the establishment of a pure nontoxic environment.

In this chapter, we discussed recent approaches of metagenomics in bioremediation with the help of recent multiple case studies. Preliminarily, we explained the methodology behind metagenomic analysis, starting from the sample screening and ending up with metagenomic analysis with respect to bioremediation. Metagenomic bioremediation reviews and extracts microbial communities applying their extensive biochemical pathways in degrading toxic pollutants. A part of our study aims to emphasize multiple case studies of metagenomic applications on air, water, and soil contaminations. Our analysis provided a topic-specific landscape with respect to metagenomic bioremediation of water contaminations, soil contaminations, and followed by air contaminations. The following part of our study focuses on recently developed sequence and function-based metagenomic strategies to analyze metagenomes from contaminated environments. In addition to this, our study explains the highly prevalent metagenomes derived from metagenomic communities which are also highly capable of degrading contaminations and toxins in the environment. Finally, we provided a landscape view of multiple bioinformatic tools used in the processing and analysis of metagenomic bioremediation data.

2. Applications of metagenomics in bioremediation

Environmental scientists consider metagenomic bioremediation as one of the potential tools to remove contaminants from the environment [13-15]. As cited earlier, recent multiple studies have reported metagenomic approaches in bioremediation. When this was compared with the other approaches of bioremediation, metagenomic bioremediation provided best outcomes with better degrading ratios. The results of a recent study emphasized the potential of metagenomic bacteria derived from petroleum reservoirs [16]. In this study, microbial strains and metagenomic clones have been isolated from petroleum reservoirs, and petroleum degradation abilities were evaluated either individually or in pools using seawater artificial ecosystems. The results showed that metagenomic clones were able to biodegrade up to 94% of phenanthrene and methyl phenanthrenes with rates ranging from 55% to 70% after 21 days [16]. The authors concluded that bacterial strains and metagenomic clones showed high petroleum-degrading potential.

Metagenomic approaches in bioremediation aid in comprehending the characteristics of bacterial communities in different kinds of contaminated environments. A metaproteogenomic study was carried out on long-term adaptation of bacterial communities in metal-contaminated sediments [17]. The aim of this study was to understand the effect of a long-term metal exposure (110 years) on sediment microbial communities. In this study, the authors selected two freshwater sites differing by one order of magnitude in metal levels. The samples extracted from the two sites were compared by shotgun metaproteogenomics which resulted in a total of 69–118 Mpb of DNA and 943–1241 proteins. The two communities were found to be functionally very similar. However, significant genetic differences were observed for three categories: synthesis of exopolymeric substances, virulence and defense mechanisms, and elements involved in horizontal gene transfer. This study can be considered as a best example of advanced metagenomic approaches applied in bioremediation of different contaminated environments.

3. Metagenomic bioremediation of different contaminations

The environment where human activity abounds is being more polluted and contaminated by different kinds of toxic contaminants [18-20]. The contaminations are diverse and cover almost all sources of life including water, soil, and air which are considered the most important sources of life [21-23]. Metagenomic analysis is applied to multiple kinds of polluted environments primarily soil- and water-contaminated environments [24, 25].

3.1. Metagenomic bioremediation of soil contaminations

Soil contamination is a serious contamination [26, 27] as soil is considered as one of the major sources of life [28]. Compared with other approaches of bioremediation, microbial and environmental researches are more inclined in applying metagenomic approaches to bioremediation [10, 29, 30]. A recent case study discusses the metagenomic analysis of arctic soils contaminated by high concentration of diesel in Canada [31]. As this study was on arctic soils, the objective framed was to trace out microorganisms and their functional genes which are abundant and active during hydrocarbon degradation at cold temperature. In this study, scientists have sequenced the soil metagenome and performed reverse-transcriptase real-time PCR (RT-qPCR) to quantify the expression of several hydrocarbon-degrading genes. Pseudomonas species were detected as the most abundant organisms in diesel-contaminated soils at cold environments. RT-qPCR assays confirmed that Pseudomonas and Rhodococcus species actively expressed hydrocarbon degradation genes in arctic biopile soils. The results of this study indicated that biopile treatment leads to major shifts in soil microbial communities which favors aerobic bacteria to degrade hydrocarbons [31].

3.2. Metagenomic bioremediation of water contaminations

Water pollution has dramatically increased in comparison with the conditions of the 20th century [32, 33]. Metagenomic application in the bioremediation of water contamination is one of the best ways to reduce water contaminations [34-37]. Recent multiple case studies suggest that metagenomic applications have been widely used for the identification and treatment of pollutants and contaminations in the sea, ground water, and drinking water [34-37]. A recent research performed at the Gulf of Mexico beaches precisely talks about the longitudinal metagenomic analysis of water and soil affected by deepwater horizon oil spill [34]. Approximately 7×105 cubic meters of crude oil were released into the Gulf of Mexico as a consequence of deepwater horizon drilling rig explosion, where thousands of square miles of the earth’s surface were covered in crude oil. During this study, researchers performed high throughput DNA sequencing of close-to-shore water and beach soil samples before and during the appearance of oil in Louisiana and Mississippi. The sequencing results have identified an unusual increase in the human pathogen Vibrio cholera, a sharp increase in Rickettsiales sp., and decrease of Synechococcus sp. in water samples [34]. In addition, a metagenomic analysis was also performed for the bioremediation of hexavalent chromium-contaminated water that existed in fixed-film bioreactor [38]. This study talks about hexavalent chromium (Cr6+) contamination from a dolomite stone mine in Limpopo Province, South Africa, causing extensive groundwater contamination. To restrict any further negative environmental impact at the site, an effective and sustainable treatment strategy for the removal of up to 6.49 mg/l Cr6+ from the groundwater was developed. The microbial community shifted in relative dominance during operation to establish an optimal metal-reducing community, including Enterobactercloacae, Flavobacterium sp. and Ralstonia sp., which achieved 100% reduction. This study provides a glimpse of effective demonstration of a biological chromium (VI) bioremediation system [38].

4. Metagenomic strategies and tools for bioremediation

Advanced scientific technology has given rise to the advancements in research tools applied in different fields of scientific research [39]. These technologically advanced inventions have driven scientific researchers towards finding out some unrevealed things of nature [40]. Multiple technologies have started getting embedded to metagenomics for a better understanding of biological and life sciences [41]. Thus, in this section, we have discussed recent major metagenomic strategies and tools applied in the process of metagenomic bioremediation.

4.1. Screening of metagenomes from polluted environments

Identification and screening of metagenomes from polluted environments are crucial in a metagenomic study. The microbial community interaction can be detected precisely when metagenomes are finely screened from a contaminated environment. A methodology proposed from a recent study [42] suggested an updated technology of high throughput genetic screening of a soil metagenomic library. The study was initiated by adding a typical composition of oligonucleotide probes to soil metagenomic DNA for hybridization. The pooled radio-labeled probes were designed to target genes encoding specific enzymes. The soil metagenomic DNA of fosmid clone library were spotted on high-density membranes before the addition of oligonucleotide probes. This next step was followed by affiliation of positive hybridizing spots to the corresponding clones in the library and sequencing of metagenomic inserts.

Figure 1.

An ideal systematic workflow of steps involved in contaminated-soil metagenomics.

When assembly and annotation were completed, new coding DNA sequences related to genes of interest were identified with low protein similarity against the closest hits in the databases. This work basically highlights the sensitivity of DNA/RNA hybridization techniques as an effective and complementary way to recover novel genes from large metagenomic clone libraries with respect to soil microbiota. Nevertheless, multiple molecular biological-based techniques [43] may also be applied during the process of metagenome extraction and screening. The basic workflow of extracting metagenomes out of contaminated soil has been explained in Fig. 1. The steps were initiated by collecting contaminated soil from the environment. The collected contaminated soil sample can be processed in two ways; one is by direct cell lysis and DNA purification and second, by separation of cells from contaminated soil and then followed by cell lysis and DNA purification. The isolated DNA is then cloned using specific cloning vectors. The cloned contaminated soil DNA is then delivered into host cells using different gene delivery systems. The multiplied host cells containing contaminated soil DNA forms a Metagenome library and these contaminated soil metagenomes were then screened. A recent study conducted screening of biosurfactant producers from petroleum hydrocarbon contaminated sources in cold marine environments. In this study, the researchers have isolated and characterized 55 biosulphant microbiota of 8 different genera including 1 Alcanivorax, 1 exiguobacterium, and 2 halomonas strains [44].

4.2. Florescence-Activated Cell sorting (FACS)

Florescence-activated cell sorting is one of the most widely used cell sorting techniques which is applied to sort microbial cells based on florescence during the process of metagenomic screening [45], with an accuracy rate of 5,000 cells per second [46]. Figure 2 shows the schematic flow of SIGEX and intercellular biosensors methods. High-throughput screening does not require a selectable phenotype. This phenomenon has led to the focus on phenotypes such as pigments that are readily visible providing the use of fluorescence-activated cell sorting. Moreover, FACS can be used to detect expression of certain types of genes by regulation of a fluorescent biosensor present in the same cell as the metagenomic DNA [47, 48]. Hence, these screen methods will be a critical tool for rapid selection of cells from metagenomic libraries.

Figure 2.

Systematic workflow representing the examples of high-throughput screens of (A) SIGEX and an (B) intracellular biosensor. SIGEX exploits the principle that catabolic genes are often substrate-induced by fusing a promoterless GFP to the metagenomic DNA and identifying clones in which GFP production is induced by the substrate of interest. An intracellular biosensor detects biologically active small molecules. GFP expression is dependent on the presence of a small molecule that activates a regulator. Finally, FACS is used to sort the GFP+ and GFP- cells separately.

4.3. Metagenomic sequencing strategies

Genome sequencing technologies have been frequently upgraded [49] since the completion of the human genome at the beginning of the 21st century [50]. Multiple next-generation genomic sequencing strategies are applied to sequence the metagenomes of different microbial communities [51, 52]. Sequencing technologies were initiated by the Sangers sequencing method which was widely used during the process of human genome sequencing [53, 54]. Technological drift has gifted next-generation sequencing techniques like pyrosequencing [55, 56], ligation sequencing [57, 58], reverse terminator [60, 61], and single-molecule sequence by synthesis [62, 63], providing a high throughput that reads comparatively in less time [64-66]. A comparative overview of recent sequencing technologies applied in metagenome sequencing is provided in Table 1 for a more detailed understanding. However, most metagenomics researchers prefer the pyrosequencing method for sequencing the metagenomes of microbial communities [67-70].

Sequence Reaction MethodRead LengthAmplificationData ProductionTemplates per runCommercially Available as
Sanger’s Method~900 to 1,100PCR1 Mb per day96ABI 3730xl
Pyrosequencing~400Emulsion PCR400 Mb per run in 7.5 to 8 hours1,000,000454 FLX Roche
Reverse Terminator36 to 175Bridge PCR>17 Gb per run in
3 to 6 days
40,000,000Illumina SOLEXA Genome Analyser
Ligation Sequencing~50Emulsion PCR10 to 15 Gb per
run in 6 days
85,000,000ABI SOLiD
Single Molecule Sequence by
30 to 35None21 to 28 Gb per
run in 8 days
800,000,000Helicos Heliscope

Table 1.

A Comparative overview of next-generation sequencing technologies applied in metagenome sequencing

5. Prevalent metagenomes for bioremediation

Metagenomes extracted from uncultured microbial communities from multiple contaminant sites are screened and further identified for degrading properties [71]. Microbial communities vary according to the characteristics of source and site of contamination [72]. A metagenomic analysis conducted on the heavy metal-contaminated groundwater revealed metagenomes of γ- and β-Proteobacteria dominated by Rhodanobacter-like γ-proteobacterial and Burkholderia-like β-proteobacterial species from the habitat of extremely high levels of uranium, nitrate, technetium and various organic contaminants [73]. Moreover, multiple metagenome projects have been taking place around the world; we have sorted out a list of multiple environmental metagenome projects with top microbe having the highest percentage of presence in the metagenomic community (Table 2). Studies on microbial adaptation of toxic environments may give rise to trace new metagenomic communities useful for efficient bioremediation. Specific functions and interactions of microbial communities with respect to contamination-degrading capabilities can be a result of environmental-based gene switching in the metagenomes.

Top PhylumPercentage of Presence in CommunityDomainMetagenome ProjectsSource
Actinobacteria38.04BacteriaBASE - Biomes of Australian Soil EnvironmentsSoil
Chlorobi56.04BacteriaAntarctica Aquatic Microbial MetagenomeEnvironmental
Actinobacteria38.21BacteriaAmerican Lake Mendota metagenomeWater
Proteobacteria31.62BacteriaSwedish Lake Vattern metagenomeWater
Proteobacteria29.68BacteriaDetoxification of arsenic mediated by microbial sulphate reduction in Mediterranean marine sedimentsEnvironmental
Proteobacteria48.12BacteriaIllumina and 454-based metatranscriptomic analyses of a diatom-induced bacterioplankton bloom in the North SeaEnvironmental
Unassigned Bacteria34.8BacteriaFunctional metagenomic profiling of Tibetan Plateau soils affected by permafrost or seasonal freezingSoil
Euryarchaeota22.71ArchaeaLonar Lake Sediment prokaryotic metagenomeWater
Unassigned Bacteria53.84BacteriaMetagenome of a microbial consortium obtained from the tuna oil field in the Gippsland Basin, AustraliaEnvironmental
Actinobacteria27.1BacteriaMeta soilSoil

Table 2.

List of multiple environmental metagenome projects with top microbe having the highest percentage of presence in the metagenomic community

6. Bioinformatic tools for metagenomic bioremediation

In the last two decades, bioinformatics has been advanced and simultaneously adapted to multiple fields of science such as basic sciences and advanced applied sciences [74]. Our previous study has given a glance of basic applications of bioinformatics in bioremediation [75]. Bioinformatics holds multiple tasks in the field of metagenomic bioremediation, majorly during metagenomic data analysis [76, 77]. A special issue on bioinformatics approaches and tools for metagenomic analysis has provided an advanced view towards comprehensive bioinformatic tools and methodologies used in metagenomics [78].

Multiple metagenomic projects are generating a large chunk of metagenomic sequence data challenging bioinformatics to develop more robust and better tools to analyze metagenomic sequence data. A recent study reveals the metagenomic characterization of soil microbial community using metagenomic approaches [79]. In this study, researchers have used 33 publicly available metagenomes obtained from diverse soil sites and integrated some state-of-the-art computational tools to explore the phylogenetic and functional characteristics of the microbial communities in soil. Recently, multiple advancements have taken place in the field of bioinformatics with respect to metagenomic bioremediation. In this section, most of our study focuses on recent bioinformatic tools and datasets majorly used in the analysis of metagenomic data in bioremediation. A comparative overview of functions and suitability of mostly used tools for metagenomic analysis is given in Table 3.

6.1. MEGAN

Meta Genome Analyzer (MEGAN) is one of the most widely used software tools for efficiently analyzing large chunks of metagenomic sequence data [80, 81]. This tool is most preferably used to interactively analyze and compare metagenomic and metatranscriptomic data, taxonomically and functionally. To perform taxonomic analysis, the program places reads onto the NCBI taxonomy and functional analysis is performed by mapping reads to the SEED, COG, and KEGG classifications. In addition, samples can be compared taxonomically and functionally, using a wide range of charting and visualization techniques like co-occurrence plots. This software also performs PCoA (Principle Coordinate Analysis) and clustering methods allowing high-level comparison of large numbers of samples [82]. Different attributes of the samples can be captured and used during analysis. Moreover, MEGAN supports different input formats of data and is capable of exporting the results of analysis in different text-based and graphical formats. Multiple methods of analysis, acceptance and comparison of high throughput data, robustness and being easy-to-handle are some of the features that made MEGAN as one of the most used metagenome analyzers.

6.2. SmashCommunity

Simple Metagenomics Analysis SHell for microbial communities (SmashCommunity) is a stand-alone metagenomic annotation and analysis pipeline that shares design principles and routines with SmashCell [83]. It is suitable for data delivered from Sanger and 454 sequencing technologies. It supports state-of-the-art software for essential metagenomic tasks such as assembly and gene prediction. It also provides tools to estimate the quantitative phylogenetic and functional compositions of metagenomes, to compare compositions of multiple metagenomes, and to produce intuitive visual representations of such analyses [84]. It provides optimized parameter sets for Arachne and Celera for metagenome assembly, and GeneMark and MetaGene for predicting protein coding genes on metagenomes. SmashCommunity also includes scripts for downstream analysis of datasets. They can generate intuitive tree-based visualizations of results using the batch access API of the interactive Tree of Life (iTOL) web tool. SmashCommunity can also compare multiple metagenomes using these profiles, cluster them based on a relative entropy-based distance measure suitable for comparing such quantitative profiles, perform bootstrap analysis of the clustering, and generate visual representation of the clustering results.


Community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis (CAMERA) is a database and associated computational infrastructure that provides a single system for depositing, locating, analyzing, visualizing, and sharing data about microbial biology through an advanced web-based analysis portal [85]. CAMERA holds a huge chunk of data including environmental metagenomic and genomic sequence data, associated environmental parameters, pre-computed search results, and software tools to support powerful cross-analysis of environmental samples. CAMERA works on a pattern of collecting and linking metadata relevant to environmental metagenome datasets with annotation in a semantically aware environment that allows users to write expressive semantic queries to the database. It also provides data submission tools to allow researchers to share and forward data to other metagenomic sites and community data archives. CAMERA can be best considered as a complete genome-analysis tool allowing users to query, analyze, annotate, and compare metagenome and genome data [86].

6.4. MG-RAST

Rapid Annotation using Subsystems Technology for Metagenomes (MG-RAST) is an automated analysis platform for metagenomes, providing quantitative insights into microbial populations based on sequence data [87]. This pipeline performs quality control, protein prediction, clustering, and similarity-based annotation on nucleic acid sequence datasets using a number of bioinformatic tools. Users can upload raw sequence data in FASTA format; the sequences will be normalized and processed, and summaries will be automatically generated. The MG-RAST server provides several methods of access to different data types, including phylogenetic and metabolic reconstructions, and has the ability to compare metabolism and annotations of one or more metagenomes and genomes. In addition, the server also offers a comprehensive search capability. The pipeline is implemented in Perl by using a number of open-source components, including the SEED framework, NCBI BLAST, SQLite, and Sun Grid Engine.

Single Genomes--++--
Single Cell Genomes------
Shotgun Metagenomes+++++-
16S rDNA Metagenomes++-+++
Gene Prediction+-++--
Functional Analysis+++++-
Taxonomy Assignment++++++
Comparative Analyses++++++
Data Management+-+---

Table 3.

A comparative overview of functions and suitability of mostly used tools for metagenomic analysis

+Yes, -No

6.5. IMG/M

Integrated Microbial Genomes and Metagenomes (IMG/M) system supports annotation, analysis, and distribution of microbial genome and metagenome datasets. IMG/M provides comparative data using analytical tools extended to handle metagenome data, together with metagenome-specific analysis [88, 89]. IMG/M consists of samples of microbial community aggregate genomes integrated with IMG’s comprehensive set of genomes from all three domains of life: plasmids, viruses, and genome fragments. Function-based comparison of metagenome samples and genomes is provided by analytical tools that allow examination of the relative abundance of protein families, functional families or functional categories across metagenome samples and genomes. It seems like registered users can gain more advantage out of IMG/M as the tools focus on handling substantially larger metagenome datasets, are available only to registered users as part of the ‘My IMG’ toolkit, and support specifying, managing, and analyzing persistent sets of genes, functions, genomes or metagenome samples and scaffolds.

7. Summary

Metagenomics is a strategic approach for analyzing microbial communities at a genomic level. This gives a glimpse towards the microbial community view of “Uncultured Microbiota”. Bioremediation has always been adapting new advances in science and technology for establishing better environments, and metagenomics can be considered as one of the best adaptations ever. Identification and screening of metagenomes from the polluted environments are crucial in a metagenomic study. The second section emphasizes recent multiple case studies explaining the approaches of metagenomics in bioremediation. Accordingly, the third section speaks about metagenomic bioremediation in different contaminated environments such as soil and water. The fourth section explains different sequences and function-based metagenomic strategies and tools starting from providing a detailed view of metagenomic screening, FACS, and multiple advanced metagenomic sequencing strategies. The fifth section deals with the prevalent metagenomes in bioremediation giving a list of different prevalent metagenomic organisms and their respective projects. The last section gives a detailed view of different major bioinformatic tools and datasets most prevalently used in metagenomic data analysis and processing during metagenomic bioremediation.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Pratap Devarapalli and Ranjith N. Kumavath (September 9th 2015). Metagenomics — A Technological Drift in Bioremediation, Advances in Bioremediation of Wastewater and Polluted Soil, Naofumi Shiomi, IntechOpen, DOI: 10.5772/60749. Available from:

chapter statistics

1715total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

Biosurfactants as Useful Tools in Bioremediation

By Claudia Isabel Sáenz-Marta, María de Lourdes Ballinas-Casarrubias, Blanca E. Rivera-Chavira and Guadalupe Virginia Nevárez-Moorillón

Related Book

First chapter

Introductory Chapter: Design of an Ideal Diet Using Common Foods

By Naofumi Shiomi

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More about us