Open access peer-reviewed chapter - ONLINE FIRST

Soil Metagenomics: Prospects and Challenges

By Prashant Kaushik, Opinder Singh Sandhu, Navjot Singh Brar, Vivek Kumar, Gurdeep Singh Malhi, Hari Kesh and Ishan Saini

Submitted: June 25th 2020Reviewed: July 2nd 2020Published: September 4th 2020

DOI: 10.5772/intechopen.93306

Downloaded: 40


The better strategies to examine RNA or DNA from soil allow us to understand the microbial diversity and features in the soil, which are challenging to identify by typical culture techniques. In this direction, the literature on soil metagenomics and its usefulness is ever-increasing and so as its implementation experiences. Omics techniques are going to assist the metagenomics in achieving agricultural sustainability. In doing so, essential understanding on the reference soil would serve to help upcoming soil survey initiatives, lessening bias and raising objectivity. Although the interpretation of limited details has influenced microbial ecologists, the scope of methodological bias remains unfamiliar. A detailed catalog of functional genes and soil microorganisms does not yet exist for any soil. Overall, this chapter provides thoughts related to the soil metagenomics, its importance, and conventional methods of analysis, along with prospects and challenges of soil metagenomics.


  • genomics
  • soil
  • microbes
  • metagenomics

1. Introduction

Soil is a robust and brilliantly vast ecosystem (2000–8.3 million bacterial species per gram). Therefore, it serves as a vast reservoir for microorganisms inhabiting in a niche that is different within the specific soil ecosystem, which can be pathogenic or beneficial [1, 2, 3, 4]. Each proportion of soil whether in grasslands, forests, or deserts (i.e., sand, silt, clay, and organic matter) offers habitats for nematodes and a large number of microbes that vary from bacteria and are also useful in nutrient cycling [5, 6, 7, 8]. Moreover, the distinct microhabitat dwelled by microorganisms with the capability to adjust and established their colony to the specific niche [9]. The crucial factors which influence the microbial load in the soil ecosystem include soil pH, organic compound, and temperature [10, 11, 12]. The chemical or physical activity does not merely determine the development of soil but the constant unfolding of different microbial species, which include or may improve the attributes of soils, regarding the development of function and structure [13, 14]. Soil supplies protection to different soil harboring animals, reptiles, and insects, along with a tremendous number of microbes inside the soil aggregates [15].

In this direction, the field of metagenomics continues to be a ground-breaking technology, which has made it possible to explore microbial diversity with its full potential [16]. Besides the soil ecosystem, microbes could quickly react to anthropogenic pressures, making it feasible to be an indication of soil quality as well as wellness [17, 18]. Lately, efforts have been attempted to determine genes from environmental samples via culture-independent techniques [19, 20]. However, they had been amplified or perhaps recognized due to their similarity to the earlier identified genes, that invalidates for exploiting novel elements of metal resistance [21, 22]. As the development of culture-independent metagenomic methods, it has been employed to evaluate the soil microbial community as well as enhances our awareness of the soil ecosystem [17, 20]. Furthermore, the soil microbial communities are primarily made up of some dominant species and numerous other rare taxa [23].

The ones with a low abundance might be from some novel microbial lineages and might play a vital function in biogeochemical interactions of the soil–plant system [24]. Therefore, the information obtained from the full metagenomic sequencing is crucial to expose the genomic data of low abundance populations as well as to disclose their activity in the soil [6]. It has been effectively released into investigating numerous varied microbial niches in the human gut, grassland soil, and aquatic ecosystem [25, 26, 27]. Furthermore, attempts have been established to evaluate the abundance of soil microbes as well as the genes involved in heavy metals’ opposition from agricultural soils [28, 29]. Additionally, soil metagenomics beyond estimating the soil microorganisms can also help in getting a concept about the soil and its habitat based on the different soil types [30]. This chapter provides the importance of soil metagenomics and standard methods of analysis, along with challenges and prospects of soil metagenomics.

2. Soil health and metagenomics

Soil is an interconnected system because of its microorganisms despite getting incredible and unique capability to adjust to life changes; soil microbes are hypersensitive to land management and also weather changes [1, 11, 14]. Based on this information, our ancestors learned the ability to grow plants and created different cultivation methods like inoculating mycorrhizal fungi with food and floral crops to decrease the impact of soil-borne diseases [31, 32, 33, 34]. With a most varied ecosystem with a composition of known and unknown microbial species, the soil provides an ecological niche [34, 35]. The biochemistry of soil reflects many anonymous functions that are a lot essential for sustenance of life [35, 36, 37]. Nevertheless, the latest technologies utilizing heavy machinery and management methods intensified agriculture and have resulted in the degeneration of the cultivable farmlands through damage of fertility, soil structure as well as the soil microbial life [38, 39, 40]. In a nutshell, lots of arable areas have switched to uncultivable or saline soil [38]. Agricultural land is simultaneously getting forfeited to nonagricultural uses [39]. The generation of soil, which primarily contains carbon twice as much as the atmosphere, is a complex phenomenon and requires lots of years for the formation of 1 cm topsoil [41]. Metagenomics data can be used to investigate the gene sequencing helpful in microbial symbiosis, as this is the most ancient symbiosis of nature of around 400 million years [42]. With the increased population pressure, the concern around worldwide sustainability also increases. Therefore, improving and sustaining the qualities of soil is an utmost concern for many years. Thus, soil health gets among the most crucial aspects of agriculture [8, 24].

Metagenomics offers an entirely new method of looking at the microbial community that has transformed contemporary microbiology and also has the potential to revolutionize comprehension approaches of the various ecosystems [43, 44]. In metagenomics, the strength of genomic examination is put on to whole populations of microorganisms [45]. Metagenomics approaches are throwing light on the myriad abilities of microbial communities that operate the planet’s energy and nutrient cycles and form the evolution of life [46, 47]. Metagenomics is anticipated to produce awareness of microbial interactions; therefore, it is used to enhance human well-being, energy production, and food security [48]. Metagenomics combines the strength of genomics, systems biology and bioinformatics and power of genomics within the research of communities generates an unparalleled ability [43, 45]. Metagenomics, still a very new science, but has produced insightful information about the microbial community due to its radically unique means of realizing the microbial world [49, 50]. The diverse test of DNA may subsequently be analyzed directly, or even cloned into a type maintainable in lab bacteria, developing a library which has the genomes of all of the microbes present in that environment [51, 52, 53]. Nevertheless, the launch of the culture-independent approaches eliminates the obstacles and barriers in understanding the environmental samples [30].

Metagenomics initially targeted the shotgun sequencing; these days it’s just as helpful for the scientific studies regarding marker genes viz. 16S rRNA by employing NGS (next-generation sequencing) systems, by extracting the specific region of DNA encoding 16S rRNA which is then amplified, sequenced, and identified based on similarities in gene sequence available in public databases [54, 55]. NGS, along with polymerase chain reaction (PCR), and DNA fingerprinting techniques have become increasingly rapid, effective, sensitive, and cost-efficient [55]. Culture-independent tactics are needed on the immediate extraction of soil DNA and later check out the genes encoding rRNA [56]. The exploration of following generation sequencing as well as analysis has accomplished in revealing the undiscovered microbial framework in a variety of earth ecosystems [57, 58, 59]. A comprehensive research of the soil metagenome provided the useful characterization of soil microorganisms linked to the genes in nutrient cycling [58]. Nevertheless, efforts are now being directed in exploring the predictions of gene operates in conditions of the actual role of theirs in situ, particularly in the soil, where metagenomes can easily be caught within biofilms [60, 61].

3. The need for microbial identification and characterization

It is well recognized that the microorganisms have an abundant quantity and diversity than other organisms on the planet [4]. Nevertheless, the division of the microbial diversity at global scales is still partially understood. The microbial diversity and composition structure are significantly affected by environmental elements [62]. As a result, indexing, cataloging and proof of the microorganisms are prerequisites for the exploration [62]. Microbial diversity in any habitat is more related to the substantial amount of species existed at a specific time [63, 64]. As the earth microbial community plays essential roles in soil health management, agro-ecosystem, accessibility of growing nutrition as well as turnover tasks of organic material in soil, they are hugely influenced by both anthropogenic and natural activities [65, 66]. For instance, many microbes that are helpful to the ecosystem services are currently threatened because of inferior agricultural practices, local weather transforming patterns, ground as well as land degradation, etc. [67, 68]. In recent years, the use of artificial fertilizers, herbicides, fungicides, and other pesticides has resulted in the deterioration of the soil microflora and diversity [7, 31, 69]. Therefore, the microorganisms with the changing atmosphere will offer a broader picture of the way the microbes are shifting the functional characteristics of soils and their flourishing in the endangered ecosystems [69, 70].

4. Metagenomics for sustainable agricultural practices

Nowadays, most of the environmental focus in agriculture is on achieving agricultural sustainability. Many metagenomic initiatives have been completed in the area of agriculture but do not hold some promise to assist the marginal farmers [71, 72, 73]. Therefore, productive scientific studies are required, which might be used the growers’ income and help agriculture [74]. The latest advances in the soil metagenomics emerge as an extraordinary area of research because of the assignment of understanding the associated microorganisms in development and plant growth [75, 76]. Likewise, restoration of the microbial population was determined to improve grain yield as well as soil health [77]. Metagenomics can predict the soil microbes’ structure and the impact on microbial groups of connected niches [9, 35].

Sustainable agricultural methods consist of different microhabitats with excellent environmental fluctuations and genetic biodiversity [78]. Reports from agrarian soils confirmed that there are high microbial stock and plant development promotion pursuits [79]. Many studies are showing the latest metagenomic improvements in agriculture [76, 77, 78]. Soil microbes play a crucial part for triggering the plant development, stress reactions, as well as defense in vegetation [80, 81]. Understanding the connection between the soil microbiota and plants using soil metagenomics is hugely advantageous in developing the crop systems [1, 82]. Metagenomics research of the soils supplemented with organic manures from several farm animals will be a lot valuable in formulating the fertilization tactics [12, 65]. For renewable agricultural production, helpful microbes of agricultural value can function as an essential alternative [39, 73]. Metagenomics compensation can address basic restorative questions associated with agriculturally significant microorganisms [83].

Direct DNA extraction and characterization through PCR and metagenomic survey have developed the study of soil ecosystem [81]. Applying metagenomics of plant-microbial association can be used to study interaction with beneficial microbes among pathogenic strains, infect recently, profitable endosymbionts inside these beneficial microbes (AMF) like nitrifying bacteria, (phosphate solubilizing bacteria) PSB and plant growth-promoting abilities are found [84]. The specificity of plant-microbial symbiosis development can be easily understood at the molecular level both for agronomic and horticulture crops, using forward and reverse genetic approaches [85]. As reported earlier, microbial inoculation has the potential of increasing plant production and sustainability in agricultural fields, so the metagenomics study can reveal the distinct microbial strains interacting with which chemical compound in the mycorrhizospheric soil and to acknowledge community structure, horizontal gene transfer analysis and phylogeny of microbes interacting with other environmental factors [86, 87]. Also, the exact niche information of microbial communities infecting soil adhering to the roots, surface between roots and soil, the surface of roots, or colonizing with the roots, can be drawn [87]. As we know this interaction is bi-directional, plant gets the essential nutrients from the soil, and in return, these rhizospheric and/or root microbiotas get the photosynthesis-derived organic compounds, and this process is known as rhizodeposition [88]. Thus, this crucial symbiosis that underlying plant-microbe community associations can be easily implicit by metagenomics for agricultural purposes because the NGS which can determine the relative abundance of microbe whether it is culturable in laboratory conditions or not [89]. In addition, the results of hundreds of samples simultaneously can be obtained on the same day as the samples are loaded [89].

5. Amplicon sequencing and bioprospecting of metagenomes

In the amplicon sequencing study, first, soil DNA is extracted, and next, 16S/18S rDNA sequences are amplified using a specific set of particular primers targeting variable areas of 16S/18S rDNA, accompanied by filtration of fragments using magnetic beads [54, 90, 91, 92]. Consequently, adapters are ligated, and the library of fragments (clones) is amplified along with the samples are sequenced utilizing NGS platform (Figure 1). The dataset obtained after sequencing is used for the identification of microbial diversity [54, 55]. Using NGS and the related software, it is doable to solve extremely complicated microbiota compositions with greater precision and to relate the microbial ecosystem of the soil [16, 55, 92], although, it must be considered for accurate data and analysis interpretation while choosing amplicon sequencing working with marker genes [92].

Figure 1.

Metagenomic analysis of environmental microbial sampling based on nucleic acids.

From the start of metagenomics, the study of novel metabolite/biomolecule (DNA polymerases, cellulases, lipases/esterases, chitinases and antibiotics) from the microbial assembly was its first application, and this has advanced with the development of NGS techniques for calculating comparison between community metagenome, meta-proteomics, and meta-transcriptomics [93, 94]. Techniques for recovering novel metabolite that comprise cloning of the microbial DNA from the environment then constructing a small/large-insert libraries, which can be done either by function-based or sequence-based screening of metagenomic libraries are shown in Table 1. The resulting metagenomic libraries subsequently transformed in several hosts like Escherichia coli (mostly), Sulfolobus solfataricus, Thermus thermophilus, Streptomyces, and Proteobacteria show significant differences in expression modes [95, 96, 97, 98].

OriginSequencing platform/amplicon analysis techniqueTotal sequencing sizeCountryResultsReferences
Potato fieldPyrosequencing1674 OTUsUSAIdentification of potato soil- borne pathogens[99]
Soil of 3 islands in the Yellow SeaPyrosequencing10,166 readsSouth KoreaWood decomposing, plant-parasitic, endophytic, ectomycorrhizal and saprotrophic fungi[100]
6 sites of forest and grassland soilsPyrosequencing598,962 sequencesGermanyIdentification of 17 bacterial phyla and 4 proteobacterial classes[101]
Pea fieldPyrosequencing55,460 sequencesDenmarkFungal species, diversity, community composition of phylum
Ascomycota, and Basidiomycota
Hitchiti Pinus forest, prior used as cotton cultivationDeep Ion Torrent sequencing>3,000,000 sequencesUSA12 fungal strains identification[103]
Sossego copper minePyrosequencing10,978 OTUsBrazil36 bacterial phyla and five proteobacteria classes[104]
Riverine Wetland soilIllumina system1872 OTUsUSA56 different bacterial phyla[105]
Solid biomedical dumpsitesIllumina system1,706,442 sequencesTanzania31 bacterial phyla belonging to aromatic
hydrocarbons degraders, chitin degraders,
chlorophenol degraders and atrazine metabolizers
Grave-soil human cadaversIllumina system1,729,482 readsUSA45 decomposing microbes identification[107]
Zea mays fieldsIllumina MiSeq2,453,023 readsUK, France, ItalyComparative account of soil microorganisms of three different sites[108]
Pepper fieldIllumina system4147 OTUsSpainStudying soil-borne pathogens[109]
Solid waste dumping site, Chite river site, Turial river site, Tuikual river siteIllumina system111,3884 sequencesIndiaIdentified 27 proteobacteria and bacteroidetes[110]
TomatoIllumina amplicon sequencing analysis and phytohormone measurements337,961 high-quality reads and 647 fungal OTUsDenmarkIdentification of 27 endophytic fungi and root hormone quantification[111]

Table 1.

Examples of soil amplicon sequencing done so far covering different habitat types.

6. Soil metagenomics: pipelines and outputs

Metagenomics was initiated with the aim of DNA cloning and screening, and now it has made significant advances in microbiology, evolution, and ecology [91, 92]. These first projects not merely proved the concept of the metagenomics but additionally unraveled enormous gene diversity within the microbial world. The various steps in soil metagenomics are enlisted below and shown in Figure 2.

Figure 2.

The layout of metagenomics showing collection of samples from agricultural field and analysis.

6.1 Sample processing

Sampling is the first and crucial step. The extracted DNA must be of high quality for metagenomic library construction and sequencing. Further, fractionation or selective lysis is ideal for those communities which are linked to the plethora. Fractionation should be examined for adequate target enrichment with little contamination.

6.2 Sequencing and assembly

Metagenomic sequencing significantly depends upon the sequencing platforms used. Nowadays, NGS techniques viz. Illumina/Solexa systems, 454/Roche sequencing, and Oxford nanopore sequencing technologies are being continuously used for metagenomic projects. Contigs are essential in getting the whole length sequence. So, assembly of short reads becomes key in metagenomics which may be accomplished by co-assembly and de novo assembly methods. On the flip side, the de novo assembly needs sophisticated computational tools and assemblers (e.g. MetaVelvet, and Meta-IDBA).

6.3 Binning and annotation

Binning shows the process of sorting of DNA in several groups of individual genomes.

In the very first step, binning explores the conserved nucleotide composition of genomes. Then, the DNA fragments are searched against a reference to bin the sequence. The binning algorithms use structure and similarity, like MetaCluster and PhymmBL. If the goal of the analysis of the reconstructed genome and large contigs, in this particular strategy, little length of contigs should be 30,000 bp or even longer. In future prediction of the assembled sequences, labelling is done while functional annotation includes mapping with an existing database. The sequences which cannot be mapped provide an endless amount of novelty in metagenomic samples. Several reference databases can be utilized to supply functional annotation viz. TIGRFAM, KEGG, eggNOG, PFAM, etc.

6.4 Statistical analysis and data sharing

Statistical assessment of the metagenomic data is vital for the exploration of the significance of the results. However, it must have appropriate experimental designs with proper replications. Metagenomic data sharing involves a great computational framework as well as a storage facility. Several of the centralized services have typical formats for recording and documenting experimental details.

7. Future road map

Robust extraction, as well as characterization of the DNA of soil microbiota through amplicon sequencing, has revolutionized the studies of ecology and environmental sciences. Essentially, the metagenomic evaluation of nucleic acids gives immediate access to the genomes of the uncultivated majority of underexploited microbial life. Accelerated by developments in sequencing technologies, microbiologists have found more novel species, genera, as well as genes from microorganisms. The unprecedented range of soil types continued exploration of a variety of agricultural and environmental features. The capacity to check out earth microbial communities with increasing capability has presumably the highest promise for answering numerous mysteries of the microbial world. Molecular methods, which include metagenomics, have revolutionized the analysis of microbial ecology. We cannot link virtually all microorganisms to their metabolic roles within an earth community. Increased sequencing capability provided by high throughput sequencing technologies has assisted characterize as well as quantify soil diversity. However, these methodologies are usually leveraged to process more samples at a reasonably shallow depth as compared to survey throughout the genomes from a single sample adequately. Figure 3 describes the various application of metagenomics.

Figure 3.

A brief account of applications of metagenomics in different fields.

Along with higher diversity, methodological biases produce a considerable challenge for soil microbial characterization. These biases include soil sampling, DNA extraction, adsorption of nucleic acids to soil particles, contributions of extracellular DNA, sample planning, sequencing protocols, sequence analysis, and purposeful annotation. Since current sequencing technologies produce millions of reads, difficulties linked to interpreting these results can contribute to the problems encountered by microbial ecologists in determining the involvement of various microorganisms in the number of processes of soil. Without having a suitable benchmark methodology or dataset for verifying the fidelity of amplicon or perhaps metagenomic analyses, assessing whether the presence, as well as the activity of organisms, are adequately evaluated, is impossible. Furthermore, methodological limitations which might stop the detection of some active and abundant bacteria in soil could lead to the same essential amount of misinterpretation. No individual protocol would be seen as adequate in isolation of DNA. Likewise, the taxonomic and likely useful deciphering of the soil microbiota would critically gain from a blend of strategies.

Exact replicates are challenging to obtain due to soil microorganism compositional changes. An additional challenge would be that the total number of species that are in a single sample of soil is unfamiliar, with hugely varying estimates. One crucial very first step toward dealing with several of the problems experienced by soil microbiologists is actually to start developing a substantial catalog of all microbial community members and features for no less than one reference soil. Such a relatively comprehensive reference dataset would shed light on the as-yet-unknown design of a ground microbial species frequency distribution and might serve as an ultimate guide for assessing town composition switches across soil landscapes (i.e., beta diversity). Put simply, the scope of bias with any private strategy (i.e., a one-time DNA extraction method) might be explicitly driven by comparing extraction strategies coupled with detailed characterization of the selected reference soil. For instance, the isolation, as well as characterization of cells via single-cell genomics, can assist target phylogenetically analysis. Coupled with extensive DNA based characterization of the collected guide soil microbial diversity, this specific research initiative should ideally assess several levels of gene expression, at the amount of RNA (metatranscriptomics), proteins (metaproteomics), and also metabolites (metametabolomics). By identifying the way a reference soil is structured, both temporally and spatially, the info from this coordinated effort might help supply missing links between typical soil analyses as well as the underlying composition of soil microbial communities.

An in-depth exploration of single guide soil should involve experiments much beyond the typical metagenomic analyses applied to soil samples. Instead, this effort is going to require considerable benchmarking of the sampling technique itself, which is connected to identifying a suitable resource website. Such an endeavor would call for a coordinated inter-disciplinary consortium of knowledge spanning chemistry, soil physics, biochemistry, microbiology, and bioinformatics. The outcomes of the effort can develop an objective foundation for creating standardized protocols for ongoing and future soil microbiological investigations.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Prashant Kaushik, Opinder Singh Sandhu, Navjot Singh Brar, Vivek Kumar, Gurdeep Singh Malhi, Hari Kesh and Ishan Saini (September 4th 2020). Soil Metagenomics: Prospects and Challenges [Online First], IntechOpen, DOI: 10.5772/intechopen.93306. Available from:

chapter statistics

40total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us