Metagenomics can be defined as the techniques and procedures that are used for the culture-independent analysis of the total genomic content of microorganisms living in a certain environment . It has many useful applications with very promising potential in both medical and environmental microbiology. The most common use of metagenomics in environmental microbiology is studying the diversity of microbial communities in particular environments through the analysis of rRNA genes and how these communities change in response to changes in physical and chemical properties of these environments .
Metagenomics also provides an opportunity to obtain and identify novel enzymes with industrial applications from extreme environments where unculturable extremophiles live. In such circumstances, functional metagenomics enables the isolation of genes coding for extremozymes, enzymes that are capable of being catalytically active in extreme conditions, or genes that will allow for better understanding of the mechanisms that make such organisms resistant to extreme environmental conditions .
Metagenomics has special importance when it comes to studying soil microbiology. It is estimated that the number of distinct microorganisms in 1 gram of soil exceeds the number of microbial species cultured so far . Therefore, metagenomics seems to be the ideal culture-independent technique for unraveling the biodiversity of soil microorganisms and to study how this biodiversity is affected with continuously changing conditions.
2. Sequencing technologies and metagenomics
Recently, taxonomic profiling, characterization, and analysis of microbial communities are being mostly performed using different next-generation sequencing (NGS) platforms. Metagenomic samples are high-throughput, short-read sequences, and the cost is relatively decreasing. In addition, these platforms are advantageous, avoiding the need for cloning of DNA fragments .
Recent advances in NGS technologies were developed to suit various numbers of applications, cost, and capabilities . The most commonly used platforms are the 454 Life Sciences (Roche) and Illumina systems (Solexa) . The 454 sequencing technology, which was the first commercially available next-generation technology, is based on the pyrosequencing technique. It provides high throughput and relatively cheap analysis . During the sequencing reaction in this technique, nucleotide incorporation into the growing chain is detected by the capture of the released pyrophosphate, which is converted into a light through an enzymatic reaction. Different nucleotides are sequentially added into each nucleotide incorporation event; therefore the light signal can be attributed to a specific nucleotide. Finally, the light signals are converted into sequence information. In the 454 pyrosequencer, the DNA fragments are amplified after being fixed on beads in a water-oil emulsion . Pyrosequencing has been employed widely in the analysis of microbial diversity in many environments including marine environments  and different soil environments [11, 12].
Illumina sequencing technology relies on the use of fluorescently labeled reversible terminator nucleotides. Instead of being chemically modified to prevent further DNA synthesis (dideoxynucleotides) which is the case with Sanger sequencing, the terminator nucleotides are attached with blocking group that can be removed from the nitrogen base in a single step. DNA synthesis takes place on a chip where primers are attached. After each cycle, the dyes attached to each nucleotide are excited by laser followed by scanning of the incorporated bases. In order for the next synthesis cycle to proceed, the blocking group and the dye are first removed by a chemical reaction. Illumina sequencing platform was successfully used to study microbial diversity in many environments [13, 14, 15].
In addition to the abovementioned technologies, recently developed sequencing technologies are available and being employed in metagenomic studies. These include SOLiD 5500 W Series developed by Applied Biosystems, single-molecule real-time (SMRT) DNA sequencing from Pacific Biosciences, and Ion Torrent semiconductor sequencing . More innovative technologies are being developed that could be of great use for metagenomic studies in the near future. Strand sequencing technologies, currently being developed by Oxford Nanopore technologies, enable the sequencing of intact DNA strand that passes through a protein nanopore . Irys Technology, developed by BioNano Genomics, represents one of the very promising new technologies in genomics era .
3. The metagenomic approaches
Metagenomics research strategy starts with selecting a proper ecological or biological environment of interest that hosts a wide variety of microbial communities which may have potential biotechnological applications. Environments that attract metagenomic researchers are mainly those characterized with extreme conditions or unique environmental conditions. These include environments with highly acidic or alkaline pH; high metal concentrations, pressures, or radiation; and high salinity or extreme temperatures .
Metagenomic analysis starts with isolating genomic DNA that represents the whole community in the soil sample, constructing a DNA library from the isolated DNA, and screening the available library for a target gene. It is important here to select the DNA extraction method that will provide enough yield and DNA that represents the diversity of the whole microbial community in the target environment. This is still one of the most challenging steps of metagenomic analysis. The chemical and physical characteristics of soils are very wide and complex, depending on the type of the soil examined, that will make it difficult to develop a reference method for DNA extraction from soils. Besides, soils contain many substances that are co-extracted with genomic DNA and harbor inhibitory effects on the downstream processing of the extracted DNA. Examples include humic and fulvic acids . Therefore, optimization and comparison between different extraction methods are usually required for each type of soil [18, 19, 20, 21].
A DNA library is then constructed from the genomic DNA isolated from the target environment. This is performed by fragmenting the isolated DNA into fragments with appropriate sizes that would allow for their cloning. This is performed by either using restriction enzyme digestion or mechanical shearing. DNA fragments obtained from such processes are cloned into the proper cloning vector. Plasmid vectors are used for small DNA fragments, and the libraries generated are called small-insert genomic libraries. Large inserts are cloning into cosmid or fosmid vectors which can hold inserts up to 40 kb in size or BAC vector which can carry inserts with sizes that exceed 40 Kb .
DNA libraries are usually constructed in a microorganism that is well-studied and is easy to manipulate inside the laboratory such as Escherichia coli. In case there is a need for expressing the genes within the DNA inserts in other microorganisms, shuttle vectors are used to transfer the libraries into a proper host .
Finally, a screening assay is applied to search for a gene of a particular function, and the gene product is functionally analyzed. There are two different metagenomic strategies that are commonly used in research. The first one is focused on the use of marker genes such as the ribosomal genes 16S rRNA  and 18S rRNA  to study the composition of the microbial community in a certain environment or specific protein-coding gene with medical or industrial importance [26, 27, 28]. Such a strategy is called targeted metagenomics. The second approach is the shotgun metagenomics, in which a wide coverage of genomic DNA sequences is achieved using high-throughput next-generation sequencing to assess the entire taxonomic structure or functional potential of microbial communities .
The most challenging aspect of the screening process in metagenomics is the analysis of a huge amount of sequence data that are generated from the constructed library. A wide range of bioinformatic tools has been developed over the years to help analyze the metagenomic data and compare it to available online databases.
This work was funded by the Deanship of Scientific Research at Princess Nourah bint Abdulrahman University, through the Research Groups Program Grant no. (RGP-1438-0006).