Approximate number of identified natural metabolites.
Screening for microbial secondary metabolites (SMs) has attracted the attention of the scientific community since 1940s. In fact, since the discovery of penicillin, intensive researches have been conducted worldwide in order to detect and identify novel microbial secondary metabolites. As a result, the discovery of novel SMs has been decreased significantly by using traditional experiments. Therefore, searching for new techniques to discover novel SMs was one of the most priority objectives. However, the development and advances of omics-based techniques such as metabolomics and genomics have revealed the potential of discovering novel SMs which were coded in the microorganisms’ DNA but not expressed in the lab media or might be produced in undetectable amount by detecting the biosynthesis gene clusters (BGCs) that are associated with the biosynthesis of secondary metabolites. Nowadays, the development and integration of gene editing tools such as CRISPR-Cas9 in metabolomics provide a successful platform for the identification and detection of known and novel SMs and also to increase the production of SMs.
- secondary metabolites
- secondary metabolites identification
- production of secondary metabolites
- gene editing
The term secondary metabolites (SMs) was first mentioned in 1891 by A. Kossel. Microbial secondary metabolites have attracted the scientific world’s attention, since the discovery of penicillin in 1940s. After that, the identification and characterization of SMs have reached the highest level between 1940s and 1960s, and this period is called as “the golden era of SMs discovery” [1, 2]. A lot of compounds have been characterized and reported during the golden era and are still utilized till now. Unfortunately, the discovery of approved novel chemical scaffolds of secondary metabolites has significantly decreased after the golden era . The possible explanation of the decreasing in the SMs’ identification might be due to the following: (1) using the biosynthetic modules that are used for SMs’ production by many bacteria, (2) focusing on some specific group of microorganisms such as actinobacteria, resulting in the isolation and identification of known compounds, and (3) almost 1% of the microbial community can be cultured in the laboratory  due to the difficulty in identifying their optimal medium compositions, resulting in the majority of SMs not being identified.
Microbial secondary metabolites play significant roles in our life. Because of SMs’ unusual chemical structures, the microbial secondary metabolites show a variety of biological activities such as antimicrobial agents, antitumor agents, enzyme inhibitor, immunosuppressive agents, antiparasitic agents, herbicides, anthelmintic and food industry, etc. For instance, one of the huge successes in human medicine is the discovery of immunosuppression, such as cyclosporine A, which plays a significant role in establishing the organ transplant field.
All biochemical reactions carried out by organisms are called as metabolism and all products resulting from metabolism are called as metabolites. As a result of metabolism reactions, organisms produce primary and secondary metabolites. Primary metabolites are found in all living cells that are able to divide, while secondary metabolites are present only incidentally and do not affect the organism’s life immediately. Nowadays, over 2,140,000 secondary metabolites (SMs) have been identified based on their vast diversity in function, structure, and biosynthesis ( Table 1 ) . The major sources of SMs are plants (about 80%) and microorganisms. Among microorganisms, bacteria, especially actinobacteria, and fungi have been reported to produce the majority of SMs that have been identified till now .
|Source||All known compounds||Bioactive|
Microbial secondary metabolites (SMs) such as antibiotics, alkaloids, toxins, pigments, growth hormones, antitumor agents, and others are low molecular mass products that are produced by microorganisms, usually during the late growth phase. In fact, microbial secondary metabolites are not essential for the growth and development of microorganisms that produce them but are associated with some other functions such as competition, interactions, defense, and others [3, 5].
The development and advances of omics-based techniques such as genomics, metabolomics, proteomics, and trascriptomics have revealed that microorganisms have the potential to produce more secondary metabolites than were originally expected [6, 38]. These products are often coded by clustered genes present on the chromosomal DNA and rarely on plasmid DNA. In fact, most of these new SMs have been predicted only by using bioinformatics analysis, which analyzes the putative SMs gene clusters in a sequenced genome. This is because, all of the new revealed SMs are not produced naturally under the lab conditions, or even though they are produced, this in very low amount that the traditional detection techniques are unable to detect them [7, 8]. Metabolomics approach aims to discover and characterize secondary metabolites in natural or engineered biosystems, and it can measure as many low molecular weight compounds as possible. Metabolomics-based technologies such as mass spectrometry (MS) and nuclear magnetic resonance (NMR) have been identified as significant analytical methods to detect SMs produced under specific conditions . The present chapter provides an overview of present-day metabolomic and genetic engineering approaches for secondary metabolites’ enhancement and identification .
2. Genomic for screening and enhancement of SMs
2.1 Gene editing for metabolites discovery
Biosynthesis gene clusters (BGCs) are the genes associated with the biosynthesis of secondary metabolites. These BGCs include all genetic information necessary for SMs’ biosynthesis, assembly, modification, and regulation of their export and transport . Microorganisms’ genome contains variety of cryptic or silent genes that are responsible for the production of secondary metabolites but are not expressed under laboratory conditions. It has been reported that most BGCs remain silent and cannot be fully expressed under standard laboratory conditions. These silent BGCs are potentially significant in the discovery of novel SMs [10, 11, 12, 43].
Due to the development of genomic and bioinformatic field, we are able to access extensive sequencing data and genetic information and enable genome mining of relevant BGCs with the potential for valuable SM production . Therefore, biosynthetic biology and genetic engineering tools are now utilized for identification of novel BGCs. In fact, genetic engineering is now widely used and moving beyond traditional tools, which has opened a new era in the detection of novel secondary metabolites . Genetic engineering for the production of SMs can be carried out in heterologous as well as homologous host. In fact, gene manipulation in heterologous host enables the activation of biosynthesis gene clusters (BGCs) obtained from unculturable organisms, whereas gene manipulation in homologous host allows the retention of all natural factors essential for the production of secondary metabolites . While there is no single approach that will work for all genes of interest, a variety of techniques have been developed to induce the expression of these genes.
In fact, several genome techniques have emerged and are utilized in the metabolomic production field, including transcriptional activator-like effector nucleases (TALENs), zinc finger nucleases (ZFNs), and clustered regulatory interspaced short palindromic repeat (CRISPR-Cas9) [45, 46]. Each genome engineering technology has its own advantages and disadvantages ( Table 2 ). For instance, ZFNs and TALENs have been successfully utilized in various microbes but still have limitation which includes the difficulty to engineer them . Recently, CRISPR-Cas9 has been reported to be a significant and promising genome editing technology in the discovery and production of SMs [14, 16, 23].
|CRISPR/Cas9||Zinc finger nucleases (ZFNs)||Transcription factor-like effector nucleases (TALENs)|
|1||Protein engineering steps||It does not require protein engineering steps, very simple to test multiple gRNA||It requires complex steps to test gRNA||TALENs need protein engineering steps to test gRNA|
|2||Mode of action||It works by including double-strand breaks in target DNA or single-strand DNA nicks (Cas9 nickase)||It can induce double-strand breaks in target DNA||Induces DSBs in target DNA|
|4||Structural proteins||CRISPR consists of single monomeric protein and chimeric RNA||ZFNs work as dimeric and only protein component is required||TALENs also work as dimeric and require protein component|
|5||Mutation rate||Low mutation rate has been observed||High mutation rate has been observed in plants||Mutation rate is high as compared to CRISPR|
|6||Components||crRNA, Cas9 proteins||Zn-finger domains, nonspecific FOKI nuclease domain||Zn-finger domains, nonspecific folk nuclease domain|
|7||Length of target sequence (bp)||20–22||18–24||24–59|
|8||Target recognition efficiency||High||High||High|
|9||Level of experiment||Easy and very fast procedure||Complicated procedure and need for expertise in protein engineering||Relatively easy procedure|
|10||Methylated DNA cleavage||It can cleave methylated DNA in human cells. This aspect is of special concern for plants as this has not been much explored||Unable to do so||There are many question marks upon the capacity of TALENs to perform methylated DNA cleavage|
|11||Multiplexing||This is the main advantage of CRISPR, and several genes can be edited at same time. Only Cas9 is needed||Highly difficult to achieve this through ZFNs||Very difficult to obtain multiplexed genes by means of TALENs. Because it needs separate dimeric proteins specific for each target|
2.1.1 Gene cloning
Direct cloning of the entire BGCs into the heterologous host is the most general and widely used approach for the activation of silent BGCs. Nowadays, many new cloning tools have been introduced, including Cas9-assisted targeting of chromosome segments (CATCH), transformation-assisted recombination (TAR), and TAR-CRISPR [15, 17, 18]. Basically, gene cloning steps include: determining the suitable heterologous host, cloning of the target BGC, transfer of the BGC into the chosen host, expression in chosen host system, and optimization of production.
Cas9-assisted targeting of chromosome segments (CATCH) is a cloning technique that utilizes the CRISPR-Cas9 gene editing system for direct BGCs cloning into the host. Comparing with traditional cloning tools such as PCR and restricted enzymes, CATCH is predicted to become a useful molecular tool for direct cloning of large gene clusters. Transformation-assisted recombination (TAR) has been used for cloning of large BGCs for about decades. However, the TAR approach is associated with a low cloning efficiency, which means it requires screening of hundreds of colonies to detect few positive clones [15, 18]. To address this challenge, TAR and CRISPR-Cas9 have been coupled resulting in a new approach called TAR-CRISPR . By coupling TAR with CRISPR, a significant increase of the clone efficiency has been reported. Comparing with traditional TAR cloning, the advantages of TAR-CRISPR are that the positive clones could be achieved with secondary screening and lesser manpower and also it does not require a high experience of working with yeast . In fact, the TAR-CRISTAR cloning will allow for the development of BGC cloning and SM production in the future.
2.1.2 Gene refactoring
Gene refactoring or replacement is useful not only in BGCs’ activation but also for novel SMs’ discovery. In fact, several silent BGCs have been refactored by replacing the BGC promoter to yield natural products such as secondary metabolites [19, 20, 21, 22].
Another new tool in gene refactoring is multiplexed CRISPR-Cas9- and transformation-associated recombination (TAR)-mediated promoter engineering method (mCRISTAR) . This new tool combines the advantages of the CRISPR-Cas9 system and TAR. It is different than the TAR-CRISPR that was discussed earlier. Comparing with TAR-CRISPR, which is a yeast-based method, basically mCRISTAR uses CRISPR-Cas9 to break the double-stranded in the promoter region of the BGC, and the fragments produced are reassembled by TAR with synthetic gene-cluster-specific promoter cassettes. Another gene refactoring tool that has aided in the faster cloning and refactoring of BGCs is the direct pathway cloning (DiPaC). Direct pathway cloning (DiPaC) depends on PCR amplification and in vitro DNA assembly for biosynthesis gene cluster capture and their expression. DiPaC was recently employed for the capture of biosynthesis gene cluster, which is small in size, followed by their activation and expression of novel natural products . DiPaC was also able to successfully clone mid and large size of BGC .
2.1.3 Gene insertion or deletion
A large number of researches have documented the effect of gene knockout/in on BGC expression or levels of SM production. However, conventional methods of gene editing are time-intensive, while CRISPR-Cas9-based approach allows for much faster and efficient gene editing . The emergence of CRISPR-Cas9 has opened up a new era in gene editing opportunities . Recently, CRISPR gene editing approach has been used to insert promoter in order to activate microorganisms’ SMs’ production .
Nowadays, CRISPR-Cas9 is used to introduce promoter at multiple BGCs, and at the same time, resulting in the activation of BGCs followed by the production of SMs . Multiplexed site-specific genome engineering (MSGE) was also used for multiple BGCs’ editing . MSGE has led to a significant increase in the secondary metabolites’ production.
While, gene editing approaches provide a significant platform to manipulate the genetic machinery of microbes toward the production of novel, natural secondary metabolites, the identification of secondary metabolites is also equally important. Metabolomics plays a significant role in the identification and characterization of secondary metabolites produced by native or genetically modified microorganisms.
2.2 Identification and characterization of secondary metabolites
Unlike all omics techniques, metabolomics often requires a broad array of instrumentation such as coulometric array detectors for detecting redox compounds, fluorescent spectrometers for detecting aromatic compounds, and ELSD for detecting lipids, whereas genomics, proteomics, or transcriptomics measurements are often conducted by a single instrument.
In general, microbial secondary metabolites’ investigation is mainly conducted in two different approaches, the targeted and untargeted metabolites’ identification . Targeted metabolites’ experiments aim to detect a specific group of compounds (about 20 compounds) that are already identified. Whereas, untargeted secondary metabolites’ investigation aims to detect and identify a large scale of metabolites that are produced by microorganisms, including known and novel metabolites .
Over the past decade, two general technologies have emerged as the primary tools in metabolomics, the nuclear magnetic resonance (NMR) spectroscopy and mass spectrometry (MS) . Some of the common MS-based analyses are (GC-MS), (CE-MS), and (LC-MS) [32, 42]. These high-throughput tools provide a broad coverage of many classes of secondary metabolites, including amino acids, lipids, sugars, organic acids, and others.
2.2.1 Detection of secondary metabolites
Mass spectrometry (MS) is a technique that measures the mass-to-charge ratio of molecule. The principle of chromatography is to detect the retention time of the constituents that travel at different speeds under a specific condition. Therefore, various constituents take different time to pass from the inlet to the detector of the chromatography system .
Nuclear magnetic resonance (NMR) spectroscopy principle is based on using the magnetic properties of atomic nuclei to determine the chemical and physical properties of atoms or molecules in which they are contained. NMR’s mechanism of action is that the magnetic nuclei in magnetic field absorb, resulting in reemitting of electromagnetic radiation at a specific resonance frequency depending on the magnetic properties of the isotope of atom as well as the strength of the magnetic field .
Both MS and NMR can be utilized to identify targeted and untargeted metabolomics. In fact, MS and NMR are often complementary techniques to each other. While NMR can be used to differentiate between structural isomers, MS provides information on the formula of the molecule . Comparing to NMR, mass spectrometry is more sensitive and is able to detect a large scale of metabolites. On the other hand, nuclear magnetic resonance (NMR) spectroscopy is highly quantitative and reproducible. Unlike MS, NMR requires a larger sample amount for analysis [34, 35].
2.2.2 Data analysis
In fact, the complexity and huge amount of information that are obtained from either NMR spectroscopy or MS are considered to be one of the major challenges in metabolomics experiments . The extraction of the important information that is generated by MS or NMR spectroscopy depends on using computer software in order to organize the vast amount of data . First, the row data acquired from the NMR spectroscopy or MS must be first converted into computer formats compatible with software packages. In fact, the goal of metabolomics data analysis is to compare and identify the differences between hundreds or thousands of SMs. It is unpractical to visualize changes between groups of metabolites by analyzing metabolites individually; therefore, univariate and multivariate statistical techniques can then be used to interpret the data. One of the most widely used statistical methods is the principal component analysis (PCA) [33, 36, 37]. By using PCA, the data can be simplified without losing their main features. Generally, the PCA principle is based on reducing the dimensionality of the data set, while keeping characteristics participating most to the variance. In fact, PCA provides information on multivariate differences among metabolites. It is usually conducted at the early stages of data analysis.
However, different univariant statistical tests can be used to analyze isolated metabolites such as ANOVA, nonparametric Wilcoxon signed-rank test, Kruskal-Wallis test, and the parametric Student’s t-test . Furthermore, other univariant analysis can be used to validate the analysis such as false discovery rate calculations or Bonferroni correction .
2.2.3 Metabolites’ identification
Due to the development of various bioinformatics software, most of metabolites can be identified. Two types of metabolites’ identification are applied, including (a) putative identification and (b) definitive identification . In putative identification, one or two molecular properties are utilized for identification. However, in definitive identification, two properties such as the retention time and accurate mass and/or fragmentation mass spectrum and/or NMR spectrum are used and compared with authentic chemical standard. Comparing to putative identification, definitive identification is a more accurate form of identification, while definitive identification uses the authentic chemical standard. Usually, the definitive identification is performed after the putative identification.
Some are spectral-based databases as well as chemical structure-based databases for metabolites’ identification. Generally, spectra generated during analysis are compared with reference compounds in databases, and then similarity is assigned to each other. Even though, metabolome databases are updated daily, still significant numbers of secondary metabolites in biological system are unidentified.
Some of the common databases used in nuclear magnetic resonance (NMR) spectroscopy are METLIN (
Microorganisms are a rich source of secondary metabolites which have significant pharmaceutical, biomedical, and food applications. Nowadays, the development and integration of gene editing tools, especially CRISPR-Cas9 (gene cloning, gene refactoring, and gene insertion or deletion) in metabolomics, provide a successful platform for the identification and detection of known and novel SMs and also to increase the production of SMs. However, there are still some challenges associated with the application of metabolomics and gene editing, including that complete identification of novel SMs requires a combination of different methods which also result in increase in the screening cost. Thus, a comprehensive and sensitive technique is the need of hour, which has the ability to provide comprehensive information of any SMs under any conditions. Also, the off-target effect of CRISPR-Cas9 is a significant problem. However, the integration of metabolomics and CRISPR-Cas9-based gene editing tools may improve the efficiency of microbial secondary metabolites’ discovery.