Open access peer-reviewed chapter

Metabolomics and Genetic Engineering for Secondary Metabolites Discovery

Written By

Ahmed M. Shuikan, Wael N. Hozzein, Rakan M. Alshuwaykan and Ibrahim A. Arif

Submitted: 22 November 2021 Reviewed: 24 January 2022 Published: 31 March 2022

DOI: 10.5772/intechopen.102838

From the Edited Volume

Secondary Metabolites - Trends and Reviews

Edited by Ramasamy Vijayakumar and Suresh Selvapuram Sudalaimuthu Raja

Chapter metrics overview

163 Chapter Downloads

View Full Metrics


Since 1940s, microbial secondary metabolites (SMs) have attracted the attention of the scientific community. As a result, intensive researches have been conducted in order to discover and identify novel microbial secondary metabolites. Since, the discovery of novel secondary metabolites has been decreasing significantly due to many factors such as 1) unculturable microbes 2) traditional detection techniques 3) not all SMs expressed in the lab. As a result, searching for new techniques which can overcome the previous challenges was one of the most priority objectives. Therefore, the development of omics-based techniques such as genomics and metabolomic have revealed the potential of discovering novel SMs which were coded in the microorganisms’ DNA but not expressed in the lab or might be produced in undetectable amount by detecting the biosynthesis gene clusters (BGCs) that are associated with the biosynthesis of secondary metabolites. Nowadays, the integration of metabolomics and gene editing techniques such as CRISPR-Cas9 provide a successful platform for the detection and identification of known and unknown secondary metabolites also to increase secondary metabolites production.


  • metabolomics
  • genetic engineering
  • secondary metabolites identification
  • genomic
  • CRISPR-Cas9
  • production of secondary metabolites
  • microorganisms
  • gene editing

1. Introduction

Since the discovery of penicillin in the 1940s, microbial secondary metabolites (SMs) have attracted the attention of scientists all over the world. In fact, penicillin discovery has been shown to be a promising solution for many kinds of infections. As a result, the scientific world starts to search for other products that are produced by microbes that can be utilized for treating a different disease or can be useful for any aspect of our life. Therefore, the period between the 1940s – 1960s The “golden period of SM discovery” [1, 2] is referred to as “the golden era of SM discovery.” During the golden era, several SMs were discovered, characterized, and reported, and they are still used today. Unfortunately, after the golden era, the development of authorized novel chemical scaffolds of secondary metabolites has declined dramatically [1] the decrease in the microbial secondary metabolites detection and identification could be due to 1) almost 99% of the microbial community unculturable [2], due to the difficulty to identify their optimal medium compositions, which means that the majority of SMs are definitely unidentified, 2) the scientists have been focused on specific groups of microorganisms such as Actinobacteria which resulting to the identification of known compounds and do not develop a new methodology for screening in other microorganisms.

All biochemical reactions carried out by organisms is called metabolism and all products resulting from metabolism is called metabolites. In fact, there are two kinds of metabolites resulting from the biochemical reactions that are called primary and secondary metabolites. The difference between primary and secondary is that primary metabolites are found in all living cells able to divide while secondary metabolites are present only incidentally and are not affect the organism’s life immediately. Microbial SMs is low molecular mass products with an unusual chemical structure that are produced by microorganisms usually during the late growth phase and are not essential for the growth and development of the microbe but are associated with some other functions such as competition, interactions, defense, and others [3, 4]. In fact, SMs have shown a variety of biological activities that can be utilized in different aspects such as antitumor agents, immunosuppressive agents, antimicrobial agents, antiparasitic agents, anthelmintic, and food industry etc. An example for the importance of SMs in our life is the discovery of immunosuppression such as cyclosporine A, which plays a significant role in establishing the organ transplant field.

Nowadays, Over 2 million SMs have been found based on their vast diversity in structure, function, and biosynthesis (Table 1). Plants (about 80%) and microbes (approximately 20%) are the primary sources of secondary metabolites discovered [3]. Actinobacteria and fungi have been found to create the bulk of SMs discovered to date [5]. Nowadays, omics-based techniques such as genomics, metabolomics, proteomics, and transcriptomics have overcome the problem of identification of unculturable microbes and have revealed that microorganisms have the potential to produce more secondary metabolites than were originally expected [6, 7]. By conducting omics techniques scientists were able to detect SMs that are coded by clustered genes present on chromosomal DNA directly without doing microbial culturing.

SourceAll known compoundsBioactive
Plant kingdom600,000–700,000150,000–200,000
MicrobesOver 50,00022,000–23,000
Higher plants500,000–600,000~100,000
Animal kingdom300,000–400,00050,000–100,000
ProtozoaSeveral hundreds100–200
Marine animals20,000–25,0007000–8000
Algae, lichens3000–50001500–2000
Insects, worms8000–10,000800–1000

Table 1.

Approximate number of identified natural metabolites.

NA–Data not available.

Source: Bérdy [5].

Due to the development of the genomic and bioinformatic field, scientists are now able to access extensive genetic information and enable genome mining of relevant Biosynthesis gene cluster (BGCs) with the potential for valuable SM production [8]. therefore, genetic engineering has now become widely used and moving beyond traditional tools which open a new era in the detection of novel secondary metabolites [9]. In fact, by using bioinformatic analysis that analyzes the putative secondary metabolites genes cluster in the sequenced genome, scientists were able to predict new SMs that were not identified by using traditional techniques because all new revealed SMs are not produced naturally under the lab conditions or even though produced but in very low amount that the traditional techniques were unable to identify them [1011]. Metabolomics aims to characterize and identify SMs in natural and engineered biosystems.

Metabolomics based techniques such as mass spectrometry (MS) and nuclear magnetic resonance (NMR) is accurate that can measure as low molecular weight compounds as possible. In fact, mass spectrometry (MS) and nuclear magnetic resonance (NMR) have been reported as significant analytical techniques to detect secondary metabolites under specific conditions [12]. This chapter provides an overview of metabolomics and genetic engineering techniques especially the CRISPR-Cas9 technique for the discovery and production enhancement of microbial secondary metabolites.


2. Genetic engineering for SMs detection

The genes associated with the biosynthesis of secondary metabolites is named biosynthesis gene cluster (BGCs). In fact, BGCs include all genetic information required for secondary metabolites regulation, assembly, modification, and biosynthesis [13]. As mentioned previously, not all microorganisms can be cultured in the laboratory resulting in not all SMs can be expressed by using traditional techniques (culturing and detection) also a lot of microbes contains silent or cryptic genes in their genome that are responsible for the production of secondary metabolites. In fact, these silent BGCs have potentially significant in the discovery of novel secondary metabolites [13, 14, 15, 16].

Nowadays, instead of traditional detection techniques, genetic engineering tools are utilized for the identification of novel biosynthesis gene cluster BGCs [9]. However, genetic engineering can be used in both heterologous and homologous hosts. While gene manipulation in a homologous host allow the retention of factors necessary for the production of SMs, also gene manipulation in a heterologous host enable activation of BGCs obtained from unculturable microorganism [17].

In fact, a variety of genetic engineering techniques have been developed in order to induce the expression of all genes of interest. Therefore, in metabolomic production field, several genome techniques have been utilized in order to detect and enhance secondary metabolites production such as clustered regulatory interspaced short palindromic repeat (CRISPR-Cas9), zinc finger nucleases (ZFNs), and transcriptional activator-like effector nucleases (TALENs) [18, 19]. While each technique has its advantages and disadvantages (Table 2), CRISPR-Cas9 has been reported to be the most promising and significant technique that can be used in the discovery and enhancement of SMs production [9, 17, 20, 21].

CRISPR/Cas9Zinc finger nucleases (ZFNs)Transcription factors like effector nucleases (TALENs)
Protein engineering stepsIt does not necessitate any protein engineering steps and is very easy to test several times.
It requires complex to test gRNATALENs need protein engineering steps to test gRNA
Mode of actionIt operates by inserting double-strand breaks or single-strand DNA nicks into the target DNA
(Case9 nickase)
It can induce double-strand breaks in target DNAInduces DSBs in target DNA
CloningNot RequiredRequiredRequired
Structural proteinsCRISP R is made up of a single monomeric protein as well as chimeric RNAZFNs are dimeric proteins that only require one protein component to functionTALENs are also dimeric and require a protein component to function
Mutation rateIt has been discovered that there is a low rate of mutationHigh mutation rate observed in plantsWhen compared to CRISPR, the mutation rate is high
ComponentscrRNA, Cas9 proteinsZn-finger domains Non- specific FOKI nuclease domainZn-finger domains Non-specific folk nuclease domain
Length of target sequence (bp)20–2218–2424–59
Target recognition efficiencyHighHighHigh
Level of experimentEasy and very fast procedureComplicated procedure that necessitates protein engineering expertiseRelatively easy procedure
Methylated DNA cleavageIn human cells, it can cleave methylated DNA. This is an area of particular concern for plants, as it has received little attentionUnable to do soThere are many unanswered questions about TALENs’ ability to cleave methylated DNA
MultiplexingCRISPR’s main advantage is that multiple genes can be edited at the same time. Only Cas9 was requiredThis is extremely difficult to achieve using ZFNsUsing TALENs to obtain multiplexed genes is extremely difficult. Because it necessitates distinct dimeric proteins for each target

Table 2.

Comparing different genomic engineering techniques used in metabolomics.

Source: Shuikan [11].

2.1 Gene insertion/deleting

Gene insertion or deletion is useful not only in biosynthesis gene clusters activation but also for novel SMs discovery [22]. In fact, several silent biosynthesis gene clusters have been refactored by replacing the biosynthesis gene clusters promoter to yield natural products such as secondary metabolites [23, 24, 25, 26].

Nowadays, the promising technique has been developed in the genetic engineering field that is multiplexed CRISPR-Cas9 and transformation-associated recombination (TAR)-mediated promoter engineering method (mCRISTAR) [21, 27, 28, 29, 30]. mCRISTAR actually combined the advantages of TAR technique and CRISPR-Cas9 technique. Basically, mCRISTAR mode of action is that CRISPR-Cas9 breaks the double-stranded in the promoter region of the biosynthesis gene cluster (BGCs), then the fragments produced are reassembled by TAR with synthetic gene-cluster specific promoter cassettes [21].

2.2 Gene cloning

Basically, gene cloning consists of some steps include 1) determining the suitable heterologous host 2) cloning the target gene, 3) transferring the gene into the suitable host, 4) expression of the gene in the suitable host system, 5) optimization of production [31].

However, many new and useful cloning techniques have been introduced such as transformation assisted recombination (TRA), Cas9-assisted targeting of chromosome segments (CATCH), and TAR-CRISPR [20, 32, 33]. CATCH is a cloning tool that uses the CRISPR-Cas9 system for direct BGCs cloning into the host. However, compared to PCR and restriction enzyme cloning techniques, CATCH is appeared to be more useful for direct cloning of large genes clusters. Whether, TAR technique has been utilized for about a decade in the cloning of large BGCs, but the TAR technique is associated with low cloning efficiency [20, 33]. To address this challenge TAR and CRISPR-Cas9 have been coupled resulting in a new approach called TAR-CRISPR [33]. Therefore, TAR-CRISPR is different than mCRISTAR as discussed earlier. It is yeast-based method, while mCRISTAR uses CRISPR-Cas9 to breaks the double-stranded in the promoter region of the BGC, and the fragments produced are reassembled by TAR with synthetic gene-cluster specific promoter cassettes. As a result, by coupling CRISPR with TAR significant increase of clone efficiency has been reported [33]. In fact, TAR-CRISTAR cloning will allow for the development of BGC cloning and SM production in the future.

While gene-editing techniques play a significant role in the detection and production of microbial secondary metabolites, metabolomics is also important in the identification and characterization of secondary metabolites produced by native or genetically modified microorganisms.


3. Identification and characterization of secondary metabolites

The identification and characterization of secondary metabolites are important. Metabolomic often requires abroad array of instrumentation such as ELSD for detecting lipids, coulometric array detectors for detecting redox compounds, and fluorescent spectrometer for detecting aromatic compounds, whereas other omics techniques such as genomics, transcriptomics, or proteomics are often conducted by a single instrument.

In microbial secondary metabolites investigation, the experiments are mainly conducted in two different approaches, targeted or untargeted metabolites identification [34]. As its name, targeted metabolites experiment aims to identify a specific group of SMs that are already known. Whereas, the untargeted secondary metabolites experiment aims to identify the large scale of SMs produced by microorganisms including novel and known metabolites [35].

Nowadays, two general technologies have been utilized as primary tools in metabolomic, mass spectrometry (MS), and nuclear magnetic resonance (NMR) [4, 36, 37].

These high-throughput tools provide broad coverage of many classes of secondary metabolites, including amino acids, lipids, sugars, organic acids, and others.

In fact, nuclear magnetic resonance (NMR) and mass spectrometry (MS) has been used to identify both targeted and untargeted secondary metabolites [38]. They are often complementary to each other. Mass spectrometry (MS) provides information of molecules whereas, nuclear magnetic resonance (NMR) is utilized to differentiate between structural isomers [39]. In fact, MS is more sensitive than NMR and able to detect the large scale of metabolites, while NMR is highly quantitative and reproducible and require larger sample amount for analysis than MS [40, 41].


4. Data analysis

In fact, the major challenges in metabolomic experiments are the huge amount of information obtained from either NMR spectroscopy or MS [7, 37]. The extraction of the significant information generated by NMR and MS is crucial by using computer software in order to organize the vast amount of data [40, 42].

Because studying individual metabolites is impractical for visualizing changes between groups of metabolites, univariate statistical approaches can be utilized to understand the results. Principal component analysis (PCA) is one of the most extensively used statistical approaches [39, 43, 44]. The data can be simplified using principle component analysis. CA without losing its core feature. In fact, principal component analysis PCA provides information on multivariate differences among secondary metabolites while, different univariant statistical tests such as non-parametric Wilcoxon signed-rank test, Kruskal–Wallis test, and the parametric.

Student’s t-test and ANOVA can be utilized to analyze isolated metabolites [45].

Nowadays, most metabolites can be identified, due to the development of many bioinformatics software. There are two types of metabolites identification that are applied including 1) definitive identification and 2) putative identification [7]. Many different metabolomics databases are available online some of them are used for NMR such as METLIN (, Biological Magnetic Resonance Databank (, and METLIN ( while the others are used for MS such as Mass Bank (,, the Glom Metabolite Database (GMD, NIST (, METLI and MMCD ( [46].


5. Conclusion

Microorganisms are one of the most significant sources of SMs that play important roles in many aspects of our life including pharmaceutical, biomedical and food applications. The integration between genetic engineering and metabolomic provides a powerful platform for the production, detection, and characterization of known and unknown secondary metabolites. However, the combination between CRISPR-Cas9 and metabolomics may improve the efficiency of microbial SMs discovery. Thus, the need of the hour is a comprehensive and sensitive technique that has the ability to provide comprehensive information of any secondary metabolites under all conditions.


  1. 1. Li JWH, Vederas JC. Drug discovery and natural products: End of an era or an endless frontier? Biomeditsinskaya Khimiya. 2011;57(2):148-160
  2. 2. Pelaez F. The historical delivery of antibiotics from microbial natural products - Can history repeat? Biochemical Pharmacology. 2006;71(7):981-990
  3. 3. McMurry JE. Organic chemistry with biological applications. In: Secondary Metabolites: An Introduction to Natural Products Chemistry. Stamford, USA: Cengage Learning Ltd; 2015. pp. 1016-1046
  4. 4. Berg M, Vanaerschot M, Jankevics A, Cuypers B, Breitling R, Dujardin J-C. LC-MS metabolomics from study design to data-analysis – Using a versatile pathogen as a test case. Computational and Structural Biotechnology Journal. 2013;4:e201301002
  5. 5. Bérdy J. Bioactive microbial metabolites. The Journal of Antibiotics. 2005;58(1):1-26
  6. 6. Putri SP, Nakayama Y, Matsuda F, Uchikata T, Kobayashi S, Matsubara A, et al. Current metabolomics: Practical applications. Journal of Bioscience and Bioengineering. 2013;115:579-589
  7. 7. Go EP. Database resources in metabolomics: An overview. Journal of Neuroimmune Pharmacology. 2010;5(1):18-30
  8. 8. Blin K, Kim HU, Medema MH, Weber T. Recent development of antiSMASH and other computational approaches to mine secondary metabolite biosynthetic gene clusters. Briefings in Bioinformatics. 2019;20:1103-1113
  9. 9. Tong Y, Weber T, Lee SY. CRISPR/Cas-based genome engineering in natural product discovery. Natural Product Reports. 2019;36:1262-1280
  10. 10. Lim FY, Sanchez JF, Wang CCC, Keller NP. Toward awakening cryptic secondary metabolite gene clusters in filamentous fungi. Methods in Enzymology. 2012;517:303-324
  11. 11. Shuikan AM, Hozzein WN, Alzharani MM, Sandouka MN, Al Yousef SA, Alharbi SA, et al. Enhancement and identification of microbial secondary metabolites. In: Extremophilic Microbes and Metabolites - Diversity, Bioprespecting and Biotechnological Applications. London: IntechOpen; 2020
  12. 12. Lenders J, Frédérich M, De Tullio P. Nuclear magnetic resonance: A key metabolomics platform in the drug discovery process. Drug Discovery Today: Technologies. 2015;13:39-46
  13. 13. Bino RJ, Hall RD, Fiehn O, Kopka J, Saito K, Draper J, et al. Potential of metabolomics as a functional genomics tool. Trends in Plant Science. 2004;9(9):418-425
  14. 14. Tran PN, Yen MR, Chiang CY, Lin HC, Chen PY. Detecting and prioritizing biosynthetic gene clusters for bioactive compounds in bacteria and fungi. Applied Microbiology and Biotechnology. 2019;103:3277-3287
  15. 15. Valayil JM. Activation of microbial silent gene clusters: Genomics driven drug discovery approaches. Biochem Anal Biochem. 2016;5:276
  16. 16. Rutledge PJ, Challis GL. Discovery of microbial natural products by activation of silent biosynthetic gene clusters. Nature Reviews. Microbiology. 2015;13:509-523
  17. 17. Zhang MM, Wang Y, Ang EL, Zhao H. Engineering microbial hosts for production of bacterial natural products. Natural Product Reports. 2016;33:963-987
  18. 18. Miller JC, Holmes MC, Wang J, Guschin DY, Lee YL, Rupniewski I, et al. An improved zinc-finger nuclease architecture for highly specific genome editing. Nature Biotechnology. 2007;25:778-785
  19. 19. Jankele R, Svoboda P. TAL effectors: Tools for DNA targeting. Briefings in Functional Genomics. 2014;13:409-419
  20. 20. Yamanaka K, Reynolds KA, Kersten RD, Ryan KS, Gonzalez DJ, Nizet V, et al. Direct cloning and refactoring of a silent lipopeptide biosynthetic gene cluster yields the antibiotic taromycin A. Proceedings of the National Academy of Sciences. 2014;111:1957-1962
  21. 21. Kang HS, Charlop-Powers Z, Brady SF. Multiplexed CRISPR/Cas9-and TAR-mediated promoter engineering of natural product biosynthetic gene clusters in yeast. ACS Synthetic Biology. 2016;5:1002-1010
  22. 22. Voytas DF. Plant genome engineering with sequence-specific nucleases. Annual Review of Plant Biology. 2013;64:327-350
  23. 23. Lee NC, Larionov V, Kouprina N. Highly efficient CRISPR/Cas9-mediated TAR cloning of genes and chromosomal loci from complex genomes in yeast. Nucleic Acids Research. 2015;43:e55-e55
  24. 24. Horbal L, Marques F, Nadmid S, Mendes MV, Luzhetskyy A. Secondary metabolites overproduction through transcriptional gene cluster refactoring. Metabolic Engineering. 2018;49:299-315
  25. 25. Shao Z, Rao G, Li C, Abil Z, Luo Y, Zhao H. Refactoring the silent spectinabilin gene cluster using a plug-and-play scaffold. ACS Synthetic Biology. 2013;2:662-669
  26. 26. Bauman KD, Li J, Murata K, Mantovani SM, Dahesh S, Nizet V, et al. Refactoring the cryptic streptophenazine biosynthetic gene cluster unites phenazine, polyketide, and nonribosomal peptide biochemistry. Cell Chemical Biology. 2019;26:724-736
  27. 27. Pohl C, Kiel JAKW, Driessen AJM, Bovenberg RAL, Nygard Y. CRISPR/Cas9 based genome editing of Penicillium chrysogenum. ACS Synthetic Biology. 2016;5:754-764
  28. 28. Sander JD, Joung JK. CRISPR-Cas systems for editing, regulating and targeting genomes. Nature Biotechnology. 2014;32:347-355
  29. 29. Zhang MM, Wong FT, Wang Y, Luo S, Lim YH, Heng E, et al. CRISPR–Cas9 strategy for activation of silent Streptomyces biosynthetic gene clusters. Nature Chemical Biology. 2017;13:607
  30. 30. Li L, Zheng G, Chen J, Ge M, Jiang W, Lu Y. Multiplexed sitespecific genome engineering for overproducing bioactive secondary metabolites in actinomycetes. Metabolic Engineering. 2017;40:80-92
  31. 31. Greunke C, Duell ER, D’Agostino PM, Glöckle A, Lamm K, Gulder TAM. Direct pathway cloning (DiPaC) to unlock natural product biosynthetic potential. Metabolic Engineering. 2018;47:334-345
  32. 32. Jiang W, Zhao X, Gabrieli T, Lou C, Ebenstein Y, Zhu TF. Cas9-assisted targeting of chromosome segments CATCH enables one step targeted cloning of large gene clusters. Nature Communications. 2015;6:810
  33. 33. Bonet B, Teufel R, Crüsemann M, Ziemert N, Moore BS. Direct capture and heterologous expression of Salinispora natural product genes for the biosynthesis of enterocin. Journal of Natural Products. 2015;78:539-542
  34. 34. Breitling R, Ceniceros A, Jankevics A, Takano E. Metabolomics for secondary metabolite research. Metabolites. 2013;3:1076-1083
  35. 35. Wu C, Kim HK, van Wezel GP, Choi YH. Metabolomics in the natural products field—A gateway to novel antibiotics. Drug Discovery Today: Technologies. 2015;13:11-17
  36. 36. Dunn WB, Broadhurst DI, Atherton HJ, Goodacre R, Griffin JL. Systems level studies of mammalian metabolomes: The roles of mass spectrometry and nuclear magnetic resonance spectroscopy. Chemical Society Reviews. 2011;40(1):387-426
  37. 37. Midelfart A, Dybdahl A, Gribbestad IS. Metabolic analysis of the rabbit cornea by proton nuclear magnetic resonance spectroscopy. Ophthalmic Research. 1996;28(5):319-329
  38. 38. Dettmer K, Aronov PA, Hammock BD. Mass spectrometry based metabolomics. Mass Spectrometry Reviews. 2007;26(1):51-78
  39. 39. Alia A, Ganapathy S, de Groot HJ. Magic angle spinning (MAS) NMR: A new tool to study the spatial and electronic structure of photosynthetic complexes. Photosynthesis Research. 2009;102(2-3):415-425
  40. 40. Dunn WB, Broadhurst D, Begley P, Zelena E, Francis-McIntyre S, Anderson N, et al. Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry. Nature Protocols. 2011;6(7):1060-1083
  41. 41. Johnson CH, Gonzalez FJ. Challenges and opportunities of metabolomics. Journal of Cellular Physiology. 2012;227(8):2975-2981
  42. 42. Lu W, Bennett BD, Rabinowitz JD. Analytical strategies for LC-MS-based targeted metabolomics. Journal of Chromatography. B, Analytical Technologies in the Biomedical and Life Sciences. 2008;871(2):236-242
  43. 43. H. H. Analysis of a complex of statistical variables into principal components. Journal of Education & Psychology. 1933;24:417-441
  44. 44. Young SP, Wallace GR. Metabolomic analysis of human disease and its application to the eye. Journal of Ocular Biology, Diseases, and Informatics. 2009;2(4):235-242
  45. 45. Broadhurst DI, Kell DB. Statistical strategies for avoiding false discoveries in metabolomics and related experiments. Metabolomics. 2006;2(4):171-196
  46. 46. Brown M, Dunn WB, Dobson P, Patel Y, Winder CL, Francis-McIntyre S, et al. Mass spectrometry tools and metabolite-specific databases for molecular identification in metabolomics. Analyst. 2009;134(7):1322-1332

Written By

Ahmed M. Shuikan, Wael N. Hozzein, Rakan M. Alshuwaykan and Ibrahim A. Arif

Submitted: 22 November 2021 Reviewed: 24 January 2022 Published: 31 March 2022