Open access peer-reviewed chapter

Drug Repurposing: Scopes in Herbal/Natural Products-based Drug Discovery and Role of in silico Techniques

Written By

Manisha Kotadiya

Submitted: 18 December 2022 Reviewed: 04 January 2023 Published: 25 January 2023

DOI: 10.5772/intechopen.109821

From the Edited Volume

Drug Repurposing - Advances, Scopes and Opportunities in Drug Discovery

Edited by Mithun Rudrapal

Chapter metrics overview

114 Chapter Downloads

View Full Metrics

Abstract

Natural products and their derivatives are the most promising and prolific resources in identifying the therapeutic small compounds with potential therapeutic activity. Nowadays, working with herbal or natural products can be boosted by collecting the data available for their chemical, pharmacological, and biological characteristics properties. Using in silico tools and methods, we can enhance the chances of getting a better result in a precise way. It can support experiments to emphasis their sources in fruitful directions. Though due to their limitations with respect to current knowledge, quality, quantity, relevance of the present data as well as the scope and limitations of cheminformatics methods, herbal product-based drug discovery is limited. The pharmaceutical re-profiling is done with the main objective to establish strategies by using approved drugs and rejected drug candidates in the diagnosis of new diseases. Drug repurposing offers safety lower average processing cost for already approved, withdrawn drug candidates. In silico methods could be oppressed for discovering the actions of un-investigated phytochemicals by identification of their molecular targets using an incorporation of chemical informatics and bioinformatics along with systems biological approaches, hence advantageous for small-molecule drug identification. The methods like rule-based, similarity-based, shape-based, pharmacophore-based, and network-based approaches and docking and machine learning methods are discussed.

Keywords

  • docking
  • molecular simulation
  • bioinformatic tools
  • machine learning
  • target identification
  • databases

1. Introduction

Herbal or natural products and their derivatives have an ancient status in using them as traditional medicines to treat ailments and various diseases. However, in today’s era, they become a prolific resource for identifying small therapeutic molecules as an inspiration. With regard to it, around two-thirds of small-molecule medicines or drugs approved in between 1981 and 2019. William C. Campbell, Satoshi Omura, and Youyou Tu got noble prize for the discovery of two natural products such as avermectin and artemisinin, and they are used in the treatment of parasitic diseases caused by parasites [1].

Due to these evolutionary processes, natural compounds consist of various biological activities in different races. Because of these characteristics, the wide range of products from natural resources are identified as privileged structural molecules [23]. They are highly diverse with respect to structure, pharmacological, and physiological properties. Some are having good ADME and physicochemical properties, and some are clearly beyond and generally recognized as small drug-like chemical space [4, 5, 6]. Almost all phytochemicals and other compounds from natural origin have complex molecular structure with respect to their 3D molecular shape, geometry, stereochemistry, ring complexity, and conformations like more number of rotatable bonds and absence of aromaticity [7, 8, 9]. This includes numerous basic obstacles to 3D cheminformatics methodologies, which is why the creation of force fields and algorithms for the prediction and identification of protein-bound conformations of such complex compounds remains the most actively pursued research period in cheminformatics and bioinformatics [10, 11, 12, 13, 14, 15].

In silico methods can contribute to natural product small-molecule discovery and can also became a backbone to experimentalists throughout the lead identification [16, 17, 18, 19]. They are not only used for identifying bioactive molecule but also used to prioritize material for testing [20, 21]. In silico methods are also adopted as follows:

  1. data curation and dereplication,

  2. chemical space analysis, visualization, and comparison,

  3. accentuation of product-likeness,

  4. prediction of ADME properties and safety profiling.

A high-performance computer facility on-site is no longer required. Calculations may now be conducted at extremely large sizes in the cloud at a cheap cost and complexity. Simply paying software license fees is a significant cost component that has steadily climbed in recent years. Simultaneously, we are seeing an increase in the number of sophisticated open-source tools, similar to what has been widely used in the area of bioinformatics. Some of the best softwares in this category are as follows:

  1. RDKit and CDK [22, 23]

  2. KNIME [24, 25] (an open-source analytics platform), and

  3. Scikit-learn (an open-source Python module for machine learning) [26]

This summarizes the methods and in silico tools for repurposing to provide a concise but comprehensive overview of the scope and limitations for herbal or other natural origin-based drug discovery in a format that is accessible to researchers from different areas with an interest in drug discovery. The conversation covers a huge number of methods in cheminformatics, bioinformatics as well as data resources relevant to natural product-based drug discovery.

Advertisement

2. Herbal/natural product databases and computational methods/tools

Most databases also provide free bulk download, allowing for virtual screening and other uses. According to these studies, the total number of natural compounds whose structures can be obtained via bulk download from free databases exceeds 250 k, approaching 300 k. Unfortunately, many databases have a brief half-life; just a handful are sustainably managed and under continuous improvement. Data quality is always an issue, but when it comes to phytoconstituents, extra care should be taken, especially when integrating the data with computational methods that rely on correct depiction of 3D molecular structures. This is because of that stereochemical information on phytocompounds is frequently erroneous. Virtual databases can be distinguished into following:

  1. encyclopedic and natural product/herbal data sources,

  2. databases augmented with phytoconstituents used in traditional medicines,

  3. specialized databases dedicated to certain ecosystems, geographical locations, animals, pharmacological activities, or even their classes.

Super Natural II [27] is the most comprehensive free database, with over 325 k substances. The database may be queried using a chemistry-aware online interface; however, mass download is not supported. A handful of the best free, downloadable materials are described below:

  1. Universal Natural Products Database (UNPD) [5], which lists more than 200 k compounds;.

  2. In addition, major databases such as the TCM database-Taiwan [28], which has information on over 60 k compounds discovered in Chinese medicinal plants, natural product Atlas [29], which contains information on over 25 k chemicals found in bacteria and fungus; and

  3. The Collective Molecular Activities of Useful Species (CMAUP) database [3031], which has information on over 47 k chemicals from over 5600 plants;

  4. We discovered that only around 16% of our collection of roughly 250 k compounds were available in the ChEMBL database and by overlapping our set with the whole ChEMBL database (a database offering bioactivity data on approximately 2 million compounds) [32, 33];

  5. Similarly, when we compared the dataset to all small-molecule ligands documented in the Protein Data Bank (PDB), we discovered that only roughly 2000 molecules have at least one co-crystallized X-ray structure of excellent quality [6].

Advertisement

3. In silico analysis, physicochemical studies, and structural properties of natural products

Computational chemistry has been playing a key role in the characterization of compounds by their physicochemical and structural properties. Phytocompounds cover a much lots of chemical space than synthetic [34]. The structural uniqueness (and complexity) of some phytocompounds and other natural compounds from other sources could allow them to target macromolecules. They are on average heavier and more hydrophobic than synthetic drugs and synthetic, drug-like compounds. Their structural complexity is also often higher, in particular with regard to stereochemistry (commonly quantified by the number of chiral centers [35], the number of fractions of Csp3 atoms, and or the number of bridge head atoms in ring systems and 3D molecular shape) [36]. All natural compounds show an enormous diversity of ring systems, in particular of aliphatic systems. One study showed that 83% of core ring scaffolds of natural products are absent in commercially available screening databases [37]. Compounds from natural sources from different kingdoms have distinct physicochemical and structural properties. For example, natural compounds with macrocycles or long aliphatic chains are more commonly to marine species than terrestrial species. Bacteria also manufacture a large number of macrocyclic natural chemicals. Natural compounds have a large number of heteroatoms and, as a result, a wide range of functional groups [38]. Computational Methods for Assessing the Institutional Variety of Herbal Compounds are unparalleled in terms of the structural diversity, which is expressed on a fragment level [39]. The majority of studies comparing the diversity of compounds with that of chemical drugs use the idea of biochemical structures (scaffolds) presented by Bemis and Murcko [40]. A powerful tool for the intuitive, visual analysis of the structural diversity of sets of compounds is Scaffold Hunter [41]. Some of the methods are enlist as below:

The open-source, Java-based program has a graphical interface and different clustering techniques.

Scaffold Hunter is based on the concept of molecular scaffold hierarchical representation and categorization (“scaffold tree”).

An early prototype of this instrument served as the foundation for the structural categorization of bioactive substances (SCONP), a technique for mapping compounds’ chemical space [42].

Principal component analysis (PCA) is one of the most widely used approaches for modeling the chemical space [43], which projects high-dimensional data into a low-dimensional space for improved interpretability, while keeping information loss to a minimum. Natural compounds have been used in several studies for mapping the chemical space of small molecules [44], for mode of action prediction and for the analysis of structure-activity relationships. Despite significant variations in chemical structure, these studies reveal a high degree of similarities between natural substances and synthesized pharmaceuticals in terms of pharmacophore characteristics [45]. T-distributed algorithms are another effective way for reducing dimensionality.

Stochastic Neighbor Embedding (t-SNE) [46], as well as the recently announced Uniform Manifold Approximation and Projection for Dimension Reduction (UMAP) [47], generates plots in which comparable things are clustered together and dissimilar ones are represented by distant points. Although t-SNE can provide graphics that seem to be superior to those produced by PCA, the approach does not really scale well with dataset size.

UMAP is theoretically similar to t-SNE and yields comparable results, but it is quicker. Medina-research Franco’s group has been working on many techniques for such intuitive characterization, visualization, and comparison of chemical collections, with an emphasis on their databases producing similar result faster (Figure 1).

Figure 1.

Examples of approved drugs and interaction of drugs to their target proteins: (A) (−)-galantamine, an ACEs inhibitor used for the treatment of Alzheimer’s disease (PDB: 1DX6), (B) tacrolimus, a macrocyclic immunosuppressant (PDB: 1FKF) and (C) chenodeoxycholic acid for the treatment of hypocholesterolaemia (PDB: 6HL1) [39].

Advertisement

4. Computational methods for the analysis of natural compounds and their drug-likeness prediction

Computational tools are able to discriminate natural compounds and natural-like compounds from synthetic compounds with high. Accuracy, and they are also able to quantify the natural compound-likeness of compounds. As such, they are frequently used in compound design, library design, natural compound selection (and their derivatives and analogs) among heterogeneous compound collections, and compound prioritizing [48]. The Natural Products-Likeness Score created by Ertl is one of the most well-established techniques [49]. This score measures the chemicals based on the resemblance of their fragment from those of existing natural compounds using Bayesian statistics. The Natural Product-Likeness Score has been re-implemented with certain changes in various tools and platforms [50]. Additional techniques include a theoretically comparable method based on extended connectivity fingerprinting (ECFPs) and a rule-based approach [50].

More recently, we developed Natural Product-Scout, a tool for identifying NPs and NP-like compounds in large sets of molecules. Arbitrary forest classification techniques are trained and tested database of known biologically active compounds.

On a sample test set, a classifier based on Molecular ACCess System keys achieved an area under the characteristics curve (AUC) of 0.997 as well as the Matthews correlation coefficient (MCC) of 0.960. Similarity maps are used by NP-Scout to identify locations in a compound that help to the identification of a compound as NP and synthetic chemical (Figure 2). NP-Scout may be accessed via a free online service [52].

Figure 2.

Similarity maps of (A) vorapaxar and (B) empagliflozin [51].

Recently, the Natural Compounds Molecular Fingerprint (NC-MFP) was presented as a novel method of defining the structural properties of natural compounds in term of the scaffold and fragments they are made up of [53]. It has been demonstrated that the NC-MFP outperforms existing fingerprints in distinguishing natural substances from manufactured ones.

Advertisement

5. Computational identification of natural products likely to interfere with biological activities

Computational techniques have a robust track record in the identification of bioactive natural origin compounds.

For their research, they used a wide range of virtual screening methods, from simple, fast methods based on 2D molecular fingerprint similarity to more complex.

Machine learning algorithms have lately become a standard in screening for pharmacological active natural compounds [54].

The structural properties of many NPs, including such greater levels of conformational flexibility, the complexity of about their shapes and ring system (particularly macrocycles), inadequacies of molecular force fields primarily model defined for synthetic substances, and uncertainly related to protonation states, tautomerism, and oxidation states pose particular challenges to 3D virtual screening techniques. One method for reducing the spatial structure of natural substances is to eliminate sugars and sugar-like substances that are not required for bioactive components on a target of interest [55]. This can be done, for example, by use of defined (SMARTS) patterns. Given the sparsity of available structural data, docking of natural compounds to the structures of macromolecules can pose a profound challenge. This is because of docking algorithms and scoring functions are highly sensitive even to very small changes in 3D structure such as commonly induced by ligand binding method (including solvent effects). However, also this hurdle may be overcome by the prudent use of homology modeling techniques, induced fit docking approaches, and molecular dynamics simulations. In case of extremely flexible proteins, docking against multiple, protein structures (“ensemble docking”) may be a good way onward (not only for screening but also for binding area prediction) [56, 57]. Diligence and patience will certainly be required and, above all, checks of the plausibility of a hypothesis using all available information can help to piece the puzzle together. More often than in virtual screening, docking algorithms produce good results in binding mode prediction [58]. Provided that the natural compounds of interest is not excessively large or flexible (as a rough guide, not exceeding 35 heavy atoms or eight rotatable bonds), that the ligand binding site is well defined (i.e., not overly shallow, not solvent-exposed), and that the interaction between the binding partners involves two or more directed interactions, and there is a good chance that a satisfactorily accurate binding pose can be obtained that offers crucial insights for the development of optimization strategies. Binding posture prediction is more practicable than virtual screening, since that allows researchers to ignore the most difficult component of docking, which is grading compounds based on their ligand binding, and it allows them to focus their efforts on a single ligand-target combination. Docking, particularly in the context of NP research, allows for the rationalization of stereoselectivity in ligand binding (and other processes, such as metabolism). The significance of incorporating accurate conformational information with 3D techniques, particularly docking, cannot be emphasized. In the following paragraphs, we will examine several exemplary investigations in which virtual screening has been effectively used to identify bioactive chemicals. Using katsumadain A, a diarylheptanoid inhibiting influenza neuraminidase, as a template for 3D molecular shape-based screening, a number of structurally distinct NPs were identified that inhibit the viral enzyme with IC50 values in the sub-micromolar to low-micromolar range (for example, artocarpin (1), which is depicted in Figure 3) [59]. In another study, pharmacophore-based virtual screening was combined with a shape-based approach in order to identify activators of the G protein-coupled bile acid receptor 1 (GPBAR1) [51]. In addition to several NP databases, a collection of synthetic compounds was screened. Among the 14 selected NPs, eight (57%) obtained a measured receptor activation of at least 15% at 20 μM concentration.

Figure 3.

Natural compound and their derivatives identified by virtual screening method.

Two of these compounds, (1) farnesiferol B (2) and microlobidene (3), are based on molecular scaffolds that had not yet been associated with GPBAR1 modulation. Both compounds were reported to have EC50 values of approximately 14 μM. Among all 19 selected compounds, only two were active (applying the identical activity threshold).

Advertisement

6. In silico prediction of the therapeutic targets of natural products

Identifying the receptors of small compounds is critical for assessing the pharmacological activity and safety of drugs, as well as their future development. However, the method of action of a significant percentage of marketed medications is uncertain or very loosely understood.

Target prediction in silico is a large-scale use of virtual screening [60], in which one, many, or even several molecules are assessed against the broadest collection of macromolecules conceivable. A number of techniques including models have been released in recent years [61], and they have emerged as valuable tools in earlier drug discovery. The majority of target prediction strategies are ligand-based, which is connected to the issues with docking and structure-based methods.

Ligand-based methodologies range from simple similarity-based approaches to advanced machine learning as well as network-based approaches. Surprisingly, despite the variety of computer tools for target prediction available today, our understanding of their utility under real-world situations remains restricted [62]. This is largely due to the (in practice) exorbitant expenses associated with the practical, scientific, proactive evaluation of such models, but it is also due to the often used inadequate, superficial retrospective validation techniques [63]. To the best of our knowledge, the only computational technique that has received rigorous experimental validation is the well-established Similarity Ensemble Approach (SEA) [64]. One may argue that testing models using current data tends to an exaggeration of how effectively a model would perform in real-world scenarios.

It is more likely that phenotypic test readouts with different types of cells or information for structurally similar drugs would be obtained. By merging all available information, some false-positive predictions are likely to be eliminated, leaving many fewer prospective targets to be studied experimentally. In a recent in-depth study of the behavior and scope of a similarity-based approach and a machine learning approach for estimating the targets of small molecules, we display that the reliability of either approach’s predictions is strongly influenced by the structural relationship between the compounds of interest and compounds represented in the training set. This issue must be carefully examined while working with natural substances, considering that target prediction algorithms are largely intended for, and with natural chemicals, given that target prediction algorithms are largely built for and trained on synthetic compound measurement data. In the same investigation, we discovered that, surprisingly, the similarity-based strategy outscored the machine learning technique using the already available data. While a meaningful correlation of these two methods should be approached with caution for several reasons, the results indicate that the basic similarity-based strategy is a solid choice, particularly when model interpretability is considered. This is also shown in the high performance of other well-known, similarity-based models, like Swiss Target Prediction [65].

The majority of the compounds differ structurally from more common, synthetic chemicals that account for the majority of the observed activity data. More complicated similarity-based approaches that examine molecules based on their 3D molecular structure are supposed to identify such distant structural similarities, but how effectively these methods would function in practice was unknown until recently. We investigated the capability of ROCS [66].

ROCS, a premier shape-based screening engine which also takes chemical feature distributions into consideration, was used to discover the biomolecules targets of “complex” molecules using a knowledge and understanding of “non-complex” molecules with measured bioactivity data [67].

We designated molecules as “complex” for this work if they are either (extremely) large in size (45 to 55 heavier atoms) or macrocyclic. We classified compounds as “non-complex” when they were tiny in size (15 to 30 heavy atoms). A collection of 28 pharmacologically important targets were investigated. A diversified set of 10 complicated small molecules was created automatically for each one of the targets. Each of these molecules had a single low-energy conformation that was used as a query for ROCS screening against a multi-conformational knowledge base. The knowledge base has 3642 targets and 272,640 non-complex molecules. This study discovered that ROCS accurately rated at least one known target in the top 10 spots (out of 3642) for up to 37% of the 280 complicated small compounds used as queries. This result is amazing given the dissimilarity of the queries and compounds in the knowledge base. It suggests that target prediction is achievable for a large number of difficult complicated compounds. It should be noted that, in many circumstances, researchers will be able to significantly limit the number of target candidates based on specialist knowledge and accessible information. There were at least 31 identified complex molecules and natural product-like molecules among the 280 complicated small molecules. The top-10 rate of success for these compounds was lower (23% vs. 37%). This is due to the fact that the median Tanimoto coefficient between the complex NP (or NP-like substance) and the nearest simple molecules in the knowledge base is only 0.13. For pairings of compounds with such a minimal degree of similarity, it is reasonable to predict that the respective binding interaction possess will be unique, which is normally outside the reach of ligand-based approaches.

In addition to 3D similarity-based techniques, 3D pharmacophore-based methodologies are commonly utilized for prediction of target protein in the context of natural substances research. A profiling investigation, for example, evaluated secondary metabolites extracted from the medicinal plant Ruta graveolens against a battery of over 2000 pharmacophore models spanning over 280 targets.

Arborinine was found as an antagonist of Angiotensin-converting enzymes (ACEs) (measured IC50 = 35 M) results from in silico screening, among many other bioactive chemicals and interactions. Machine learning-based methods for natural chemical target prediction have sparked the greatest attention in recent years. Some of examples for online tools are given below:

  1. SPIDER,

  2. TIGER, and

  3. Starfish

Spider employs self-organizing maps in conjunction with “fuzzy” chemical descriptors, allowing it to be extended to NPs. The model proved useful in identifying 5-lipoxygenase, peroxisome proliferator-activated receptors, steroid receptors, prostaglandin E2 synthase 1, and Farnesoid X receptor as therapeutic targets of the archazolid A, and it accurately predicted prostanoid receptor 3 as a molecular target for doliculide, which is a 16-membered depsipeptide [68]. SPIDER has effectively discovered the targets of other fragment-like natural compounds, such as Sparteine, for which the kappa opioid receptor, p38 mitogen-activated protein kinase, and muscarinic and nicotinic receptors were clinically verified as targets [3]. DL-goitrin, whose targets have been experimentally proven to be receptor pregnane X and the cholinergic receptor,

Graveolinine acts on cyclooxygenase-2, serotonin 5HT2B receptors were clinically verified as targets, isomacroin acts on adenosine A3, and platelet growth factor receptors were clinically identified as targets.

DEcRyPT uses random forest regression to build a revised list of possible macromolecule targets based on predictions obtained from spider, the Target-Drug Relationship Predictor. DEcRyPT was used to successfully identify 5-lipoxygenase for which ortho-naphthoquinone-lapachone is well-known substrate. Lapachone hydroquinone was shown as inhibitor of 5-lipoxygenase.

TIGER is thematically connected to SPIDER. However, it utilizes updated Cats descriptors and employs a different technique for assessing expected targets. The, glucocorticoid, Orexin as well as cholecystokinin receptor were effectively discovered as therapeutic hit for marine NP (±) marinopyrrole A by TIGER. Among other proteins, the model correctly predicted estrogen receptors and as binding biomolecule of the stilbenoid resveratrol [69]. Starfish is a stacked ensemble approach for target prediction trained on synthetic compounds.

As a component of the development process, various machine learning methods were investigated. The authors determined the optimum stacking strategy by feeding molecular fingerprints into k-nearest neighbor’s model and a random forest model. The probability predicted by such models in which each of the therapeutic targets are employed as input for a logistic regression-based meta-classifier (level 1). On a test set of NPs, the stacking technique performed much better than the separate models (ROC AUC 0.94; BEDROC score 0.73). Network techniques for predicting biological targets of natural chemicals have also been published. Cheng and colleagues, for example, created statistical models in order to bind natural compounds to cancer targets and their protein involved in disorders like aging. Neural networks system was recently trained on clinical indication data and applied to discover favored molecular scaffold in natural products. Based on these models’ predictions, a unique template database for 100 indications were created, which may be used as a preliminary step for NP-based drug development. The reader is directed to reference for further information on this subject. Natural compounds that are likely to disrupt with biological experiments can be identified computationally. The proclivity of compounds to interfere with biological assays remains a significant challenge in compound screening experiments. The flavonoid quercetin, a well-known pan assay interference compound, exemplifies the scope of the issue: since about 28 July 2020, and the PubChem Bioassay repository identified quercetin as conclusively bioactive in over 800 separate bioassays, representing a hit rate of more than 50%. The most typically seen method of test interference is aggregation formation, which happens under certain assay circumstances. Covalent binding, redox cycling, interference with spectroscopy assay, metal chelation, membrane rupture, and breakdown in buffers are further significant processes [70].

Advertisement

7. Computational identification of natural products likely to interfere with biological assays

The development of computer techniques to address this challenge has been gradual. Until recently, the tools available to users comprised numerous rule sets, a few similarity-based techniques, and a statistical method. The most well-known method and widely used rule set is pan assay interference components (pains) rule set. Despite the unambiguous declarations of its creators, operators of the PAINS rules set all too frequently overlook the significant drawbacks of its scope, applicability, and trustworthiness. Other relevant rule sets here include rapid elimination of swill rules as well as a set of rules generated from an Nuclear magnetic resonance-based approach in detecting tiny compounds that give false-positive test results owing to interaction (ALARM NMR) [71]. Aggregator Advisor is a useful similarity-based technique that identifies compounds with similar structural structures. Aggregator Advisor is a handy similarity-based technique that indicates compounds that have a close structural affinity to identified aggregators based on molecular scaffolds.

Hit Dexter 2.0 is the second generation of a series of machine learning models meant to identify compounds that are likely to exhibit prolific hitter behavior in primary screening and/or confirmatory dose-response tests, independent of the underlying (interference) mechanism. All of these methods are generated from databases dominated by synthetic chemicals. As we demonstrate in our work on Hit Dexter 2.0, the training set, although comprising of around 250 k compounds, covers just a tiny proportion (approximately 15%) of the active compounds with molecules that are structurally related to the model to make credible predictions. This means that, once again, discretion is required when employing any of these techniques, specifically in the area of NPs.

Advertisement

8. In silico prediction of ADME and safety profiles of natural products

The biodistribution and safety characteristics of NPs are frequently a source of difficulty in NP-based drug development. The hERG channel (whose blockage has been associated with potentially deadly cardiac arrhythmia), cytochrome P450 enzymes (which can induce drug-drug interactions and toxicity), and P-glycoprotein are some of the most well-known anti-targets tackled by NPs (an efflux pump with broad substrate specificity that can effectively cause drug resistance). A wide range of computational models (e.g., pharmacophore models, statistical models, docking machine learning models, etc.) are also used to handle these and many additional anti-targets and end points. However, because of the data available, these and many other in silico methods are tested/tested using substances that are mainly of synthetic origin. For example [72, 73],

  1. Hit Dexter 2.0’s application to natural compounds is restricted. The fidelity of Hit Dexter’s estimations has been proven to decline significantly beyond a given distance from the training data, since the training data is mostly constituted of synthetic substances.

  2. In contrast, FAME3, a theoretically comparable machine learning model for predicting metabolic sites of small compounds, has been demonstrated to function well on natural compounds, despite the fact that the bulk of chemicals in the training set are synthetic. The reason for the FAME3 models’ high robustness and good result on molecules is that the liability of atom locations in compounds is described based on their determinative atom surroundings, and these proximate neighborhoods are much more excessive among compounds and synthetic substances than their worldwide molecular similarity (Table 1).

Sr. No.Purpose
1DockingGlide, Auto dock, Tar Fish Dock, Flare
2Binding site predictionSitemap, Computed Atlas of Surface Topography of proteins (Castp), Findsite, LigASite
3Pathway analysisTherapeutic Performance Mapping System
4Drug designForge, Spark
5.Pharmacokinetic parametersSwiss ADME
6GenomicsConnectivity map (CMap), Directionality map (DMAP)
7Molecular simulationImods, Gromacs

Table 1.

Available software for in silico drug repurposing.

Advertisement

9. Conclusions

NPs provide remarkable hurdles to both experimentalists and theorists, yet data on recently approved small-molecule medications demonstrate that NP research is worthwhile and can deliver useful, new therapeutics. Modern in silico approaches can contribute significantly to the speeding and non-risking of natural drug development. However, model applicability must be carefully monitored, especially when dealing with NPs, because computational approaches are often created for and trained on data for synthesized chemicals. Unfortunately, even recently established models sometimes lack rigorous definitions of the application area and do not appropriately notify users about compounds with unreliable predictions. Researchers, in fact, may be attracted to use one of the numerous free, user-friendly web applications. Obviously, the idea holds true for these web applications as well: in the absence of solid indications of the trustworthiness of individual forecasts, these estimates are not to be believed. Given the renewed interest in NP research, the increasing availability of biological, chemical, advances in algorithms, and structural data, and improvements in algorithms, modeling techniques, as well as computing capability, the future will see the sustained connectivity of computational techniques in natural compound-based drug development pipelines.

Advertisement

Conflict of interest

Authors declare that there is no conflict of interest.

References

  1. 1. Newman D, Cragg G. Natural products as sources of new drugs over the nearly four decades from 01/1981 to 09/2019. Journal of Natural Products. 2020;83:770-803
  2. 2. Cragg G, Newman J. Biodiversity: A continuing source of novel drug lead. Pure and Applied Chemistry. 2005;77:7-24
  3. 3. Rodrigues T, Reker D, Schneider P, Schneider G. Counting on natural products for drug design. Nature Chemistry. 2016;8:531-541
  4. 4. Atanasov G, Waltenberger B, Pferschy-Wenzig E, Linder T, Wawrosch C, Uhrin P, et al. Discovery and resupply of pharmacologically active plant-derived natural products: A review. Biotechnology Advances. 2015;33:1582-1614
  5. 5. Gu J, Gui J, Chen L, Yuan G, Lu H, Xu X. Use of natural products as chemical library for drug discovery and network pharmacology. PLoS One. 2013;8:e62839
  6. 6. Chen Y, de Lomana M, Friedrich N, Kirchmair J. Characterization of the chemical space of known and readily obtainable natural products. Journal of Chemical Information and Modeling. 2018;58:1518-1532
  7. 7. Clemons P, Bodycombe N, Carrinski H, Wilson J, Shamji WB, Koehler A, et al. Small molecules of different origins have distinct distributions of structural complexity that correlate with protein-binding profiles. Proceedings of the National Academy Science USA. 2010;107:8787-18792
  8. 8. Chen H, Engkvist O, Blomberg N, Li J. A comparative analysis of the molecular topologies for drugs, clinical candidates, natural products, human metabolites and general bioactive compounds. Medical Chemistry. 2012;3:312-321
  9. 9. David B, Grondin A, Schambel P, Vitorino M, Zeyer D. Plant natural fragments, an innovative approach for drug discovery. Phytochemical Review. 2019;2019. DOI: 10.1007/s11101-019-09612-4
  10. 10. Friedrich N, Flachsenberg F, Meyder A, Sommer K, Kirchmair J, Rarey M. Conformator: A novel method for the generation of conformer ensembles. Journal of Chemical Information and Modeling. 2019;59:731-742
  11. 11. Friedrich N, de Bruyn KC, Flachsenberg F, Sommer K, Rarey M, Kirchmair J. Benchmarking commercial conformer ensemble generators. Journal of Chemical Information and Modeling. 2017;57:2719-2728
  12. 12. Olgac A, Orhan I, Banoglu B. Benchmarking commercial conformer ensemble generators. Future Medicinal Chemistry. 2017;9:1665-1686
  13. 13. Ikram N, Durrant J, Muchtaridi M, Zalaludin A, Purwitasari N, Mohamed N, et al. Molecular docking and 3D-pharmacophore modelling to study the interactions of Chalcone derivatives with estrogen receptor alpha. Journal of Chemical Information and Modeling. 2015;55:308-316
  14. 14. Grienke U, Mihaly-Bison J, Schuster D, Afonyushkin T, Binder M, Guan S, et al. Pharmacophore-based discovery of FXR-agonists. Part II: Identification of bioactive triterpenes from Ganoderma lucidum. Bioorganic & Medicinal Chemistry. 2011;19:6779-6791
  15. 15. Landrum G. “RDKit,” can be found under www.rdkit.org
  16. 16. Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E. The chemistry development kit (CDK): An open-source Java library for chemo- and bioinformatics. Journal of Chemical Information and Computer Sciences. 2003;43:493-500
  17. 17. “KNIME j Open for Innovation,” can be found under https://www.knime.com/
  18. 18. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research. 2011;12:2825-2830
  19. 19. Banerjee P, Erehman J, Gohlke B, Wilhelm T, Preissner R, Dunkel M. Super natural II-a database of natural products. Nucleic Acids Research. 2014;43:935
  20. 20. Chen C. TCM database@Taiwan: The World's largest traditional Chinese medicine database for drug screening in silico. PLoS One. 2011;6:e15939
  21. 21. “Natural Products Atlas (2019),” can be found under https://www.npatlas.org
  22. 22. Wolber G, Langer T. Ligand Scout: 3-D pharmacophores derived from protein-bound ligands and their use as virtual screening filters. Journal of Chemical Information and Modeling. 2005;45:160-169
  23. 23. Ertl P, Schuffenhauer A. Cheminformatics analysis of natural products: Lessons from nature inspiring the design of new drugs. Progress in Drug Research. 2008;66:219-235
  24. 24. Chen Y, Stork C, Hirte S, Kirchmair J. NP-scout: Machine learning approach for the quantification and visualization of the natural product-likeness of small molecules. Biomolecules. 2019;9:43
  25. 25. Lucas X, Gruning B, Bleher S, Gunther S. Journal of Chemical Information and Modeling. 2015;55:915-924
  26. 26. Hert J, Irwin J, Laggner C, Keiser M, Shoichet B. Quantifying biogenic Bias in screening libraries. Nature Chemical Biology. 2009;5:479-483
  27. 27. El-Elimat T, Zhang X, Jarjoura D, Moy F, Orjala J, Kinghorn A, et al. Chemical diversity of metabolites from Fungi, Cyanobacteria, and plants relative to FDA-approved anticancer agents. ACS Medicinal Chemistry Letters. 2012;3:645-649
  28. 28. Muigg P, Rosen J, Bohlin L, Backlund A. Marine natural products: A source of novel anticancer drug. Phytochemistry Reviews. 2013;12:449-457
  29. 29. Chavez-Hernandez A, Sanchez-Cruz N, Medina-Franco J. A fragment library of natural products and its comparative Chemoinformatic characterization. Molecular Informatics. 2020;39:2000050
  30. 30. Zeng X, Zhang P, Wang Y, Qin C, Chen S, He W, et al. CMAUP: A database of collective molecular activities of useful plants. Nucleic Acids Research. 2019;47:1118
  31. 31. Bemis G, Murcko M. The properties of known drugs. 1. Molecular frameworks. Journal of Medicinal Chemistry. 1996;39:2887-2893
  32. 32. Bento A, Gaulton A, Hersey A, Bellis L, Chambers J, Davies M, et al. The ChEMBL bioactivity database: An update. Nucleic Acids Research. 2014;42:1083-1090
  33. 33. Schafer T, Kriege N, Humbeck L, Klein K, Koch O, Mutzel P. Scaffold hunter: A comprehensive visual analytics framework for drug discovery. Journal of Cheminformatics. 2017;9:28
  34. 34. Koch M, Schuffenhauer A, Scheck M, Wetzel S, Casaulta M, Odermatt A, et al. Charting biologically relevant chemical space: A structural classification of natural products (SCONP). Proceedings of the National Academic Science USA. 2005;102:17272-17277
  35. 35. Saldivar-Gonzalez F, Pilon-Jimenez A, Medina-Franco J. BIOFACQUIM: A Mexican compound database of natural products. Physical Science Review. 2019;4:2018-0103
  36. 36. Frederick R, Bruyere C, Vancraeynest V, Reniers J, Meinguet C, Pochet L, et al. Novel trisubstituted harmine derivatives with original in vitro anticancer activity. Journal of Medicinal Chemistry. 2012;55:6489-6501
  37. 37. Rosen J, Rickardson L, Backlund A, Gullbo J, Bohlin L, Larsson R, et al. QSAR and Combinatorial Science. 2009;28:436-446
  38. 38. Van der Maaten L, Hinton G. Visualizing Data using t-SNE. Journal of Machine Learning Research. 2008;9:2579-2605
  39. 39. McInnes L, Healy J, Melville J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction, arXiv e-prints 2018, 1802.03426v2
  40. 40. Yu M. Knowledge-based approach to de novo design using reaction vectors. Journal of Chemical Information and Modeling. 2011;51:541-557
  41. 41. Ertl P, Rogg S, Schuffenhauer A. Knowledge-based approach to de novo design using reaction vectors. Journal of Chemical Information and Modeling. 2008;48:68-74
  42. 42. Jayaseelan K, Moreno P, Truszkowski A, Ertl P, Steinbeck C. Natural product-likeness score revisited: An open-source, open-data implementation. BMC Bioinformatics. 2012;13:106
  43. 43. Zaid H, Raiyn J, Nasser A, Saad B, Rayan A. Physicochemical properties of natural based products versus synthetic chemical. Open Nutraceuticals Journal. 2010;3:194-202
  44. 44. “NP-Scout,” can be found under https://nerdd.zbh.uni-hamburg.de/npscout/
  45. 45. Seo M, Shin H, Myung Y, Hwang S, T. No K. Development of natural compound molecular fingerprint (NC-MFP) with the dictionary of natural products (DNP) for natural product-based drug development. Journal of Cheminformatics. 2020;12:6
  46. 46. Kirchweger B, Rollinger J. In: Filho VC, editor. Natural Products as Source of Molecules with Therapeutic Potential. 2018. pp. 333-364
  47. 47. Kirchweger B, Rollinger M. A strength-weaknesses opportunities-threats (SWOT) analysis of cheminformatics in natural product research. Progress in the Chemistry of Organic Natural Products. 2019;110:239-271
  48. 48. Grienke U, Schmidtke M, Kirchmair J, Pfarr K, Wutzler P, Durrwald R, et al. Antiviral potential and molecular insight into neuraminidase inhibiting diarylheptanoids from Alpinia katsumadai. Journal of Medicinal Chemistry. 2010;53:778-786
  49. 49. Amaro R, Baudry J, Chodera J, Demir O, McCammon J, Miao Y, et al. Ensemble docking in drug discovery. Biophysical Journal. 2018;114:2271-2278
  50. 50. Warren G, Andrews C, Capelli A, Clarke B, LaLonde J, Lambert M, et al. A critical assessment of docking programs and scoring functions. Journal of Medicinal Chemistry. 2006;49:5912-5931
  51. 51. “ROCS. Open Eye Scientific Software,” can be found under https://www.eyesopen.com
  52. 52. Kirchweger B, Kratz J, Ladurner A, Grienke U, Langer T, Dirsch V, et al. In Silico workflow for the discovery of natural products activating the G protein-coupled bile acid receptor 1. Frontiers in Chemistry. 2018;6:242
  53. 53. Grisoni F, Merk D, Friedrich L, Schneider G. Design of Natural-Product-Inspired Multitarget Ligands by machine learning. ChemMedChem. 2019;14:1129-1134
  54. 54. Cereto-Massague A, Ojeda M, Valls C, Mulero M, Pujadas G, Garcia-Vallve S. Tools for in silico target fishing. Methods. 2015;71:98-103
  55. 55. Mathai N, Chen Y, Kirchmair J. Validation strategies for target prediction methods. Briefings Bioinference. 2019;21:791-802
  56. 56. Mathai N, Kirchmair J. Similarity-based methods and machine learning approaches for target prediction in early drug discovery: Performance and scope. International Journal of Molecular Sciences. 2020;21:3585
  57. 57. Keiser M, Setola V, Irwin J, Laggner C, Abbas A, Hufeisen S, et al. Predicting new molecular targets for known drugs. Nature. 2009;462:175-181
  58. 58. Lounkine E, Keiser M, Whitebread S, Mikhailov D, Hamon J, Jenkins J, et al. Large-scale prediction and testing of drug activity on side-effect targets. Nature. 2012;486:361-367
  59. 59. Gfeller D, Grosdidier A, Wirth M, Daina A, Michielin O, Zoete V. Swiss target prediction: A web server for target prediction of bioactive small molecules. Nucleic Acids Research. 2014;42:32-38
  60. 60. Chen Y, Mathai N, Kirchmair J. Scope of 3D shape-based approaches in predicting the macromolecular targets of structurally complex small molecules including natural products and macrocyclic ligands. Journal of Chemical Information and Modeling. 2020;60:2858-2875
  61. 61. Rollinger J, Schuster D, Danzl B, Schwaiger S, Markt P, Schmidtke M, et al. In silico target fishing for rationalized ligand discovery exemplified on constituents of Ruta graveolens. Planta Medica. 2009;75:195-204
  62. 62. Reker D, Rodrigues T, Schneider P, Schneider G. Identifying the macromolecular targets of de novo-designed chemical entities through self-organizing map consensus. Proceedings of the National Academy of Sciences. 2014;111:4067-4072
  63. 63. Schneider P, Schneider G. De-orphaning the marine natural product (±)-marinopyrrole A by computational target prediction and biochemical validation. Chemical Communications. 2017;53:2272-2274
  64. 64. Cockroft N, Cheng X, Fuchs J. Starfish: A Stacked Ensemble Target Fishing Approach and its Application to Natural Products. Chemical Communications. 2019;9:4906-4920
  65. 65. Reker D, Perna A, Rodrigues T, Schneider P, Reutlinger M, Monch B, et al. Revealing the macromolecular targets of complex natural products. Nature Chemistry. 2014;6:1072-1078
  66. 66. Schneider G, Reker D, Chen T, Hauenstein K, Schneider P, Altmann K, et al. DE orphaning the macromolecular targets of the natural anticancer compound Doliculide. Chemical International Edition England. 2016;55:12408-12411
  67. 67. Walters W, Stahl M, Murcko M. Virtual screening—An overview. Drug Discovery Today. 1998;3:160-178
  68. 68. Huth J, Mendoza R, Olejniczak E, Johnson R, Cothron D, Liu Y, et al. ALARM NMR: A rapid and robust experimental method to detect reactive false positives in biochemical screens. Journal of the American Chemical Society. 2005;127:217-224
  69. 69. Irwin J, Duan D, Torosyan H, Doak A, Ziebart K, Sterling T, et al. An aggregation advisor for ligand discovery. Journal of Medicinal Chemistry. 2015;58:7076-7087
  70. 70. Yang J, Ursu O, Lipinski C, Sklar L, Oprea T, Bologa C. Badapple: Promiscuity patterns from noisy evidence. Journal of Cheminformatics. 2016;8:29
  71. 71. Stork C, Chen Y, Sicho M, Kirchmair J. Hit Dexter 2.0: Machine-learning models for the prediction of frequent hitters. Journal of Chemical Information and Modeling. 2019;59:1030-1043
  72. 72. Kirchmair J, Goller A, Lang D, Kunze J, Testa B, Wilson D, et al. Predicting drug metabolism: Experiment and/or computation? Nature Reviews Drug Discovery. 2015;14:387-404
  73. 73. Sicho M, Stork C, Mazzolari A, de Bruyn KC, Pedretti A, Testa B, et al. FAME 3: Predicting the sites of metabolism in synthetic compounds and natural products for phase 1 and phase 2 metabolic enzymes. Journal of Chemical Information and Modeling. 2019;59:3400-3412

Written By

Manisha Kotadiya

Submitted: 18 December 2022 Reviewed: 04 January 2023 Published: 25 January 2023