Open access peer-reviewed chapter

Chemoinformatic Approach: The Case of Natural Products of Panama

Written By

Dionisio A. Olmedo and José L. Medina-Franco

Submitted: 28 May 2019 Reviewed: 07 June 2019 Published: 31 July 2019

DOI: 10.5772/intechopen.87779

From the Edited Volume

Cheminformatics and its Applications

Edited by Amalia Stefaniu, Azhar Rasul and Ghulam Hussain

Chapter metrics overview

1,306 Chapter Downloads

View Full Metrics

Abstract

Chemoinformatic analysis was used to characterize a compound database of natural products from Panama and other reference collections. Data mining allowed to compare drug-likeness properties with public and commercial software and to achieve a statistical analysis of the physicochemical properties. Visualization of the chemical space in 3D indicates a high structural similarity. Molecular flexibility and complexity were evaluated using 2D descriptors, whereas the molecular scaffold was obtained using the Murcko method, and these showed few differences between the explored data set. In this chapter, we also present and discuss an example of the application of the chemoinformatic approach using the concept of modeling the activity landscape to study the structure-activity relationships (SARs) of compounds with activity against Plasmodium falciparum.

Keywords

  • chemoinformatic
  • complexity
  • data mining
  • physicochemical properties
  • scaffold

1. Introduction

Natural products (NPs) and their derivatives constitute a significant fraction of approved drugs [1, 2, 3], bioactive compounds [4, 5, 6, 7, 8], and lead compounds for drug discovery [9]. NP fragment has been used to guide the synthesis of bioactive compounds and generate BIOS combinatorial libraries [10, 11, 12, 13, 14, 15]. NPs have structures with different substituent patterns, giving rise to different biological activities for compounds with very similar structures [16, 17, 18, 19]. These bioactive metabolites have greater affinity for biological targets and, overall, may have better bioavailability than synthetic compounds, and the presence of pan-assay interference compounds (PAIN) is less frequent in this type of product [20]. The chemoinformatic analysis of several databases of NPs developed by academic institutions and private companies [21] has been carried out in different countries. Thus, the following databases were obtained: BIOFACQUIM [22], CIFPMA [23], NuBBE [24, 25], NANPDB [26], TCM [27], HIT [28], and NPACT [29]. The application of chemoinformatic tools involves the generation, manipulation, and analysis of data set of chemical substances. This allows us through mathematical calculations to order, develop, and evaluate structural information that can be visualized in 2D and 3D [30]. The determination of the physicochemical properties carried out on different databases of NPs and principal component analysis (PCA) was used as an approximation to display the chemical spaces [22, 23, 24, 31, 32, 33, 34, 35, 36, 37].

Computational exploration of NPs has increased in recent years, giving greater relevance to studies that include structural diversity metrics calculated with parameters based on distances such as Euclidean distance, Manhattan distances, and Cosine distance. Other criteria are based on circular fingerprint (ECFP-4, ECFP-6) [22, 23, 24, 38, 39, 40, 41, 42, 43, 44, 45] and fingerprint based on substructure (MACCS, PubChem) [22, 23, 24, 39, 40, 41, 42, 43, 44, 45]. Another metric used in NPs is the comparison by similarity that uses the Tanimoto index/Tanimoto coefficient [22, 23, 24, 45, 46, 47, 48, 49].

In this study, the molecular scaffolds of natural products have been obtained using the Murcko method [22, 23, 24, 50, 51, 52, 53, 54, 55, 56, 57]. Meanwhile, the molecular complexity is frequently evaluated by descriptors in 2D such as fraction of sp3 hybridized carbons (Fsp3) [23], fraction of chiral centers (FCC) [23], and globularity [22, 23, 24, 58, 59, 60, 61, 62, 63].

An update of the Natural Products Database from the University of Panama (UPMA) containing 454 compounds (Unpublished data) has been evaluated against different therapeutic targets such as cytotoxicity bioassay in cell lines, antifungal assay in vitro, parasites of tropical diseases (Leishmania sp., Plasmodium falciparum, and Trypanosoma cruzi), and the bioassay against HIV-1 virus, demonstrating an inhibitor effect on protease, reverse transcriptase, nuclear factor NFkappaB, and Tat protein affecting the viral replication. These are the most significant biological targets in which the natural products from Panama present bioactivity. The values of their biological activities are represented as percentages in Figure 1.

Figure 1.

Biological endpoints and targets in which natural products from Panama present bioactivity.

Advertisement

2. Application of chemoinformatic antimalarial databases: case of natural products from Panama

2.1 Preparation curated and processing of data set

In this chapter, we present a chemoinformatic analysis of natural products with antimalarial activities (in vitro), expressed as pIC50 against sensitive and resistant strains. Databases of natural products with antimalarial activity (NPAs) were constructed in-house by reviewing published articles including those compounds that were isolated and characterized by spectroscopic techniques of nuclear magnetic resonance. Around 1312 compounds were compared to 8 reference data sets: an open database, DrugBank (antimalarial drug), European Bioinformatics Institute. (CHEMBL drug indications) (antimalarial activities), Open Source Drug Discovery (OSDD) Malaria, Malaria Box (Medicines for Malaria Venture (MMV)), St. Jude Children’s Research Hospital (St. Jude), Novartis (GNF Malaria Box), and GlaxoSmithKline (GSK) Tres Cantos antimalarial set. All data sets were curated using the “Wash” function implemented in the Molecular Operating Environment (MOE2018.0101) software [64]. The structure of the studied compounds was represented by simplified molecular input line entry system (SMILES) notation, thus obtaining 20,364 unique molecules that are summarized in Table 1. The difference between initial compounds and unique compounds is due to the fact that during the data preparation (curation process), the duplicate compounds are eliminated, those that have positive or negative partial loads have neutralized their protonation states, the metals are disconnected, and the energy is minimized using the molecular mechanistic force field (MMFF94). The result of the data curation is the reduction of the initial number of molecules present in the databases evaluated in this work.

DatabasesInitial compoundsUnique compoundsSource
Natural Products Antimalarial (NPAs)13531312Databases of NP in house
DrugBank Version 5.0. (Drug Antimalarial)264https://www.drugbank.ca
European Bioinformatics Institute. (CHEMBL Drugs Indications) (Antimalarial activities2724[https://www.ebi.ac.uk/chembl]
Open Source Drug Discovery (OSDD) Malaria9388http://opensourcemalaria.org/
Malaria Box-Medicine of Malaria Venture (MMV)124124https://www.ebi.ac.uk/chembl/malaria/source
St. Jude Children’s Research Hospital’s1.4781.478https://www.ebi.ac.uk/chemblntd
Novartis-GNF Malaria Box4.8784.868Available in: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3941073/
Available in: https://www.ebi.ac.uk/chemblntd
GlaxoSmithKline Tres Cantos Antimalarial12.47012.466Open Source Malaria (GSK-TCMDC). Available in: https://www.ebi.ac.uk/chemblntd

Table 1.

Databases analyzed with chemoinformatic tools.

2.2 Molecular descriptors

The descriptors of physicochemical properties, hydrogen bond acceptors (HBAs), hydrogen bond donors (HBDs), number of rotatable bonds (NRBs), the octanol/water partition coefficient (logP), topological polar surface area (TPSA), and molecular weight (MW), or others such as molar refractivity, are important physicochemical parameters for quantitative structure-activity relationship (QSAR) analysis. These molecular descriptors are based on Lipinski’s rule and Verger’s rule regarding the prediction of the pharmacological similarity of orally active pharmacological potential [65, 66, 67]. The statistical analysis of the physicochemical properties was realized with RStudio Software 1.0.136 AGPL [68].

2.3 3D visualization of chemical space of compounds with antimalarial activity

PCAs were done with MOE software [64], and the dominant characteristics are expressed as covariance and visualized with the corresponding 2D or 3D graphic score plot with DataWarrior program v. 5.0 [69]. Figures 28 showed the distribution of different compounds with antimalarial activities in the chemical spaces.

Figure 2.

3D visualization of the chemical space of natural product databases.

Figure 3.

3D visualization of the chemical space of synthetic compounds.

Figure 4.

3D visualization of the chemical spaces of natural products and GNF DBs.

Figure 5.

3D visualization of the chemical spaces of natural products and TCMDC DBs.

Figure 6.

3D visualization of the chemical spaces of natural products and DBK DBs.

Figure 7.

3D visualization of the chemical spaces of natural products, OSM and St. Jude.

Figure 8.

3D visualization of the chemical spaces of all databases.

In Figures 28 we observed that NPs, drugs, and synthetic compounds occupy, in general, similar chemical space and are overlapping in most of the evaluated databases.

2.4 Molecular diversity based on fingerprints

Three binary molecular fingerprints were calculated with RStudio package rcdk: Extended connectivity fingerprints with diameter 4 (ECFP-4) for similarity searching, molecular access system (MACCS) keys of 166 bits for determining similarity and molecular diversity, and PubChem keys of 881 bits for encoding molecular fragment information [42, 43, 44]. The similarity of fingerprints by structural pairs of compounds was calculated with the Tanimoto coefficient and analyzed with the cumulative distribution function (CDF). This approach has been used to calculate, measure, and represent the molecular variety of compound data sets [23].

Figures 911 show the CDFs of the pairwise similarity of the different data sets evaluated with Tanimoto coefficient and ECPF-4, MACCS keys, and PubChem fingerprints, respectively.

Figure 9.

Curve for cumulative frequency distribution (CFD) based on ECFP-4.

Figure 10.

Curve for cumulative frequency distribution based on MACCS keys.

Figure 11.

Curve for cumulative frequency distribution based on PubChem.

Figure 12.

Cyclic system retrieval curves for all databases evaluated in this study.

Figures 911 provide information on the structural diversity of the six databases. Similar approach has been previously published [23]; the curves obtained with ECFP-4 did not prove to be a suitable fingerprint representation for these data sets. In the three similarity graphs based on fingerprints, it is shown that the database of natural products with antimalarial activity, OMS, and MMV has the lowest molecular diversity, while GSK DB was the most diverse.

In Tables 24, the statistical values of the pairwise Tanimoto similarity with the data sets analyzed are shown. In these tables, CHEMBL and DrugBank databases are excluded from our analysis, due to the small amount of data.

Similarity ECFP-4/Tanimoto coefficient
DBsMin.1st Qu.MedianMean3rd Qu.Max.
GSK0.017240.057890.088440.114900.122450.82353
NPs0.000000.078260.099100.105650.123891.00000
OSM0.000000.078260.099170.106070.123971.00000
MMV0.000000.078260.099240.106150.124031.00000
ST JUDE0.000000.081970.103450.109800.128571.00000
GNF0.000000.082090.103450.107720.127391.00000

Table 2.

The statistical values of the similarity of the Tanimoto coefficient with ECFP-4.

Similarity MACCS keys/Tanimoto coefficient
DBsMin.1st Qu.MedianMean3rd Qu.Max.
GSK0.078130.256820.333330.370090.455810.92683
NPs0.000000.344260.436360.446730.545451.00000
OSM0.000000.344830.436360.446930.545451.00000
MMV0.000000.344830.436360.446770.544121.00000
ST JUDE0.000000.333330.412500.423130.500001.00000
GNF0.000000.317460.394370.399990.476191.00000

Table 3.

The statistical values of the similarity of the Tanimoto coefficient with MACCS keys.

Similarity PubChem/Tanimoto coefficient
DBsMin.1st Qu.MedianMean3rd Qu.Max.
GSK0.081250.245000.375550.402630.540021.00000
NPs0.036840.322980.438020.461840.586211.00000
OSM0.036840.323400.439020.462530.587301.00000
MMV0.036840.324440.440330.463210.587911.00000
ST JUDE0.036840.382240.471430.476240.561951.00000
GNF0.000000.405980.481170.478000.554461.00000

Table 4.

The statistical values of the similarity of the Tanimoto coefficient with PubChem.

2.5 Molecular scaffolds: content and diversity

2.5.1 Scaffold content

Murcko scaffolds were calculated with the program Molecular Equivalent Indices (MEQI) [50, 51] and DataWarrior program [69]. MEQI has been used to obtain the codes corresponding to the chemotypes most frequently analyzed in the databases. [23, 45, 52, 53, 54, 55]. The distribution and diversity of the molecular scaffolds present in the data sets were calculated and analyzed using the cyclic system retrieval (CSR) curves [42]. These curves were obtained by plotting the fraction of scaffold and the fraction of compounds that contain cyclic systems [43, 44].

Table 5 indicates that the MMV DB (0.491) was the most diverse in scaffold content taken as reference the F50 values compared to the data set from GSK (0.183), NPs (0.168), and GNF (0.161), respectively. CSR curves on Figure 12 further confirm the relative scaffold variety of the eight databases. The analysis of area under curve (AUC) metrics associated with the CSR curves is reported in Table 5. The CSR curves showed that MMV has more variety in scaffold content with AUC value of 0.507. In contrast OSM, NPs, GNF, GSK, St. Jude, and CHEMBL were the least diverse (e.g., AUC scores of 0.745, 0.712, 0.705, 0.698, 0.655 and 0.607, respectively). The CSR curves provide information on the diversity of the most frequent scaffolds in all databases.

DBsNumber of Compounds (M)Unique chemotypes (N)FN/MNSINGFNSING/MFNSING/NSAUCF50
NPs12986290.48464000.30820.63590.71250.1685
DBK551.000051.00001.00000.48000.4000
CHEMBL24180.7500160.66670.88890.60720.3333
OSM89390.4382270.30340.69230.74530.1025
MMV1241220.98391200.96770.98360.50790.4918
St. JUDE9154790.52353250.35520.67850.65510.2474
GNF486032290.664426900.55350.83310.70540.1615
GSK12,46367030.537850090.40190.74730.69820.1837

Table 5.

Summary of the scaffold diversity of the eight databases analyzed in this work.

M = number of molecules in the BD, N = number of chemotypes or substructures, FN/M = chemotype diversity fraction, NSING = singleton number, FNSING/M = singleton fraction between total molecules, FNSING/N = fraction of singleton among total chemotypes, AUC = area under the curve, F50 = fraction of chemotype required to recover 50% of the molecules.

2.5.2 Shannon entropy (SE) and scaled Shannon entropy (SSE)

The Shannon entropy has been adapted to measure the scaffold diversity based on the (N) number of most recurrent scaffolds [70]. The scaled Shannon entropy is a normalized value that measures the most common chemotypes present in a database. Thus, SSE closer to 1 indicates higher scaffold diversity, while SSE closer to zero (0) indicates lower diversity. In this study, we calculated the SSE for values ranging from N = 10 to N = 40.

Figure 13 shows a histogram with the distribution of the 40 most populated scaffolds in NPAs. The histogram includes the corresponding chemotype code. The comparison of the scaffolds of the NPAs allowed the identification of the 68MBD chemotype as one of the most active compounds in this database.

Figure 13.

Scaled Shannon entropy of the most frequent scaffolds with values ranging from 10 to 40 in natural products.

2.5.3 Molecular complexity and flexibility

The structural descriptors used to quantify fraction of sp3 hybridized carbons (Fsp3) [23, 58, 63, 70], fraction of chiral centers (CCF) [23, 59, 63, 70], fraction of aromatic atoms (Faro-atm), globularity [60], principal moments of inertia (PMI), normalized principal moments of inertia ratio (NRP) [61, 62], molecular complexity, shape index of Kier, and molecular flexibility were calculated with DataWarrior program [69] and MOE 2018.0101 [64]. Figures 1419 showed the descriptors utilized to evaluate the complexity and the molecular flexibility.

Figure 14.

Distribution of the fraction of sp3 hybridized carbons in different databases.

Figure 15.

Distribution of the fraction of chiral centers in different databases.

Figure 16.

Distribution of the fraction of aromatic atoms (Faro-atm) in different databases.

Figure 17.

Shape index distribution of different databases.

Figure 18.

Distribution of the molecular flexibility in different databases.

Figure 19.

Distribution of the molecular complexity in different databases.

Tables 68 summarize the statistics of the distribution of Fsp3, FCC, and Faro-atm of NPs and reference data sets. These results indicate that the NP data set has the largest complexity molecular in Fsp3 (0.63) and CCF (0.16) and a low distribution of Faro-atm (0.67–0.78). In contrast, GNF, MMV, St. Jude, and GSK DBs are very similar in these three metrics with values between 0.25 and 0.37, 0.27 and 0.37, and 0.014 and 0.025, respectively. In contrast, the structural flexibility was evaluated with the index of form presenting all databases in the range of 0.41–0.58 indicating that many of the compounds present sphericity and intermediate molecular flexibility (data not presented).

Fraction of sp3 hybridized atoms (Fsp3)
DBsMin1qstmedianmean3qrtmaxdev.st
NPs0.0000.4810.6360.6560.8332.0000.254
CHEMBL0.1670.3420.5360.6210.6271.3330.374
MMV0.0000.1670.3000.3160.4020.8000.190
OSM0.0000.1740.2550.2770.3380.8930.145
DBK0.2500.4380.5190.4630.5450.5650.175
GNF0.0000.2270.3640.3770.5002.6670.207
STJUDE0.0000.2220.3330.3530.4711.1360.178
GSK0.0000.2500.3750.3720.5001.5000.180

Table 6.

Distribution of Fsp3 in different databases.

Fraction of chiral centers (CCF)
DBsmin1qstmedianmean3qrtmaxdev.st
NPs0.0000.0330.1390.1610.2670.6560.145
CHEMBL0.0000.0000.0360.1280.1410.5330.192
MMV0.0000.0000.0000.0140.0000.1110.028
OSM0.0000.0000.0000.0080.0000.2860.035
DBK0.0000.0000.0190.0200.0400.0430.024
GNF0.0000.0000.0000.0250.0400.5560.053
STJUDE0.0000.0000.0000.0240.0450.2170.037
GSK0.0000.0000.0000.0170.0340.5000.033

Table 7.

Distribution of FCC in different databases.

Fraction of aromatic atoms (Faro-atm)
DBsmin1qstmedianmean3qrtmaxdev.st
NPs0.0000.0000.3240.3410.6001.1330.294
CHEMBL0.0000.2990.5560.5090.6901.0910.321
MMV0.2610.6820.8260.8170.9561.4290.230
OSM0.0000.6770.7330.7860.8601.5000.232
DBK0.5380.5910.7330.7200.8620.8750.171
GNF0.0000.5220.6670.6700.8181.7140.235
STJUDE0.0000.5530.7120.7080.8571.5560.216
GSK0.0000.5710.7060.7130.8571.4000.208

Table 8.

Distribution of fraction of aromatic atoms.

The descriptors globularity, PMI, and NRP did not prove to be suitable metrics to measure and differentiate the molecular complexity in the data sets evaluated. This is because the corresponding values computed for all data sets were very low (close to zero) and did not differentiate the data sets (data not shown). The large molecular complexity of NPs measured is in agreement with previous studies using similar metrics [23, 63, 71].

Advertisement

3. Activity landscape modeling

The methods of modeling the landscape based on properties of the compounds (property landscape modeling (PLM)) is at the interface between experimental sciences and computational chemistry, being a frequent strategy to systematically describe the structure-property relationships (SPR) of the compound data set [72]. PLM have been used in medicinal chemistry in the stages of drug discovery with a quantitative, descriptive, and statistical approach to activity cliffs [72, 73, 74]. Structure-activity relationships (SARs), using the concept of modeling the activity landscape (activity landscape modeling ALM), are an increasing common practice in the drug discovery process to identify the activity cliffs, guide the optimization of compound hits, and to avoid the deleterious effects of the activity cliffs in the studies of the classic models of QSAR and in the search of structural similarity. In this research we analyze, through the web tool Activity Landscape Plotter (ALP) [72], a set of data from NPs from Panama with antimalarial activity against four strains of Plasmodium falciparum in the erythrocyte gametocyte stage (Figures 20 and 24).

Figure 20.

Structural similarity compared with activity cliffs in NPAs.

Figure 21.

Structural similarity compared with activity cliffs in GSK and Novartis (GNF).

Figure 22.

SAS maps of compounds with antimalarial activity ((a), (b), and (c)) through the web tool activity landscape plotter.

Figure 23.

DAS map with MACCS key fingerprint.

Figure 24.

Antimalarial compounds in NPs from Panama.

The generation and comparison of structure-activity pairs, by structure-activity similarity maps (SAS map). The SAS map has been used to link up structure and biological activity, based on a systematic pairwise comparison of all the compounds in a data set analyzed. We compare the values of structure-activity similarity, the activity difference, and structure-activity landscape index (SALI) to find the pairs of compounds with high molecular similarity and the activity difference that are located in the upper right quadrant of the SAS map (activity cliffs) [72, 73, 74, 75, 76]. Figures 1721 show SAS map in NP of Panama, NP published, GSK, and GNF. In SAS maps, data points are colored by density (Figure 22).

The SAS maps using the molecular fingerprints EFCP-4, MACCS keys, and PubChem led to the identification of a total of 26 pairs of compounds with structure-activity similarity ratios >0.50 and structure-activity landscape index values varying between 0.3 and 5.0. The web application Activity Landscape Plotter [72] is a tool that allows us to perform QSAR. The SAS generated represent 55 natural products isolated in Panama with antimalarial activity which were analyzed and compared the biological activities against strains of Plasmodium falciparum sensitive, resistant and multiresistant. The analysis with the parameters the (SAS / Tanimoto index / ECFP-4), a total of twenty-six pairs of compounds showed similarity values greater than 70%, sixteen pairs greater than 80% and only two pairs of compounds gave a similarity greater than 85%. While with activity cliffs, only three pairs of compounds show structural similarity correlated with the values of pIC50 activity [72, 77].

SAS maps are color-coded according to their intensity and we observe that most pairs of compounds with antimalarial activity show an intense red color. Analyzed are located in the region of little structural similarity, indicating that the natural products have high structural diversity and low difference in activity, attributed to having similar functional groups in their molecules.

DAS maps represent the pairwise activity differences for each possible pair of compounds in an evaluated data set, against two biological targets. These maps permitted to differentiate if a structural modification can increase or decrease the activity under one target or other (Figure 23).

With this web application, we have carried out a QSAR study in a fast, simple, and easily interpretable way, obtaining three natural products as leading computational compounds for their optimization as Plasmodium falciparum blockers, which exhibit a gametocidal activity [78] (Figure 24).

Advertisement

4. Conclusion

The chemoinformatic analysis of the 20,364 compounds (1312 NPs and 19,052 synthetic (MMV, OSM, GNF, St. Jude, GSK, CHEMBL, and DrugBank)) indicates that so many natural products and synthetic products (S) share the same chemical space showing molecules that have similar structural properties. NPs present a greater diversity based on fingerprint than the synthetic compounds. Also, NPs have a higher proportion of chiral carbons and atoms with sp3 hybridization and greater complexity, while synthetic products contain a greater proportion of aromatic atoms. Finally, concerning the properties related to cyclicity, relative shape, and flexibility, all have very similar values, which could explain the antimalarial activity of computationally determined compound hits in this work against Plasmodium falciparum-sensitive (3D7, D6, poW, D10) and chloroquine-resistant strains (W2, Dd).

Advertisement

Acknowledgments

The DAO acknowledges the SNI 2018 awards from SENACYT of Panama.

Advertisement

Conflict of interest

The authors declare that there are no financial or commercial conflicts of interest.

References

  1. 1. Newman DJ, Cragg GM. Natural products as sources of new drugs from 1981 to 2014. Journal of Natural Products. 2016;9:629-661. DOI: 10.1021/acs.jnatprod.5b01055
  2. 2. Newman DJ, Cragg GM. Natural products as sources of new drugs over the 30 years from 1981 to 2010. Journal of Natural Products. 2016;75:311-335. DOI: 10.1021/np200906s
  3. 3. Newman DJ, Cragg GM. Natural products as sources of new drugs over the last 25 years. Journal of Natural Products. 2007;70:461-477. DOI: 10.1021/np068054v
  4. 4. Thomford NE, Senthebane DA, Rowe A, Munro D, Seele P, Maroyi A, et al. Natural products for drug discovery in the 21st century: Innovations for novel drug discovery. International Journal of Molecular Sciences. 2018;19:1578. DOI: 10.3390/ijms19061578
  5. 5. Gurnani N, Mehta D, Gupta M, Mehta BK. Natural products: Source of potential drugs. African Journal of Basic & Applied Sciences. 2014;6:171-186. DOI: 10.5829/idosi.ajbas.2014.6.6.21983
  6. 6. Hong J. Role of natural product diversity in chemical biology. Current Opinion in Chemical Biology. 2011;15:350-354. DOI: 10.1016/j.cbpa.2011.03.004
  7. 7. Schreiber SL. Organic chemistry: Molecular diversity by design. Nature. 2009;457:153-154. DOI: 10.1038/457153a
  8. 8. Schneider G, Grabowski K. Properties and architecture of drugs and natural products revisited. Current Chemical Biology. 2007;1:115-127. DOI: 10.2174/2212796810701010115
  9. 9. Cragg GM, Newman DJ. Natural products: A continuing source of novel drug leads. Biochimica et Biophysica Acta. 2013;1830:3670-3695. DOI: 10.1016/j.bbagen.2013.02.008
  10. 10. Sen S, Prabhu G, Bathula C, Hati S. Diversity-oriented asymmetric synthesis. Synthesis. 2014;46:2099-2121. DOI: 10.1055/s-0033-1341247
  11. 11. van Hattum H, Waldmann H. Biology-oriented synthesis: Harnessing the power of evolution. Journal of the American Chemical Society. 2014;136:11853-11859. DOI: 10.1021/ja505861d
  12. 12. Welsch ME, Snyder SA, Stockwell BR. Privileged scaffolds for library design and drug discovery. Current Opinion in Chemical Biology. 2010;14:347-361. DOI: 10.1016/j.cbpa
  13. 13. Wetzel S, Bon RS, Kumar K, Waldmann H. Biology-oriented synthesis. Angewandte Chemie (International Ed. in English). 2011;50:10800-10826. DOI: 10.1002/anie.201007004
  14. 14. Ertl P, Roggo R, Schuffenhauer A, Natural Product-likeness A. Score and its application for prioritization of compound libraries. Journal of Chemical Information and Modeling. 2008;48:68-74. DOI: 10.1021/ci700286x
  15. 15. Mang C, Jakupovic S, Schunk S, Ambrosi H-D, Schwarz O, Jakupovic J. Natural products in combinatorial chemistry: An andrographolide-based library. Journal of Combinatorial Chemistry. 2006;8(2):268-274. DOI: 10.1021/cc050143n
  16. 16. Wach JY, Gademann K. Reduce to the maximum: Truncated natural products as powerful modulators of biological processes. Synlett. 2012;23:163-170. DOI: 10.1055/s-0031-1290125
  17. 17. Feher M, Schmidt JM. Property distribution: Differences between drugs, natural products, and molecule from combinatorial chemistry. Journal of Chemical Information and Modeling. 2003;43:218-227. DOI: 10.1021/ci0200467
  18. 18. Hert J, Irwin JJ, Laggner C, Keiser MJ, Shoichet BK. Quantifying biogenic bias in screening libraries. Nature Chemical Biology. 2009;5:479-483. DOI: 10.1038/nchembio.180
  19. 19. Schenone M, Dancik V, Wagner BK, Clemons PA. Target identification and mechanism of action in chemical biology and drug discovery. Nature Chemical Biology. 2013;9:232-240. DOI: 10.1038/nchembio.1199
  20. 20. Baell JB. Feeling nature’s PAINS: Natural products, natural product drugs, and pan assay interference compounds (PAINS). Journal of Natural Products. 2016;79(3):616-628. DOI: 10.1021/acs.jnatprod.5b00947
  21. 21. Egieyeh S, Syce J, Christoffels A, Malan SF. Exploration of scaffolds from natural products with antiplasmodial activities, currently registered antimalarial drugs and public malarial screen data. Molecules. 2016;21:104. DOI: 10.3390/molecules21010104
  22. 22. Pilón-Jiménez B, Saldivar-Gonzalez F, Díaz-Eufracio BI, Medina-Franco JL. BIOFACQUIM: A mexican compound database of natural products. Biomolecules. 2019;9:31. DOI: 10.3390/biom9010031
  23. 23. Olmedo DA, González-Medina M, Gupta MP, Medina-Franco JL. Cheminformatic characterization of natural products from Panama. Molecular Diversity. 2017;21(4):779-789. DOI: 10.1007/s11030-017-9781-4
  24. 24. Pilon AC, Valli M, Dametto AC, MEF P, Freire RT, Castro-Gamboa I, et al. NuBBEDB: An updated database to uncover chemical and biological information from Brazilian biodiversity. Scientific Reports. 2017;7:7215-1-7215-12. DOI: 10.1038/s41598-017-07451-x
  25. 25. Valli M, dos Santos RN, Figueira LD, Nakajima CH, Castro-Gamboa I, Andricopulo AD, et al. Development of a natural products database from the biodiversity of Brazil. Journal of Natural Products. 2013;76(3):439-444. DOI: 10.1021/np3006875
  26. 26. Ntie-Kang F, Telukunta KK, Döring K, Simoben CV, Moumbock AF, Malange YI, et al. NANPDB: A resource for natural products from northern african sources. Journal of Natural Products. 2017;80(7):2067-2076. DOI: 10.1021/acs.jnatprod.7b00283
  27. 27. Chen CY-C. TCM database@Taiwan: The world's largest traditional chinese medicine database for drug screening in silico. PLoS ONE. 2011;6(1):e15939. DOI: 10.1371/journal.pone.0015939
  28. 28. Ye H, Ye L, Kang H, Zhang D, Tao L, Tang K, et al. HIT: Linking herbal active ingredients to targets. Nucleic Acids Research. 2011;39:D1055-D1059. DOI: 10.1093/nar/gkq1165
  29. 29. Mangal M, Sagar P, Singh H, Raghava GPS, Agarwal SM. NPACT: Naturally occurring plant-based anti-cancer compound-activity-target database. Nucleic Acids Research. 2013;41:D1124-D1129. DOI: 10.1093/nar/gks1047
  30. 30. Bhalerao SA, Verna DR, D’Souza LR, Teli NC, Didwana VS. Chemoinformatics: The application of informatic methods to solve chemical problem. Research Journal of Pharmaceutical, Biological and Chemical Sciences. 2007;4(3):475-499
  31. 31. Wenderski TA, Stratton CF, Bauer RA, Kopp F, Tan DS. Principal component analysis as a tool for library design: A case study investigating natural products, brand-name drugs, natural product-like libraries, and drug-like libraries. Methods in Molecular Biology. 2015;1263:225-242. DOI: 10.1007/978-1-4939-2269-7_18
  32. 32. Medina-Franco JL, Mayorga-Martínez K, Giulianotti MA, Houghten RA, Pinilla C. Visualization of the chemical space in drug discovery. Current Computer-Aided Drug Design. 2008;4:322-333. DOI: 10.2174/157340908786786010
  33. 33. Osolodkin DI, Radchenko EV, Orlov AA, Voronkov AE, Palyulin VA, Zefirov NS. Progress in visual representations of chemical space. Expert Opinion on Drug Discovery. 2015;10(9):959-973. DOI: 10.1517/17460441
  34. 34. Ringner M. What is principal component analysis? Nature Biotechnology. 2018;26:303-304. DOI: 10.1038/nbt0308-303
  35. 35. Jolliffe I. Principal component analysis. In: Everitt BS, Howell DC, editors. Encyclopedia of Statistics in Behavioral Science. Vol. 3. Aberdeen, Chichester, UK: University of Aberdeen, John Wiley and Sons, Ltd; 2005. pp. 1580-1584. DOI: 10.1002/0470013192.bsa501
  36. 36. Clemons PA, Wilson JA, Dančík V, Muller S, Carrinski HA, Wagner BK, et al. Quantifying structure and performance diversity for sets of small molecules comprising small-molecule screening collections. Proceedings of the National Academy of Sciences of the United States of America. 2011;108:6817-6822. DOI: 10.1073/pnas.1015024108
  37. 37. Djuric SW, Akritopoulou-Zanze I, Cox PB, Galasinski S. Compound collection enhancement and paradigms for high-throughput screening-an update. Annual Reports in Medicinal Chemistry. 2010;45:409-428
  38. 38. Rogers D, Hahn M. Extended-connectivity fingerprints. Journal of Chemical Information and Modeling. 2010;50:742-754. DOI: 10.1021/ci100050t
  39. 39. Medina-Franco JL, Martínez-Mayorga K, Peppard TL, Del Rio A. Chemoinformatic analysis of GRAS (generally recognized as safe) flavor chemicals and natural products. PLoS One. 2012;7(11):e50798. DOI: 10.1371/journal.pone.0050798
  40. 40. Durant JL, Leland BA, Henry DR, Nourse JG. Reoptimization of MDL keys for use in drug discovery. Journal of Chemical Information and Computer Sciences. 2002;42(6):1273-1280
  41. 41. Bolton EE, Wang Y, Thiessen PA, Bryant SH. PubChem: Integrated platform of small molecules and biological activities. Annual Reports in Computational Chemistry. 2008;31(4):217-241
  42. 42. Rogers D, Brown R, Hahn M. Using extended-connectivity fingerprints with Laplacian-modified Bayesian analysis in high-throughput screening. Journal of Biomolecular Screening. 2005;10:682-686. DOI: 10.1177/1087057105281365
  43. 43. Hert J, Willet P, Wilton DJ, Acklin P, Azzaoui K, Jacoby E, et al. Comparison of topological descriptors for similarity-based virtual screening using multiple bioactive reference structures. Organic and Biomolecular Chemistry. 2004;2:3256-3266. DOI: 10.1039/B409865J
  44. 44. Barnard JM, Downs GM. Chemical fragment generation and clustering software. Journal of Chemical Information and Computer Sciences. 1997;37:141-142. DOI: 10.1021/ci960090k
  45. 45. González-Medina M, Prieto-Martínez FD, Owen JR, Medina-Franco JL. Consensus diversity plots: A global diversity analysis of chemical libraries. Journal of Cheminformatics. 2016;8:63. DOI: 10.1186/s13321-016-0176-9
  46. 46. Bajusz D, Rácz A, Károly Héberger K. Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? Journal of Cheminformatics. 2015;7:20. DOI: 10.1186/s13321-015-0069-3
  47. 47. Owen JR, Nabney IT, Medina-Franco JL, López-Vallejo F. Visualization of molecular fingerprints. Journal of Chemical Information and Modeling. 2011;51:1552-1563. DOI: d10.1021/ci1004042
  48. 48. Skinnider MA, Dejong CA, Franczak BC, McNicholas PD, Nathan A, Magarvey NA. Comparative analysis of chemical similarity methods for modular natural products with a hypothetical structure enumeration algorithm. Journal of Cheminformatics. 2017;9:46. DOI: 10.1186/s13321-017-0234-y
  49. 49. Medina-Franco JL. Discovery and development of lead compounds from natural sources using computational approaches. In: Mukherjee P, editor. Evidence-Based Validation of Herbal Medicine. Amsterdam, The Netherlands: Elsevier; 2015. pp. 455-475
  50. 50. Xu Y-J, Johnson M. Using molecular equivalence numbers to visually explore structural features that distinguish chemical libraries. Journal of Chemical Information and Computer Sciences. 2002;42:912-926. DOI: 10.1021/ci025535l
  51. 51. Xu Y-J, Johnson M. Algorithm for naming molecular equivalence classes represented by labeled pseudographs. Journal of Chemical Information and Computer Sciences. 2001;41:181-185. DOI: 10.1021/ci0003911
  52. 52. Lopez-Vallejo F, Castillo R, Yepez-Mulia L, Medina-Franco JL. Benzotriazoles and indazoles are scaffolds with biological activity against Entamoeba histolytica. Journal of Biomolecular Screening. 2011;16:862-868. DOI: 10.1177/1087057111414902
  53. 53. Hu Y, Bajorath J. Quantifying the tendency of therapeutic target proteins to bind promiscuous or selective compounds. PLoS ONE. 2015;10:e0126838. DOI: 10.1371/journal.pone.0126838
  54. 54. Lipkus AH, Yuan Q , Lucas KA, Funk SA, Bartelt WF 3rd, Schenck RJ, et al. Structural diversity of organic chemistry. A scaffold analysis of the CAS registry. The Journal of Organic Chemistry. 2008;73:4443-4451. DOI: 10.1021/jo8001276
  55. 55. Krier M, Bret G, Rognan D. Assessing the scaffold diversity of screening libraries. Journal of Chemical Information and Modeling. 2006;46:512-524. DOI: 10.1021/ci050352v
  56. 56. Medina-Franco JL, Martínez-Mayorga K, Bender A, Scior T. Scaffold diversity analysis of compound data sets using an entropy-based measure. QSAR and Combinatorial Science. 2009;28:1551-1560. DOI: 10.1002/qsar.200960069
  57. 57. Yongye AB, Waddell J, Medina-Franco JL. Molecular scaffold analysis of natural products databases in the public domain. Chemical Biology & Drug Design. 2012;80:717-724. DOI: 10.1111/cbdd.12011
  58. 58. Todeschini R, Consonni V. Molecular Descriptors for Chemoinformatics. Germany: Wiley-VCH; 2009
  59. 59. Clemons PA, Bodycombe NE, Carrinski HA, Wilson JA, Shamji AF, Wagner BK, et al. Small molecules of different origins have distinct distributions of structural complexity that correlate with protein-binding profiles. Proceedings of the National Academy of Sciences of the United States of America. 2010;107:18787-18792. DOI: 10.1073/pnas.1012741107
  60. 60. Meyer AY. Molecular mechanics and molecular shape. III. Surface area and cross-sectional areas of organic molecules. Journal of Computational Chemistry. 1986;7:144-152. DOI: 10.1002/jcc.540070207
  61. 61. Sauer WHB, Schwarz MK. Molecular shape diversity of combinatorial libraries: A prerequisite for broad bioactivity. Journal of Chemical Information and Computer Sciences. 2003;43:987-1003. DOI: 10.1021/ci025599w
  62. 62. Rolfe A, Lushington GH, Hanson PR. Reagent based DOS: A ‘click, click, cyclize strategy to probe chemical space. Organic & Biomolecular Chemistry. 2010;8:2198-2203. DOI: 10.1039/b927161a
  63. 63. Méndez-Lucio O, Medina-Franco JL. The many roles of molecular complexity in drug discovery. Drug Discovery Today. 2017;22:120-126. DOI: 10.1016/j.drudis.2016.08.009
  64. 64. Molecular Operating Environment (MOE). 2018.0101. 910-1010 Sherbrooke, St. W. Montreal, QC H3A 2R7; Canada: Chemical Computing Group, Corporate Headquarters Montreal
  65. 65. Lipinski CA, Lombardo F, Dominy BW, Feeney PJ. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Advanced Drug Delivery Reviews. 2001;46(1-3):3-26. DOI: 10.1016/S0169-409X(00)00129-0
  66. 66. Veber DF, Johnson SR, Cheng HY, Smith BR, Ward KW, Kopple KD. Molecular properties that influence the oral bioavailability of drug candidates. Journal of Medicinal Chemistry. 2002;45(12):2615-2623. DOI: 10.1021/jm020017n
  67. 67. Leeson PD, Springthorpe B. The influence of drug-like concepts on decision-making in medicinal chemistry. Nature Reviews. Drug Discovery. 2007;6(11):881-890. DOI: 10.1038/nrd2445
  68. 68. RStudio Team. RStudio: Integrated Development for R. Boston: RStudio, Inc.; 2015. Available form: http://www.rstudio.com/
  69. 69. Sander T, Freyss J, Von Korff M, Rufener C. Datawarrior: An open-source program for chemistry aware data visualization and analysis. Journal of Chemical Information and Modeling. 2015;55:460-473. DOI: 10.1021/ci500588j
  70. 70. Godden JW, Bajorath J. Analysis of chemical information content using Shannon entropy. In: Lipkowitz KB, Cundari TR., editors. Reviews in Computational Chemistry. Hoboken: John Wiley & Sons, Inc.; 2007. pp. 263-289. DOI: 10.1002/9780470116449.ch5
  71. 71. Lovering F, Bikker J, Humblet C. Escape from flatland: Increasing saturation as an approach to improving clinical success. Journal of Medicinal Chemistry. 2009;52:6752-6756. DOI: 10.1021/jm901241
  72. 72. González-Medina M, Prieto-Martínez FD, Naveja J, Méndez-Lucio O, El-Elimat T, Pearce CJ, et al. Chemoinformatic expedition of the chemical space of fungal products. Future Medicinal Chemistry. 2016;8:1399-1412. DOI: 10.4155/fmc-2016-0079
  73. 73. González-Medina M, Méndez-Lucio O, Medina-Franco JL. Activity landscape plotter: A web-based application for the analysis of structure−activity relationships. Journal of Chemical Information and Modeling. 2017;57:397-402
  74. 74. Cruz-Monteagudo M, Medina-Franco JL, Pérez-Castillo Y, Nicolotti O, Cordeiro MNDS, Borges F. Activity cliffs in drug discovery: Dr Jekyll or Mr Hyde? Drug Discovery Today. 2014;19(8):1069-1080
  75. 75. Bajorath J. Modeling of activity landscapes for drug discovery. Expert Opinion on Drug Discovery. 2012;7(6):463-473
  76. 76. Garcia-Sanchez MO, Cruz-Monteagudo M, Medina-Franco JL. Challenges and advances in computational chemistry on physics quantitative structure-epigenetic activity relationship. In: Lezcznski J, Roy K, editors. Advances in QSAR Modeling: Application in Pharmaceutical, Chemical, Foods, Agricultural and Environmental Science, 24. Gewerbestrasse 11, 6330 Cham, Switzerland: Springer Nature. Springer International Publishing AG; 2017. pp. 303-338
  77. 77. Medina-Franco JL. Scanning structure–activity relationships with structure–activity similarity and related maps: From consensus activity cliffs to selectivity switches. Journal of Chemical Information and Modeling. 2012;52(10):2485-2493. DOI: 10.1021/ci300362x
  78. 78. Kiszewski AE. Blocking plasmodium falciparum malaria transmission with drugs: The gametocytocidal and sporontocidal properties of current and prospective antimalarials. Pharmaceuticals. 2011;4(1):44-68. DOI: 10.3390/ph4010044

Written By

Dionisio A. Olmedo and José L. Medina-Franco

Submitted: 28 May 2019 Reviewed: 07 June 2019 Published: 31 July 2019