Summary of some small molecule libraries currently available for anticancer drug discovery.
Computer modeling of natural products (NPs) and NP scaffolds is increasingly gaining importance in drug discovery, particularly in hit/lead discovery programs and at the lead optimization stage. Even though industry had lost interest in the implication of NPs in hit/lead searches, recent reports still show that computer modeling could be a useful assert for the identification of starting scaffolds from nature, which could be further exploited by synthetic modifications. In this chapter, the focus is on some useful tools for computer modeling aimed at the discovery of anticancer drugs from NP scaffolds. We also focus on some recent developments toward the identification of potential anticancer agents by the application of computer modeling. The chapter will lay emphasis on natural sources of anticancer compounds, present some useful databases and computational tools for anticancer drug discovery, and show some recent case studies of the application of computational modeling in anticancer drug discovery, as well as some success stories in virtual screening applications in anticancer drug discovery, highlighting some useful results on the application of on lead discovery (including promising NP scaffolds) against an interesting anticancer drug target, the protein kinase C-related kinase (PRK1).
- molecular modeling
- natural products
- virtual screening
1.1. Cancer and natural products
Cancer is one of the most feared causes of death, as it represents several disease forms and treatments possibilities are still limited for late stages of the disease . Among the known drugs for cancer treatment, camptothecin, vinblastine, vincristine, podophyllotoxin, and taxol are of natural origin . Nature is known to be an immense repository of natural products (NPs), constituting the source of about half of the anticancer drugs currently in the market . Inspite of the drop in interest for NPs in drug discovery projects, from an industrial point of view, recent reports still show that NPs could constitute a useful assert for the identification of starting scaffolds for further discovery [4–6]. The quest for anticancer drugs of natural origin or with NP scaffolds has resulted in the development of NP databases, the most promising one being the naturally occurring plant-based anti-cancer compound activity-target database (NPACT), with ~1500 NPs, including experimentally verified in vitro and in vivo biological activities (in the form of IC50s, ED50s, EC50s, GI50s, etc.), along with physical, elemental, and topological properties of the compounds, the tested cancer types, cell lines, protein targets, commercial suppliers, and drug-likeness classification for each compound .
1.2. Prostate cancer
Among the many diverse cancer forms, prostate cancer is the second leading cause of cancer deaths in men worldwide . Two different types of prostate cancer were identified: androgen-dependent prostate cancer and androgen-independent prostate cancer . Androgen hormones (testosterone or dihydrotestosterone) are known to activate the androgen receptors located in the cell nucleus. The main function of these receptors is to modify gene expressions, thus controlling several biological activities in the cells, including cell growth and differentiation, development, and function of male reproductive and accessory sex tissues [10, 11]. It was found that the androgen receptors signaling pathway plays an important role in the progress and development of prostate cancer. In the first stage of androgen-dependent prostate cancer, the survival and growth of the cancer cells are mainly dependent on androgen hormones . The initial treatment of androgen-dependent cancer, based on androgen ablation, is called hormone therapy. This procedure aims to stop the cell growth of cancer cells, which, in most cases, respond to this therapy. The recurring prostate cancer cells from hormone therapy would further not respond to androgen ablation, thus leading to the development of androgen-independent prostate cancer, which can further progress to metastasis . The molecular mechanism of tumor recurrence is not completely clear. For the second stage of prostate cancer, there is no efficient therapy available.
1.3. Protein kinase C–related kinase
Protein kinase C–related kinase (PRK1, also known as PKN1) is a serine/threonine kinase known to play a role in controlling the activity of androgen receptors in prostate cancer. It was shown that the activation of PRK1 stimulates the activity of androgen receptors and is involved in tumorigenesis . In 2008, Schüle et al. showed that PRK1 phosphorylates histone H3 upon ligand-dependent recruitment to androgen receptor target genes . The phosphorylation of histone H3 at threonine 11 (H3T11) increases demethylation of Lys-9 by Jumonji C (JmjC)-domain-containing protein (JMJD2C), which promotes androgen receptor-dependent gene expression and tumor cell proliferation . Additionally, PRK1 may directly phosphorylate JMJD2C, thereby stimulating its activity . Meanwhile, the role of PRK1 in androgen-independent prostate cancer is unknown. PRK1 can be activated by the Rho family GTPases, thus mediating several processes related to the migration and cancer cell invasion and consequently playing a major role in the formation of metastases [15, 16]. Thus, PRK1 is considered to be a promising therapeutic target, and the discovery of novel potent and selective inhibitors could supply a meaningful tool for the treatment of prostate cancer. On the other hand, the discovery of selective and potent inhibitors would provide a tool for understanding particular biological roles of PRK1. In spite of the importance of PRK1 in the targeted therapy of cancer, only a few known inhibitors have been identified. The known PKC inhibitors (Figures 1 and 2) include staurosporine (PubChem CID: 44259) and its analogue (Ro-318220; PubChem CID: 5083), bisindolylmaleimide I (BIM I; PubChem CID: 2396), and lestaurtinib (PubChem CID: 126565), as well as the nonselective Akt inhibitor GSK-690693 (PubChem CID: 16725726) and Pfizer’s JAK nonselective inhibitor CP-690550 (tofacitinib; PubChem CID: 9926791) [17, 18]. In a previous work, several novel PRK1 inhibitors were identified containing different scaffolds, using a homology model, with varying potencies . In the present work, the focus is to search for new small molecules and natural products, which could inhibit PRK1 by using the recently published crystal structures.
1.4. Structural analysis of PRK1
PRK1 kinase belongs to the protein kinase C (PKC) superfamily and was first identified in 1994 from a human hippocampal cDNA library . Three isoforms are found in mammals (called PRK1, PRK2, and PRK3). They possess different enzymatic properties and are distributed among different tissues . The PRK1 structure is divided into three conserved regions:
an N-terminal lobe (which includes a regulatory region containing three homologous stretches and is rich in charged amino acids),
an auto-inhibitory domain called C2-like region (which is sensitive to arachidonic acids), and
a C-terminal lobe (which contains the catalytic domains or called kinase domain).
Both lobes are connected by a hinge region. The catalytic domain is located between both terminal lobes and shows high conservation and similarity to the PKC family kinase domain [21, 22]. Moreover, PRK1 and PKC are members of AGC kinase family . The characteristic feature of these kinases is a C-terminal regulatory region (C-tail). The C-tails regulate the enzymatic activity and insert conserved phenylalanine residues into the ATP-binding site [23–25].
1.5. Computational approaches in drug discovery
Virtual screening (VS) was first used at the end of the last century to refer to computational algorithms and techniques used to identify novel hit/lead compounds for biological target from large chemical libraries, depending on the known structure or the drug target or active ligands [26–29]. The VS approach is considered as a complementary approach to experimental or physical screening or high-throughput screening (HTS). VS has been fully integrated to most modern day drug discovery projects . However, VS does not require the physical existence of the compounds to be tested. It is only based on computer models or the three-dimensional (3D) structures of the drug target, also known as structure-based VS (SBVS) or on the chemical structure of known active ligand(s), also known as ligand-based VS (LBVS) . The fundamentals of the SBVS approach depend on the availability of the 3D structure of the biological target and a database of small molecules. The SBVS approach often uses molecular docking to generate the protein-ligand complexes of small molecules from a database into the target active site. The aim is to identify the compounds, which interact favorably with the target binding site . Meanwhile, LBVS methods utilize chemical similarity analysis of structurally diverse or known active ligands, with the view of identifying novel small molecules, which could show similar biological activities [33–35]. However, both approaches have practical limitations. Therefore, researchers often combine SBVS and LBVS, with the aim that this might improve the efficiency of the VS results [36, 37]. Computational approaches for the prediction of biological activity also include quantitative structure-activity relationships (QSAR) analysis, which is aimed at finding a correlation between predicted and experimental biological activities of small molecules. This approach incorporates the influence of the molecular descriptors. QSAR models can be used initially to predict the activity of untested hits from a virtual screening campaign . On other hand, QSAR modeling is useful tool during lead optimization, which is aimed at the improvement of the biological activities of the identified hits . Recently, many crystal structures of the PRK1 drug target have been published . Due to the flexibility of the biding site of the PRK1 structure, which adapts to different conformations, each protein-ligand complex was individually analyzed to reveal the important intermolecular interactions responsible for ligand binding that are useful for the identification new hits. The aim of this research project is to analyze ligand binding to PRK1 and to identify novel-specific inhibitors that could block the activity of PRK1, using structure-based VS. In a preliminary study, the applied docking methods were validated and used to investigate its ability to predict the experimental conformations of the co-crystallized ligands. Second, several QSAR models were generated in order to find the significant correlations between the computational and experimental biological activity data. The constructed QSAR models contain different scoring functions, including computed binding-free energy (BFE) values for the protein-ligand complexes, in addition to molecular descriptors of the isolated ligands. To investigate the predictive ability of the generated QSAR models, internal and external methods were performed. Moreover, an enrichment study was performed, in order to assess the ability of the scoring functions to identify known binders or actives in large databases containing actives and inactives. By focusing on kinase data sets, GSK data sets 1 and 2 were screened to search for novel lead compounds, employing virtual screening methods. A NP library including compounds isolated from African flora was also screened. This chapter begins by presenting database tools (small molecule libraries) useful in VS campaigns, with a focus on small molecule libraries and NPs with anticancer properties. This is followed by some recent success stories on VS for the identification of inhibitors and/or modulators of some anticancer drug targets. The end of the chapter shows the case study of VS for the identification of PRK1 inhibitors.
2. Databases of small molecule libraries and some recent success stories in anticancer drug discovery using computer-based methods
2.1. Small molecule databases for virtual screening
A summary of small molecule libraries utilizable in virtual screening experiments has been provided in Table 1. Known cancer drugs have been included in several databases, including ChEBI , ChEMBL , DrugBank , EpiDBase , NANPDB , NCI-DIS, NCI-DTP, and NCI-FDA , along with SANCDB , SuperDrug , p-ANAPL , PubChem , and ZINC15 . However, those with tested and proven in vitro activities against known cancer cell lines and/or with known in vivo activities only include CancerDR , OAA , and SYFPEITHI , along with the NP libraries: AfroCancer , CancerHSP , CHMIS-C , InPACdb , MAPS , NPACT , and TIPdb .
|AfroCancer||A data set of natural anticancer products from African flora|||
|Cancer drug resistance database (CancerDR)||A database of 148 anticancer drugs and their effectiveness against around 1000 cancer cell lines|||
|CancerHSP||An anticancer herbs database of systems pharmacology|||
|ChEBI||A database for chemical entities of biological interest|||
|ChEMBL||An open large-scale bioactivity compound database|||
|CHMIS-C||A herbal medicines database for cancer|||
|DrugBank||A resource that combines detailed drug data with comprehensive drug target information|||
|DUDE datasets||Benchmarking data sets of actives and decoys for diverse targets, including proteins, GPCRs and ion channels, clustered ligands, etc. drawn from ChEMBL|||
|EpiDBase||A database for small molecule epigenetic modulators|||
|InPACdb||Indian plant anticancer compounds database|||
|MAPS Database||A database of phytochemicals, including the data of >500 medicinal plants|||
|NANPDB||A natural products database for compounds of Northern African origin, with a significant number of bioactive metabolites exhibiting anticancer activity|||
|NCI Drug Information System (DIS)||A searchable database of 3D structures (mainly organic compounds) from the NCI Drug Information System (DIS)|||
|NCI-DTP database||Tested compounds from the National Cancer Institute (NCI) Developmental Therapeutics Program (DTP)|||
|NCI-FDA||Several sets of FDA-approved anticancer drugs to enable cancer research|||
|NPACT||A database for plant-based anticancer compounds|||
|Oral anticancer agents (OAA) from Singapore||A database of 39,772 oral anticancer agents prescribed to 8837 patients in Singapore, with 55 clinically significant drug-drug interactions for the evaluation of drug interaction facts|||
|SANCDB||A database of natural products from South Africa, containing also anticancer agents from the flora and fauna of this ecologically diverse country|||
|SuperDrug||A database of 3D-structures of active ingredients of essential marketed drugs|||
|SYFPEITHI||A database of major histocompatibility complex (MHC) ligands and peptide motifs|||
|TIPdb||A database of anticancer, antiplatelet, and antituberculosis phytochemicals from indigenous plants in Taiwan|||
|p-ANAPL||A collection of samples of natural products from African sources|||
|PubChem||A public repository for information on chemical substances and their biological activities|||
|ZINC15||A free database of over 100 million purchasable compounds for virtual screening|||
2.2. Some recent success stories
Computational modeling has been applied to understand drug-target interactions in several validated anticancer drug targets, for the identification of novel inhibitors from small molecule databases and/or for the elucidation of modes of action. We here present some few recent cases. A typical example is the recent discovery of new inhibitors of CXC chemokine receptor 2 (CXCR2) by applying ligand-based pharmacophore models . It should be mentioned that CXCR2 and its ligand, CXCL8, are known to be implicated in a number of inflammation-mediated diseases, including cancer [62–65]. In the study, Ha et al. generated a pharmacophore model based on known CXCR2 antagonists and used it to screen a database of 5 million commercially available compounds from different vendors. The authors were able to identify small molecule hits, which were further tested by in vitro screening in a cell-based CXCR2-mediated β-arrestin-2 recruitment assay, followed by several other cell-based assays. In vivo studies were conducted by lipopolysaccharide (LPS)-induced lung inflammation in mice. It was also shown that one of the compounds inhibits CXCR2 signaling through down regulation of surface CXCR2.
Moreover, the same compound was shown to inhibit CXCL8-mediated neutrophil migration and LPS-induced lung inflammation in mice significantly. The identified compounds were shown to also inhibit CXCR2/β-arrestin-2 association, cell migration and proliferation, and acute inflammation in mouse models. Another study combined structure-based docking and pharmacophores to design novel indole and chromene analogues, which were cyclin-dependent kinase 2 (CDK2) inhibitors. The identified compounds proved to be active against MCF-7 and HeLa cell lines. The study was conducted by exploiting stereo-specific information obtained from crystal structures of CDK2, substituting the pharmacophores on their moiety and docking the target protein to calculate the binding affinities . Other recent successful cases, where docking, QSAR, machine learning, and pharmacophore-based screening were combined to search for small molecules targeting cancer, are abundant in the literature [67–70].
3. Case study: structure-based virtual screening and QSAR studies on PRK1 inhibitors
3.1. Structure-based design studies (X-ray target structures and cross docking)
Four crystal structures of PRK1 are available in the protein data bank (PDB), Table 2. The PRK1 binding site is reported to exhibit either intrinsic or induced flexibility [17, 24]. Moreover, different bound inhibitors are known to induce different conformational changes in the binding site residues . Furthermore, chemical characteristics of the co-crystallized ligands play an important role in the applicability of the X-ray structure for virtual screening. Therefore, a cross-docking procedure was carried out, as a retrospective study aimed at determining the appropriate structure for the virtual screening. The selected structure should be able to bind the nonnative ligands with low root mean square deviation (RMSD), with respect to the reference binding conformation. On the other hand, the performance of the docking power and scoring functions is varying for different targets . The docking power measures the ability of docking algorithms to predict the correct conformation of the ligands .
The four X-ray structures of PRK1, including the apo-form, were used for a cross-docking study . The co-crystallized ligands were docked toward the target binding sites, using the Glide cross-docking script, implemented in Schrödinger Suite (2014 version) with standard precision (SP) for flexible ligand docking. In order to optimize the docking solution, an option was selected to perform post-docking minimization and includes number of poses up to 5 per ligand. The structures were prepared using the default setting implemented in Protein Preparation Wizard (Schrödinger 2014). The co-crystallized water molecules were deleted. Hydrogen atoms and partial charges were assigned. Finally, the structure energy minimized applying the OPLS 2005 force field. The binding site was defined using Grid Generation of Schrödinger suite 2014 and sets the co-crystallized ligand as a center of the binding pocket. In the case of apo-structure 4OTD, the binding site was defined by applying centroid of selected residues Leu650 and Lys753 with box size 14 Å. A hydrogen bond constraint was defined at PRK1 hinge region reside Ser704. In addition, all ligands were prepared using LigPrep implemented in Schrödinger utility 2014 involving generation of ionization and tautomeric states at pH 7.4 with less than 10 low-energy ring conformations. The ligands were energy minimized using MMFFs force field.
The RMSD values were calculated, with respect to the respective reference ligand conformations. Table 2 shows the RMSD values and Glide SP scores for the top-ranked poses in the corresponding structures. It was observed that none of the available structures had a binding site conformation, which allowed all the three different ligands to be docked with RMSD < 2.0 Å. Furthermore, the calculated average RMSD values for each structure were high. However, the applied docking method performed well by reproducing the binding mode of the co-crystallized inhibitor in self-docking. Moreover, the docking method correctly scored the co-crystallized inhibitors at the top of the list (Table 2), e.g., the X-ray structure of the ligand lestaurtinib was re-docked with an RMSD = 0.24 Å into its co-crystallized structure (PDB: 4OTG), being also ranked as the top scoring pose (Glide SP = −11.67 kcal/mol). Similar results were observed with the other inhibitors when X-ray structures were docked toward their respective co-crystallized target structures. Meanwhile, the binding pocket conformation for the apo-form of PRK1 was not found to be suitable for docking when using any of the current inhibitor structures. RMSD values were high and the docking scores were low when compared with those obtained in the rest PRK1 structures (Table 2). Therefore, an ensemble of PRK1 structures was proposed for the further virtual screening study.
3.2. Docking of active inhibitors
The data set DS1 includes 28 active (Figures 1 and 2 ) and 300 inactive compounds. First, the active compounds were docked toward the ensemble of the three PRK1 structures using the previously described docking methods. The docking solutions were first inspected, with the aim of comparing them with the experimental conformations (the co-crystallized ligands in other kinases). The active compounds could be divided into four subsets, the first set being staurosporine derivatives (10 actives, including the co-crystallized inhibitors lestaurtinib and Ro-318220). The second subset is made of the tofacitinib or pyrrolopyrimidine family (nine actives, in addition to the inhibitor tofacitinib in PDB ID: 4OTI). The third set was made of two isoquinoline derivatives, forming a group of five actives. The last subset contains the remaining inhibitors, with diverse scaffolds. The binding mode for each compound was analyzed and compared with the experimental data. As previously seen in the cross-docking for the co-crystallized inhibitors (Table 2), the binding modes for lestaurtinib and Ro-318220 were correctly reproduced. The binding modes for the further staurosporine derivatives were compared with these two analogues. Since the ensemble docking procedure was applied to dock the actives, eight compounds were docked toward the 4OTG structure target site, while the others (Ro-318220 and BIM1) were docked toward the 4OTH site. It was interesting to notice that the top-ranked active poses had a quite conserved binding mode, which shares two H bonds, interacting with the hinge region of PRK1. The obtained binding modes for all staurosporine derivatives were identical to the published structure (in the PRK1 X-ray structure 4OTG). Since a part of the C-terminal regulation region (C-tail) is not resolved, resulting in a more open and accessible ATP binding pocket, most of the staurosporine derivatives could be docked into it. Furthermore, all staurosporine derivatives were docked into both PRK1 structures, which are co-crystallized with one of staurosporine derivatives (4OTG or 4OTH). For pyrrolopyrimidine and tofacitinib derivatives, the inhibitors were docked into three X-ray structures of PRK1. Tofacitinib was co-crystallized in the structure 4OTI. The RMSD value of re-docking tofacitinib was 0.55 Å. The binding modes for the other analogues were compared with the experimental data (4OTI). Additionally, several other analogues were co-crystallized with the Akt1 kinase (PDB IDs: 3MV5, 3MVJ, and 3OCB).
Each active compound forms 2 H bonds with the hinge region, besides additional H bonds with the surrounding residues in the binding pocket. There are also some isoquinoline derivatives that were co-crystallized with other kinases, e.g., cAMP-dependent kinase (PDB IDs: 1YDS and 1YDR) and Rho kinase (PDB IDs: 2GNI). This subset of actives interacts with the PRK1 hinge region by mediating 1 H bond. A former virtual screening campaign had identified several actives, including CB-6046000 and CB-5743914, which were docked using the same previously described docking method. The binding mode showed two H bonds between the dihydroindol-2-one and the backbone of the hinge region, mainly Ser704 and Glu702. Furthermore, the binding mode of PD-0166285 matched the experimental data observed for the LCK kinase (PDB ID: 3KMM).
3.3. Scoring power
The rescoring for the derived docking poses was performed by calculating BFEs and using different solvation models implemented in the AMBER12 package. IC50 values had been previously determined for 26 of the active compounds (Table 3), with only percentage inhibitions available for two others) . Thus, the correlation coefficient (R2) between the experimental pIC50 and the calculated enthalpy changes (∆H) for protein-ligand binding was calculated (Table 3 and Figure 3). The BFE calculation was carried out using one snapshot after two consecutive minimizations steps and applying the Generalized Born solvation model [73, 74]. First, the minimization was carried out for water and counterion molecules without the ligand-protein complex, which was restrained to their initial coordinates with a force constant of 500 kcal/mol/Å2. In this step, there were 2000 iterations (beginning with 1000 steepest descent and followed by 1000 conjugate gradient). The second minimization step was applied for the whole system through 10,000 iterations (first 5000 steepest descent and then 5000 conjugate gradients). A significant correlation was found when using the Nguyen and Simmerling (igb = 8) version of the Generalized Born solvation model [73, 74] and applying the two minimizations steps. The cross-validated R2 was found to be 0.60, with a root mean square error (RMSE) of 0.89. To understand the effect of the chemical modification on the main scaffold, the binding mode of the actives and the interactions in the binding pocket was analyzed. As mentioned previously, PRK1 actives can be divided into four subsets (Table 3). The docking score (Glide SP) and the MM/GBSA BFE values were employed to explore the differences in the inhibitory activity. However, there was a weak correlation between the pIC50 and Glide SP scores (R2 of 0.43, see Table 4).
|Compound||IC50 (nM)||Exp. pIC50||Target structure||Glide SP||MM/GBSA**||PEOE_PC-||Pred. pIC50*|
It was observed, from docking, that the isoquinoline derivatives are able to form an H bond between the N atom of the isoquinoline motif and the NH backbone of Ser704 located in the hinge region. The isoquinoline motif occupied the adenine binding site; meanwhile, the substituents were located in the sugar-binding site, which is surrounded by several hydrophilic residues (e.g., Asp708, Asp750). Rescoring, using MMGBSA, showed that H-7 (IC50 = 658 μM and ∆H = −34.51 kcal/mol) had a higher score than HA-1077 (IC50 = 1.95 μM and ∆H= −29.42 kcal/mol), Table 3. The more favorable value resulted from the additional hydrophobic interactions between the front hydrophobic pocket residues and Phe910 (from the C-tail) and the methyl substituent on the piperazine ring. Meanwhile, the van der Waals interaction became weaker in the case of HA-1077. This could be explained by the shifting of the position of the substituent, since its seven-membered ring is bulkier. Both derivatives contain a positively charged ring nitrogen, which consequently forms an additional H bond in the ATP-sugar binding pocket (Figure 4). Meanwhile, the interaction between Asp708 and the piperazine ring enables the methyl group to get closer to Phe910 and its surrounding residues. Moreover, the sulfonyl moiety of both compounds is located under the P-loop, where it forms hydrophilic and hydrophobic interactions.
The remaining compounds share common interactions (as previously mentioned), but the biological activities of these compounds could not be predicted using the derived BFE model. However, the lower activity of compound ambnee93542761 could be explained by the loss of the interaction between of the sulfonyl moiety and weak hydrophobic interactions with Phe910. Nonetheless, it forms a hydrogen bond with the catalytic Asp764 (IC50 = 29.33 μM and ∆H = −30.32 kcal/mol). The results of this subset show the importance of the hydrophobic interactions with the Phe910 of the C-tail and its surrounding residues. However, the BFE values and docking scores failed to predict the affinity of F2457-0067, which shows a favorable BFE value but a low IC50 value (Table 3). This could probably result from the long linker between the isoquinoline motif and the positively charged morpholine ring. The generated conformations show additional van der Waals interactions. On the other hand, the morpholine ring shows more polar properties compared to the other compounds.
The second subset of actives is made of staurosporine derivatives. Most of these compounds form two H bonds between the pyrrole ring and the hinge backbone (NH of Ser704 and CO of Glu702). The docking results for staurosporine derivatives were compared with two PRK1 crystal structures (4OTG and 4OTH). The favorable BFE values of staurosporine derivatives could be attributed to the strong hydrophobic interactions with residues from the P-loop, e.g., Val635 and Leu627, in addition to the interactions with Leu753. The only observed deviation was with parts of the docked ligand interacting with the sugar pocket, e.g., staurosporine (the most active compound) showed a BFE value of ∆H = −60.85 kcal/mol and an IC50 = 0.0008 μM (Table 3). The favorable BFE value and activity of staurosporine could be estimated through the targeting of both hydrophobic pocket residues (behind the gatekeeper Met701 and near Phe910); on the other hand, several polar groups occupied the sugar polar pocket. By comparison with lestaurtinib (IC50 = 0.0086 μM and ∆H= −46.92 kcal/mol), the differences in the activity and BFE values could be attributed to the smaller ring/shorter linker in lestaurtinib for targeting Asp708 and Asp750 (Figure 5).
The third subset of PRK1 inhibitors is made of tofacitinib analogues. Ten compounds belong to this subset (Table 3). All derivatives interact with the hinge region by forming two H bonds through the pyrrolopyrimidine scaffold (Figure 6). The major substituents, which influence notably the BFE values and the biological activity of tofacitinib, are the N-methyl of the pyridine ring. Any chemical modification changing these interactions will consequently change the BFE value and the activity. A comparison of the calculated BFE values of tofacitinib with those of the other derivatives could clarify the effects of these interactions. As previously mentioned, the loss of the interactions made with the P-loop residues clearly influences the calculated BFE values and, therefore, affects the biological activities. Consequently, Z1129905037 (IC50 = 55.73 μM and ∆H = −27.92 kcal/mol) possesses a BFE value lower than tofacitinib (IC50 = 0.129 μM and ∆H = −38.88 kcal/mol) (Table 3). Furthermore, contrary to tofacitinib, the binding mode of Z1129905037 does not show hydrophobic interactions with the residue located at the bottom of the binding pocket. The ATP-sugar binding site is occupied by non-polar substituents in the case of Z1129905037, which is unfavorable and consequently reduces the activity. The other tofacitinib analogues, which do not contain any tertiary amine as the linker, show higher BFE values and obviously lower biological activities. One exception is PZ0151 (IC50 = 1.06 μM and ∆H= −43.19 kcal/mol) which displays the same binding mode with tofacitinib. The pyrrolidine ring in PZ0151 takes the position of the CN group of tofacitinib but rather shows a lower activity. This could be as a result of the small size of the binding site under the P-loop residues, which is not suitable for large substituents. In general, it is difficult to explain the changes of the BFE values caused by chemical modification of all compounds under study. The observed correlation coefficient (R2 = 0.65) could only be used to explain major differences in biological activities. However, it could be used as an indicator to identify the interactions, which play major roles in the determination of biological activity.
3.4. Ranking power
An enrichment study was performed in order to measure the ability to identify known binders or actives in large databases of inactives. The previously mentioned data set (named DS1), containing 28 actives and 300 inactives, was used to perform the enrichment study. The ligands were docked using the same ensemble and rescored depending on the previous findings. The results are presented in a BOX-PLOT and receiver operating characteristic (ROC) curves (Figures 7 and 8). Table 4 shows enrichment study results, using an ensemble of PRK1 structures, where the available scoring functions in Schrödinger and BFE methods were considered (∆H, Glide SP and Glide XP). As seen in Figure 7, Glide SP was able to discriminate between actives and inactives better than the BFE scoring (∆H).
The median values of the pair active/inactive were −9.17/7.23 and −38.79/34.02 for Glide SP and ∆H, respectively. The area under the curve (AUC) and enrichment factors (EF) at two different percentages were calculated. Table 4 shows the enrichment study results. It is interesting to note that at EF1%, all screened hits were actives when using both Schrödinger’s scoring functions (Glide SP and XP). Interesting, it was also observed that for the EF3% (for Glide SP and XP), the rate of true positives was 100% (Table 4). The main factor to measure the ranking power of the applied docking methods was the AUC value (Table 4). It is clear that the Glide scoring functions performed better to discriminate between the actives of inactives, having AUC = 0.90 and 0.88 for SP and XP, respectively.
3.5. QSAR model generation
Several QSAR models were generated to identify a possible correlation between experimentally obtained pIC50 values and calculated BFE or descriptor values (for the compounds on Figures 1 and 2). Calculated BFE (obtained by the MM/GBSA approach) from the previous study and 2D molecular descriptors, e.g., different descriptors referring to the partial charge and number of rotatable bonds, were considered. Since the Glide SP score showed a significant discrimination power between the active and inactive compounds, further scoring functions were also tested to optimize the QSAR models.
The best correlation coefficient in the previous study was found by applying Nguyen and Simmerling (igb = 8) Generalized Born solvation model [73, 74], after a minimization step (R2 = 0.65, RMSE = 0.89 and cross-validated q2 = 0.60, n = 26). In order to improve the correlation, partial charges of the ligands were calculated to compute several molecular descriptors for generating PLS models . The inclusion of the PEOE_VSA-0 descriptor improved the correlation coefficient value only slightly (model QSAR_, R2 = 0.68, RMSE = 0.8, and q2 = 0.61, n = 26). This descriptor indicates the partial equalization of electronegativities with approximated accessible van der Waals surface area (in Å2) of the molecule . Furthermore, the inclusion of another descriptor based on ligand partial charges (PEOE_PC-) , which indicates to the total negative partial charge, was used instead of PEOE_VSA-0 to generate QSAR_2 (R2 = 0.70, RMSE = 0.78 and q2 = 0.62, n = 26; Eq. (1)). The model QSAR_3, containing three descriptors, showed a slightly better correlation coefficient (R2 = 0.71, RMSE = 0.76 and q2 = 0.63, n = 26; Eq. (2)). The relative importance of the individual descriptors was MM-GBSA: 1, GLIDE_SP: 0.417, and PEOE_PC-: 0.371. The other tested descriptors were not helpful and were not considered for the final models.
Further internal validation of the generated QSAR models was carried out by means of bootstrapping, in which the samples are randomly selected from the used inhibitors to form training and test sets [77, 78]. The respective models are generated, and their statistical parameters (R2 and q2) are calculated and compared with those of the original QSAR models (which had been generated from the whole data set). The random selection of the samples was performed within each cluster generated by hierarchical clustering, depending on the structural similarity or within activity distribution clusters of PRK1 inhibitors (low, moderate, and high active inhibitors) . PRK1 inhibitors can be divided into four subsets depending on the chemical scaffolds (Figures 1 and 2). In the next step, hierarchical clustering using maximum common substructure (MACCS) keys and calculation of Tanimoto coefficients were carried out using the molecular operator environment (MOE) software tool (chemical computing group, Montreal, Canada, version 2014). Then, several samples within each cluster were taken, considering their activities. From 26 compounds, the sample subset contains only 20 inhibitors. The statistic parameters of the obtained QSAR models are presented in Table 5. The goal of the bootstrapping is to perturb the training set while not considering the statistical parameters of the test set. Since the obtained QSAR models were built from subset of the total data set, the values of and satisfy the minimum acceptable statistical parameters when compared with the three original QSAR models. These values also revolve around the values of the original QSAR models, generated from 26 PRK1 inhibitors (Table 5). It was observed that the obtained QSAR models possess acceptable statistic parameters (q2 > 0.5) and were comparable with those found in the original QSAR models. However, the QSAR_BT_3 model showed poor correlation when compared with the results found in the whole data set (q2 = 0.60, Table 5). The remaining six inhibitors were used as the external test set to verify the ability of the obtained QSAR models (QSAR_BTs) to predict the biological activities of the test set compounds. The values of between the calculated and measured pIC50 are 0.43, 0.58, and 0.67 for QSAR_BT_1, QSAR_BT_2, and QSAR_BT_3, respectively (Table 5). These results may confirm the ability of the model QSAR_3 and its derivatives (QSAR_BT_3) to predict the biological activities of novel compounds.
|Training set (20 inhibitors)||Test set (6 inhibitors)|
Interestingly, QSAR_3 have predicted pIC50 values higher or close to 7 for the most potent PRK1 inhibitors (IC50 < 100 nM). Moreover, QSAR_3 possessed the ability to distinguish between weak, moderate, and highly potent compounds (Table 3). Thus, the model could be used as a filter for the prioritization of specific compounds (hits) for synthesis and testing. Figure 9 displays the graph of the correlation between the observed and predicted pIC50 values using QSAR_3 model.
3.6. Validation of the generated QSAR models and GSK databases screening
It has previously been demonstrated that considering only the statistic parameters (R2, RMSE, and q2) to validate QSAR models is insufficient and sometimes misleading [78–82]. Therefore, it is important to validate QSAR models by testing the ability to accurately predict the biological activities of ligands not used for QSAR model generation. In the current study, two data sets of compounds were available: the first set containing 26 PRK1 inhibitors with IC50 values was used as a training set for QSAR model generation (Figures 1 and 2). The second set was used as an external test set to validate the QSAR model. These compounds were identified from an in vitro screening (GSK PKIS1) and contain 35 compounds including 14 active compounds and 21 inactive compounds (Figures 10 and 11) . Seven of the active compounds, for which IC50 values were measured (0.04–4.9 μM, Table 6), were taken as external test set to further evaluate the predictive power of the QSAR models.
Table 5 provides a comparison of the calculated pIC50 versus experimental pIC50 values. The calculation of pIC50 values was performed by using the QSAR_1, 2, and 3 models. An important statistic parameter to test the predictive power of the QSAR models is the correlation coefficient between the predicted and observed pIC50 values for the test set . The correlation coefficients between the experimental and predicted pIC50 values for the seven test set compounds were quite weak (= 0.49, n = 7) when using QSAR_3 model. The predictive power of the QSAR_1 and 2 models was investigated. The correlations of both models were within a similar range (= 0.58 and 0.37, n = 7). A further analysis was performed to detect the outliers depending on Z-score values. The Z-score values were calculated using MOE, for QSAR_3. Golbraikh et al. mentioned that at least five compounds are required for a test set to validate a QSAR model . After the analysis of the correlation and Z-scores, compound SB-750140 found to negatively affect the correlation coefficients. The correlation using QSAR_3 shows an improvement (= 0.95, RMSD = 0.17 and q2 = 0.89, n = 6). A similar observation was made for the two other QSAR models (QSAR_1: = 0.93, RMSD = 0.21 and q2 = 0.82, n = 6; QSAR_2:= 0.98, RMSD = 0.12 and q2 = 0.95, n = 6). The outcomes confirm that the generated QSAR models show satisfactory ability to predict the biological activities of the test set compounds. Thus, they could be accepted having an value > 0.5 . Moreover, taking the last consideration of the threshold for the predicted pIC50 > 7 found in case of actives to refer to potent compounds was clearly in the test set, since the most potent inhibitors in the test set (IC50 < 100) were identified with predicted pIC50 > 7 (Table 6). Meanwhile, the biological activities of the moderately active compounds were predictive with calculated pIC50 in the range 5.5–7 (Table 6). Among the inactive compounds in the test set, three compounds were predicted to be highly potent compounds, but these compounds were inactive in the assay. The visualization of the false positives could not clarify the absence of activity of both compounds. However, GSK978744A and GSK1030059A, which mainly target the PLK kinase, possess the same chemical scaffold, and their binding modes were investigated and compared with the binding modes of analogues co-crystallized with other kinases (CDK2; PDB ID: 2I40, NEK2; PDB ID: 2XNN). However, it is not clear, from structural interactions, why this compound is not active. Additionally, it might possible that the exclusion of entropy changes or solvation of the binding pocket could play a role in the wrong estimation of the biological activity.
Since the QSAR_2 model shows high value of in the test set (QSAR_2: = 0.98, RMSD = 0.12 and q2 = 0.95, n = 6), the discrimination performance was therefore discussed. Similar to QSAR_3, the most potent inhibitors in the test set (IC50 < 100) were identified with predicted pIC50 > 7 (Table 6). Furthermore, between the remaining active compounds, two of them were predicted to be high potent inhibitors pIC50 > 7 (GW784307A: predicted pIC50 = 7.36; GSK1007102B: predicted pIC50 = 7.21). However, the IC50 values of these two compounds have not been measured yet, but it reported to be active since it binds to PRK1 at 1 μM. Thus, the consideration of the threshold for the predicted pIC50 > 7 could also be applicable in the QSAR_2 model. The weak active compounds GSK319347A, GW759710A, and GW829877X (shown in Table 6) also predicted as moderate or low active compounds where predicted pIC50 values were less than 7. Among the inactive compounds, three compounds (the same compounds as in the case of the QSAR_3 model) were predicted to be highly potent compounds, but these compounds were inactive in the assay.
By comparing the three generated QSAR models (QSAR_1, 2, and 3) depending on the predicted pIC50 values among them to the measured ones, it was found the predicted pIC50 values when using QSAR_3 model were closer to the experimental values (Table 6), but QSAR_2 model showed the highest value of in the test set. Furthermore, in bootstrap validation, it was found that the QSAR_3 model had the ability to predict the biological activity of the test set compounds more accurately than the models QSAR_1 and 2 (Table 6). Thus, the QSAR_3 model could be used to predict the biological activities of the highly potent compounds, which should exhibit biological activities power in the nanomolar range and differentiate between the active compounds, according to their activity (low, moderate, and high).
3.7. Virtual screening using the p-ANAPL library
The above protocol was further used to virtually screen for possible PRK1 inhibitors from the p-ANAPL library . As previously applied, ensemble docking was performed to dock the compounds to the X-ray structures of PRK1 stored in the PDB. Both data sets were prepared using Schrödinger 2014.U2, including the generation of several conformations per compound. The docking solutions were visualized using MOE to explore the conserved interactions with residues at the PRK1 hinge region. The selection of the hits was based first on the displayed binding mode, followed by their scoring, then predicting the pIC50 values using the generated QSAR models. Depending only on docking scores was insignificant for the selection of promising hit compounds. The docking solutions for the entire data set were visualized and several hits compounds were selected. In the next step, their predicted pIC50 values were calculated using QSAR_1, 2, and 3. Depending on the set threshold for the predicted pIC50, several top scoring compounds could be selected as promising hits for the biological assays, e.g., those in Figure 12. Figure 13 displays the binding modes of the selected hits and the interaction with residues at the binding pocket of the corresponding X-ray structure of PRK1.
One of the promising hits, DBT-6b, is Bartericin B (MW = 408.49; PubChem CID: 12136210) from Dorstenia barteri . The binding mode of DBT-6b is shown in Figure 13A. The compound is predicted to interact with hinge region residue Ser704 with one H bond. The phenol ring forms an additional H bond with the backbone of Asp764 from the DFG-motif. The interaction with the front polar pocket and the ribose binding pocket is mediated by two H bonds between Asp708 and propanol substituent in DBT-6b. The calculated pIC50 values were 7.85, 7.64, and 7.45 using the generated QSAR models QSAR_1, 2, and 3, respectively. The second hit (P87), vitexin or apigenin-8-C glucoside (MW = 432.38; PubChem CID: 5280441), is a flavonoid glycoside from diverse sources, e.g., Hyparrhenia hirta . The binding mode of P87 (Figure 13B) shows that the compound might form one H bond with the hinge region residue Ser704. Additionally, the substituents on the oxane ring are targeting the front polar pocket and forming two H bonds with Asp708 and Asp750. The phenyl ring interacts with the back hydrophobic pocket and further forms an H bond with Asp764 in the DFG motif. P87 was predicted to be a highly potent compound, with a predicted pIC50 value greater than 7, using QSAR models (pred_pIC50= 8.02, 8.6, and 9, respectively). Bartericin B and vitexin are the only two examples of promising compounds from natural products which could target PRK1. Among the screened databases, other hits were selected as promising compounds, which could inhibit PRK1 activity in the nanomolar range. In further investigations, the selected compounds can be submitted for biological assay to figure out their ability to bind into PRK1.
This chapter provides a brief summary of database resources for anticancer drug discovery and some recent inputs and success stories for the identification novel inhibitors of selected anticancer drug targets by the use of computer modeling. While the experimental validation of some of the natural product hits identified in the case study is ongoing, the question about the applicability domains of the derived QSAR models reported is still to be answered. Among the identified hits, 50 closely similar compounds to Bartericin B (with cut-off Tanimoto coefficient of 0.7), having reported biological activities in PubChem  and ChEMBL , with 239 biological activities and 152 published patent data. A similarity search of vitexin in PubChem gives 112 similar compounds, corresponding to 524 biological activities and 208 patents. Bartericin B is known to exhibit antimicrobial activities against Trichomonas gallinarum, with minimum lethal concentrations (MLCs) of 0.244 and 0.121 μg/mL after 24 H and 48 H, respectively , while its analogue (Bartericin A from Dorstenia angusticornis) is known to be very active against some bacteria and yeasts associated with human pathologies . Both chalcones are known to also have potential antiprotozoal activities , e.g., against Plasmodium falciparum . Vitexin, on the other hand, is known for its antioxidative and  spasmolytic effects . The compound is known to be abundant in plants of the genus Vitex (Verbenaceae), which is known to exert insect antifeeding activities, among others . These compounds could now be tested in biological assays for potential PRK1 inhibition.
A.N. is currently a doctoral candidate financed by the German Academic Exchange Services (DAAD), Germany. F.N.K. acknowledges a Georg Forster fellowship from the Alexander von Humboldt Foundation, Germany.
List of abbreviations
|AfroCancer||African Anticancer Natural Products Database|
|BFE||Binding free energy|
|EC50||Half maximal effective concentration, i.e., the concentration of a drug, antibody or toxicant which induces a response halfway between the baseline and maximum after a specified exposure time|
|ED50||The median effective dose, a dose that produces the desired effect in 50% of a population|
|GI50||The growth inhibition of 50%, drug concentration resulting in a 50% reduction in the net protein increase|
|IC50||The drug concentration causing 50% inhibition of the desired activity|
|NANPDB||Northern African Natural Products Database|
|NPACT||Naturally Occurring Plant-based Anti-cancer Compound Activity-Target Database|
|p-ANAPL||Pan-African natural products library|
|PLS||Partial least squares|
|PRK1||protein kinase C–related kinase|
|QSAR||Quantitative structure-activity relationship|
|RMSD||Root mean square deviation|
|RMSE||Root mean square error|
- The authors declare that they have no competing interests.