The various HIV-1 protease and HIV-1 integrase inhibitors and their structures. The affected residues in HIV-1 protease and HIV-1 integrase binding pocket are shown as well.
Molecular docking has been developed and improving for many years, but its ability to bring a medicine to the drug market effectively is still generally questioned. In this chapter, we introduce several successful cases including drugs for treatment of HIV, cancers, and other prevalent diseases. The technical details such as docking software, protein data bank (PDB) structures, and other computational methods employed are also collected and displayed. In most of the cases, the structures of drugs or drug candidates and the interacting residues on the target proteins are also presented. In addition, a few successful examples of drug repurposing using molecular docking are mentioned in this chapter. It should provide us with confidence that the docking will be extensively employed in the industry and basic research. Moreover, we should actively apply molecular docking and related technology to create new therapies for diseases.
- computational drug design
- molecular docking
- drug repurposing
Molecular docking is one of many computational tools that can be used in drug discovery [1, 2, 3, 4]. It is a form of structure-based drug discovery that quantifies the binding affinities between small molecules and macromolecular targets (proteins). The first step in molecular docking is choosing a drug target. Any macromolecule can be used as a target; some very common targets include enzymes and regulatory elements. Next, the three-dimensional structure must be determined or predicted; high resolution structures can be determined using X-rays, NMR, or electron microscopy (EM). Thousands of popular targets have solved structures available on the protein data bank (PDB) . Many drug targets have known binding sites; if not, software that can predict potential binding sites for different ligands have been developed. Docking studies can be performed using known ligands (naturally occurring molecules or known drugs) or novel ligands. Virtual screening (i.e. identifying novel ligands with molecular docking) provides an extremely useful (but time consuming) method of drug discovery because molecules can be designed to have high binding affinity to a very specific site. Docking studies are often validated using further computational methods, such as molecular dynamic simulation. The most successful candidates from computational trials can be tested
It is believed that a searching algorithm, which assists in thoroughly and efficiently exploring possible positions, orientations and conformations of potential drugs and the target proteins, and a scoring function, which assists in precisely and correctly identifying the most energetically favorable binding poses, are two most important components of a molecular docking programs. However, some other factors will affect the effectiveness and accuracy of molecular docking, such as the availability and quality of a determined or predicted structure of the target protein, the conformational changes of the target proteins after the drug binding, and the identification of potential binding sites. As those mentioned in previous chapters, many commercial and academic docking search algorithms, scoring functions, and software packages have been developed and improved in the past decades. However, it is still questioned if there are any successful stories in which molecular docking have helped to bring a drug to the market.
Although many molecular docking algorithms have been developed and improved for many decades, biomedical laboratories or pharmaceutical companies used to be hesitant to apply this technology to drug screening. Here are some possible reasons:
The “force fields” which describe the intra- and inter-molecular interaction energies were not accurate and precise enough to estimate or calculate the binding affinities between proteins and potential binding drugs.
The computer was not “fast” enough to calculate the interacting energy of many possible binding “conformations” of one or many possible binding compound(s) using a sophisticated model taking account into all the factors, components, and conditions of molecular interactions.
The number of binding complex structures was not large enough and the resolution of available structures was not good.
The searching/sampling algorithms to explore the possible binding orientations and conformations were not efficient to identify possible binding poses with reasonable time.
These reasons and concerns are all tightly cross-linked together, and, fortunately, have been dramatically improved in the past years. For example, the number of structures on PDB has increased from 47,605 to 133,759 since 2007 . The resolution of determined structures has significantly improved. Therefore, the accuracy of both physics-based and knowledge-based scoring functions which assist researchers in identifying the most energy favorable binding poses and estimating binding affinities have been improved. The substantial improvement in both computer hardware and software also make it possible to screen a large number of natural and artificial compounds and search the best binding poses efficiently.
When we attempt to dock a compound to a target protein, often we need to use other computational methods before docking or in parallel. For instance, we may need to do structure prediction if the structure of the target protein has not yet been determined. The accumulated PDB structures with good resolution and the accurate structure prediction algorithms make it possible for researchers to obtain reliable structural models to perform molecular docking experiments. The enhanced quantity, quality, and diversity of protein-compound complex structures provide solid basis for creation of accurate binding site prediction methods, and they help reduce the searching surface area on the target proteins for docking algorithms [7, 8, 9]. Other computational methods such as pharmacophore and quantitative structure-activity relationship (QSAR) models can be used prior to the molecular docking to reduce computational load and time [10, 11, 12]. In summary, the technology of molecular docking has matured and been applied in different stages of the drug discovery process. The successful stories have not been mentioned often and are not widely known. They will be introduced in this chapter.
2. Identification of medicine for HIV
The human immunodeficiency virus (HIV) epidemic around the world has pushed massive amounts of money into research that looks for ways to help treat and prevent this virus. Because bringing a drug into the market can take many years and cost astronomical amounts of money, it is of the utmost importance of researchers to use a cost effective ways to find these new therapeutics. Computational methods have been gradually becoming commonplace in drug design research. These methods have been either confirming established research, discovering new compounds, binding sites or conformations, and even allowing for the repurposing of the drug to treat other illnesses. HIV research has seen an influx of multiple computational methods being used to confirm discoveries of previous studies and establish new ones. Methods such as docking and molecular dynamics are saving researchers valuable time. These methods are also allowing research to make accurate and precise predictions of what is going on at the molecular level. While computational drug design methods are nowhere near replacing
2.1. Human immunodeficiency virus
Acquired immunodeficiency syndrome (AIDS) is acquired in humans by the retrovirus HIV . HIV infects important helper T cells in the human immune system—specifically CD4+ T cells . HIV is transmitted as positive-sense, single-stranded, enveloped RNA virus. There are currently two types of HIV that have been characterized as HIV-1 and HIV-2. HIV-1 was the first HIV virus discovered and it is more virulent and more infective than HIV-2 . After the viral capsid has entered the cell, an enzyme called reverse transcriptase liberates the positive-sense RNA from the viral proteins and copies it into a complimentary DNA molecule . The reverse transcriptase process is very prone to errors. This characteristic results in many mutations that make this component of HIV likely to encounter drug resistance. For this reason, HIV reverse transcriptase is an unlikely target for HIV therapeutics. The newly formed circular DNA strand and its complement form a double-stranded viral DNA that is transported to the nucleus. The integration of the viral DNA into the host’s genome is carried out by the integrase enzyme . The HIV virus then may remain dormant or continue to assemble new HIV-1 virions. The plasma membrane of the host cell is the site for the production of new HIV-1 virions. The virion buds that are produced at the plasma membrane are cleaved by HIV-1 protease enzyme. Once the bud has been cleaved by HIV-1 protease, the internal components can assemble, and in turn create a virion capable of infecting other cells. The two targets that computational drug researchers have focused on significantly are HIV-1 integrase and HIV-1 protease.
2.2. HIV-1 integrase
Integrase (IN) is a retrovirus enzyme not exclusive to only HIV. This protein allows the genetic material of the virus to be integrated into the DNA of the host cell. Integration occurs after the double-stranded viral DNA is produced by reverse transcriptase. Once integration has commenced for a cell, there is no turning back. The cell is now considered a pro-virus, and it is now a permanent carrier of the virus. In general, retroviral integrases catalyze two reactions. Both reactions are catalyzed by the same active site on the enzyme and occur via transesterification.
2.2.1. HIV-1 integrase inhibitor—Raltegravir and its ensuing analogs
The most common inhibitors for integrase are referred to as integrase strand transfer inhibitors (INSTIs). Mg2+ and Mn2+ are critical cofactors in the integration phase , and inactivating these cofactors causes functional impairment of integrase. Most HIV-1 INSTIs contain a structural motif that coordinates the two divalent magnesium ions in the enzyme’s active site . Researchers screen over 250,000 compounds to yield potent inhibitors . The most active inhibitors seemed to contain a distinct beta-diketo acid (DKA) moiety . This moiety had the ability to coordinate metal ions within the IN active site. There was similar antiviral activity when the DKA pharmacophore was transferred to a naphthyridine carboxamide core . A class of N-alkyl hydroxypyridinone carboxylic acids was the result of the success with the diketo acid structural analogs. These new analogs had a good pharmacokinetic profile in rats . The drug, MK-0518, also known as Raltegravir, became the most promising pyrimidinone carboxamide derivative. Raltegravir was the first integrase inhibitor to progress to Phase III clinical trials. While there have been multiple resistant mutations for both treatment-experienced and treatment-naïve patients, Raltegravir still proved to be an effective IN inhibitor . In October 2007, Raltegravir became the first FDA-approved IN inhibitor (Table 1).
To bring a single drug to the market, it can cost upwards of $2 billion . Even with this, only one in three drugs will generate enough revenue to cover the cost of the research and development of the drug . Pharmaceutical researchers and executives can see the allure of modifying current leads on drugs, rather than trying to design a new drug. “Me-too” drugs  can create an optimized drug and create vital marketplace competition, but many argue that slight modifications are producing negligible improvements . “Me-too” drug emergence has seen a surge in the HIV-1 integrase inhibitor market. While Raltegravir has become the known and widely used anti-HIV drug, amino acid mutations have already conferred robust viral resistance of the drug . This viral drug resistance normally occurs when one of three amino acids—Y143, Q148, or N155—mutate in conjunction with at least one other mutation . The strongest antiviral resistant mutation seems to be the Q148H integrase mutant (IC50 > 700 nM), and G140S has been shown to restore the poor replication ability of Q148H to wild-type levels . Even though Raltegravir has seen this resistance profile, pharmaceutical companies still spend lots of money on “me-too” research and the development on this drug. There should be a distinction made between me-too drugs and second-generation drugs . A second-generation inhibitor needs to exhibit a new mode of action. Secondly, a second-generation drug needs to show significantly improved potency or decreased toxicity. A major problem with second-generation drugs is cross resistance, so these drugs should maintain potency, but avoid this cross-resistance.
2.2.2. Using docking studies to predict the binding mode of S-1360
It is very important to predict a bioactive conformation of a ligand, but the task becomes difficult when the receptor site has a region with unusual conformational flexibility. With the numerous crystal structures available for HIV-1 integrase, there are numerous differences in the active site regions in the core domains of IN. S-1360 was one of the first beta-diketoacid IN inhibitors to enter clinical studies. Dayam and Neamati sought to predict the bioactive (active site bound) conformations of S-1360 . To achieve this, the researchers performed extensive docking studies with three different crystal structures. The study was extended to include 5CITEP and a bis-diketoacid (BDKA).
To predict the binding mode of S-1360, 104 unique conformations within a 20 kcal/mol energy range were generated using catConf module of the Catalyst. All 104 conformations of S-1360 were docked into the active sites A, B, and C (PDB: 1QS4, 1BIS, and 1BL3, respectively). Based on GOLD fitness scores, 10 conformers with highest scores were selected for further analysis. The researchers noted that S-1360 adopted very different binding orientations inside the active sites for A, B, and C. In the A active site, the bound conformation with the highest GOLD fitness score was found 102 times of 200 conformations. S-1360 occupies a space near D64, D116, N120, and Mg2+ ion. In active site B, the highly favorable conformation of S-1360 is found 62 times. The triazole and the diketoacid moiety of S-1360 occupy a deep cavity surrounded by I151, N155, V75, and Q62. The groups show favorable van der Waals and electrostatic interactions with D64, I151, E152, and N155 (Figure 2). The highly favorable binding conformations of the C active site are found 172 times. The researchers also compared the best binding orientations of S-1360 in the active sites from the three different crystal structures of HIV-1 integrase. Dayam and Neamati observed that S-1360 in the A active site achieves a planar conformation and interacts with various residues throughout the active site. In this orientation, S-1360 forms H-bonding interactions with K159 and N120. The two oxygen atoms from the furan ring and keto group for coordinate bonds with the Mg2+ ions. This conformation appears to be stable because it occurs 102 out of 200 times. The binding site conformation of S-1360 inside the C active site is also very stable because of its 172 appearances out of 200. While these conformations of A and C are stable, this conformation is not in line with experimentally observed results. S-1360 selectively inhibits strand transfer reactions of HIV-1 integrase, but S-1360 in A and C did not interact with amino acids in the strand transfer (ST) cavity. However, S-1360 did form strong interactions with various amino acids and Mg2+ ion in the cavity where 3′-processing of IN is believed to be carried out.
Docking in this study was performed using version 1.2 of the Genetic Optimization for Ligand Docking (GOLD) software. This uses a genetic algorithm to explore the ligand conformational flexibility with partial flexibility of the active site . GOLD was tested on a dataset of over 300 complexes. GOLD succeeded in more than 70% of cases in reproducing the experimental bound conformations of the ligand . GOLD requires that users define the specific binding site. For this study, Dayam and Neamati defined a 20 Å radius active site. D64 was selected as the center of the active site. The GOLD program then searches for a cavity within the defined area. The program also considers all the solvent accessible atoms in the defined area as active site atoms. “All docking runs were carried out using standard default settings with a population size of 100, a maximum number of 100,000 operations, a mutation, and crossover rate of 95” . At the end of each run, GOLD reported all the predicted bound conformations based on their fitness score. The fitness score consists of H-bonding, complex energy, and ligand internal energy.
2.2.3. Validating the resistance profiles of “me-too” Raltegravir analogs using docking studies
Serrao et al. sought to validate the resistance profiles of me-too Raltegravir analogs . There are minor variations in the
Serrao et al. proposed that residues essential to the compounds’ interaction with HIV-1 integrase would be prime candidates for resistance mutation. “Raltegravir makes direct interactions with three residues encompassing the [IN] catalytic motif (D64, D116, E152)” . The researchers wanted to predict the interaction residues of Raltegravir’s analogs in a similar way. They wanted to show that the compounds would have little success in viral eradication. Because S-1360 was the one of the first clinical IN inhibitor candidates, the researchers thought it would be interesting to look at the interactions between S-1360 and 1BL3 and compare with that of Raltegravir. The researchers found that there are identical interactions between the two drugs (D64, T66, D116, Y143, Q148, E152, and N155). Raltegravir showed an additional interaction with E92. While this observation has been confirmed by clinical experiments, the E92Q mutation has conferred upwards of a sevenfold viral resistance to Raltegravir [32, 33, 34]. The researchers’ data could significantly validate the reliability of their docking technique. The researchers then moved on to describing the interactions between HIV-1 integrase and each most potent analog of Raltegravir. On the several compounds that were used in this follow up study, most all of them interacted in the same binding pocket that Raltegravir is active in. If the researchers’ predictions are correct, these candidate drugs will fail to replace Raltegravir. The researcher’s note, while there is always the possibility for me-too drugs to evolve into blockbuster drugs, the studied HIV-1 integrase “drugs appear to have a small chance of improving the clinical outlook of HIV patients with Raltegravir viral strains” .
2.3. HIV-1 protease
An essential element in the HIV life cycle is HIV-1 protease. It is a retroviral aspartyl protease. HIV-1 protease is a homodimer, with each subunit made up of 99 amino acids . Gag and Pol polyproteins are cleaved by this protease . When these are cleaved at the appropriate places, a mature and infectious HIV virion is produced. When an effective HIV protease is blocked, the HIV virus is not infectious . HIV’s ability to replicate and infect additional cells can be disrupted by mutation of the HIV protease active site or inhibition . For this reason, HIV protease has seen a massive amount of research money in developing HIV-1 protease inhibitors.
HIV-1 protease is a homodimeric enzyme. Two aspartic acid residues that are essential for catalysis , D25 and D25, are located on each monomer. Asp-Thr-Gly sequence is present in HIV-1 protease, but this is conserved among other mammalian aspartic protease enzymes. There are extended beta-sheet regions on each monomer, and these are known as “the flap”. This makes up the hydrophobic substrate binding cavity with the two aspartyl residues on the bottom. HIV-1 proteases are highly selective, and very catalytically active in hydrolyzing peptide bonds. While the mechanism is similar to many known features of aspartic proteases, the full detailed mechanism of this enzyme has not been fully understood .
2.3.1. Saquinavir and Nelfinavir—HIV-1 protease inhibitors and their ensuing resistance
The ideal HIV-1 protease inhibitor should be potent and specific for HIV-1 protease compared to other mammalian aspartic acid proteases . The drugs should also have good bioavailability and duration in human bodies. There were no known inhibitors of HIV-1 protease when it was first determined to be a good target for antiviral therapy. A good starting place to look was the type of enzyme that HIV-1 protease was, an aspartic acid protease.
When researchers were designing HIV-1 protease inhibitors, it was noted that there was a stereocenter in the drug that correlated with the drug’s activity. The transition state hydroxyl group needed to be in the R-stereochemistry or else the drug completely lost its activity. This discovery led researchers to identify Ro-31-8959, or Saquinavir, as a prime candidate for further studies because of this characteristic. Saquinavir has an IC50 < 0.37 nM for HIV-1 protease and does not inhibit other aspartic acid proteases, making it highly potent. While the drug is potent, it shows poor oral bioavailability—only 4% . Researchers attribute this to the high molecular weight of the drug and the large number of amide bonds. Agouron Pharmaceuticals and Lilly Research Laboratories collaborated to produce Nelfinavir . The structure of Nelfinavir is very similar to the structure of Saquinavir, but Nelfinavir contains a couple of changes. Labile components in Saquinavir were replaced with a hydroxytoluene amide group, however, this modification resulted in reduced potency. Drug developers replaced the phenyl group with a phenylthio group. This phenylthio group was better able to fill the hydrophobic pocket of the HIV-1 protease active site . With an IC50 = 2 nM, Nelfinavir is also a very potent HIV-1 protease inhibitor.
As shown with HIV-1 integrase inhibitors, resistance persists to be a pressing problem in the treatment plans for HIV-1. Because Saquinavir and Nelfinavir have similar structures, there are different, yet highly overlapping sets of amino acids substitution mutations that confer to drug resistance. The mutations that affect the binding site for Saquinavir are G84, I84, or L90. For Nelfinavir, the only difference from the Saquinavir mutation is D30 instead of G48 . While these amino acids affect the binding pocket, there are other overlapping sets of amino acids that when mutated elsewhere in the HIV-1 protease enzyme confers antiviral resistance. These sites include L10, M46, L63, A71, and N88. Because many of the HIV-1 protease inhibitors on the market right now are very similar in structure, it is not surprising that there is a high degree of cross-resistance between the drugs.
2.3.2. Predicting HIV-1 protease resistance with docking studies
There are several different methods to interpret the resistant behavior of HIV-1 from genotypic data. A physics-based approach of docking has seen an influx of use by researchers in evaluating the energy interactions of the protein-inhibitor complexes. This technique has been widely used to look at the interactions between HIV-1 protease and its inhibitors. In 2005, Jenwitheesuk and Samudrala completed a study that used a protein-inhibitor docking approach to determine the correlation between experimentally and computer calculated protease inhibitor binding affinities . The researchers also supplemented their findings with a molecular dynamics protocol . This was used in part because most docking programs utilize a rigid protein protocol. HIV-1 protease has special flaps that are in motion upon binding. Since the structure of target protein is rigid, the opening and closing of the flaps is not performed . This protocol was used to simulate the flexible nature between the ligand and the enzyme. The researchers used the X-ray crystal structures of various wild-type HIV-1 protease-inhibitor complexes. For Saquinavir and Nelfinavir, the researchers selected 1HXB and 1OHR, respectively (Figures 3 and 4). The researchers then substituted the wild-type side chains with a mutant side chain.
When preparing the inhibitor structure, the researchers treated them as an all atom entry. By doing so this filled the empty valences with hydrogen. All the rotatable bonds in the inhibitors were also allowed to rotate freely. The researchers used AutoDock version 3.0.5 with a Lamarckian genetic algorithm to carry out docking calculations. Genetic algorithms use the idea of natural genetics and biological evolution. There are specific values describing the ligand with respect to the protein (translation, orientation, and conformation). These are described at state variables and in the genetic algorithm (GA), each state variable corresponds to a gene. In genetic algorithms, the genotype is from the ligand’s state, and the phenotype comes from the atomic coordinates . When molecular docking is performed, the fitness of the gene is referred to as the total interaction energy between the ligand and the protein. The GA comes into play by mating random pairs of individuals to induce crossover. In this scenario, some offspring undergo random mutation. The genes are selected from the current generation based off their fitness scores. This process is repeated for multiple generations to produce a ligand and protein interaction that has the most fitness. In the research conducted by Jenwitheesuk and Samudrala , there were a total of 27,000 generations. AutoDock generates the energy terms for inter-molecular energy, internal energy of the ligand, and torsional free energy. When the researchers determined the final docked energy of the protein ligand complex, the inter-molecular energy, and the internal energy of the ligand was added.
In the results of this study , Jenwitheesuk and Samudrala saw a significant improvement in the correlation coefficient when supplementing their docking procedure with MD simulation to provide a flexible nature of the protein (correlation coefficient changed from 0.38 to 0.87). The researchers were also able to see that their docking with dynamic protocol was 64% accurate for phenotypically resistant profiles and 83% accurate for phenotypically susceptible groups. There was a previous study done by Shenderovich et al. . While this study followed a similar protocol to the one followed by Jenwitheesuk and Samudrala, Shenderovich et al. only used 50 HIV-1 protease sequences. Jenwitheesuk and Samudrala used 1792 HIV-1 protease sequences. This larger sample size could include all of the reported resistant mutations. Jenwitheesuk and Samudrala also added a protein-inhibitor relaxation feature to their protocol. Their protocol was also able to consider the rearrangement of the side chain on the active site surface. The relatively short MD simulation of 0.1 ps had a significant effect on the flap region (which moved away from the binding pocket—RMSD = 0.54 Å), yet was not long enough to affect the main chain of the protein. Using this protocol, the resistance and susceptibility predictions from Nelfinavir and Saquinavir were 86 and 94%, respectively .
This study looked at the two key mutations discussed earlier—Asp30Asn and Gly48Val. In this study, docking with the molecular dynamics implementations always failed to identify as a cause of drug resistance. This suggests that researchers should not rely solely on one method or system in making decisions about therapeutic regimens without consulting other methods, resources, and techniques. This study was still able to determine other mutations around the binding pocket. The docking with MD simulation implementation could identify mutations that correspond with high levels of resistance of Amprenavir (another kind of HIV-1 protease inhibitor)—I50V and a combination of I84V + L90 M and I54V + V82A + I84V + L90 M. These mutations are cross resistant with Nelfinavir and Saquinavir.
2.4. Repurposing HIV-1 protease inhibitors
American trypanosomiasis, or Chagas disease, is caused by the protist
Over the years, there has been a recent interest in drug repurposing (also known as drug repositioning). The process involves using known and approved medications—and sometimes discontinued drugs from other drug trials—and using them for a new clinical applications other than their intended treatment. Drug repurposing is gaining popularity due to the fact that within the past few decades there has been a significant decline in the number of safe and effective drugs being developed for the pharmaceutical market. Pharmaceutical companies are not inclined to fund research or product design because development of a new drug is a long and costly process . One of the major benefits of trying to repurpose drugs is the reduced cost of researching and developing a novel drug from scratch.
Bellera et al. present computer-aided identification of approved drugs Clofazimine, Benidipine, and Saquinavir as potential trypanocidal compounds . The major drug target is cruzipain (Cz). Cz is the major cysteine protease of the parasite. This protease is essential for replication of the intracellular form of the parasite. Bellera et al. compiled a 147 compound dataset. This data set was balanced with 77 Cz inhibitors and 70 non-inhibitors. The researchers then used docking studies on Saquinavir, Benidipine, Clofazimine, and the inactive verapamil. The protein to be used in the docking studies was 1ME4. This protein was a crystal structure of one reversible inhibitor that was complexed with Cz. The compounds were docked according to the Lamarckian genetic algorithm. The active site was defined as a 19 × 15 × 15 Å3 grid. The researchers performed 100 docking runs for each compound. The docking active site was treated as a rigid molecule and the ligands were treated as flexible. The researchers used Autodock 4.2 to analyze the results of their docking study. The binding results from the docking studies correlated with experimental evidence. The scores for Saquinavir, Benidipine, and Coldazimine were −12.76, −8.42, and −7.36 kcal/mol, respectively. However, the inactive verapamil compound was only −6.37 kcal/mol .
3. Identification of medicine for cancer
Cancer is one of the most devastating and destructive diseases that is known to be a persistent public health threat. As of the year 2016, cancer is the second leading cause of death in the United States. There were an estimated 1,685,210 new cases and 595,690 deaths resulting from cancer . Along with the high rate of incidence exacerbating the pressure already felt by researchers to discover a cure, the mechanisms of the disease add another level of complexity that must be outmaneuvered. Many cancer cells lack molecular targets making it extremely difficult for anticancer chemotherapeutics to be fully effective. Toxicity against normal tissues can develop from anticancer therapy, which leads to unwanted side effects. Due to the adverse effects, many anticancer chemotherapeutics are given at suboptimal doses which typically results in failure of therapy, drug resistance, and metastatic disease . The complications associated with cancer demonstrate the critical need for the development of new anticancer therapies that are successful with minimal undesired reactions. In order to aid in the task, many researchers are turning to
3.1. Docking for identifying novel proteasome inhibitors and understanding the binding mechanisms
A variety of cancer therapeutics already exists and is available to patients; many of these therapies attempt to have a specific molecular target in order to eradicate the cancerous cells. One protein that receives extensive attention due to its pivotal biological role in eukaryotic cells is the proteasome. There are two major types of proteasomes such as the 20S proteasome, which is responsible for intracellular protein degradation and the 26S proteasome complex, which functions in the ubiquitin pathway as an ATP-dependent proteasome . The 26S proteasome has three proteolytic activities including peptidyl glutamyl peptide hydrolase (PGPH) in the β1subunit, trypsin-like (T-L) in the β2 subunit and CT-L activities in the β5 subunit .
Degradation of proteins in the cytoplasm and nucleus of eukaryotic cells can affect: regulation of cellular pathways particularly cell growth and proliferation, apoptosis, DNA repair, transcription, immune responses, and signaling processes . Inhibition of proteasomes has therefore become an attractive target for anticancer therapies. The drug Bortezomib was developed by Millennium Pharmaceuticals Inc. and received regular approval by the Food and Drug Administration in 2005 as the first proteasome inhibitor to be used for the treatment of multiple myeloma . Bortezomib is a peptide boronate inhibitor of the proteasome and it selectively binds to the protein to inhibit its chymotryptic-like activity . The anticancer effects demonstrated by Bortezomib are mainly observed by the inhibition of the transcription factor NF
In the race to discover a more efficacious proteasome inhibitor, molecular docking has been an extremely beneficial tool utilized by researchers to expedite the exacting process.
Molecular docking has not only been successful in identifying potential proteasome inhibitors but it has also been beneficial in understanding the binding mechanism of proteasome inhibitors to the proteasome. One study conducted by Zhang et al. was focused on MG132 (Z-Leu-Leu-Leu-al), which is a structural component of peptide aldehydes selective and potent against the proteasome. Using the Insight II software, the proteins and ligands were prepared for docking. MG132 was then covalently docked to the β5 subunit of the 20S proteasome using GOLD version 4.0. The results showed that the docking of MG132 proposed two binding modes with low docking energies. More thorough analysis and the use of molecular dynamics simulations revealed that binding mode I was more stable than mode II. The computational methods utilized in this study resulted in the generation of a model that was able to re-examine the correlation of the structure and activity of proteasome inhibitors, specifically the interactions that take place at the P2 and P4 sites . Observing the binding mode is advantageous for the improvement of existing proteasome inhibitors but also for the development of more potent inhibitors.
Ma et al. used the binding mechanism of MG132 as a comparison for docking their own series of peptide aldehyde derivatives in which they synthesized. A total of 17 different peptide aldehydes were developed and are listed in Table 2. Eight of the peptides are in the Cbz class at the R4 position and the other nine peptides are in the Boc class at the R4 position. The 17 peptide aldehydes were then docked using GOLD software 4.0 with the β5 of the 20S proteasome based on the crystal structure of the first known inhibitor MG101 complexed with the 20S proteasome. The results of the docking experiment indicated that the size and length of the P3 side chain is critical to the activity of the peptide aldehyde. Compounds 3 and 4 which are part of the Cbz series synthesized by Ma et al. possess Glu(OtBu) residues at the P3 site providing the most active inhibition. The results from docking indicated that when a phenyl ester was used to replace a tert-butyl ester at P3 in the Boc-series, the Asp(OBzl) residue in compound 10 exhibited more active inhibition than Glu(OBzl) residue in compound 12. Also in the Boc-series, Ser(OBzl) in compound 15 has the most suitable length side chain because it demonstrated the most active inhibition to CT-L active site . The docking results generated from this study highlighted the importance of the P3-position substitutes are vital for inhibitor potency, which is essential for designing more effective proteasome inhibitors.
Peptide aldehydes are not the only compounds being considered as proteasome inhibitors for cancer therapeutics. Santoro et al. investigated whether or not cationic and anionic porphyrins can be used as inhibitors of the proteasome. Porphyrins are hydrophilic compounds that possess tumor localizing properties and are used in conjunction with red light for photodynamic therapy for the treatment of tumorous cells . Cationic and anionic porphyrins were docked using AutoDock Vina to the 20S proteasome complexed with Bortezomib (PDB: 2F16). The cationic porphyrin H2T4 demonstrated similar inhibitory activity in all three catalytic sites of the proteasome when observed during
3.2. Docking for identifying inhibitors of CAs
Besides proteasomes, several isoforms of carbonic anhydrases (CAs) have become an attractive anticancer drug target. Carbonic anhydrases are ubiquitous metalloenzymes broken up into four unrelated gene families; the α-CAs, β-CAs, γ-CAs, and δ-CAs. Mammals have 16 α-CAs isozymes that are different in their tissue distribution, catalytic activity, and subcellular localization . The α-CAs are of particular interest because they have well established catalytic and inhibition mechanisms . One α-CA in particular, CA IX, has potential to act as an anticancer drug target as it has the ability to act as a biological marker for certain tumors . CA IX is an extracellular transmembrane-bound protein located in the gastrointestinal tract. When the enzyme is present in hypoxic conditions, CA IX is overexpressed and is observed to be associated with different types of cancer cells via the hypoxia inducible factor-1 (HIF-1). Overexpression also causes the environmental pH of a tumor to be lowered to acidic conditions . The appeal of the CA IX as a potential anticancer drug is demonstrated by the fact that the enzyme has restricted expression in normal tissues (Table 2).
|Compounds||R4 position||P3 position||P2 position|
Amresh et al. used molecular docking and several others
3.3. Docking for identifying inhibitors of EGFR
The epidermal growth factor receptor (EGFR) is another enticing biological target in the development of anticancer therapeutics. The EGFR is a family of tyrosine kinases that regulate many developmental, metabolic, and physiological processes. Binding of the epidermal growth factor to the family of kinases leads to homodimerization or heterodimerization of the EGFR. Mutations of EGFR gene, over expressed copies of the gene and EGFR protein overexpression lead to dysregulated TK activity which is observed in many tumors . Overexpression of EGFR is frequently observed in breast, lung, ovarian, and prostate cancer and is associated with aggressive tumor behavior . The EGFR is the main activator in the downstream pathways for survival and growth signals such as p42/44 MAPK and PI3K/AKT pathways . Inhibition of these pathways leads to apoptosis of cancer cells, making the EGFR a particularly promising area of cancer research.
The mutations G719S, L858R, T790M, G719S/T790M, and T790M/L858R are commonly seen in patients with cancer because they modify the EGFR kinase activity . García-Godoy et al. used molecular docking in order to study the interactions of EGFR inhibitors on the wild-type EGFR and mutant EGFR. For the wild-type human EGFR, the EGFR (PDB: 4ZAU) was complexed with the ligand AZD9291. Docking was also conducted on the EGFR containing the G719S mutation and the L858R mutation. The EGFR (PBD ID: 2ITN) was used with the G719S mutation and the EGFR (PDB: 2 EB3) was used with the L858R mutation. Both EGFRs were in complex with AMP-PNP. Results of this docking study indicated that in both complexes, M793 was an important residue in facilitating interactions between the ligand and the active site . In the final docking study, docking was performed on the EGFR double mutants T790M/L858R and T790M/G719S. In the instance where the EGFR mutant T790M/L858R was docked, the EGFR (PDB: 4JR5) was used and it was complexed with the ligand 3QY. The double mutant EGFR T790M/G719S (PDB: 3UG2) was also used and it was complexed with getfitinib (PDB: IRE). In both of the docking studies, the results revealed that there is a critical interaction between the ligand and the Met793 residue in the active site of the mutant EGFR . Analysis of the results concluded that the interactions displayed in each case can be crucial evidence to why different cancer patients are more or less sensitive to certain treatments. This provides insight into how certain therapies should be considered circumstantial based on the mutation a patient may possess. The
Mahajan et al. discerned the value of the EGFR as a target for anticancer therapy; using molecular docking they were able to discover potential EGFR inhibitors. Screening of 50,000 compounds was performed by LigPrep (version3.3; Schrodinger, LLC, 2015) in order to prepare a library of drugs to be tested by several
3.4. Repurposing approved drugs to anticancer applications
Molecular docking for drug repurposing is another effective and beneficial method that many researchers utilize in order to discover new indications for already existing drugs. The technique is especially favorable when assessing different pharmaceuticals as potential anticancer therapies. Avastin, which was originally developed for metastatic colon cancer and non-small cell lung cancer, has now been approved for metastatic breast cancer. Rituxan, which was intended for non-Hodgkin’s Lymphoma has been repurposed for chronic lymphocytic leukemia and rheumatoid arthritis . Molecular docking to make predictions of the physical interactions between the ligand and the target has been a successful practice in drug repurposing.
Avastin and Rituxan are not the only two drugs that have been repurposed for anticancer therapeutics. Oliva et al. used molecular docking to aid in the study of repurposing the FDA approved psychotropic drug Chlorpromazine. Evidence had shown that Chlorpromazine had antiproliferative activity against colon and brain tumors . The drug accomplished this by inhibiting cytochrome c oxidase (CcO), which is the terminal electron acceptor enzyme of the mitochondrial respiratory chain and is composed of 13 subunits [74, 75] . Cytochrome c oxidase subunit 4 isoform 1 (COX4-1) was the focus of the study because in patients with glioblastoma, increased expression of COX4-1 has been associated with Temozolomide chemoresistance .
4. Identification of medicine for other prevalent diseases
Influenza, commonly referred to as the flu, is a viral infection that can be mild or severe, depending on the strain, and the host it infects. Due to the rapidly mutating nature of the influenza virus, new vaccines must be made and administered annually. Each year, researchers must determine which strains of the influenza virus are most likely to become prevalent in the coming flu season; annual flu vaccines are manufactured based on those recommendations . Unfortunately, there is always the threat that the virus may mutate after that decision has been made, rendering vaccines ineffective. In that case, flu outbreaks and even pandemics may occur. In a pandemic, vaccination will no longer be a feasible option, and antiviral agents will become a critical resource .
There are two types of antiviral drugs that have been used to treat influenza. The first marketed influenza antivirals were Adamantanes, specifically Amantadine and Rimantadine (Figure 8A, B). Adamantanes function by blocking the M2 proton channel . This class of drugs was effective against influenza type A, but drug resistance developed rapidly [79, 80]. Hayden et al. conducted a study in which 17 Rimantadine-resistant influenza strains were recovered from 13 patients . The M2 coding sequences of 17 resistant strains were then compared to 8 drug sensitive strains, and it was determined that all resistant strains had a nonsynonymous substitution in RNA segment 7. The most common mutation was S31N, which was found in 14 separate isolates. The other mutations found were A30V, A30T, and V27A. By 2009, all strains of influenza A had become resistant to Adamantanes .
The second class of influenza drugs is neuraminidase inhibitors. Neuraminidase, also referred to as sialidase, is an enzyme involved in the release of viral progeny. At the end of the viral replication cycle, neuraminidase cleaves O-sialic acid, also called NeuAc5 (N-acetyl-alpha-neuraminate), during the budding process that releases viral progeny that then infect other cells. Because inhibition of this enzyme greatly reduces the spread of the virus throughout the body, it is an attractive drug target . There are currently two neuraminidase inhibitors on the market: Zanamivir (Relenza) and Oseltamivir (Tamiflu). Zanamivir (4-guanidino-Neu5Ac2en) was created using computer-assisted rational design based on the X-ray diffraction structure of influenza neuraminidase, which was first solved by Varghese et al. (now PDB: 7NN9) . In further studies, Colman et al. characterized the active site of this protein, identifying a large pocket containing “an unusually large number of charged residues,” including R119 and E1201 . Von Itzstein et al. used GRID software to analyze the active site of influenza neuraminidase and its interactions with various novel inhibitors . The inhibitor with the most energetically favorable interactions was 4-guanidino-Neu5Ac2en, now known as Zanamivir. It was noted that one of the terminal amino groups of Zanamivir’s guanidyl group interacted with the glutamic acid 119 carboxyl group (Figure 9A, B). Von Itzstein et al. went on to conduct Zanamivir trials on influenza infected ferrets and mice, which validated the results of their computational studies . Hayden et al. conducted randomized double blind trials that concluded Zanamivir was both effective and safe for use to treat influenza A and B . The drug became FDA approved in 1999 and has since been used in conjunction with annual vaccines to prevent and minimize influenza outbreaks .
Malaria is an infectious disease caused by a parasitic protist and spread by mosquitoes. There are several different species of this parasite; the most deadly, and most prevalent is
The lactate dehydrogenase enzyme of
The Zika virus (ZIKV), named for the Ugandan forest in which it was originally found, was first isolated in monkeys . ZIKV belongs to a genus of viruses known as flaviviruses; other viruses belonging to this genus are dengue fever, yellow fever, hepatitis, and West Nile. ZIKV can be transmitted by mosquitoes or sexual contact. Symptoms of the virus include fever, joint pain, and rash for up to 7 days. ZIKV has also been associated with Guillain-Barre syndrome , an autoimmune disease. The virus can also be transmitted from mother to fetus, which can result in severe birth defects. From 2007 to 2014, several small outbreaks of the virus were reported [97, 98, 99]. In 2015, the first ZIKV epidemic began in Brazil. As outbreaks become more and more severe, it is becoming increasingly urgent to find a drug to treat ZIKV.
Non-structural protein 5 methyl transferase (NS5 MTase) is crucial for the maintained stability of a flaviviral genome, and the ability to evade immune response  which makes it an attractive target for antiviral activity. Zhang et al. used docking simulations (AutoDock 4.2) to determine potential designs for novel NS5 MTase inhibitors and binding sites ; the authors of this study found that dengue virus inhibitor compound 10 found by Lim et al.  (PDB: 3P8Z) may bind to ZIKV NS5MTase. Ramharack and Soliman utilized several different computational tools in their study. Preliminary methods included homology modeling, binding site prediction, and pharmacophore modeling . To narrow down the results from these studies, they used molecular docking [AutoDock Vina]. Out of 31 compounds subjected to docking studies, 3 were chosen for the next step, molecular dynamic simulation. It was concluded that two of their compounds showed “substantial stability in complex with the target enzyme (ZIKV NS5),” .
Hepatitis C is another virus that is closely related to Zika. Hepatitis C is commonly treated with polymerase inhibitors (Ribavirin and Sofosbuvir). Sacramento et al. used docking simulations (MODELER 9.16) to model binding between Hepatitis C polymerase inhibitors and Zika RNA polymerase (PDB: 4WTG) . These simulations, as well as their
Tuberculosis (TB) and infectious disease caused by
Shikimate kinase is a protein involved in an amino acid biosynthesis pathway in
Another response to drug resistance is drug repurposing. The advantage of drug repurposing is that potential drugs have already been shown not to have severe side effects, which speeds up the process and saves money. Studies of this nature often utilize molecular docking and other computational methods to save even more time and money by screening more potential drugs in a shorter time frame. Kahlous et al. selected 1991 FDA-approved (nonantibiotic) drugs and tested them for antibiotic activity against
5. Summary and discussion
The cost of bringing a drug to market and the amount of drug resistance profiles emerging are major factors that researchers need to address when designing a drug. It may cause more than 1 billion dollars and 10 years to bring a drug to the market . As we have collected millions of pharmaceutical compounds in a database like Pubchem  and ChEMBL , we will need several months or even years to screen them all manually or automatically in the lab, if it is possible we can obtain them all. More and more researchers are turning to computational methods to design drugs in an efficient manner that has the possibility to save money for pharmaceutical companies. While these
To summarize the cases we reported above (Please see Table 3):
Most of these projects were designed to recognize a new inhibitor(s) to an enzyme which plays an essential role in a key metabolic/proliferation pathway or the infectious procedure of a pathogen.
One or more determined PDB structures of the target protein with good resolution were used, and, often, the key residues of the catalytic reaction, binding/inhibition mechanisms, and drug resistances were revealed based on the docking results.
Other computational methods or tools were also used in sequence or in parallel, such as structure prediction, binding site prediction, pharmacophore model, QSAR model, and MD simulation.
Drug repurposing has received more and more attention.
|Disease||Target protein||PDB ID||Docking Software||Drug(s)||Purpose||Other computational method(s) used|
|HIV||Integrase||1QS4, 1BIS, 1BLE||GOLD||S-1360||To predict the binding mode|
|HIV||Protease||1HXB, 1OHR||AutoDock||Saquinavir, Nelfinavir||To predict the drug resistance||MD simulation|
|Cancer||Proteasome||2F16||Glide, GOLD||PI-083, MG132, peptide aldehydes||To identify new drug, to understand the binding mechanism||MD simulation|
|Cancer||Carbonic anhydrases IX||3IAI||AutoDock||ZINC03363328, ZINC08828920, ZINC12941947, ZINC03622539, ZINC1665054||To discover inhibitors||Post-docking energy minimization|
|Cancer||EGFR||2ITN et al.||AutoDock, jMetalCpp||AMPPNP, Dacomitinib, et al.||To study the effects of mutations||Optimization algorithms|
|Influenza||Neuraminidase||7NN9||GRID||Zanamivir||To analyze the active site|
|Malaria||M18 aspartyl aminopeptidase||4EME||GOLD, Glide||CHEMBL588000 et al.||To identify new drugs||Pharmacophore and QSAR models|
|Malaria||Lactate dehydrogenase||1LDG||MolDock||Itraconazole, Atorvastatin, Posaconazole||To select potential drugs|
|Zika||NS5MTase||3P8Z||AutoDock||New candidates||To identify new drugs||MD simulation|
|Zika||RNA Polymerase||4WTG||Modeler||Ribavirin, Sofosbuvir||To model and compare ligand binding|
|TB||Shikimate kinase||2DFN et al.||MOLDOCK||New candidates||To identify new inhibitors|
|TB||Shikimate kinase||2XCS, 2XCT, 2FUM||OpenEye HYBRID, Glide||Diclofenac et al.||Drug repurposing|
Enzymes and membrane proteins (receptors) are two major drug targets. According to previous studies, there is severe bias on the number of determined structures deposit on PDB [118, 119]. A large proportion of solved structures belong to soluble proteins, especially enzymes. It not only made structures of enzymes easier to obtain for molecular docking, but it also made scoring functions/force fields of molecular docking and other related computational approaches to be more accurate for enzymes than membrane proteins. However, we have noticed the importance of membrane receptors, glycol-proteins and non-structure proteins. How to create a reliable strategy to determine or predict the structures of these important drug targets remains a big challenge in molecular docking.
Drug resistance is also a major issue in the failures of treatment of both cancers and infectious diseases. Due to the advancement of docking calculation, we will be able to predict the possible drug resistances and side effects before the treatment or even the drug approval in the future. Therefore, the back-up drugs should be developed and utilized even before drug resistance occurs. The improved reliability of molecular docking also facilitates the precision medicine.
As we see in Figure 1, the computational approaches play key roles in different steps of the drug discovery process: obtaining the protein structures, binding site prediction, virtual drug screening, binding verification, binding affinity estimation, prediction of drug resistances, binding kinetic modeling, and so on. Molecular docking assists in achieving many objectives in the steps mentioned above effectively and efficiently. Often, it is cheaper, faster than performing experiments in the biological labs, and we can do even more than conventional approaches. For example, we can predict the potential side effects or drug resistances. Other computational tools such as (3D structure or binding site) prediction models, molecular dynamic simulation, and kinetic modeling have been also well established and applied in different steps of drug discovery to provide more information of target protein or drug efficacy, narrow down the searching spaces/reduce computational load, and/or validate the results of docking. Moreover, drug repurposing is another important application of molecular docking that helps us to enhance the cost- and time-effectiveness of drug development.
In the Era of “Big Data”, the accumulated number of protein structures and upgraded computation software and hardware generally improved all related computational methods, not just molecular docking. Based on the progress of the knowledge on protein folding, structural flexibility and molecular recognition, molecular docking has matured. As the core technology of virtual drug discovery, molecular docking will be widely applied to many stages of the drug discovery process.
- Colman et al. (1983) refers to Arg 119 and Glu 120 as Arg 118 and Glu 119. This text uses the more up to date numbering used in .