Molecular interactions including protein-protein, enzyme-substrate, protein-nucleic acid, drug-protein, and drug-nucleic acid play important roles in many essential biological processes, such as signal transduction, transport, cell regulation, gene expression control, enzyme inhibition, antibody–antigen recognition, and even the assembly of multi-domain proteins. These interactions very often lead to the formation of stable protein–protein or protein-ligand complexes that are essential to perform their biological functions. The tertiary structure of proteins is necessary to understand the binding mode and affinity between interacting molecules. However, it is often difficult and expensive to obtain complex structures by experimental methods, such as X-ray crystallography or NMR. Thus, docking computation is considered an important approach for understanding the protein-protein or protein-ligand interactions [1-3]. As the number of three-dimensional protein structures determined by experimental techniques grows —structure databases such as Protein Data.
Bank (PDB) and Worldwide Protein Data Bank (wwPDB) have over 88000 protein structures, many of which play vital roles in critical metabolic pathways that may be regarded as potential therapeutic targets — and specific databases containing structures of binary complexes become available, together with information about their binding affinities, such as in PDBBIND , PLD , AffinDB  and BindDB , molecular docking procedures improve, getting more importance than ever .
Molecular docking is a widely used computer simulation procedure to predict the conformation of a receptor-ligand complex, where the receptor is usually a protein or a nucleic acid molecule and the ligand is either a small molecule or another protein (Figure 1).
The accurate prediction of the binding modes between the ligand and protein is of fundamental importance in modern structure-based drug design. The most important application of docking software is the virtual screening, in which the most interesting and promising molecules are selected from an existing database for further research. This places demands on the used computational method: it must be fast and reliable. Another application is the research of molecular complexes.
Since the pioneering work of Kuntz et al.  during the early 1980s, significant progress has been made in docking research to improve the computational speed and accuracy. Over the last years several important steps beyond this point have been given. Handling efficiently the flexibility of the protein receptor is currently considered one of the major challenges in the field of docking. The binding-site location and binding orientation can be greatly influenced by protein flexibility. In fact, X-ray structure determination of protein–ligand complexes frequently reveals ligands with a buried surface area in the range of 70–100%, which can only be achieved as a consequence of protein flexibility . There are many interesting docking suites and algorithms that have shown significant progress in predicting near-native binding poses by making use of biophysical and biochemical information combination with bioinformatics.
Modeling the interaction of two molecules is a complex problem. Many forces are involved in the intermolecular association, including hydrophobic, van der Waals, or stacking interactions between aromatic amino acids, hydrogen bonding, and electrostatic forces. Modeling the intermolecular interactions in a ligand-protein complex is difficult since there are many degrees of freedom as well as insufficient knowledge of the effect of solvent on the binding association. The process of docking a ligand to a binding site tries to mimic the natural course of interaction of the ligand and its receptor via the lowest energy pathway . There are simple methods for docking rigid ligands with rigid receptors and flexible ligands with rigid receptors, but general methods of docking considering conformationally flexible ligands and receptors are problematic. Docking protocols can be described as a combination of a search algorithm, and the scoring functions (Figure 2).
The search algorithm should create an optimum number of configurations that include the experimentally determined binding modes. Although a rigorous searching algorithm would go through all possible binding modes between the two molecules, this search would be impractical due to the size of the search space and amount of time it might take to complete. As a consequence, only a small amount of the total conformational space can be sampled, so a balance must be reached between the computational expense and the amount of the search space examined. Some common searching algorithms include molecular dynamics, Monte Carlo methods, genetic algorithms, fragment-based, point complementary and distance geometry methods, Tabu, and systematic searches. On the other hand, scoring function consists of a number of mathematical methods used to predict the strength of the non-covalent interaction called the binding affinity. In all the computational methodologies, one important problem is the development of an energy scoring function that can rapidly and accurately describe the interaction between the protein and ligand. Several reviews on scoring are available in the literature [10-12].
There are three important applications of scoring functions in molecular docking. First is to determine the binding mode and site of a ligand on a protein. The second application is to predict the absolute binding affinity between protein and ligand. This is particularly important in lead optimization. The third application, and perhaps the most important one in structure-based drug design, is to identify the potential drug hits/leads for a given protein target by searching a large ligand database. Over the course of the last years, different scoring functions have been developed that exhibit different accuracies and computational efficiencies. Some of these commonly-used functions are: force-field, empirical, knowledge-based and consensus scoring.
The protein-ligand docking procedure can be typically divided into two parts: rigid body docking and flexible docking .
Rigid Docking. This approximation treats both the ligand and the receptor as rigid and explores only six degrees of translational and rotational freedom, hence excluding any kind of flexibility. Most of the docking suites employ rigid body docking procedure as a first step.
Flexible Docking. A more common approach is to model the ligand flexibility while assuming having a rigid protein receptor, considering thereby only the conformational space of the ligand. Ideally, however, protein flexibility should also be taken into account, and some approaches in this regard have been developed. There are three general categories of algorithms to treat ligand flexibility: systematic methods, random or stochastic methods, and simulation methods . Due to the large size of proteins and their multiple degrees of freedom, their flexibility may be the most challenging issue in molecular docking. The methods to address the flexibility of proteins can be grouped into: soft docking, side-chain flexibility, molecular relaxation and protein ensemble docking. They were described by Huang et al .
3. Experimental docking procedures
There are a number of excellent reviews of molecular docking methods and a large number of publications comparing the performance of a variety of molecular docking tools [1-3], . Following, we will describe the four-step procedure adopted in this study to perform the molecular docking.
3.1. Target selection
Ideally, the target structure should be determined experimentally by either X-ray crystallography or nuclear magnetic resonance, which can be downloaded from PDB; however, docking has been performed successfully in comparison to homology models or threading. The model should have good quality. It can be tested using validation software such as Molprobity . After selecting the model, it must be prepared by removing the water molecules from the cavity, stabilizing charges, filling the missing residues, and generating the side chains, all according to the available parameters. The receptor should be at this point biologically active and in the stable state.
3.2. Ligand selection and preparation
The type of ligands chosen for docking will depend on the goal. It can be obtained from various databases, e.g. ZINC or/and PubChem, or it can be sketched by means of Chemsketch tool . Often it is necessary to apply filters to reduce the number of molecules to be docked. Examples include the net charge, molecular weight, polar surface area, solubility, commercial availability, similarity thresholds, pharmacophores, synthetic accessibility, and absorption, distribution, metabolism, excretion, and toxicology properties. Many times the researchers design their own molecules such as those generated by us in the example that will be described in this work in the section 5.
This is the last step, where the ligand is docked onto the receptor and the interactions are checked. The scoring function generates a score depending on the best selected ligand.
3.4. Evaluating docking results
The success of docking algorithms in predicting a ligand binding pose is normally measured in terms of the root-mean-square deviation (RMSD) between the experimentally-observed heavy-atom positions of the ligands and the one(s) predicted by the algorithm. The flexibility of the system is a major challenge in the search for the correct pose. The number of degrees of freedom included in the conformational search is a central aspect that determines the searching efficiency . A good performance is usually considered when the RMSD is less than 2Å.
3.5. Docking software description
There are many algorithms available to assess and rationalize ligand-protein or protein-protein interactions, and their number is constantly increasing. Speed and accuracy are key features for obtaining successful results in docking approaches. Several algorithms share common methodologies with novel extensions focused on obtaining a fast method with accuracy as high as possible. The most common docking programs include AutoDock , DOCK , FlexX , GOLD , ICM , ADAM , DARWIN , DIVALI , and DockVision .
4. Application of molecular docking to a particular case — Biopolymers docked to dengue virus E protein
In the last decades, the incidence of Dengue disease has dramatically increased around the world. About 2.5 billion persons (two fifth of the world population) are exposed to the risk of contracting the disease. Every year, dengue virus (DENV) infects more than 50 million people, with approximately 22 000 fatal cases . The disease is endemic in more than 100 countries of Africa, America, Oriental Mediterranean, Southeast Asia, and the Western Pacific Ocean with the last two regions being the most affected by the disease. Before 1970 only nine countries suffered from the Hemorrhagic Dengue (HD) epidemics, number that in 1995 was multiplied for more than four. There are four antigenically distinct, but closely related, serotypes of dengue virus (DENV), which is a Flavivirus member of the family Flaviviridae . Each serotype has genotypes, which are virulent at several levels; nevertheless, the factors of virulence are not totally established . A better understanding of the mechanisms and the molecules involved in the key steps of the DENV transmission cycle may lead to the identification of new anti-dengue targets . In fact, the presence of two or more serotypes in the same geographical region implies a growing risk to population of contracting Hemorrhagic Dengue or Dengue Shock Syndrome (SSD) due to a phenomenon known as the Antibody – Dependent Enhancement (ADE). As a result, the diagnosis and treatment of dengue disease has become a world-wide global problem to deal with. The mature DENV virion contains three structural proteins: capsid protein (C), membrane protein (M), and envelope protein (E). In particular, the DENV E glycoprotein (51-60 kDa~ 495 aa), found on the viral surface, is important in the initial attachment of the viral particle to the host cell, as it contains two N-linked glycosylation sites at Asn-67 and Asn-153. While the glycosylation site at position 153 is conserved in most flaviviruses, the site at position 67 is thought to be unique for dengue virus. N-linked oligosaccharide side chains on flavivirus E proteins have been associated with viral morphogenesis, infectivity, and tropism [27, 28]. In addition, E protein is closely associated with the lipid envelope containing a cellular receptor-binding site (s) and a fusion peptide . It can be found in a form of a homodimer on the surface of the mature virion, and inside the cell, it creates a prM-E heterodimer together with the prM protein. E protein is the principal component of the virion surface, containing the antigenic determinants (epitopes) responsible for the neutralization of the virus and the hemagglutination of erythrocytes, inducing thereby an immunological response in the infected host . On native virions, the elongated three-domain E molecule is positioned tangentially to the virus envelope in a head-to-tail homodimeric conformation. Upon penetration of the virion into the target cell endosome, E dimers are converted into stable target-cell membrane-inserted homotrimers that reorient themselves vertically to promote virus-cell fusion at low pH . Furthermore, there is a great deal of evidence that E protein contains the majority of molecular markers for pathogenicity. Comparing the nucleotide sequence of the E protein gene in different flaviviruses has demonstrated a perfect conservation of 12 cysteine residues, which form six disulfide bridges. The structural model for the E protein was refined by Mandl and co-workers , who correlated the structural properties of different epitopes with disulfide bonds .
4.1. Biopolymers as potential adjuvants carriers
The aim of this work is to study the docking of monomers of polyvinylpyrrolidone (PVP), chitosan (CS), and chitosan-tripropylphosphate-chitosan (CS/TPP/CS) with E protein of dengue virus in order to use them as potential adjuvant carriers. Given their structure, these polymers have specific molecular anchor sites that are expected to be exploited to induce antigenic specificity to the conserved regions of dengue virus. Several authors report that the E protein produces immunity and confers protection against infection in mice with low levels of neutralizing antibodies [33-35]. Because of the dual role of its receptors as well as the cell entry through membrane fusion, the E protein, apart from being the most exposed protein, is the main target against which the neutralizing antibodies are produced to inhibit its functions.
At present, the biggest challenge in developing an efficient dengue vaccine is to achieve a life-long protective immune response to all 4 serotypes (DEN1-4) simultaneously. Although several vaccines are currently being developed, so far only a chimeric dengue vaccine for live attenuated yellow fever (YF) has reached stage 3 in clinical trials. The candidate vaccines can be divided into the following types: (a) live attenuated, (b) DEN-DEN and DEN-YF live chimeric virus, (c) inactivated whole virus, (d) live recombinant, (e) DNA, and (f) subunit vaccines .
Chitosan is a polycationic polymer comprising of D-glucosamine and N-acetyl-D-glucosamine linked by β(1,4)-glycosides’ bonds. It is produced by deacetylation of chitin, which is extracted from the shells of crabs and shrimp. It is a linear, hydrophilic, positively charged, water soluble biopolymer, can form thin films, hydrogels, porous scaffolds, fibers, and micro and nanoparticles in mild acidity conditions. As a polycationic polymer, it has a high affinity to associate macromolecules such as insulin, pDNA, siRNA, heparin, among others, with antigenic molecules, protecting them in turn from hydrolytic and enzymatic degradation .
Polyvinylpyrrolidone (N-vinyl-2-pyrrolidone, PVP) has chemical, physical and physiological properties which have been exploited in various industries, including but not limited to medical, pharmaceutical, cosmetic, food, and textile, due to its biological compatibility, low toxicity, tackiness, resistance to thermal degradation in solutions as well as inert behavior in salt and acidic solutions [38, 39]. It is a water soluble homopolymer with a wide range of molecular weights (2.5 to 1.200 kDa), molecules between 12 and 1350 monomers, and end-to-end distances ranging from 2.3 to 93 nanometers. It is physically and chemically stable; it tolerates heating and air atmospheres for up to 16 hours at 100 °C, as well as the change of appearance for 2 months at 24 ° C and 15% HCl. When heated with strong bases such as lithium carbonate, trisodium phosphate or sodium metasilicate, it generates a precipitate due to the ring opening and subsequent crosslinking of chains. Yen-Jen et al. studied its effect as a drug deliverer and intracellular acceptor .
4.1.1. Molecular docking
In this work, the molecular docking calculations were performed using the AutoDock program. In particular, it uses a Lamarckian genetic algorithm (LGA) and a force field function based approximately on the AMBER force field, which consists of five terms: 1) the 12-6 dispersion term of Lennard-Jones, 2) a 12-10 directional hydrogen bonding term, 3) an electrostatic Coulomb potential, 4) an entropic term, and 5) one term of desolvation pairs. The scaling factor of these terms is empirically calibrated using a set of 30 structurally-known protein-ligand complexes, which affinities have been experimentally determined. The AutoDock program has become widely used due to its good precision and high versatility; moreover, the latest version of AutoDock (version 4.0) added flexible functions to the side chains in the receptor.
4.1.2. Model preparation
In this study we used the E protein of dengue virus. It consists of a dimer with 394 amino acids (aa) per monomer and, as mentioned before, it is the main component of the dengue virus envelope. E protein enters the cell by fusion with the membrane due to a previous conformational change produced by a low pH, generating thereby a change of form from dimer to trimer, in which the fusion peptide between the II and III domains is exposed. When the pH is lower than 6.3, dimers dissociate from dimer phase, making the I and II domains rotate outwards and exposing the fusion loop, which interacts with the endosomal membrane of the cell. Domain III then rotates backwards to pull the I and II domains, which were already bound to the cell membrane by the fusion peptide, thus attaching the cell membrane with the membrane of the virus in order to release the RNA [27, 29, 41-45]. It is important to mention that Bressanelli showed that the virus domains remain at neutral pH but their relative orientation is altered . For best results during the molecular docking process, we optimize the original model of the dengue virus protein E (PDB code 1OKE) with a number of refinements and validations cycles with Phenix and Molprobity programs respectively. Figure 3 shows the corrected model of the dimer and trimer.
4.1.3. Ligands preparation
4.1.4. Molecular docking
Molecular docking was performed by means of the AutoDock program that combines rapid grid-based energy evaluation and efficient search of torsional freedom. This program uses a semi-empirical free energy force field to evaluate the conformations during the docking simulations. The force field is quantified using a large number of protein-inhibitor complexes, for which the inhibition constants (Ki), are known. The force field evaluates the union in two steps, first when the ligand and the protein are separated. Then, the intramolecular energies are estimated for the transition from the unbound state to the protein-ligand bound state. In the second step, intermolecular energies are evaluated by combining the ligand with the protein conformations bound to themselves. The force field includes six pairs-wise of evaluations (Vi) and an estimated loss of conformational entropy after binding (ΔSconf):
where L refers to the ligand and P to the “protein” in a ligand-protein docking. Each of the pair-wise energetic terms includes evaluations for dispersion/repulsion, hydrogen bonding, electrostatics, and desolvation .
The calculations can be summarized in the following four steps: (1) preparation of files using AutoDockTools coordinates, (2) pre-calculation of atomic affinities by using AutoGrid, (3) docking of ligands by using AutoDock, and (4) analysis of the results applying AutoDockTools.
4.2.1. Amino acids of interest in the dengue virus infection mechanism
In the loop conformation, several amino acids are involved in trimerization of unit E of DENV. These amino acids are of particular interest since they are allocated in between I and II domains, the fusion loop of the host cell located between domains II and III, and aa268-270 (kl loop). Also are important, the loop of fusion to the host cell located between domains II and III, which subsequently is exposed in the trimer with the aa98-111 fusion peptide, and the C-terminal of domain III, which holds the protein to the virus membrane. Other important amino acids were mentioned by Mazumder , who made a structural analysis of the dengue virus E protein of the 4 serotypes in order to find the conserved and exposed sites as well as the epitopes in the T-cell. In our study, we additionally considered the sites of interest described by Yorgo Modis [42, 43] (Table 1).
In addition to the ten conserved regions presented in Table 1, we predicted around 740 E proteins of the 4 serotypes, some of which are included in the same Table 1. Their sequences were quantified using Shannon's entropy  with a variation from 0.3 to 1.1 bits.
The analyzed proteins can be identified as: N8-G14, V24-D42, R73-E79, V97-S102, D192-M196, V208-W220, V252-H261, G281-C285, E314-T319, E370-G374, and K394-G399; whereas the hidden amino acids, which change to exposed amino acids in the trimer, can be listed as follows: M1, H244, K246, G254, G330 and K344; and the exposed residues that remain hidden include the following: S16, Q52, Q167, S169, P243, D290, Q293, S331 and E343. It is worth mentioning that we have identified at least 14 conserved negative sites in at least 3 of the 4 serotypes (C3, C60, R73, T189, F213, A267, F306, T319, S376, F392, K394, S424, G445, and V485). The importance of this discovery relies on the fact that it has demonstrated that the epitopes with negative sites work better as vaccines than those with positive sites as they are less likely to change due to their functional restrictions.
4.2.2. Dengue virus E protein — PVP docking
The docking of PVP molecules with the E protein of dengue virus has demonstrated that the interface of the I and II domain was the most energetically favorable site for the binding (Figure 5). The interaction between protein and ligand takes place by establishing 8 hydrogen bonds with the Asn124, Lys202, and Asp203 amino acids (Table 2). This region is extremely important for the pivotal role it plays in the conformational changes triggered by low pH, which in turn is closely related to the infectivity of the virus. In particular, the PVP molecule, which interacts with aa124,202,203 in the E protein-BOG ligand complex, could act as a blocker of the kl aa268-270 pitchfork activity, which is responsible for the conformational changes in the E protein at low pH. In other words, it could inhibit their function to work as a hinge for conformational changes due to its proximity to amino acids through steric hindrance, preventing thereby the hinge action between the I and II domain, which in turn could stiffen the area. Alternatively, if BOG ligand is absent, the molecule could be internalized into the hydrophobic pocket and replace it, but the subsequent molecular prediction simulations would be required to determine how it could act in the presence of low pH, in order to find out whether the conformational changes would appear or be inhibited. The PVP is well-known to be highly stable at acid pH and high temperatures, so its structural integrity is assured to remain intact; the loop or kl pitchfork amino acids mutate, resulting in an increase of the pH threshold, at which conformational changes occur. It is achieved by replacing long hydrophobic side chains by the short ones. As the result, the site can be consistently represented as a potential trigger in the virus replication cycle and a good candidate to inhibit its function (Figure 5).
4.2.3. Dengue virus E protein — CS docking
The docking of CS molecules in the E protein of dengue virus resulted in the interaction with the interface of domain I and II of the protein (see Figure 6). The CS ligand binds to seven amino acids of E protein by ten hydrogen bonds (see Table 3). The elongated CS molecule settles into a channel formed in the II domain surface of the protein. Additionally, it interacts with amino acids near the kl hinge or loop of I and II domain interface. There is a remarkable familiarity between the BOG and NAG complexes. Amino acids-CS molecule interactions, which are shown in Table 10 (aa65,68,202,249,251,272,273), suggest that the mechanism of action of this molecule is similar to PVP ligand. Additionally, it is very close to the conserved region V252-H261 that forms a channel in the 4 serotypes. This finding is of the highest importance since it could very well serve as a ligand for the 4 serotypes, and it could be even more useful in the development of a chimera vaccine with the four domains III of E protein, which would be similar to the chimeric vaccine developed in India at the International Centre for Genetic Engineering and Bionanotechnology.
4.2.4. Dengue virus E protein — CS/TPP/CS docking
In this case, we used the CS and TPP monomers taking into account that the CS units form bindings by means of 1-4 beta bonds. Similarly to the E protein–PVP docking, the molecular docking between the CS/TPP/CS ligand and the E protein was carried out between the domains I and II, although we observed more interactions in the case of PVP monomer. Table 4 and Figure 7 illustrate seven interactions between the amino acids and the BOG. The CS-TPP-CS complex interacts with aa49, 124, 126, 200, 202, 203, 271 amino acids, and the docking results suggest that these three molecules are attracted the most to the area formed by the hydrophobic pocket, indicating that the latter molecule has a direct interaction with the BOG ligand oxygen.
We have reviewed the key concepts and current experimental procedures, including the recent advances in protein flexibility, ligand sampling, and scoring function. In addition, challenges and possible future directions were addressed in this chapter. As an example of protein ligand study we analyzed the interaction between the dengue virus E protein and Polyvinylpyrrolidone and Chitosan biopolymers and we confirmed that PVP, CS, and CS/TPP/CS biopolymers can fulfill the function of adjuvant carriers in the potential development of a chimeric dengue vaccine against the 4 serotypes of dengue virus. Furthermore, the ring-shaped molecules have shown affinity to or preference for a place of vital importance in the virus’s cycle of infection and replication, which placed us on the path to develop an inhibitor of the aforementioned conformational changes (see Figures 5-7). Their binding to the E protein is possible due to the great affinity they present to simulated molecules. However, further analysis of molecular simulation is required to determine the behavior of the protein without the presence of BOG ligand or in different environmental conditions in the presence of low pH.