Treatment of Hydrogen Bonds in Protein Simulations

The hydrogen bond plays an essential role in maintaining the secondary structures of protein, and an accurate description of hydrogen bond interaction is critical in protein folding simulation. Modern classical force fields treat hydrogen bonding as nonbond‐ ed interaction, which is dominated by electrostatic interaction. However, in the widely used nonpolarizable force fields, atomic charges are fixed and are determined in a mean-field fashion. Applying nonpolarizable AMBER force field in the folding simulations of some short peptides, the native structure cannot be well populated. When polarization effect is introduced into the simulation by utilizing either the on-the-fly charge fitting or the polarizable hydrogen bond model, the native structure becomes prominent in the free energy landscape. These studies highlight the necessity of electrostatic polarization effect in protein simulation.


Introduction
The function and chemical properties of protein are determined by its three-dimensional structure. [1,3] The native structure of protein at physiological temperature and solvent condition, which is located at a free energy minimum, is a result of delicate balance between many competing interactions, including van der Waals forces, hydrophobic interaction, and electrostatic interaction, such as salt bridge and hydrogen bond (H-bond). [4,5] During folding process, the unfolded structure collapses first to minimize the area of hydrophobic surface that is exposed to the solvent and hydrogen bonds begin to form. Hydrogen bonds, providing most of the directional interactions, are the dominant feature in the formation of protein secondary or tertiary structures. They play an indispensable role in the stabilization of the native structure of a protein. α helix and pleated β -sheet were predicted in 1951 by Linus Pauling and Robert Corey on the basis of hydrogen bonding and cooperative criteria. [6] Therefore, correctly describing the thermodynamic properties of inter/intra-protein hydrogen bonds is essential.
Hydrogen bond is formed when an electronegative atom (H-bond acceptor, A) approaches a hydrogen atom bonded to another electronegative atom (H-bond donor, D). There exist many hydrogen bond donors and acceptors in proteins, such as the amide and carbonyl groups of the backbone, as well as the polar functional groups (amides, carboxyl groups, hydroxyl groups, and amines) in the side chains of some amino acids.
Different parameters can shed light on the different aspects of the nature of H-bonds established for a given system. The interaction energy is the best indicator of H-bond strength. The geometry of the H-bond complex, and in particular, the distance and angle around the H-bond also provide a very good indication of such interaction. When a hydrogen bond is formed, the distance between the H and the acceptor (A) atoms has to be smaller than the sum of their corresponding van der Waals radii. Furthermore, the D-H-A angle should be close to 180º. In practice, a cutoff of 120 or 90º is employed. Now, it is well accepted that, in neutral systems, H-bonds formed between O, N, or O/N pairs of atoms are very strong, whereas those formed between C-H and a π system are weak. The formation of an H-bond can usually cause red shift of IR spectroscopic bands, even though improper blue shifts have been also detected.

Quantum chemistry method
Quantum chemistry method is an ideal approach to calculate the energy of a hydrogen bonded system since all electronic and steric effects are fully taken into account. Molecular orbital methods, especially the post Hartree-Fock (HF) methods, are capable of providing an accurate description of systems containing hydrogen bonds. Thus, Møller-Plesset methods such as MP2 [7] can be considered ideal for the study of hydrogen bonds. However, even a single point calculation at MP2 level of medium-sized hydrogen bond systems can be very demanding. Density functional theory (DFT) methods such as B3LYP [8,9] is becoming more and more popular these days. It has been shown that B3LYP calculation provides very good description of strong H-bond systems, comparing with the experimental data or MP2 results. However, Kim et al. have shown that in the case of some systems with weak H-bond such as C-H π or O-H π, B3LYP does not offer the same level of reliability as MP2. [10] The basis set superposition error (BSSE) is another issue to be considered in the computation of H-bond complexes, which arises because of the incompleteness of the basis set functions. Applying the basis set of the complex to each molecule will yield a lower energy than that from the calculation with the basis set for the monomer itself. Therefore, BSSE introduces a nonphysical attraction between the two monomers. It has been shown that the BSSE correction of the potential energy surface leads to sizeable differences in the H-bond lengths for correlated calculations using split-valence basis sets with polarization and diffuse functions.
Although much progress has been made for investigating the nature of H-bonds utilizing quantum chemistry methods, few ab initio calculations have been carried out on the calculation of dynamic properties of hydrogen bond interaction for biomolecules due to relatively large size of systems. Thus, a more efficient computational method is desired.

Molecular mechanics method
Molecular mechanics is a popular computational method, in which the electronic degrees of freedom of the molecules are ignored, and only motions of the nuclei are calculated. The fundamental assumption underlying all molecular mechanics methodologies is the Born-Oppenheimer approximation, which allows separated motions for nuclear and electrons.
In molecular mechanics methods, the molecule of interest and surrounding solvent are typically treated as classical particles interacting through an empirically derived and computationally tractable potential energy function ("force field"). Molecular dynamics or Monte Carlo simulations can provide wealth of information to interpret, complement and design experiments. Most force fields were developed in the early 1980s, such as AMBER, CHARMM, and OPLS. In the first-generation AMBER force field, hydrogen bond energy is explicitly expressed as a 10-12 function as 10 . The inclusion of the first term is to prevent the occurrence of unrealistically short H-bonds and the second term is to "fine tune" the H-bond distances and energies to desired values.
This kind of potential function had not been changed until AMBER94 force field was proposed. In AMBER94, hydrogen bond is incorporated into electrostatics interaction due to the improved performance of the new RESP charge model and new van der Waals parameters. Current AMBER force fields, following the potential energy functional form of AMBER94, comprise bonded (bond, angle, and dihedral) and nonbonded terms (van der Waals and electrostatic interaction), which are parameterized against gas phase quantum chemical calculations, spectroscopic, and thermodynamics experimental data of small model molecules. The potential energy function has the following form.

Deficiency in the treatment of hydrogen bond
Utilizing molecular dynamics simulations to investigate the structural properties of proteins has been an important tool in biological science for decades. With continuous advancing of computer technology, the difficulty in time scale for protein simulation is lessened significantly. For example, D. E. Shaw utilized a specialized supercomputer, Anton, to investigate the folding mechanism of a series of proteins, and the simulation time was extended to microseconds. [11] A 58-residue protein BPTI conducted by Shaw also reached one millisecond. [12] However, successful application of computer simulation is impeded by the accuracy of force field employed. Defects in the potential energy could strongly bias the simulation results toward incorrect conformations.
An important deficiency that exists in current pairwise force fields is lack of polarization. [13 , 16] When an isolated molecule interacts with another molecule, or is placed in an external electric field, its charge distribution will be distorted due to polarization effect. Current force fields are amino-acid specific or mean-field like and, therefore, fail to give accurate representation of the electrostatics of the specific protein environment, which is highly inhomogeneous and protein specific. In a previous work of Zhang et al., [17] a short α -helix cannot fold to its native structure with AMBER03 [18] force field, despite that this force field has been proved to bias helical conformation. [19] In the past two decades, many attempts have been made to explicitly incorporate polarization effects into molecular modeling, [20] such as the fluctuating charge model, [21,22] Drude oscillator, [23,24] induced multipole, [25,29] and quantum mechanical treatment. [30,33]

Improvement in the treatment of hydrogen bonds
Based on a recently proposed fragment-based scheme (molecular fractionation with conjugate caps, MFCC), a new scheme termed polarized protein-specific charge (PPC) model has been proposed for protein dynamics. [34] PPC is derived from first principle quantum calculation and therefore can correctly represent the electronically polarized state of the protein. It has been proven that the polarization effect plays an important role in pK a shifts for ionizable residues, [34] hydrogen bond stability, [35,40] protein folding, and native structure stabilization, [17,41,45] dynamic properties of proteins, [46,48] protein-ligand binding affinity, [47,49,51] and protein-protein recognition specificity. [52] The polarization of backbone hydrogen bonding is found critical to the success of folding simulation of proteins. In those works, the polarization of hydrogen bonds is included in the simulations by performing on-the-fly quantum mechanical calculations to generate dynamically updated atomic charges. However, on-the-fly charge fitting is computationally expensive, in which quantum mechanical calculations must be periodically carried out to obtain new atomic charges for residues involved in hydrogen bond formation or cleavage. It is desirable to employ a simpler but effective method to include polarization of hydrogen bond for protein simulation. Thus, a new polarization model for the main chain hydrogen bond based on an analytical fitting of atomic charges was proposed, which is computationally more efficient. [53] In the fitting, a pair of alanine dipeptides with a backbone hydrogen bond was utilized as the model system, which is shown in Figure 1.
Atomic charges are calculated by a restrained fitting to the electrostatic potential generated from quantum chemistry calculation. [54,56] By varying the donor-acceptor distance but keeping the other degree of freedom fixed, the charge variation of atom N and O with respect to their asymptotic values are shown in Figure 1, both of which can be fitted to a single exponential function. In this approach, charge transfer between residues is not allowed and in which R is the donor-acceptor distance in angstroms. As shown in Figure 1, at the normal hydrogen bond length (∼3 Å), the transferred charge is only about 0.1 e and 0.13 e for N and O atoms, respectively. In the charge fitting procedure, mutual polarization was included. Specifically, in the calculation of one dipeptide, the other one was taken as background charges centered at atomic positions. The calculation was iterated until convergence was reached. When the two dipeptides are infinitely separated, the residues are given the corresponding AMBER charges. Thus, the charge variation can be determined by hydrogen bond length alone. Compared with PPC model, this polarized hydrogen bond (PHB) charge model is more efficient. In the current approach, only the main chain hydrogen bonds are considered. Hydrogen bonds involving side chain atoms, as well as salt bridges, also have strong electro-  static polarization effect, which may be taken into consideration by refined versions of this approach.

Folding of a short helix under the PPC model
Electrostatic polarization and charge transfer play important roles in protein folding. [57,58] However, the fixed PPC from a prefixed structure cannot present the dynamic properties of protein accurately, which undergoes a large conformational change in folding. A straightforward method is performing on-the-fly quantum calculation to generate dynamically updated atomic charge at each step of the molecular dynamics simulations. However, the computational demanding is too expensive. A good approximation is implementing polarization into molecular simulations when hydrogen bonds form or break, which is a good indicator of secondary structure. During simulations, only residues involved in the formation or breaking of hydrogen bonds are subjected to quantum calculation to refit atomic charges, while charges of the rest residues are kept to their previous values.
The folding of a short α -helix 2I9M, [59] which is a "de novo"-designed 17-residue peptide, was investigated by employing the dynamic PPC model in molecular dynamics simulations. For comparison, REMD simulation under AMBER03 force field is also performed. Starting from a linear structure, the peptide was successfully folded to its native state with the low backbone root mean square deviation (RMSD) under PPC model. In contrast, due to the lack of polarization effect, the hydrogen bonds of the peptide are too "loose" to form stable secondary structure under AMBER03 force field, as shown in Figure 2. Thus, polarized protein-specific charge, especially polarized hydrogen bond charge, plays a critical role in the formation of secondary structures.

The thermal stability of a short helix under the PHB model
REMD simulations with 12 replicas (temperatures 267, 283, 300, 328, 353, 380, 409, 439, 471, 506, 542, and 577 K) were also performed for investigating the impact of H-bonds on the thermal stability of the peptide. Comparing with standard molecular dynamics simulation at room temperature, the energy barrier can be surmounted much more easily at high-temperature replicas. [60] Two sets of MD simulations were performed using, respectively, the standard AMBER03 and the PHB model. In the PHB model, only atomic charges of donors and acceptors in hydrogen bonds are different, whereas other force field parameters were taken from the AMBER03 force field. The starting conformation was a linear structure generated using LEaP module in AmberTools and relaxed using energy minimization. The relaxed structure was heated to its target temperature for each replica in 100 ps. The solvation effect was modeled using the generalized Born (GB) model developed by Onufriev et al. [61] MD simulations were extended to 120 ns for each replica.
The structure with RMSD from NMR structure below 2 Å was classified as the folded state. The populations of the folded structure in two REMD simulations show a clear difference, as shown in Figure 3. At the lowest temperature (267 K), the percentage of folded structure using PHB charge model was nearly 30% higher than that in AMBER03 force field. At NMR experimental temperature (283 K), the fraction of folded structure in the PHB model is over 50%, whereas that in AMBER03 force field is only about 35%, despite the overestimated helical propensity of AMBER03 force field.
The melting temperature is defined as the temperature at which the free energies of the native and unfolded states are equal, indicating the equal population (50% each) of folded and unfolded states. In Figure 3, the melting temperature under AMBER03 force field is well below the NMR temperature of 283 K, which is obviously inconsistent with the experimental observation, suggesting 2I9M is thermally unstable. While under PHB model, the melting temperature is about 286 K, which is in agreement with the experiment. This result shows that hydrogen bond polarization is critical to the thermal stability of the helical structure of 2I9M.
The distribution of the peptide structure from the REMD simulation is determined by the free energy distribution, which is calculated by Weighted Histogram Analysis Method (WHAM) [62,63] using density of states. Figure 4 plots the free energy changes along the RMSD at 267, 283, and 300 K using both the AMBER03 and PHB model. The free energy curve under AMBER03 force field is first examined. At 267 K (top figure), the free energy curve has a global minimum at RMSD around 1.6 Å and also a local minimum at an RMSD value around 4 Å (unfolded state). The energy difference between the global minimum and the local minimum is only 0.5 kcal/mol, indicating comparable populations of structures are distributed at these two geometries. While the free energy curve under the PHB model displays two nearly degenerate minima at RMSD near 1.25 and 1.6 Å, both representing the folded structures. The free energy is essentially flat between the two minima. While at RMSD = 3.5 Å, the free energy curve shows a very shallow local minimum, which is about 1 kcal/mol higher than the global minimum. This free energy curve explains why the helix is thermally more stable under the PHB model. By increasing temperatures, the free energy curve becomes flatter and the minimum at 1.25 Å gradually disappears but the minimum at 1.6 Å remains, as shown in the middle and bottom figures of Figure 4. While under AMBER03 force field, comparing the free energy curves at 283 and 300 K with those at 267 K, we observe that the free energy difference between the folded and the unfolded states gradually diminished.
The above analysis implies the importance of polarization of backbone hydrogen bonding to the helix formation and the thermal stability of the helix, which is consistent with the observed effect of hydrogen bond cooperativity. [64,67] The distributions of the number of hydrogen bonds at 267, 283, and 300 K in both REMD simulations are plotted in Figure 5. The number of hydrogen bond is counted using a sigmoidal function of the donor-acceptor distance d as related to the strength of hydrogen bonds. It shows that the peak shifts from 3.8 under the AMBER03 charge to 5.0 under the PHB model at 283 K, indicating that hydrogen bonds are strengthened by including the polarization effect. Decreasing the temperature to 267 K, the difference between the AMBER03 charge and PHB model becomes more remarkable, whereas increasing the temperature to 300 K does the opposite.

Folding of a long helix under the PHB model
2KHK is a 53 amino acid straight long helix belonging to b30 to 82 domain of F1Fo adenosine-5triphosphate synthesis in Escherichia coli. In order to minimize the overall size of the water box in MD simulations with explicit water, a 10-ns MD simulation with implicit water was performed first. The last snapshot from this short MD simulation was extracted as the initial structure (shown in Figure 6) for explicit water simulation. Simulations were performed using both the standard AMBER03 force field and the PHB model, respectively.  The overall fluctuations of RMSD and Rg using AMBER03 charges are both larger than those using PHB model. The most populated state under PHB model has an RMSD of 2.1 Å, while under standard AMBER simulation, it is as large as 4.0 Å. The result indicates that hydrogen bond polarization helps the folding of long helix, which is also reflected from the fluctuation of Rg along the simulation time. Under PHB simulation, initially the peptide adopts a more compact structure than the native structure. After 75 ns, the Rg wanders around the experimental value with small fluctuation. While under AMBER03 force field, Rg gradually deviated from the native state. Snapshots of the intermediate states along the simulation time under PHB model are plotted in Figure 8. It is noted that although the structure near the N-terminal of the peptide is flexible, the helix in the middle region is becoming more and more stable as simulation goes on.
The two-dimensional free energy landscapes from MD simulations under both AMBER03 force field and PHB model are shown in Figure 9, using the RMSD and Rg of the backbone as the reaction coordinates.
Under AMBER03 force fields, due to the lack of polarization effect, enthalpy change may not be enough to compensate for the entropy loss in the formation of secondary structure. The global minimum located at RMSD of around 4.1 Å and Rg around 17.2 Å (versus 20.3 Å calculated from the nuclear magnetic resonance (NMR) structure). The representative struc- ture is obviously twisted from the middle of the peptide as shown in Figure 9 (top panel). Therefore, no stable native structure is founded in standard AMBER simulation. For comparison, the free energy landscape under the PHB model displays a global minimum at RMSD around 2.0 Å (folded state), as shown in Figure 9 (bottom panel). The global minimum obtained in the PHB model has an Rg value of about 20.0 Å, which is very close to the NMR value of 20.3 Å. There is also a local minimum at RMSD around 3.1 Å, which is a partially unfolded state. At this local minimum, the N terminal of the peptide is unfolded, but the middle helical fragment is almost intact. This free energy analysis shows that the helix is thermally more stable under the PHB model.

Summary
Among many factors that are important to protein stability, hydrogen bonds are found to correlate highly with the thermal stability. They are the dominant feature in protein secondary structures of α helices and β sheets. Recent theoretical studies demonstrated that the strength of hydrogen bonds from simulations under standard nonpolarizable force fields is underestimated due to the lack of polarization effect.
In this chapter, we present an accurate and efficient polarizable hydrogen bond model that can be implemented in molecular dynamics simulations. By investigating the thermal stability of a short α helical peptide and the folding of a long α helical peptide, it shows that including polarization effect is very important in maintaining the hydrogen bonds, stabilizing the structure, and folding protein to its native conformation. It also should be noted that folding simulations of peptides with β sheets or mixed helix-sheet structures are still very challenging. The difficulty lies in not only the lack of polarization effect but also the unbalanced secondary structure propensity in used force fields.