The energetic state of a protein is one of the most important representative parameters of its stability. The energy of a protein can be defined as a function of its atomic coordinates. This energy function consists of several components: 1. Bond energy and angle energy, representative of the covalent bonds, bond angles. 2. Dihedral energy, due to the dihedral angles. 3. A van der Waals term (also called Leonard-Jones potential) to ensure that atoms do not have steric clashes. 4. Electrostatic energy accounting for the Coulomb’s Law m protein structure, i.e. the long-range forces between charged and partially charged atoms. All these quantitative terms have been parameterized and are collectively referred to as the ‘force-field’, for e.g. CHARMM, AMBER, AMBERJOPLS and GROMOS. The goal of energy Minimization is to find a set of coordinates representing the minimum energy conformation for the given structure. Various algorithms have been formulated by varying the use of derivatives. Three common algorithms used for this optimization are steepest descent, conjugate gradient and Newton–Raphson. Although energy Minimization is a tool to achieve the nearest local minima, it is also an indispensable tool in correcting structural anomalies, viz. bad stereo-chemistry and short contacts. An efficient optimization protocol could be devised from these methods in conjunction with a larger space exploration algorithm, e.g. molecular dynamics.
- energy minimization
- minimum energy conformation
- force fields
- global minimum energy
- molecular dynamics simulations
- molecular modeling
Molecular modeling relies on the event of theoretical and computational methodologies, to model and study the behavior of molecules, from little chemical systems to big biological molecules and material assemblies. The applying fields of molecular modeling regard computational chemistry, drug design, computational biology and materials science. The fundamental computational technique to perform molecular modeling is simulation. Molecular simulation techniques need specific extra computational and code software system . Most molecular modeling studies involve three stages. Within the initial stage a model is chosen to explain the intra- and inter-molecular interaction within the system. The two most typical models that are utilized in molecular modeling are quantum mechanics and molecular mechanics. These models enable the energy of any arrangement of the atoms and molecules within the system to be calculated and permit the modeler to work out how the energy of the system varies because the positions of the atoms and molecules change. The second stage of a molecular modeling study is that the calculation itself, such as an energy Minimization, a molecular dynamics or Monte Carlo simulation, or a Conformations search. Finally, the calculation should be analyzed, not solely to calculate properties however additionally to see that it’s been performed properly .
In molecular modeling we tend to are particularly curious about minimum points on the energy surface. Minimum energy arrangements of the atoms correspond to stable states of the system; any movement off from a minimum provides a configuration with a better energy. There is also a really sizable amount of minima on the energy surface. The minimum with the very lowest energy is known as the global energy minimum. To spot those geometries of the system that correspond to minimum points on the energy surface we tend to use a Minimization algorithm. The highest point on the pathway between two minima is of particular interest and is understood as the saddle point, with the arrangement of the atoms being the transition structure. Both minima and saddle points are stationary points on the energy surface, wherever the primary derivative of the energy function is zero with regard to all the coordinates .
A geographical analogy is useful thanks to illustrate several of the ideas as during this analogy minimum points correspond to the lowest of valleys. A minimum is also represented as being in an exceedingly ‘long and slender valley’ or ‘a flat and featureless plain’. Saddle points correspond to mountain passes. Confer with consult with algorithms creating steps as ‘uphill’ and downhill’.
1.1 Energy minimization: a brief description about the problem
The Minimization problem can be formally stated as follows: given a function
With respect to present discussion, the most important functions are the quantum mechanics or molecular mechanics energy with the variables
Minimization algorithms can be classified into two categories: one in which we use derivatives of the energy with respect to the coordinates and second in which we do not use any derivative. Derivatives are extremely important because they have details about the shape of the energy surface and due to this efficiency to locate the minimum energy is increases drastically. For any problem, before choosing best algorithm (or algorithms), several points should be considered for e.g. the best Minimization algorithm should use least memory to generate the answer as quickly as possible. For different problems of molecular modeling, different Minimization method is used because we do not have any Minimization method developed which could be applied to all. Any method which is developed for efficient performance with quantum mechanics may or may not be compatible for molecular mechanics because quantum mechanics deals with models having very less atoms as compare to molecular mechanics. Another point is that procedures like inversion of matrix in some Minimization methods works fine for small systems but problem arises when number of atoms increases to thousands. To calculate the number of derivatives of different Conformations and their energies, different level of performance is required for quantum mechanics than molecular mechanics. Molecular mechanics requires an algorithm that is having more number of steps; quantum mechanics has the opposite scenario. Therefore we have various methods in various popular software packages .
As, most of the Minimization algorithms can only identify the minimum energy point which is closest to the starting point, thus it can be stated that they only move downwards or more appropriately downhill on the energy surface. Suppose, this schematic energy surface is shown in Figure 1, having three starting points A, B and C to obtained the minima. The locations at which any hypothetical ball stops rolling on energy surface under gravitational force will have corresponding energy minima. But more important thing is to identify global energy minimum which can only be generated by using different starting points, which will be minimized later. Using this criterion, some of the Minimization methods can move uphill to find out energy minimum than the closest one. But not a single algorithm till date is reported for efficiently identification of the global minimum energy from a random starting point. To identify the number of different minimum energy Conformations the shape of energy surface is very useful. For example, population or number of minimum in a deep and narrow valley will be very less than population at broad minimum because it is having higher energy as the vibrational energy is more widely spaced in the minimum and so less accessible. Therefore, the global energy minimum may not be the most highly populated minimum. Thus, there may be the case that the ‘functional’ structure (e.g. the biologically active conformation of a drug molecule) may not belong to the global minimum, or to the most highly populated conformation, or even to a minimum energy structure at all .
Every Minimization algorithm has a set of initial coordinates as input. These coordinates can be generated from different sources. These can be generated using traditional experimental method like X-ray crystallography or NMR. Alternatively, these can be generated by employing a theoretical method like conformational search procedure. But for practical efficiency, both types of methods can be used combinatorially. For example, the behavior of a protein in water can be studied using its x-ray generated structure. Then place this in a solvent completely. Monte Carlo or molecular dynamics simulation can generate the atomics or Cartesian coordinates of the solvent molecules.
For derivatives based Minimization methods, calculation of the derivatives of the energy is performed with respect to the different variables i.e. Cartesian or internal coordinates, as the case may be. These derivatives can be generated using either analytical or numerical procedures but derivatives obtained through analytical procedure are more preferred because these can be generated more readily and these are more exact. Although, if derivatives generated by only numerical procedure is available then one should use a non-derivative Minimization procedure as it is more efficient .
Although, in some situations it is always preferable to use derivatives generated though numerical procedure. By following way these can be generated: suppose there is a small alteration (
2. Non-derivative minimization methods
2.1 The simplex method
A geometrical figure having
The simplex algorithm identifies an energy minimum by traveling around on the potential energy surface in a manner that is similar to the movement of an amoeba. There are three possible primary moves. The most common move is a reflection of the vertex having maximum value on the opposite sides of the simplex. The reflection is used as an effort to produce a new point having a lower value. If this is the lowest energy point than any other points in the simplex then next move may be applied which is a “reflection and expansion.” Reflection move will be failed to generate a better point, when a “floor of the valley” is reached. In this situation, simplex will make simple contraction all along the highest point dimension. If this fails to further decrease the energy then another kind of move is possible. In this move, contractions occur in all the directions towards the lowest point. The Figure 2 illustrates above discussed three moves.
The vertices of the initial simplex have to be first generated before applying the simplex algorithm. The first conformation of the method fit to just one of these vertices. Rest of the points can be generated using various methodologies, e.g. simplest method is to increase a fix value to each coordinate successively. To calculate the functional value of the applicable vertex, the energy of the whole system is measured for each new point.
When the starting configuration of the system is having high energy, it is best to use simplex method. The simplex method is more helpful in this because it seldom go wrong in the identification of a fitter answer. Nonetheless, it requires large computational time for the analysis of the high number of energy instances. For e.g. to create the starting simplex needs 3
An important question is that what is the reason behind containing one extra vertex in the simplex than the degree of freedom? The answer to this is that of simplex is having lesser vertices than
2.2 The sequential univariate search method
It is seldom appropriate to use the simplex method for the calculations involve in quantum mechanics because in that case very high number of energy assessments has to be done. In this case a much befitting non-derivative procedure like the sequential univariate search method is well-advised . This procedure consistently repeats through the coordinates successively. For every coordinate, two new configurations are created by making changes in the present coordinates (i.e.
The minimum is bound to reach when the changes in all the coordinates are adequately very small. Alternatively, a new iteration is performed. In comparison to the simplex method, the sequential univariate method normally needs less function assessment. But if two or more coordinates have a strong connection or bonding then the sequential univariate search method may converge slowly. It also converges slowly when the energy surface is similar to a long narrow valley.
3. Derivative minimization methods
Most of the favorite Minimization procedures utilize derivatives because the information which is helpful in minimization is furnished by derivatives. The direction of the first derivative of the energy (the gradient) points where the minimum lies and the magnitude of the gradient tells about the steepness of the local slope. The energy of the system can be decreased by moving each atom with respect to the force acting on it. The force is equal to minus the gradient. Second derivatives point towards the curvature of the function. This information can be utilized to find out where the function will change its direction (i.e. pass through a minimum or any different non-moving point).
The energy functions which are often utilized in molecular modeling are seldom quadratic and thus the Taylor series expansion can only be a well-advised approximation. There are two crucial consequences of this. First is, for a pure quadratic function a given minimization procedure executes very well rather than for a molecular mechanics or quantum mechanics energy surface. For example, the Newton–Raphson algorithm can identify the minimum in a one step for a purely quadratic function. But, for a typical molecular modeling energy function, it needs to run several iterations. The second consequence is that, even though they may function very well close to a minimum, where the harmonic approximation is more logical, the harmonic approximation is very bad and far from minimum. Due to this some of the less robust methods will not be successful. Because of this reason Minimization protocol must be picked very carefully. A robust or may be inefficient method could be exploited earlier then a comparatively least robust but more efficient procedure.
On the basis of highest order derivatives used, the derivative methods can be classified. The first derivatives or the gradients based methods are called as first-order methods. Methods in which both first and second order derivatives are used are known as second-order methods. Because the simplex method does not use any derivatives can thus be called as a zeroth-order method.
3.1 First-order minimization methods
3.1.1 The steepest descents method
The steepest descents method moves in the direction parallel to the net force, which in our geographical analogy corresponds to walking straight downhill. For
Once the direction of movement is clearly characterized then it should be decided that how much distance to be covered along the gradient. Consider the two-dimensional energy surface of Figure 4. The gradient direction from the initial point is along the line shown. Suppose we have a cross-section through the surface along the line, the function will pass through a minimum and then increase, as shown in the figure . We can identify the minimum point by performing a line search or we can take a step of arbitrary size along the direction of the force.
3.1.2 Line search in one dimension
The goal of a line search is to find out the minimum along a specific direction (i.e. along a line through the multidimensional space) . In the very first step of the line search is to
Alternatively, we can set a suitable quadratic function to the three points. Then apply differentiation to this suited function to modify an approximation to the minimum along the line which should be identified analytically. To get a better approximate, a new function can be set then, as shown in Figure 5. Higher-order polynomials may yield an improved fit to the bracketing points but when these are utilized with functions that altered aggressively in the bracketed region, these higher-order polynomials can yield wrong interpolations. The gradient at the minimum point obtained from the line search will be perpendicular to the previous direction. Thus, when the line search method is used to locate the minimum along the gradient then the next direction in the steepest descents algorithm will be orthogonal to the previous direction (i.e.
3.1.3 Arbitrary step approach
As, we know that the line search may be computationally very expensive, New coordinates can be identified by walking a step of arbitrary size along the gradient unit vector
The largest inter-atomic forces indicate the direction of the gradient. Therefore, the steepest descent is more suitable for alleviating attributes of the highest-energy in the initial conformation. If the harmonic calculations corresponding to the energy is hypothesized badly and the initial point is distant from a minimum, even then the method performs strongly. But, in the case of downward movements in a long slender valley, the method uses short steps in high number and this causes trouble to the method. Although, it is not suitable manner to find out the minimum, the steepest descents process is bound to move in the right-angled direction at every point. The route constantly over compensates itself and vibrates. However, Subsequent steps reintroduce errors which were already rectified by prior steps .
3.1.4 Conjugate gradients minimization
The vibrating activity of the steepest descents procedure in slender depression is absent in the set of directions generated by the conjugate gradients methods. Rather, both the directions of consecutive steps and the gradients are orthogonal in the steepest descents method . More specifically, in the conjugate gradients method, the gradients are orthogonal in nature at every point and the directions of consecutive steps are
Both the conjugate gradients method and the steepest descents method move in the direction of the gradient in the first step. The line search method should ideally be used to find out the one-dimensional minimum in all direction to assure that each gradient is orthogonal to all preceding gradients and that each direction is conjugate to all preceding directions. However, at this stage random step procedure is also achievable . To identify the second point a line search should be applied along the line with gradient but it must pass through the point. Therefore, the conjugate gradient procedure identifies the perfect minimum of the function in just two moves.
3.2 Second order derivative methods
3.2.1 The Newton-Raphson method
Second-order methods utilize the information from both the first derivatives and the second derivatives to find out a minimum. First derivatives provide gradient information while second derivative furnish details about the curvature of the function. Having these properties, the Newton–Raphson method is the simplest second order method . For a strictly quadratic function of the first derivative the second derivative will be same everywhere. If we talk about a multidimensional function the Hessian matrix of second derivatives essentially be inverted. Thus, for larger molecules it is more computationally expensive because there are a large number of atoms present and this necessitates bigger storage. The Newton- Raphson method is thus more appropriate to small molecules (usually less than 100 atoms or so) .
As stated earlier, for a strictly quadratic function, the Newton–Raphson method requires just one step to locate the minimum from any point on the surface. Practically, the surface is exclusively quadratic to a first approximation and this necessitates a large number of steps to move. The Hessian matrix of second derivatives should be calculated first and then inverted at each step. This must be ‘positive definite’ in a Newton–Raphson Minimization method. A positive definite matrix is one for which all the eigen-values are positive. When the Hessian matrix is not positive definite then the Newton–Raphson method moves to saddle points where the energy increases, rather than a narrow point where energy decreases. Additionally, the harmonic approximation is not suitable at positions which are very far from the minimum because this leads to instability of the Minimization. This can be solved by employing a more efficient and robust method (prior to the application of the Newton–Raphson method) to find out minimum or to reach close to minimum (in case of the positive definite Hessian matrix) .
3.2.2 Quasi-Newton method
Computation of the inverse Hessian matrix can be a possibly long procedure that represents an important disadvantage to the ‘pure’ second derivative methods such as Newton–Raphson. Furthermore, analytical second derivatives could not be generated preferably. Variable metric methods which are also an alternative name to the Quasi-Newton methods gradually develop the inverse Hessian matrix in consecutive iterations. That means, a sequence of matrices
At each iteration
4. Which minimization method should 1 use?
The choice of Minimization algorithm is determined by a number of components, including the storage and computational requirements, the relative speeds with which the various parts of the calculation can be performed, the availability of analytical derivatives and the robustness of the method. Thus, any method that requires the Hessian matrix to be stored (let alone its inverse calculated) may present memory problems when applied to systems containing thousands of atoms. Calculations on systems of this size are invariably performed using molecular mechanics, and so the steepest descents and the conjugate gradients methods are very popular here. For molecular mechanics calculations on small molecules, the Newton–Raphson method may be used, although this algorithm can have problems with structures that are far from a minimum. For this reason it is usual to perform a few steps of Minimization using a more robust method such as the simplex or steepest descents before applying the Newton–Raphson algorithm Analytical expressions for both first and second derivatives are available for most of the terms found in common force fields. The steepest descent method can actually be superior to conjugate gradients when the starting structure is some way from the minimum. However, conjugate gradients are much better once the initial strain has been removed. Quantum mechanical calculations are restricted to systems with relatively small numbers of atoms, and so storing the Hessian matrix is not a problem. As the energy calculation is often the most time-consuming part of the calculation, it is desirable that the Minimization method chosen takes as few steps as possible to reach the minimum. For many levels of quantum mechanics theory analytical first derivatives are available. However, analytical second derivatives are only available for a few levels of theory and can be expensive to compute. The quasi-Newton methods are thus particularly popular for quantum mechanical calculations.
When using internal coordinates in a quantum mechanical Minimization it can be important to use an appropriate Z-matrix as input. For many systems the Z-matrix can often be written in many different ways as there are many combinations of internal coordinates. There should be no strong coupling between the coordinates. Dummy atoms can often help in the construction of an appropriate Z-matrix. A dummy atom is used solely to define the geometry and has no nuclear charge and no basis functions. Strong coupling between coordinates can give long ‘valleys’ in the energy surface, which may also present problems. Care must be taken when defining the Z-matrix for cyclic systems in particular. The natural way to define a cyclic compound would be to number the atoms sequentially around the ring. However, this would then mean that the ring closure bond will be very strongly coupled to all of the other bonds, angles and torsion angles. Some quantum mechanics programs are able to convert the input coordinates (be they Cartesian or internal) into the most efficient set for Minimization so removing from the user the problems of trying to decide what is an appropriate set of internal coordinates. For energy Minimizations redundant internal coordinates have been shown to give significant improvements in efficiency compared with Cartesian coordinates or non-redundant internal coordinates, especially for flexible and polycyclic systems . The redundant internal coordinates employed generally com- comprise the bond lengths, angles and torsion angles in the system. These methods obviously also require the means to inter-convert between the internal coordinate representation and the Cartesian coordinates that are often used as input and desired as output. Of particular importance is the need to transform energy derivatives and the Hessian matrices (if appropriate) .
5. Differentiating between minima, maxima and saddle points
A configuration at which all the first derivatives are zero need not necessarily be a minimum point; this condition holds at both maxima and saddle points as well. From simple calculus we know that the second derivative of a function of one variable, f’(x) is positive at a mini- minimum and negative at a maximum. It is necessary to calculate the eigenvalues of the Hessian matrix to distinguish between minima, maxima and saddle points. At a minimum point there will be six zero and 3 N — 6 positive eigenvalues if 3 N Cartesian coordinates are used. The six zero eigenvalues correspond to the translational and rotational degrees of free- freedom of the molecule (thus these six zero eigenvalues are not obtained when internal coordinates are used). At a maximum point all eigenvalues are negative and at a saddle point one or more eigenvalues are negative.
6. What should be the convergence criteria?
In contrast to the simple analytical functions that we have used to illustrate the operation of the various Minimization methods, in ‘real’ molecular modeling applications it is rarely possible to identify the ‘exact’ location of minima and saddle points. We can only ever hope to find an approximation to the true minimum or saddle point. Unless instructed otherwise, most Minimization methods would keep going forever, moving ever closer to the minimum. It is therefore necessary to have some means to decide when the Minimization calculation is sufficiently close to the minimum and so can be terminated. Any calculation is of course limited by the precision with which numbers can be stored on the computer, but in most instances it is usual to stop well before this limit is reached. A simple strategy is to monitor the energy from one iteration to the next and to stop when the difference in energy between successive steps falls below a specified threshold. An alternative is to monitor the change in coordinates and to stop when the difference between successive configurations is sufficiently small. A third method is to calculate the root-mean- square gradient. This is obtained by adding the squares of the gradients of the energy with respect to the coordinates, dividing by the number of coordinates and taking the square root. It is also useful to monitor the maximum value of the gradient to ensure that the Minimization has properly relaxed all the degrees of freedom and has not left a large amount of strain in one or two coordinates .
7. Applications of energy minimization
Energy Minimization is very widely used in molecular modeling and is an integral part of techniques such as conformational search procedures. Energy Minimization is also used to prepare a system for other types of calculation. For example, energy mini- Minimization may be used prior to a molecular dynamics or Monte Carlo simulation in order to relieve any unfavorable interactions in the initial configuration of the system . This is especially recommended for simulations of complex systems such as macromolecules or large molecular assemblies.
The energetic state of a protein is one of the most important representative parameter of its stability. The energy of a protein (E) can be defined as a function of its atomic coordinates, thus providing a quantitative criterion for model selection and refinement. This energy function consists of several components e.g. (1) Bond energy and angle energy, representative of the covalent bonds, bond angles. (2) Dihedral energy, due to the dihedral angles. (3) A van der Waals term (also called Leonard-Jones potential) to ensure that atoms do not have steric clashes. (4) Electrostatic energy accounting for the Coulomb’s Law m protein structure, i.e. the long-range forces between charged and partially charged atoms. All these quantitative terms have been parameterized and are collectively referred to as the ‘forcefield’. The goal of energy Minimization is to find a set of coordinates representing the minimum energy conformation for the given structure. Various algorithms have been formulated by varying the use of derivatives. The common algorithm used for this optimization is steepest descent, conjugate gradient and Newton–Raphson etc. These methods complement each other in search of the local minima. Therefore, a reasonable energy Minimization protocol involves few initial steps of steepest descent, followed by a larger number of conjugate gradient iterations. Although energy Minimization is a tool to achieve the nearest local minima, it is also an indispensable tool in correcting structural anomalies, viz. bad stereo-chemistry and short contacts. An efficient optimization protocol could be devised from these methods in conjunction with a larger space exploration algorithm, e.g. molecular dynamics.
The authors are grateful to the Sam Higginbottom University of Agriculture, Technology and Sciences, Allahabad, for providing the facilities and support to complete the present work.
Conflict of interest
The author declares no conflict of interest.