Reaction Path Optimization and Sampling Methods and Their Applications for Rare Events

Reaction mechanisms are an important tool for chemists in the determination of thermodynamic and kinetic properties of chemical reactions.(Hanggi & Borkovec 1990; Heidrich 1995; March 1992; Tolman 1925) The mechanisms are integral in the understanding of detailed molecular or chemical transitions from one equilibrium state (reactant) to another equilibrium state (product). In computational chemistry, the reaction mechanism is often represented as a reaction path on the Born-Oppenheimer potential energy surface (PES) of the system of interest through construction of a potential energy function of the nuclear coordinates.(Bader & Gangi 1975; Lewars 2011; Mezey 1987; Truhlar 2001; Wales 2003) The PES serves as an important theoretical construct to provide a framework to describe the transition between different states in detail. The equilibrium states correspond to local minimum on the PES with zero first order derivatives (gradient) in all directions and all positive eigenvalues of the second order derivative (Hessian) matrix, excluding rotation and translation degrees of freedom. The transition states (TSs), based on the transition state theory (TST),(Doll 2005; Eyring 1935; Laidler & King 1983; Pechukas 1981; Truhlar et al. 1983; Wigner 1938; Yamamoto 1960) are the first order saddle points with zero gradient and only one negative eigenvalue of the Hessian matrix. The equilibrium states are often easy to identify through experimental or computational studies. Understanding the detailed transition process between equilibrium states is of more interest in research, but unfortunately is very difficult to study experimentally. On a given PES, one can imagine that there could exist an infinite number of possible routes connecting two predefined states on that surface. However, not every route has the same weight in elucidating of reaction mechanisms. In the static point of view, the minimum energy path (MEP) is the route that needs the least amount of potential energy for the system to undergo the transition. The MEP connecting two local minima must go through one or more TSs, and is identified as a representative reaction path. In the statistical point of view, the minimum free energy path (MFEP) is the most probable transition path connects two metastable states. The simulation of the systems either through molecular dynamics (MD) or Monte Carlo (MC) sampling on the PES could generate an ensemble of transition paths, from which the MFEP can be identified. Both an MEP and an MFEP can be used to predict important properties, such as a reaction’s kinetic isotope effect.


Introduction
Reaction mechanisms are an important tool for chemists in the determination of thermodynamic and kinetic properties of chemical reactions.(Hänggi & Borkovec 1990;Heidrich 1995;March 1992;Tolman 1925) The mechanisms are integral in the understanding of detailed molecular or chemical transitions from one equilibrium state (reactant) to another equilibrium state (product).In computational chemistry, the reaction mechanism is often represented as a reaction path on the Born-Oppenheimer potential energy surface (PES) of the system of interest through construction of a potential energy function of the nuclear coordinates.(Bader & Gangi 1975;Lewars 2011;Mezey 1987;Truhlar 2001;Wales 2003) The PES serves as an important theoretical construct to provide a framework to describe the transition between different states in detail.The equilibrium states correspond to local minimum on the PES with zero first order derivatives (gradient) in all directions and all positive eigenvalues of the second order derivative (Hessian) matrix, excluding rotation and translation degrees of freedom.The transition states (TSs), based on the transition state theory (TST), (Doll 2005;Eyring 1935;Laidler & King 1983;Pechukas 1981;Truhlar et al. 1983;Wigner 1938;Yamamoto 1960) are the first order saddle points with zero gradient and only one negative eigenvalue of the Hessian matrix.The equilibrium states are often easy to identify through experimental or computational studies.Understanding the detailed transition process between equilibrium states is of more interest in research, but unfortunately is very difficult to study experimentally.On a given PES, one can imagine that there could exist an infinite number of possible routes connecting two predefined states on that surface.However, not every route has the same weight in elucidating of reaction mechanisms.In the static point of view, the minimum energy path (MEP) is the route that needs the least amount of potential energy for the system to undergo the transition.The MEP connecting two local minima must go through one or more TSs, and is identified as a representative reaction path.In the statistical point of view, the minimum free energy path (MFEP) is the most probable transition path connects two metastable states.The simulation of the systems either through molecular dynamics (MD) or Monte Carlo (MC) sampling on the PES could generate an ensemble of transition paths, from which the MFEP can be identified.Both an MEP and an MFEP can be used to predict important properties, such as a reaction's kinetic isotope effect.
Although one could generate a reduced PES for complex systems with selected reaction coordinates, (Klähn et al. 2005;Shi et al. 2008;Tao et al. 2009a) it is rarely practical to construct a reduced PES of a system of interest to identify a reaction path connecting two minima.Moreover, reducing a complex multi-dimensional system to a simplified pathway is a form of data reduction.This reduction is non-unique and a choice imposed in this reduction will affect the quality and applicability of the results.It is more feasible to search for the MEP or MFEP directly on a given PES.Given the complexity and high degree of freedom of most systems of interest in chemistry, molecular biology and materials science, there has been a rapid development in methodologies for reaction pathway identification in large systems.As an attempt to reflect the current development and to better understand the consequences of specific methodological choices, this chapter reviews the recent progress of various methods to identify reaction paths with or without knowing the TS(s) a priori.Specifically, it emphasizes the applications of these methods in macromolecular systems.It should be noted that identifying saddle points on a PES alone is not the focus of this report.This report does not cover the geometry optimization including equilibrium and transition structures, (Farkas & Schlegel 2003;Henkelman et al. 2000a;Olsen et al. 2004;Schlegel 1982Schlegel , 2003Schlegel , 2011) ) conformational sampling, (Beusen 1996;Leach 1991;Parish 2002) or global minimum search methodologies, (Floudas & Pardalos 1996;Horst 1995Horst , 2000;;Torn 1989) which are all important for the studies of computational chemistry and biology.Other related topics, including enhanced sampling methods, (Earl & Deem 2005;Hamelberg et al. 2004;Lei & Duan 2007;Okur et al. 2006;Sugita 1999;Swendsen & Wang 1986;Thomas et al. 2005;Wen et al. 2004) simulation of nonequilibrium states, (Bair et al. 2002;Cummings & Evans 1992;Hoover 1983;Hoover & Hoover 2005;Kjelstrup & Hafskjold 1996;Li et al. 2008;Mundy et al. 2000) and minimization methods, (Bonnans 2003;Dennis & Schnabel 1996;Fletcher 2000;Gill 1982;Haslinger & Mäkinen 2003;Nocedal 2006;Scales 1985) are not covered in this chapter either.The curious readers are welcome to read cited references for more information.

PES walking methods
Without the intention to generate a complete PES, it is logical to develop methods to explore the PES by walking along the surface from certain starting points using local information of the PES, such as the energy, gradient and even the Hessian.(Hratchian & Schlegel 2005a;Schlegel 2003Schlegel , 2011) ) In this way, one can start from somewhere on the PES, either reactant, product, or TS, and walk uphill or downhill, depending on the starting points to reach the adjacent stationary points.The walking trajectories, after successfully reaching these stationary points, are the reaction pathways that describe the mechanism of transitions.For smaller systems, walking methods may be sufficient to fully understand a given reaction mechanism.However, for larger and more complex systems, pathways explored by such walking mechanisms are often not reversible, and can show significant hysteresis that results in a poor representation of the reaction.

Reaction path following methods
When using mass-weighted Cartesian coordinates, a steepest descent path from the TS down to the reactant and product is referred to as the intrinsic reaction coordinate (IRC) path.(Fukui 1981;Quapp & Heidrich 1984;Tachibana & Fukui 1980;Yamashita et al. 1981) The steepest descent pathway is given by the differential equation where x is the vector of Cartesian coordinates, s is the step size of the path, and g is the energy gradient at x.The path obtained by solving this equation is the IRC, when x is massweighted.Numerous methods were developed to locate the TS on a PES.(Baker 1986;Banerjee et al. 1985;Bell & Crighton 1984;Cerjan 1981;Ionova & Carter 1993, 1995;Jensen 1983;Müller & Brown 1979;Peng et al. 1996;Simons & Nichols 1990) The IRC could be optimized based on its variational nature.(Bofill & Quapp 2011;Quapp 2008) However, it is more applicable for many systems to construct the IRC by solving Eq. ( 1) from a TS.Gonzalez and Schlegel developed the implicit trapezoid method (GS−IRC) for reaction path following at second order accuracy.(Gonzalez & Schlegel 1989, 1990) In their initial development, the points along the target reaction path are constructed by constrained optimization using internal degrees of freedom of the molecules.For each step of optimization along the path, the new point is constructed and optimized so that the gradient at each point is tangent to the path.Therefore, the resulting path is both continuous and differentiable.This initial method is correct to second order in the limit of small step size.
The same method was later developed up to sixth order accuracy.(Gonzalez & Schlegel 1991) The GS−IRC method is generally efficient for small systems.
To improve the computational efficiency, the velocity Verlet algorithm (Verlet 1967) to propagate a classical dynamics trajectory was applied to integrate the IRC with a magnitude of the velocity damping for each step.(Hratchian & Schlegel 2002) This method is referred as the damped velocity Verlet (DVV) algorithm.The time step for each integration step is adjusted to ensure that the damped trajectory stays close to the IRC.The DVV-IRC method can be considered as running downhill along the PES from TS in a slow motion (by damping the velocity at each step).It enjoys the stability of the Verlet integrator and low cost of computation since the Hessian does not need to be calculated.
In their later work, Hratchian and Schlegel introduced an approach using a Hessian based predictor-corrector (HPC) integrator to solve Eq. ( 1).(Hratchian & Schlegel 2004) The HPC integrator comprises two steps: the predictor step and the corrector step.The gradient g and Hessian H of the system PES are used to calculate the predictor step with second order accuracy.Then, the correction of the predicted step is calculated through a modified Bulirsch-Stoer algorithm based on the gradient information at the predicted step.(Bulirsch & Stoer 1964, 1966a, b) Although, the HPC-IRC method is comparable to GS-IRC with fourth order accuracy, calculation of the Hessian at each step can be rather expensive for large systems.This bottleneck was resolved by applying a Hessian updating scheme in their later development.(Hratchian & Schlegel 2005b) For each step of an IRC calculation, the Hessian is not calculated de novo, but updated from the Hessian of the previous step and the change of the gradient and step size between two steps.With this scheme, the Hessian only needs to be calculated once at the TS, and then is updated at each step of the IRC calculation.This HPC-IRC method with Hessian updating has been applied successfully in large protein systems using a combined quantum mechanical and molecular mechanical (QM/MM) method.(Tao et al. 2010;Tao et al. 2009b;Zhou et al. 2010) In these studies, the inhibition mechanism of matrix metalloproteinase 2 (MMP2) by its potent inhibitor, was elucidated in great detail using QM/MM methods.The TS of the key reaction in the active site of MMP2 was identified.The IRC of the reaction including the protein environment was calculated to confirm that the reactant and product are connected through the identified TS (Fig. 1).Very recently, Hratchian and Schlegel applied the Euler (first-order) predictor and corrector method (EulerPC) using an Euler explicit integrator in the calculation of predicted step.(Hratchian et al. 2010) This method avoids the expensive Hessian calculation at the TS and updating afterwards.By repeating the evaluation and correction several steps after the prediction, the error of the calculation is greatly reduced.The newly developed EulerPC method shows comparable accuracy with HPC but with much less computational cost, and is tested on several rather large enzymatic systems.(Hratchian & Frisch 2011) As a summary, the IRC calculations are becoming practical even for large enzyme systems using the QM/MM approach.However, to apply any of the IRC methods listed in this section, a well-defined TS structure is necessary to serve as the starting point.For a large system, e.g. an enzymatic reaction system, using QM/MM methods may take substantial effort in identifying a TS.

Uphill walking methods
By walking uphill from a minimum, one could reach an adjacent TS.Applying a reaction path following method on the obtained TS could yield another minimum corresponding to a product or intermediate state and a complete reaction pathway could be formed.Simons and coworkers developed methods that walk on the PES toward the selected direction (either uphill or downhill) using local gradient and Hessian information.(Nichols et al. 1990;Simons & Nichols 1990;Taylor & Simons 1985) By applying a local quadratic approximation, the PES close to a starting structure x 0 can be written as where E 0 , F 0 and H 0 are the energy, gradient and Hessian at x 0 , respectively.Vector x is in the region around x 0 , in which the local quadratic approximation is valid.In the attempt to walk uphill, the vector with the lowest Hessian eigenvalue will be chosen to define the moving direction.The potential energy increases along the chosen direction, but remains at minima along the other eigenvectors.For the best computational efficiency, the step size for each round of searching is controlled using eigenvalues of the Hessian matrix.After leaving the region in which the local quadratic approximation is valid, the Hessian matrix H can be recomputed or updated for further calculations.Along the walk, the Hessian eigenvalue of the eigenvector, which is being followed may cross with other eigenvalues.This is likely to occur when the starting geometry is not within the quadratic approximation region of the TS.In such situations, a decision needs to be made to either keep track of the original eigenvector, or follow the eigenvector with the lowest eigenvalue after crossing.This decision may significantly affect the final TS.
An algorithm was developed by Ohno and Maeda to find reaction pathways on a PES systemically.(Maeda 2003;Ohno 2004;Ohno & Maeda 2006;Yang et al. 2005) In their method, the PES around the equilibrium structure (ES) is expanded using reduced normal coordinates in terms of normal coordinates Q i with eigenvalues i , The degrees of freedom for translation and rotation are projected out from the normal coordinates.In such representation, any constant energy around the ES within the limit of harmonic potential gives spherical hypersurface (hypersphere) (Fig. 2).Any structure represented on a hypersphere is somewhat distorted from the ES around which the hypersphere is constructed.This common feature is referred to as anharmonic downward distortions following (ADDF), and is used for a reaction path search by walking uphill toward the direction with the local maxima of ADDF.Through a series of hyperspheres with different sizes but common origin, different reaction paths may be identified by following the local maxima of ADDF on each hypersphere.TS regions or dissociation channels (DC) can be identified through the variation of the first order derivatives along these reaction paths.Further calculations need to be carried out to precisely determine the real TS structures, which may not be on any hypersphere.The reaction path following methods can be applied to the TSs to find new ESs.The whole procedure can be repeated for new TSs.This is referred as scaled hypersphere search (SHS) method.Theoretically, the SHS method can be repeated until all the stationary points and reaction channels are identified for a given system.This method provides a means to systematically explore the PES and is referred to as global reaction route mapping (GRRM).By applying GRRM, the PESs of several small organic molecules were explored with numerous ESs and TSs identified for each system.(Maeda & Ohno 2007;Yang et al. 2005) Recently, a new GRRM method was developed to search reaction pathways for large flexible systems using a microiteration-ADDF ( -ADDF) technique.(Maeda et al. 2009) The microiteration scheme was originally developed for the QM/MM method.(Svensson et al. 1996;Vreven et al. 2006b;Vreven et al. 2006a;Vreven et al. 2003) For large systems, it may not be practical to follow all the ADDF maxima.Instead, two other methods were developed by the same authors to follow only large ADDF pathways (lADDF), (Maeda & Ohno 2007) or to follow the reverse direction from a point on a very large hypershpere to the sphere center with decreasing hypershere radius (double-ended ADDF, or dADDF).(Maeda et al. 2009) Fig. 2. Schematic illustrations of computational procedures for GRRM by the SHS method: (A) Though the reference harmonic potential has a constant energy on the scaled hypersphere surface, the real potential has some minima on the same surface, which correspond to the anharmonic downward distortions indicating the symptoms of chemical reactions.Following those minima on different sizes of scaled hypersphere (spheres 1 and 2), reaction routes (paths 1-3) can be traced from the equilibrium structure (ES) by the SHS method as shown by arrows.(B) Starting from an ES, find all reaction routes as energy minima on the scaled hypersphere (maxima of anharmonic downward distortions), then continue uphill walking to reach DC or TS, and then downhill walking from the TS region to DC or another ES.From each new ES, the above procedures for finding DC, TS, and ES should be repeated until no new ES is found.This one-after-another approach in the SHS method can be automated, and it enables us to perform GRRM within finite processes.(Reprinted with permission from ref. (Ohno & Maeda 2006).Copyright 2006 American Chemical Society.) Both methods were implemented with the -ADDF technique.In this new GRRM method, a large system is divided into reaction-center and nonreaction-center parts.were examples used to test the stability and efficiency of the new method.Multiple reaction pathways were identified in both cases.Thus, the GRRM method with -ADDF could serve as a powerful tool to explore PESs of reactions in large molecular systems.However, the GRRM has not been reported to be applied on protein systems.

Combined method for determining reaction path, minima, and TSs
It is worth noting that Schlegel et al. developed a method to determine TSs, minima and the reaction path in a single procedure without calculating the Hessian matrix.(Ayala & Schlegel 1997) In this method, a starting approximate path is constructed as several (5 to 7) structures on PES, and iteratively relaxed until two endpoints reach minima, and one of the middle points reaches a TS.The final path is a second order approximation of the steepest descent path.However, this method became impractical for large biological systems, such as proteins, because of the computational cost of eigenvector following searches for TSs.Nevertheless, this method provides a rigorous means of identifying reaction paths and TSs simultaneously.

Reduced Hessian methods
Hessian calculations are important for TS calculations, as well as reaction path following methods.However it may not be practical or necessary to calculate a full Hessian for macromolecular systems, since a large amount of degrees of freedom are unrelated to the reaction of interest.Therefore, these degrees of freedom can be somewhat disregarded.The partial Hessian vibrational analysis (PHVA), (Li & Jensen 2002) was developed to diagonalize only a subblock of the Hessian matrix to yield vibrational frequencies for partially optimized systems.In a recent development of vibrational subsystem analysis (VSA), the complexity of Hessian calculation can be reduced by separating a large system into an active "subsystem" with the remainder of the system defining the "environment".The environment is kept at a minimum energy with respect to the motion of the subsystem, thus an effective Hessian involving only the subsystem needs to be considered.(Woodcock et al. 2008) The VSA is an improvement over PHVA, but does entail higher computational costs.These reduced Hessian approaches can only work if an adequate subsystem including all important atoms for the reaction can be identified.By applying the methods introduced in this section, one could locate an IRC on any given PES with well-defined TSs.There are certain limitations of these methods, especially for large systems with high degrees of freedom.For large systems, the uphill walking methods should not be one's first choice since PESs of such systems are rather complicated and rugged, with numerous local saddle points.The downhill walking methods worked very well for the listed studies.However, a TS needs to be identified before applying IRC calculations.There is always a possibility that the IRC calculated from identified TS does not reach the desired reactant or product.It is even more likely that there are multiple TSs that exist between the reactant or product.In consideration of these factors, there might be more interest to obtain information about reaction pathway rather than TSs, especially for large biomolecular systems.Accordingly, the so-called chain-of-states methods were developed to obtain a reaction pathway without identifying TSs.

Chain-of-states methods
In chain-of-states methods, a number of replicas (i.e.states) of a system are used to connect two endpoints, and are subject to minimization simultaneously.The first and the last replicas usually correspond to the reactant and product, and are often fixed during the minimization.For large complex systems, chain-of-states methods can be used to address issues relating to hysteresis, free energy, reaction rates, and multiple pathways.In this section, various chain-of-states methods to build reaction paths are surveyed.

Line integral methods
Elber and Karplus (EK) developed a method using a line integral representation of a discretized path subject to optimization.(Elber 1987) In their method, the objective function subject to optimization reads as where j R is the coordinates of replica j, M is the total number of steps from the starting , and () j V R is the potential energy of the system at replica j R .The degrees of freedom of the rigid body, i.e. translation and rotation, are projected out from the minimization for replicas with reference to the end points of the path.The objective function is subject to non-linear optimization for the final reaction path.The method was applied to several systems including the conformational change of myoglobin.(Elber 1987) Czerminski and Elber then developed the self-penalty walk (SPW) method (Czerminski & Elber 1990a) based on original EK formulation.The main development of SPW is the addition of repulsion terms for each replica j: where , λ and ρ determine the range and maximal value of the repulsion between replica i and j.These repulsive terms can help to prevent the aggregation of replicas in the neighborhood of two endpoints where the energies of replicas are lower than those close to the TS region.As discussed in their paper, the repulsive terms reflect the stiffness of the reaction path and mimic the effect of kinetic energy on the classical trajectory.
The reaction path calculation of alanine dipeptide isomerization using SPW displayed a convergence rate that is 10 times faster than the one using the EK method.The conformational change of isobutyryl-ala 3 -NH-methyl (IAN) between the helix and extended chain was also studied.By using the optimal values of parameters in this method, a reaction path that is very close to the MEP was obtained for IAN from the calculations with a straight line as initial path.In another study, the SPW method was applied to study the diffusion of carbon monoxide through leghemoglobin.(Nowak et al. 1991) Three similar but distinct diffusion pathways were identified and compared.The barrier heights calculated for the three pathways were in the agreement with the proposed model.Ulitsky and Elber (UE) proposed a locally updated planes (LUP) method to calculate steepest descent paths (SDP) in flexible polyatomic systems.(Ulitsky & Elber 1990) For a series of replicas {r k } k=1,M , s k is the unit vector along the gradient for replica r k .The SPD satisfies that , where V is the potential energy.For a discretized path, the vector s k is approximated as (r k+1r k-1 )/| r k+1r k-1 |.To refine the path of each round, the coupled differential equations of all the replicas {(/t)r k (t)=V proj } k=1,M were solved by a fifth order Adams predictor-corrector algorithm.(Gear 1971) The SDP could be reached in the limit of t→∞.Choi and Elber later improved the LUP method by a gradient updating scheme.(Choi & Elber 1991) The local gradient vector s k is calculated based on the initial path, and updated based on new reaction path after every M steps of optimization.The authors found that M=20 to be efficient for the helix formation of tetrapeptides under their studies.As Choi and Elber pointed out, the final results depend on the initial guess.If multiple MEPs exist between two endpoints and the initial guess is not in the radius of convergence of a single path, the result may be discontinuous and contain segments from different MEPs.

Nudged elastic band (NEB) methods
As pointed by Jónsson, Mills and Jacobesen, (Jónsson et al. 1998) the line integral methods suffer from the "corner cutting" problems in which the final paths bypass the TS region, leading to overestimated barriers.This problem originates from the fact that elastic forces added to replicas have non-zero components perpendicular to the path.The optimization of objective functions that include elastic forces will have a tendency to pull replicas off from the MEP.In addition, replicas along the final path tend to aggregate around endpoints where the potential energies are smaller than the TS region, which is underrepresented in the chain.This is because the actual forces of the path pull the replicas downhill against elastic forces.To solve these problems, the NEB methods were developed to project out perpendicular components of elastic forces and parallel components of the true force with respect to the path under minimization.(Henkelman et al. 2000a;Jónsson et al. 1998) For a reaction path with N+1 replicas [R 0 , R 1 ,…,R N ], the NEB method is implemented as the following: The tangent at the replica i, τ i , can be estimated based on adjacent replicas i-1 and i+1: The force acting on a replica subject to optimization is where F i  is the sum of the true force perpendicular to the tangent: V is the potential energy of the system, k is the elastic force constant.By projecting out the elastic force perpendicular to the path, the final path should relax to the MEP in principle.
One potential problem that the NEB method may encounter is producing kinks along the path, mainly in regions where the parallel component of force is large compared with the perpendicular component.A new NEB method was developed by Henkelman and Jónsson to eliminate the kinks along the path.(Henkelman & Jónsson 2000) In the new implementation, the tangent is defined as where V i is the potential energy of replica i, V(R i ).However, when the replica i is at a minimum (V i+1 >V i <V i-1 ) or at a maximum (V i+1 <V i >V i-1 ), the tangent is estimated as , and . This implementation was motivated by the calculations of finding the MEP from a TS.It is always easier to find a MEP by following the PES downhill from a TS rather than going uphill from a minimum.Therefore, the local tangent of the path is always defined by the higher energy replicas nearby to improve the stability of the calculation.The updated NEB method is reported to behave well and remove kinks from the reaction paths.(Henkelman & Jónsson 2000) Henkelman, Uberuaga and Jónsson (Henkelman et al. 2000b) developed climbing image NEB (CI-NEB) by defining the force of the replica with the highest energy i max as The basic idea is to invert the component of force along the path.Therefore, the minimization movement based on a given force will lead this replica toward the saddle point, which has the maximum energy along the path but minimum in all other directions.In addition, the elastic force constant along the path is scaled linearly on the energy of the replicas.Therefore, the replicas around the TS region would be connected through stronger elastic bands than those closer to the endpoints.Compared with the regular NEB method, the CI-NEB method generates reaction pathways with TS regions better represented, and leads to more accurate estimates of reaction barriers.Maragakis et al. (Maragakis et al. 2002) proposed the adaptive nudged elastic band approach (ANEBA) to search for saddle points.Instead of starting with a large number of replicas, three movable replicas are added for the initial round of the NEB calculation.After obtaining reasonable convergence, the two replicas that are adjacent to the one with the highest energy will serve as endpoints for the next round of NEB calculations after inserting new movable replicas in between.This process could be repeated until a well-defined TS is identified.The major goal of ANEBA is TS searching, but a reaction path that approximates the MEP will be obtained after finding the TS.
Trygubenko and Wales proposed a doubly nudged elastic band (DNEB) method by retaining a portion of perpendicular component of elastic force.(Trygubenko & Wales 2004b;Trygubenko & Wales 2004a) The limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) quasi-Newton optimizer was implemented with the DNEB method.One of the strategies adapted is gradually projecting the perpendicular component of elastic force until the full path is reasonably stable.Then replicas with potential energies above their adjacent replicas are subject to an eigenvector-following calculation for TSs.Chu et al. (Chu et al. 2003) developed the first superlinear minimizer for the NEB method that was based on expanding the adopted basis Newton-Raphsion (ABNR) method (Brooks et al. 1983), and this is available in CHARMM.(Brooks et al. 2009) Other improved optimizers, (Alfonso & Jordan 2003;Bergonzo et al. 2009;Galván & Field 2008) and other software packages, such as AMBER, (Case et al. 2005) have efficient implementations of NEB as well.
In general, the NEB methods have been applied in numerous studies, including diffusion processes on surfaces,(Ágoston & Albe 2010; Sørensen et al. 1996;Uberuaga et al. 2000;Villarba & Jonsson 1994;Yang et al. 2009) nucleation process, (Lutsko 2008) stability of nanoparticle, (Vélez et al. 2008) and dissociative adsorption of a molecule on a surface.(Mills 1995) Xie, Liu and Yang adapted the NEB method for enzymatic reactions.(Xie et al. 2004) The major development of their method is carefully choosing the degrees of freedom that are essential for the reaction.The chosen degrees of freedom were subject for the calculation of distances between replicas.The large number of soft and floppy spectator degrees of freedoms are, therefore, excluded from the elastic force computation in NEB.The potential problem of such selection is discontinuity of contributions from spectator degrees of freedoms to the total potential energy.One solution is starting from relatively rigid reference systems and keeping position restraints to the environmental atoms throughout the NEB calculations.The third alteration is cutting off the elastic band for intermediate states from multiple reaction steps to allow them to relax to the minimum.They applied this modified NEB method to study the mechanism of enzyme class A -lactamase, and obtained the MEP of the reaction at the active site.With further improvement, (Cisneros et al. 2005;Liu et al. 2004) the NEB methods were applied in more enzyme systems to map out the detailed MEP of their mechanisms.(Cisneros et al. 2009;Zhao & Liu 2008)

Zero temperature string (ZTS) methods
Similar to NEB, the ZTS method was developed to find the reaction pathway connecting two minima on a PES.(Cameron et al. 2010;E et al. 2002, 2007;Ren 2003) In the ZTS method, a series of evenly distributed initial states between two endpoints are minimized towards a MEP using path gradients and tangent information.Instead of adding spring forces on the path to maintain appropriate distances between states, a step is added to evenly redistribute the states along the path after each step of minimization.The basics of the ZTS method can be summarized as the following.For transitions between two metastable states A and B, the MEP is a smooth curve φ* between A and B which satisfies where   From an arbitrary string φ connecting A and B, searching MEP can be realized by evolving φ through where u  is the perpendicular component of force with reference to φ.For simplicity, φ is parametrized by normalized arc length , for which =0 at A and =1 at B. The Eq. ( 13) can be rewritten as  is the unit tangent factor of φ and scalar field r( ,t) is a Lagrange multiplier determined by parameterization.In real applications, the string is discretized into a series of replicas, which move according to the first term of the right hand side of Eq. ( 15).Eq. ( 15) can be solved using ordinary differential equation (ODE) solvers to evolve the string towards a MEP.The reparametrization can be carried out after certain number of evolving steps of the string to redistribute the replicas to enforce the equal arc length (distance) between adjacent replicas.In a further development, (E et al. 2007) the Eq. ( 15) is rewritten as . Eq. ( 17) is equivalent to Eq. ( 15), but avoids a force projection operation in the string evolving step for better numerical stability.It is noted that the ZTS method does not require the location of minima beforehand to generate an initial string.The final converged string should connect two minima, as long as the two endpoints of the initial string lie in the attraction basins of the minima (Fig. 3).Based on the framework of the ZTS method, the transition path theory was developed to sample the minimum free energy path.This theory will be introduced in section 4.3.

Growing string methods
So far, the chain-of-states algorithms introduced in section 3.1−3.3require an initial path composed by series of replicas.The initial paths are usually constructed by linear interpolation between two minima.There are two potential drawbacks of these methods when applying them in studies involving expensive quantum mechanics (QM) calculations.First, the expensive QM calculation needs to be carried out for every replica in each round of optimization from the very beginning of the reaction path calculation until convergence.Second, the initial linear interpolation path may include structures with severely overlapping atoms, which can cause failure of the QM calculations.The first drawback means intensive computation of reaction path optimization.The second drawback simply means immediate failure of the chain-of-states calculation.Growing string methods (GSM) were proposed (Peters et al. 2004) and are under continuous development to address these issues.(Goodrow et al. 2008(Goodrow et al. , 2009(Goodrow et al. , 2010;;Quapp 2005Quapp , 2009) ) The basic idea is to gradually grow two strings separately from two minima by continuously adding replicas to each string until both strings merge.The GSM generally comprises five steps.(Goodrow et al. 2008) In step one, two replicas are added along the linear synchronous transit path, and placed close to reactant and product, which defines the linear path.The reactant and product plus each added replica close to them serve as starting segments of two separate strings.In step two, the two string segments are minimized until the norm of the orthogonal force of each replica converges to a specified tolerance.After step two, the two string segments lie on the steepest descent direction from the frontier replica of each segment.In step three, replicas on both string segments are redistributed uniformly in terms of arc length, similar to the ZTS.The arc length of the string can be calculated by integrating the cubic spline fitted to all the replicas.The rigid body movement needs to be projected out for all the replicas with reference to the reactant and product after each round of the minimization.In step four, a new replica is added to each string segment along the fitted cubic spline.Steps two through four are repeated until the arc length between two frontier replicas from the string segments is small enough so that the two string segments can be considered a united string.In step five, the joined string is minimized and reparametrized with a fixed number of replicas until the sum of the norm of the orthogonal forces on all nodes falls under predefined tolerance.Step five is analogous to the ZTS method.One or more replicas with maximum energies among their neighbors can be used for TS search calculations.The general GSM is illustrated in Fig. 4. In various developments, Newton projector, (Quapp 2005) internal coordinate, (Goodrow et al. 2008) conjugate gradient method, (Goodrow et al. 2008) and different TS search strategies (Goodrow et al. 2009(Goodrow et al. , 2010) ) are implemented in GSM for better performance.The major goal of GSM is improving the calculation efficiency of the ZTS framework for systems involving expensive QM calculations.By growing strings stepwise from two endpoints instead of minimizing from an initial path with potentially bad geometries, a large amount of calculations can be saved from carrying out QM calculations to move the fixed number of replicas from initial path to MEP iteratively.The GSM has been applied in one study using a model system representation of an enzymatic system (Maresh et al. 2008) and some other studies using small models.(Goodrow & Bell 2008;Zheng & Bell 2008a, b) The application of GSM directly on protein system has yet to be reported.

Conjugate peak refinement (CPR) method
The CPR method (Fischer 1992;Noé et al. 2006) focuses on finding saddle points along a path that connects a predefined reactant (r) and product (p).At the beginning of the CPR calculation, the replica with maximum potential energy, y 1 , is searched along an initial path, e.g.linear interpolation between r and p (Fig. 5).Then, the energy is minimized along the direction of each conjugate vector to reach the next replica, x 1 .This replica is added to the path, which is represented as (r, x 1 , p).The procedure is repeated for the partial path (r, x 1 ) and (x 1 , p).If there is only one TS that exists between r and p, the maximum may not exist for one path segment.Therefore the procedure will not be carried out for this specific path segment.The procedure is carried out recursively on each new path segment defined by newly added replicas, until the root mean square (RMS) of the gradient falls under a predefined tolerance for the saddle points by refinement along conjugate vectors.In practice, the linear interpolation often produ c e s v e r y p o o r s t r u c t u r e s w i t h s e v e r e overlapping atoms.This could result in extremely high energies and gradients.Therefore, certain limitations need to be imposed on atomic movements during refinement.The CPR method can possibly identify one or more TSs between r and p through recursive searches and refinement.However, a smooth path will be not a direct result from the CPR procedure.From identified TSs, steepest descent calculations can be carried out to generate a final MEP r p i y 1 x 1 www.intechopen.combetween r and p. Implicitly, it is assumed that TSs identified through the CPR method lie on the same MEP.Otherwise, the steepest descent calculations from TSs do not necessarily generate a continuous path between r and p.The CPR method has been applied to study the ligand-binding pathways in myolobins, (Golden & Olsen 2008) several small molecule systems (Fischer et al. 1994;Fischer et al. 1995;Verma et al. 1996) as well as large molecules, such as proteins.(Blondel et al. 1999;Caflisch et al. 1997;Dutzler et al. 2002;Fischer et al. 1993;Gruia et al. 2005;Santos et al. 2000)

Reference path methods
In some applications, especially for macromolecules, the convergence to the MEP using chain-of-states methods could require a large number of iterations.Reference path methods were developed to generate reference reaction paths that are a good approximation to true MEPs with fast convergence rates and evenly distributed replicas along the path.Such reference paths can be used as an expansion to calculate free energies, reaction rates, or kinetic isotope effects.

Hyperplane projection methods
Czerminski and Elber (Czerminski & Elber 1989) proposed applying holonomic constraints on replica R along a reaction path connecting replicas R 1 and R 2 : where is a parameter that varies from 0 (reactant) to 1 (product) with small steps.For each value, the potential energy of the system V(R α ) is minimized while keeping the system in the hyperplane defined by Eq. ( 18).It is necessary to project out the rigid body motions from the optimization using either R 1 or R 2 as reference structures.The optimized path satisfies the condition that at any point along the path, only one direction may have negative energy curvature, because energy is minimized for all degrees of freedom except the direction perpendicular to the hyperplane defined by Eq. ( 18).The Powell conjugate gradient algorithm was applied to optimize the reaction path with linear constraints in this method.The conformational change of IAN between the helix and extended chain was studied using this method.The lowest energy path of the transition identified by the calculation was presented.In addition, the 257 direct transitions between 112 minima of IAN were also identified and subject to statistical analysis.The same authors later introduced another constraint to the path using parameter : where is the parameter between 0 (reactant) and 1 (product).(Czerminski & Elber 1990b) Combining application of and constraints, more comprehensive results were presented for conformational changes of IAN.

RPATH/restraint method
Woodcock et al. implemented the reaction path (RPATH) method with restraints of both best-fit root mean square distances (RMSD) and angles on replicas along the pathway.(Woodcock et al. 2003) The RMSD restraint forces are defined as where N is the number of replicas, k rms is the force constant used to restrain distance between adjacent replicas along the reaction pathway, r i is the best-fit RMSD between replica i and i+1, and r is the average distance between adjacent replicas.The angle energetic penalty term reads The angle , illustrated in Fig. 6, defines the deviation of the pathway from linearity.The force constant k ang controls the rigidity of the pathway.Constant COSMAX determines the value of cos( ) subject to the angle forces.
Fig. 6.Definition of angle θ for replica i in RPATH calculation.RMSD i-1,i is the distance between replica i-1 and i.It is similar to RMSD i,i+1 and RMSD i-1,i+1 .
The RPATH/restraint method was applied to study the mechanism of chorismate mutase.(Woodcock et al. 2003) All protein residues or water molecules that had any atom within 6 Å of the substrate were replicated 21 times to build the reaction path.All other atoms within the system served as the environment of all the replicas.Starting from a linear interpolation pathway, the optimization of the RPATH/restraint calculation was converged within 400 steps.The reaction barrier obtained from RPATH/restraint calculations was also in good agreement with experiments.

RPATH/constraint method
Recently, Chu and coworkers (Brokaw et al. 2009) applied an equal distance holonomic constraint in the RPATH framework.Given two states of a molecular system with N atoms, r 0 and r k , a chain of K+1 replicas can be constructed to connect these two states.The distance between each pair of adjacent replicas is set to be equal to each other: Here, Δl i is the distance between replica i and i+1.l  is the average distance between adjacent replicas.The distance (Δl) can be in any form, including best-fit RMS distance.To apply the equal distance constraint defined in Eq. ( 22), the following scheme is used to propagate the reaction path with two ends, r 0 and r k , fixed at the initial coordinates.
i. Set up and calculate initial average distance, l  , for replicas r 0(0) through r k(0) .The superscript "(0)" indicates the optimization iteration step.ii.To maintain the equal distance, a set of K coefficients, ( i ) (n) (i=0,K-1), are used to update the coordinates of each replica i: iii.Solve ( i ) (n) by setting the first order Taylor expansion of each of ( with respect to ( j ) (n) to zero: ) is greater than a selected tolerance, then repeat steps (ii) and (iii).After convergence, the RPATH calculation leads to a reaction path composed by K+1 equal distance replicas connecting states r 0 and r k .In the framework of using constraints with RPATH, a kinetic energy potential can be added to the potential energy with overall objective functions as a Hamiltonian for path optimization.Therefore the optimized path is the so-called minimum Hamiltonian path (MHP) instead of MEP.The kinetic energy component in the potential prevents kinks, therefore helping maintain the smoothness of the path.This smoothness comes at the cost of deviating from the MEP, resulting in higher reaction barriers, but with the benefit as a better starting point for free energy studies.The RPATH/constraint was applied to study the helix-to-sheet transition of a GNNQQNY heptapeptide.(Brokaw et al. 2009) An initial reaction path with 100 replicas was generated.Then a stable intermediate was chosen to divide the pathway into two segments in which the number of replicas was doubled.Using this divide-and-conquer strategy, a smooth reaction pathway with 464 replicas was obtained to describe the transition (Fig. 7).

Free energy sampling of reaction paths for large systems
The methods introduced in Section 3 provide a framework for determining a minimum potential energy pathway connecting two equilibrium states.For biological processes the minimum free energy pathway (MFEP) is often more desirable to compute, ensuring that the results obtained for the PES are converged, in a thermodynamic sense.Experimental observables can be transformed more readily to free energies of the reactant and product states.(Hu et al. 2008) Biased sampling methods, such as umbrella sampling (Bartels & Karplus 1997;Kästner 2011;Torrie & Valleau 1977;Torrie & Valleau 1974) and metadynamics (Bussi et al. 2006;Laio 2002;Laio & Gervasio 2008;Raiteri et al. 2006) have been widely applied to sample free energies of the transition path for large systems.Due to their high computational cost, the applications of these methods have been limited in studies of enzymatic reaction mechanisms using QM/MM methods.However, powerful modern computing systems and advanced methodologies make the dynamical simulation of enzymatic reactions for free energy profiles much more feasible than a decade ago.In this Fig. 7.An optimized path of the helix-to-sheet transition of a solvated GNNQQNY heptapeptide RPATH/constraint.The path contains a total of 464 replicas.Structures of selected replicas are shown to illustrate the nature of the transition.(Reprinted with permission from ref. (Brokaw et al. 2009).Copyright 2009 American Chemical Society.)section, methods for direct simulation of protein systems using QM/MM methods are surveyed.

QM/MM free energy path (FE) and minimum free energy (MFEP) path methods
QM/MM-FE methods have been pioneered by Yang and coworkers (Zhang et al. 2000) and are sequential based approaches for first optimizing an equilibrium geometry on a reaction potential energy surface, followed by the calculation of free energies between two states, or along a path.(Hu & Yang 2008) The first innovation of their approach involves separating the QM and MM parts of the system and optimizing them independently to allow for faster convergence.There is no concurrent optimization of the QM and MM energy functions making it effective in reducing the number of expensive QM energy and gradient evaluations.Starting from a given structure, the coordinates of the MM portion are frozen while the QM coordinates are optimized.The MM portion is subsequently minimized while the QM portion is held fixed and the process is repeated until convergence is reached for each partition.Further simplification is made by using an approximate QM/MM energy function for the MM minimization where only Coulombic interactions between the point charges of the MM atoms and the QM electrostatic potential (ESP) fitted charges are accounted for: In order for the approximation to yield a monotonic convergence of the QM/MM energy, the MM minimization must be able to approximate the PES closely.It is important to fully optimize the reactant and product structure at the same level of theory applied on QM/MM-FE calculations.
From the converged PES obtained through the sequential optimizations, a free energy of perturbation is performed: .. MMA represents an ensemble average over the MM sub-system with the QM region frozen to the optimized coordinates.(Zhang et al. 2000) A major advantage of the QM/MM-FE method over previous QM-FE approaches (Chandrasekhar & Jorgensen 1985;Chandrasekhar et al. 1985;Jorgensen 1989) is the inclusion of the enzymatic environment in the QM optimization.
Yang and coworkers (Hu & Yang 2008) have extended the QM/MM-FE method to overcome the challenge of having the optimized path be influenced by the initial starting conformation.This influence is problematic when solvent effects may be prominent in the reaction system, e.g. for solvent exposed active sites.Instead of using MM minimization in the sequential optimization scheme, MD sampling is used for obtaining the MM subsystem minima.Optimization of the QM region is then performed with a fixed MM conformational ensemble.Another key distinction of this so-called QM/MM-MFEP over the original QM/MM-FE method is that the optimization function for the QM region is the QM potential of mean force instead of the total energy surface.The associated gradient with respect to the i-th QM coordinate provides a convenient way to compute the free energy as an ensemble average over the QM atoms.The efficiency of the method derives from having a fixed size ensemble of MM conformations instead of repetitive sampling at each step.This is achieved through the updated QM reference structures of each iteration cycle.A precise potential of mean force (PMF) is obtained through optimization with classical numerical schemes.Convergence is equally quick, often achieved within 10 steps.

Transition path sampling (TPS) methods
First pioneered by Pratt (Pratt 1986) and developed much further by Chandler and his collaborators, (Bolhuis et al. 2002;Bolhuis et al. 2000;Dellago & Bolhuis 2004;Dellago & Bolhuis 2009;Dellago et al. 2002) TPS methods are a set of computational techniques to determine the transition process between two metastable states of complex systems.TPS does not require a reaction path a priori, because no specific or "typical" reaction pathway is pursued in TPS methods at all.Instead, a large number of transitions between two metastable states (say A and B in Fig. 8) need to be sampled to represent a transition path ensemble (TPE) between these states.The TPE comprises all the dynamical trajectories that start from one state and end up in the other, which are called reactive trajectories.It is impractical to sample the TPE by starting a dynamics simulation from either A or B in many trials and hoping that enough number of trials can overcome the barrier and reach the other state.This is simply because the transition between A and B is a rare event, and can only be observed in an extremely limited number of trajectories if at all.Instead of this direct sampling strategy, the importance sampling methods, such as generalized MC procedures, were applied to enhance the sampling efficiency and generate a collection of reactive trajectories with a frequency proportional to their probability weight in the TPE.Starting from a reactive trajectory, new trajectories can be generated through so-called shooting moves (Fig. 8).From initial trajectory, a random frame is selected as the shooting point.Then, the velocity of each particle in the shooting point is perturbed.From the shooting point with perturbed velocity, the equations of motion are integrated forward and backward to obtain a complete trajectory.In a simplified formalism, the new trajectory is accepted based on the acceptance probability: Then, the particle momenta at that point are modified by addition of a small perturbation p. From the point with perturbed momenta the equations of motion are integrated forward and backward to obtain a complete trajectory.For small perturbations, the new trajectory will be close to the old one near the shooting point but will then rapidly diverge from it due to the chaoticity of the underlying dynamics.(With kind permission from Springer Science+Business Media: Fig. 8 in ref. (Dellago & Bolhuis 2009).) Function h A [x 0 (n) ] indicates that whether the system resides in state A at time 0 in the new trajectory: h A [x 0 (n) ] is 1 for 0 xA  , and 0 otherwise.Function h B [x T (n) ] is defined analogously to indicate whether the system resides in state B at time T in the new trajectory.According to this expression, any new trajectory that is reactive will be accepted.The perturbation added to the shooting point can be adjusted to control the acceptance probability and optimize the TPE sampling efficiency.(Dellago et al. 1999) To begin path sampling, an initial reactive trajectory is required as the starting point.In most studies of rare events, this may not be achieved by simply running a long molecular dynamics trajectory starting from either metastable state.In practice, the system can be driven from one state to the other artificially.Such a trajectory may not carry large statistical weight in the TPE, but will be sufficient as a starting point for the sampling procedure.
A key factor for a successful TPS calculation is the sampling efficiency in the path space.For a simple system with only one MEP and one TS in between, the path sampling could A B δp converge quickly.However, the PES of a complex system may have multiple reaction pathways, with one or more stable intermediate states presenting on each pathway.To enhance the sampling efficiency for such a complex system, the TPS with path replica exchange (Bolhuis 2008) and multi states transition TPS (Rogal & Bolhuis 2008) were developed.Based on the TPS framework, a transition interface sampling (TIS) method (Moroni et al. 2004;Van Erp et al. 2003;Vanerp & Bolhuis 2005) was developed to measure the positive flux through a series of hypersurfaces in phase space for the calculation of reaction rate constants.The TPS and TIS methods were applied to simulate the -hairpin folding, (Bolhuis 2003) base-pair binding of DNA, (Hagan 2003) Trp-cage folding in explicit solvent, (Juraszek & Bolhuis 2006) and the reaction mechanism of lactate dehydrogenase.(Quaytman & Schwartz 2007)

Transition path theory (TPT)
Most of the reaction path theories are based on the TST, (Eyring 1935;Wigner 1938;Yamamoto 1960) with major emphasis on identifying a first order saddle point on the PES as the TS.The TS concept is also the foundation for the methods introduced in section 2 and 3.The TPS method bypasses the TS identification by sampling the reactive trajectories directly.(Bolhuis et al. 2002) Vanden-Eijnden and his coworkers advanced further by proposing TPT.(E et al. 2005b;E et al. 2005a;E & Vanden-Eijnden 2006;Maragliano et al. 2006;Metzner et al. 2006;Ren et al. 2005;Vanden-Eijnden ;Vanden-Eijnden & Venturoli 2009) The TPT is a framework to study the transition trajectories between two metastable states and the probability density functions of these reactive trajectories.In the TPT framework, the reactive trajectories are viewed as a portion of a long trajectory that oscillates between the metastable states A and B (Fig. 9).Two important quantities in TPT are probability density function and probability current function, which can be defined for a hypothetical long trajectory of an overdamped system with friction i and white noise i on compoment x i , as the following.For a given position x (neither in A nor in B) and t, the function q(x) defines the probability that the trajectory reaches first B rather than A. The probability density to observe a reactive trajectory at point x and time t is where Z is the partition function of the whole trajectory, V(x) is the potential function of the system at x. Another quantity of interest is the probability flux of reactive trajectories across an isoprobability surface, or isocommittor surface, dividing A and B through x.The flux is the vector field with ith component reads () 11 , () () where k B is the Boltzmann factor, T is the temperature.It should be noted that for a system in thermodynamic equilibrium, the net flux through any surface is zero.The overall reaction rate from A to B can be calculated by integrating the reactive flux through an isocommittor surface S dividing A and B: where n ˆ() S z is the unit vector perpendicular to S toward B. The TPT is a general theory about the rare transition events between metastable states based on reactive trajectories but in configurational space.Similar to the TPS, the concept of reactive trajectories does not require the identification of the TS, which is not well defined for many complex systems.The method to apply TPT in practice is finite temperature string (FTS) method, (E et al. 2005b) which is a finite temperature generalization of the ZTS method introduced in section 3.3.In the FTS method, the concept of string ensemble φ ω ( ) is introduced to have a mean value as the string φ( ), where is a parameter between [0,1].The string ensemble can evolve by the equation where  is the unit tangent vector along φ, ω is a white noise with covariance    Using isocommitter surface, a transition tube can be defined as certain region around the most probable reactive trajectory through isocommitter surfaces between A and B. These regions have a significant probability to be visited by reactive trajectories.Of course, there could exist more than one transition tube between two metastable states.These may correspond to the multiple reaction pathways or mechanisms observed on the PES.The free energy associated with isocommitter surface S( ) is given as In this presentation, these hyperplanes serve as reaction coordinates of the transition between metastable states.The free energies are a minimum at A and B. There may exist one or more maxima between the two metastable states.
The FTS method has been applied in a couple of biomolecular systems very recently.In one study, (Rosta et al. 2011) the mechanism of cleavage of the RNA backbone catalyzed by ribonuclease H is elucidated through QM/MM simulations.The converged strings for the entire reaction are reported with estimated free energies along the path (Fig. 10).In another study, (Ovchinnikov et al. 2011) the conformational change between the prepowerstroke and rigor states of myosin VI was simulated using a replica exchange umbrella sampling The harmonic Fourier bead (HFB) approximation proposed by Khavrutskii and his coworkers (Khavrutskii et al. 2006;Khavrutskii et al. 2008a;Khavrutskii & Mccammon 2007) possesses many similarities with FTS method.The main difference is the representation and parametrization of the path by 1 ( ) (0) ( (1) (0)) sin( )

A B
where is single progress parameter between 0 and 1, q i is the ith bead, which is the ith component of the configuration vector R=(q 1 ,…,q 3N ), i n b are the amplitudes of the Fourier basis functions.The string or reaction path in HFB can evolve under certain constraints or reparametrization, similar to ZTS and FTS, to obtain MEP and MFEP.The HFB method was applied to study the conformational transition of the signature peptide of the KcsA ion selectivity filter.(Khavrutskii et al. 2008b) A novel hypothesis of the ion selectivity mechanism was proposed based on the HFB simulation results.

Action based methods
There is a group of methods that are based on minimization of action functional of transition paths.According to the Onsager-Machlup (Onsager & Machlup 1953) and Freidlin-Wentzell theories,(E et al. 2004;Freidlin 1984) the most probable path for the random dynamical system is the minimum action path (MAP).The MAP is a more general definition than MEP, for MEP is undefined for nongradient systems.For gradient systems with energy landscapes, the MAP degenerates with MEP in the high friction limit.Therefore, MEP can be obtained through optimization of path action.Olender and Elber (Olender 1997) first proposed to minimize a functional, which is called "scalar work" for a trajectory: where V is the potential energy of the system, dl is a segment of the trajectory.For the steepest descent path, W scalar has a minimum value, where V  is parallel to dl along the path.More recently, the geometric minimum action method (Vanden-Eijnden & Heymann 2008) and adaptive minimum action method (Zhou et al. 2008) were developed as two action based methods for MEP optimization.Some action-based methods are developed to simulate the reaction path.Doniach and his coworkers (Eastman et al. 2001) proposed reaction path annealing methods to simulate protein folding trajectories which are distributed according to the Onsager-Machlup action functional: where the action S[X(t)] can be calculated for an overdamped system with friction in discretization form as M is the mass matrix, V is the potential energy function, and Δt is the time step of the trajectories.The reaction path annealing method was employed to simulate the conformational changes among the -helix, -sheet and the random coil of a seven residue peptide from prion-like protein.(Lipfert et al. 2005) Faccioli and co-workers developed the dominant reaction path (DRP) method, and applied it on numerous systems.(Autieri et al. 2009;Faccioli 2008;Faccioli et al. 2006;Mazzola et al. 2011;Sega et al. 2007) Their method is based on the Fokker-Planck equation, which defines the probability of finding a particle at position x and time t: where D is the diffusion constant, U is the potential energy function for the particle.The Boltzmann distribution of probability P(X)~exp(−V(X)/k B T) is a stationary solution of Eq. ( 40).Using a substitution where the effective Hamiltonian operator is defined as and the effective potential is defined as A path-integral representation can be obtained as the solution of Eq. ( 42) subject to the boundary conditions as that the system is in state A at time 0, and state B at time T. this presentation also allows a switch from the time-dependent Newtonian description to the energy-dependent Hamilton-Jacobi (HJ) description.In the HJ framework, the most probable trajectory can be obtained by minimizing the target HJ functional  XX () where is the friction, and E eff is an effective energy parameter, which determines the total time of trajectory.In very recent studies, the simulation of reaction pathways connecting two boundary states were carried out to search for the dominant reaction pathway for folding of a small peptide, (Faccioli et al. 2010) an organic reaction from cyclobutene to butadiene, (Beccara et al. 2010) and the folding of tetraalanine at semiempirical level of theory.(Beccara et al. 2011) Although the DPR method is still in its early stage of development, the current progress demonstrates the efficiency of this method in sampling the reaction pathways for rare transition events.

Summary
The development of reaction path methods to study the mechanism of rare events of high dimensional systems has been an active research field for more than two decades.Many efficient methods have been developed and applied in numerous applications.The IRC is a well defined reaction pathway of the MEP.It plays an important role in explaining reactions in complicated systems, especially when the reaction coordinates are not simple order parameters.However, one can only generate an IRC after obtaining a TS.In many studies, identifying a TS is the bottleneck of the research.In some cases, the IRC generated from a TS may not connect with the target reactant and product.For large systems, there may exist multiple reaction pathways between the reactant and product.To map out multiple reaction pathways through IRC, it is required to identify multiple TSs a priori, which is a very challenging task.The chain-of-states methods were developed to simultaneously optimize a series of replicas as the representation of the reaction pathway connecting two equilibrium states.Starting from an initial guess, the path optimization could lead to a reaction pathway that closely resembles the MEP between two equilibrium states.One advantage of the chainof-states methods is that no TS is required to generate a reaction path.The replicas with highest energy from reaction path calculations are often subject to a TS search for better estimation of the reaction barriers.
For large systems, special caution is needed when carrying out the IRC calculations or chainof-states calculations.To ensure the continuity of the reaction pathways that connect the reactant and product, large amounts of spectator degrees of freedom need to be controlled, or frozen, during the calculations.This treatment can make the working PES cleaner, but has the drawback of ignoring important degrees of freedom that contribute to the free energies, which can be directly compared with the experimental observations.Recently, more research has been reported to develop methods that sample the reaction pathway directly in the transition path space, rather than the phase space.With these developments, multiple reaction pathways could potentially be sampled without obtaining TSs for each pathway a priori.For rare events with high thermodynamic or kinetic barriers, the efficiency of sampling in the reaction path space is higher than the one of sampling in the phase space.
The free energies could be estimated for each pathway with the sampled ensembles.Currently, the reaction path sampling method is still in its early stage of development for both theory and sampling techniques.It is clear that advanced computational power, theories and sampling techniques are in sight to make the computational study of the mechanisms of rare events for large systems more practical and appealing.

Fig. 3 .
Fig. 3. Initial string and calculated MEP using the string method with ten images.The empty circle indicates the saddle point identified by combining the string method with the climbing image technique.(Reprinted with permission from ref. (E et al. 2007).Copyright 2007, American Institute of Physics.)

Fig. 8 .
Fig.8.In the shooting algorithm for deterministic dynamics a new path (green) is generated from an old one (black) by first randomly selecting one point on the old path, the shooting point.Then, the particle momenta at that point are modified by addition of a small perturbation p. From the point with perturbed momenta the equations of motion are integrated forward and backward to obtain a complete trajectory.For small perturbations, the new trajectory will be close to the old one near the shooting point but will then rapidly diverge from it due to the chaoticity of the underlying dynamics.(With kind permission from Springer Science+Business Media: Fig.8in ref.(Dellago & Bolhuis 2009).)

Fig. 9 .
Fig. 9. Schematic illustration of a long trajectory oscillating between the reactant state A and the product state B. The reactive pieces of this trajectory, during which the system travels from A to B, are shown in green.(Reprinted with permission from ref. (E & Vanden-Eijnden 2010).Copyright 2010 Annual Reviews.)

Fig. 10 .
Fig. 10.2D free energy surface and reaction strings for the complete catalytic reaction with a protonated Asp132, obtained by projection onto coordinates Qe = ET probing bond formation and breaking and Qp = (PT1a+PT1b)/2+PT2 probing PT steps.The black curve shows the initial string; the blue and purple curves show the converged strings for the entire reaction and for the final step, respectively.The inset shows a schematic of the active-site coordination in the intermediate state.(Reprinted with permission from ref. (Rosta et al. 2011).Copyright 2011 American Chemical Society.)algorithm in a string method framework.The free energy and reaction rate of the transition were calculated and found to be consistent with the experimental observation.(De La Cruz et al. 2001) The harmonic Fourier bead (HFB) approximation proposed by Khavrutskii and his coworkers(Khavrutskii et al. 2006;Khavrutskii et al. 2008a;Khavrutskii & Mccammon 2007) possesses many similarities with FTS method.The main difference is the representation and parametrization of the path by